CS7180: Special Topics in AI: Text Modeling for the Humanities and Social Sciences
Fall 2017
Class meeting: Tuesdays, 11:45-1:25, and
Thursdays, 2:50-4:30, Ryder 124
Instructor: David
Smith, Assistant Professor in Computer and Information Science
(Office Hours: Thursdays, 12-2, or by appointment; WVH 356)
Course Description
Researchers and archivists have been digitizing the source
materials for human history and culture for over half a century, but
two further factors sped the emergence in the last decade of the
digital humanities and computational social sciences. First,
industrial scale scanning projects have increased the available
evidence beyond the ability of individual scholars to manage them;
second, born-digital traces of our social, cultural, economic, and
political lives have become practically archivable and searchable on
a massive scale. Much of this data is
text—“unstructured” as the database people might
say—providing opportunities for advances in natural language
processing.
In this seminar, we will read and discuss papers about building
models of text to answer questions in the humanities and social
sciences. Students will take turns presenting and leading
discussion of papers along with the relevant background material.
All students will write short reviews of the papers we read and
complete a course project and accompanying report.
Prerequisites
There are no official prerequisites; however, it is expected that
students have some background either in NLP, in machine learning, or
in working with text computationally in the humanities or social
sciences.
Syllabus
Each week, we will read roughly two papers on a common theme. The
papers could be tied together by methodology—e.g., text
categorization or convolutional neural networks—or by subject
matter—e.g., criminology or plot analysis.
- September 7: Introduction:
Human language and culture meet Big Government, Big Business,
Big Science, and Big Data. I ended up talking about several
books from, modally, 1983:
- Walter J. Ong. Orality and Literacy: The
Technologizing of the Word, Routledge, 1982.
- Elizabeth Eisenstein. The Printing Press as an Agent
of Change: Communications and Cultural Transformations in
Early-Modern Europe, Cambridge University Press,
1979.
- Ernest Gellner. Nations and Nationalism,
Cornell University Press, 1983.
- James C. Scott. Seeing Like a State: How Certain
Schemes to Improve the Human Condition Have Failed, Yale
University Press, 1998.
- Benedict Anderson. Imagined Communities: Reflections
on the Origins and Spread of Nationalism, Verso,
1983.
- Georges Lefebvre. John Albert White, trans. The Great
Fear of 1789: Rural Panic in Revolutionary France,
Princeton University Press, 1983.
- September 12: Models of Text in the Social Sciences and Humanities
- September 14: Bags of words and other text representations
- September 19: Word vectors and distributed representations
- September 21: Text categorization
- September 26: Language models and topic models
- September 28: Dynamic models and temporal change
- October 3: Entity and relation extraction
- October 5: Discuss project ideas
- October 10: Plot and character
- October 12: Language and power relations
- October 17: Geographical and social variation
- October 19: Dialogue and argumentation
- October 24: Text Reuse
- October 26: Information cascades
- October 31: Community structure and communication
- November 2: Exploratory Data Analysis
- November 7: Document analysis and recognition
- Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-Kirkpatrick.
Automatic
compositor attribution in the First Folio of
Shakespeare. In ACL, 2017. (Sreekumar)
- Verónica Romero, Alicia Fornés, Nicolás Serrano, Joan
Andreu Sánchez, Alejandro H. Toselli, Volkmar Frinken, Enrique
Vidal, and Josep Lladós.
The
ESPOSALLES database: An ancient marriage license corpus for
off-line handwriting recognition. Pattern
Recognition 46(6):1658–1669, 2013. (Gundogdu)
- November 9: Preliminary project presentations
- November 14: Laws and legislatures
- November 16: Crime and Harm
- Rob Voigt, Nicholas P. Camp, Vinodkumar Prabhakaran, William
L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David
Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt.
Language
from police body camera footage shows racial disparities in
officer respect. Proceedings of the National
Academy of Sciences, 114(25):6521–6526, 2017. (Gallagher)
- Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen
Coppersmith, and Mrinal Kumar.
Discovering
shifts to suicidal ideation from mental health content in
social media. In CHI, 2016. (Robertson)
- November 21: Causal Inference
- November 23: Thanksgiving: No Class
- November 28: Final project presentations
- MacLaughlin
- Gallagher
- Murali and Sreekumar
- Robertson
- November 30: Final project presentations
- McCabe and Sherkati
- Muther
- Gundogdu
- Nye
- Pandya
- December 5: Some other topic
- December 7: Interpretation