What is TextXD?

TextXD aims to foster cross-pollination among researchers who work with natural language data, whether they identify as computer, social, data, or information scientists, including linguists.


Berkeley NLP Seminar

Weekly one-hour seminar on the latest topics in the field of Natural Language Processing. Researchers from across UC Berkeley as well as visitors from out of town present their recent work for discussion and feedback.

The Berkeley NLP Group

A part of the UC Berkeley Computer Science division working in the following areas: Linguistic analysis, Machine translation, Computational linguistics, Grounded semantics, Unsupervised learning.


A beginner/intermediate/advanced group of social scientists learning NLP tools together through a group project, and creating re-usable code for every step of a text analysis project.


The MetaNet project seeks to systematically identify and analyze the metaphors that people use to discuss and reason about a broad range of topics and domains.


The FrameNet project is building a lexical database of English that is both human- and machine-readable, based on annotating examples of how words are used in actual texts.


Text Thresher improves the social science practice of content analysis, making it vastly more transparent and scalable to millions of documents.

Literature + Digital Humanities

A beginners’ group that explores a subset of DH tools, and theory that are suited for the study of literature in a variety of languages.


TextXD Fall 2017

TextXD's fall 2017 conference will take place at BIDS on Thursday, November 30 and Friday, December 1. Find the schedule and more details here.

David Bamman To Speak at EECS Colloquium

David Bamman, assistant professor in the School of Information at UC Berkeley, will speak at the EECS Colloquium, on Wednesday November 30 from 4PM-5PM, in 306 Soda Hall. He will be speaking about "The Large-Scale Analysis of Books", and will outline the opportunities and challenges involved in "distant reading."

David Bamman Kicks Off NYU's Fall 2016 Text as Data Speaker Series

David Bamman, assistant professor in the School of Information at UC Berkeley, is kicking off the Fall 2016 Text as Data Speaker Series, hosted by The Center for Data Science at New York University. In its third semester, the speaker series has hosted a wide range of experts on NLP, including Laura Nelson (UC Berkeley Postdoc), Hanna Wallach, David Blei, Molly Robers, Brandon Stewart, and Noah Smith. Bamman is assistant professor in the School of Information, where he works on applying natural language processing and machine learning to empirical questions in the humanities and social sciences." Bamman recently presented at the 2016 CSCW Workshop on Human-Centered Data Science, advocating for the role of interpretation in predictive models.

Berkeley Team, led by Marti Hearst, Placed Second at the 2016 PoetiX Competition

Professor Hearst is currently leading a team of researchers who are designing a computer system that writes poetry. Their team placed second in the 2016 PoetiX competition, a Turing Test competition hosted by the Neukom Institute that requires computers to compose sonnets, an especially constrained poetic form.

Berkeley NLP Seminar Meeting at a New Time This Semester

Professor Marti Hearst is leading the Fall 2016 Berkeley NLP Seminar, which is meeting at a new time this semester: Mondays from 3:30-4:30pm in room 202 South Hall. The seminar has hosted two speakers so far this semester, with the next speaker, Sida Wang, scheduled for October 3.

The MetaNet Project Releases First Public Inventory of Conceptual Metaphors and Frames

On August 7, 2016, the MetaNet Project has released a public, structured inventory of conceptual metaphors and frames, found here. The goal of the MetaNet Project, a cross-disciplinary effort, is to use computational methods to analyze metaphors. The beta-version of this frame repository will be expanded on an ongoing basis, so check back for further updates.

UC Berkeley Libraries Hires New Digital Humanities Librarian

On August 8, 2016, UC Berkeley welcomed their new Literatures & Digital Humanities Librarian, Stacy Reardon. This position is meant to facilitate faculty and student projects relating to the digital humanities, and is part of a larger, campus-wide expansion of the digital humanities at Berkeley.

Using NLP to Enhance Literary Analysis

How do literary scholars use NLP? Teddy Roland, previously an instructor at UC Berkeley, now starting a PhD program at UCSB, covers a number of ways humanists can use NLP to enhance literary analysis. His blog post highlights the rapidly growing domain of cultural analytics, which you can follow via the new Journal of Cultural Analytics.

Sign Up for the NYU Data Science Community Newsletter for Relevant NLP and Data Science News

The NYU Data Science Community Newsletter covers all things data science, with a particular focus on NYU and UC Berkeley. They often cover news, events, and jobs related to NLP, and I find it a useful way to keep up with what's happening in the community. In their recent edition they covered a fascinating paper, currently under review, that analyzes the interaction between maintenance bots on Wikipedia:

Wikipedia's bots fighting Sisyphean culture wars (written by Laura NoreŽn, one of the editors of the newsletter):

Taking a closer look at bot vs. bot competition, Milena Tsvetkova et al. find that Even Good Bots Fight using Wikipedia data to show that Wikipedia's maintenance bots are constantly adding and deleting each others' edits. What's more, just like humans, Wikipedia bots exhibit cultural differences. As Pedro Domingo points out in his 10 Myths about Machine Learning, not all learning algorithms start with a blank slate; some use data to refine a preexisting body of knowledge. And that's how we get bots tirelessly waging quiet editorial culture wars on Wikipedia.

Explore more!

Check out some of our other website: