Scraping Wikipedia Data - Stuart Geiger

February 17, 2016 at 5-6:30pm in BIDS, 190 Doe Library


About 30 folks!

Stuart Geiger

I’m a postdoc at the Berkeley Institute for Data Science and I recently completed my Ph.D last December at the UC-Berkeley School of Information next door. I’m an ethnographer of science and technology, and I study how people produce knowledge. A big focus of my work is about how new technologies change what it means to produce knowledge. In my work, I use many different kinds of methods – sometimes I look more like an anthropologist, a historian, or a philosopher, while other times I run surveys, experiments, and large-scale data analyses. My Ph.D research was about Wikipedia’s volunteer editing community, and I’m now studying the emergence of this thing we like to call data science.

Scraping Wikipedia data

We’ll be using two different resources to query Wikipedia. First, the Wikipedia API, which directly queries the text in Wikipedia articles, and second Wikidata, a new project that is trying to store all of the information in Wikipedia articles in a standardized, structured database.

Things you will need

Lightning Talks

Matthias : Hacker Within mybinder

Go checkout You can run the THW notebooks from your browser.

Brian : Where is a mountain, anyway

Inspired by the geocoordinates in Stuarts talk, Brian pointed out that putting coordinates on a mountain is tricky. Where is a mountain, anyway?