1. Skip to content
  2. Skip to main menu
  3. Skip to more DW sites

Wikipedia anniversary

April 4, 2011

Our interactive chart shows the development of German Wikipedia in the past ten years. What kind of data did we use? How did we visualize the data? Here are the answers.

https://p.dw.com/p/10lsr
Incomplete Wikipedia globe
Wikipedia is one of the most popular websites worldwideImage: by-nc-sa

What kind of data set did we use in order to create the visualization?

Wikipedia is run by the non-profit organization Wikimedia which is offering data sets on Wikipedia for free on dumps.wikimedia.org. In order to visualize the daily activities, we browsed through all raw data files on all edits of the German Wikipedia sites from 2001 to 2011. Other data needed had already been analyzed by Wikimedia and is presented in a tabulated format on Wikimedia's page.

How large are the amounts of data which were used for the visualization?

It's one single 20 gigabyte XML document of raw data files. This includes a list with titles of all Wikipedia articles. In addition to that, a revision history, or edit history, is kept for every article (in order to document who has changed a particular article at what time and to what extent).

How is it even possible to analyze these large amounts of data?

The only option to analyze such huge amounts of data is to write a so-called scraper, a small program designed to search through data according to specified criteria. It can then automatically retrieve the data. Scrapers are generally used to retrieve information from less structured data sources (such as websites).

Why was a spiral pattern chosen to visualize the data?

Data visualization
A spiral pattern was chosen to visualize the data

The analyzed data was available for single months. In order to find out if the development of Wikipedia sites is linear or if it fluctuates significantly (and thus can be considered seasonal), it's necessary to choose a format which vividly depicts results at first glance.

This is very easy to see if you look at the spiral pattern. If Wikipedia had developed in a straight proportional, linear way, the chart would have shown a perfectly shaped spiral pattern. But this is not the case. A few parameters, such as "New Articles per Day", show anomalies which can be discovered as visible bumps. If you click on "Active Wikipedians", you can see at first glance that the number of Wikipedians has leveled off since 2008 and even shows a decrease.

Author: Steffen Leidel, Gregor Aisch (sst)
Editor: Cyrus Farivar