21C3 Schedule Release 1.1.7

21st Chaos Communication Congress
Lectures and workshops

Speakers
Picture of Tim Starling Tim Starling
Picture of Brion Vibber Brion Vibber
Schedule
Day 1
Location Saal 1
Start Time 15:00 h
Duration 01:00
INFO
ID 63
Type Lecture
Track Hacking
Language english
FEEDBACK

Edit This Page

Scaling the Wiki Beyond 1 Million

Wikipedia developers chronicle the evolution of the MediaWiki software and server farm to manage the popular and ever-growing editable encyclopedia. Planned scalability improvements are outlined, plus developers' conference and hacking session.

Since 2001, the Wikipedia encyclopedia project has jumped from a scratchpad side project to one of the top 500 sites on the web (Alexa stats), bringing community and media attention to both wikis and open-licensed content.

While some dispute its quality, Wikipedia's quantity is undeniable: at over one million pages in dozens of languages Wikipedia is the largest, most populous Wiki Wiki site on the net. Where traditional wikis have tended to be relatively small communities based on some topical interest, Wikipedia actively seeks attention, visitors, and editors with an open-ended mission to document virtually any topic, from the philosophers of Athens to Slashdot trolling fads.

Phenomenal growth in the editing community, non-editing visitors, and the number of topics covered has thus put continual social and technical pressure on the scalability of the system. The wiki engine MediaWiki has grown up along with the project, following the sometimes-conflicting paths of being both easy to install and use and performing reasonably well in a multimillion hit per day environment.

Built in the scripting language PHP, MediaWiki attempts to boost performance with "alternate hard and soft layers" of code: most hits to Wikipedia are actually handled by a Squid reverse proxy cache which is faster than any PHP script could be. Pre-parsed page chunks and dynamically editable user interface data are optionally cached using Livejournal's distributed memory object cache memcached or the compatible disk-backed tugelacache. Ongoing work is being put into accelerated native-code diff, parsing, and Unicode normalization modules, while still retaining compatibility with "pure" PHP code for use in more restricted environments.

Additionally a MediaWiki developers' conference is planned, for discussion and work on the next major version of the software which will carry sites like Wikipedia through 2005 and beyond: developers will give a peek at upcoming storage back-end improvements to handle an ever-growing text corpus.