29C3 - Version 1.9

F/a{hr-p).l//a,n
2.9/C-3

Speakers
Aylin Caliskan Islam
Rachel Greenstadt
Sadia Afroz
Schedule
Day Day 2 - 2012-12-28
Room Saal 6
Start time 23:00
Duration 01:00
Info
ID 5230
Event type Lecture
Language used for presentation English
Feedback

Stylometry and Online Underground Markets

Stylometry uses linguistic information found in a document to perform authorship recognition. In this talk, we will present how stylometry can be used to deanonymize users in multilingual underground forums. Our initial result shows that in spite of differences in languages and text lengths, regular stylometric methods perform well in identifying users in this context. We will also present the improved version of Anonymouth, a tool to anonymize written document, with user studies.

Stylometry identifies the author of an anonymous text by using linguistic features, a topic that we explore in detail at the Privacy, Security, and Automation Lab at Drexel University. In our previous talks at CCC, people have often asked us how well stylometry works on non-English texts and how well translation tools work at anonymizing texts. We will explore these topics in detail in this year’s talk. In particular, we have shown that machine translation does not obfuscate a writer’s writing style and an anonymous text that has been translated can be attributed to its original author with a 92% true-positive rate.

Next, we wanted to see what stylometry could do when applied to an interesting real world dataset containing short text in multiple languages. As a result, we applied stylometry to leaked underground forums. Online forums are frequently used by cyber-criminals around the world to establish trade relationship and exchange fraudulent goods and services such as the sale of stolen credit card numbers and compromised hosts, spamming, phishing, and online credential theft. These forums are popular among the cyber-criminals as they are easily accessible and provide some high degree of anonymity. In this work, we examine several multilingual underground forums, for example, thebadhackerz.com, blackhatpalace.com, www.carders.cc, free-hack.com, hackel1te.info, hack-sector.forumh.net, rootwarez.org, L33tcrew.org, antichat.ru. We did authorship attribution on these users and so far have had 72% success in correct attribution (however we believe this number will be significantly improved by the time of the talk as we continue our analysis and bring in new features).

Authorship attribution in the underground forums requires new features since the text used in these forums are multilingual, contain numerical information such as credit card and bank account numbers, and have many symbols in the URLs and services being shared. These properties of the text are not similar to common writing. We are expecting a significant increase in the accuracy once the above mentioned feature set is implemented. We will also present our results on user attribution across forums to see if we can detect users engaging in different forums or users who have multiple accounts in the same forum, since these users tend to get banned.

We also present some improvements we have made to the tool Anonymouth which was presented at 28C3 and helps a writer anonymize their text by making the suggested changes.