28C3 - Version 2.3.5

28th Chaos Communication Congress
Behind Enemy Lines

Speakers
Michael Brennan
Rachel Greenstadt
Schedule
Day Day 3 - 2011-12-29
Room Saal 3
Start time 16:00
Duration 01:00
Info
ID 4781
Event type Lecture
Track Hacking
Language used for presentation English
Feedback

Deceiving Authorship Detection

Tools to Maintain Anonymity Through Writing Style & Current Trends in Adversarial Stylometry

Stylometry is the art of detecting authorship of a document based on the linguistic style present in the text. As authorship recognition methods based on machine learning have improved, they have also presented a threat to privacy and anonymity. We have developed two open-source tools, Stylo and Anonymouth, which we will release at 28C3 and introduce in this talk. Anonymouth aids individuals in obfuscating documents to protect identity from authorship analysis. Stylo is a machine-learning based authorship detection research tool that provides the basis for Anonymouth's decision making. We will also review the problem of stylometry and the privacy implications and present new research related to detecting writing style deception, threats to anonymity in short message services like Twitter, examine the implications for languages other than English, and release a large adversarial stylometry corpus for linguistic and privacy research purposes.

Stylometry is the study of authorship recognition based on linguistic style (word choice, punctuation, syntax, etc). Adversarial stylometry examines authorship recognition in the context of privacy and anonymity though attempts to circumvent stylometry with passages intended to obfuscate or imitate identity.

This talk will introduce the open source authorship recognition and obfuscation projects Anonymouth and Stylo. Anonymouth aids individuals in obfuscating their writing style in order to maintain anonymity against multiple forms of machine learning based authorship recognition techniques. The basis for this tool is Stylo, an authorship recognition research tool that implements multiple forms of state-of-the-art stylometry methods. Anonymouth uses Stylo to attempt authorship recognition and suggest changes to a document that will obfuscate the identity of the author to the known set of authorship recognition techniques.

We will also cover our recent work in the field of adversarial authorship recognition in the two years since our 26C3 talk, "Privacy & Stylometry: Practical Attacks Against Authorship Recognition Techniques." Our lab has new research on detecting deception in writing style that may indicate a modified document, demonstrating up to 86% accuracy in detecting the presence of deceptive writing styles. Short messages have been difficult to assign authorship to but recent work from our lab demonstrates the threat to anonymity present in short message services like Twitter. We have found that while difficult, it is possible to identify authors of tweets with success rates significantly higher than random chance. We also have new results that examine the ability of authorship recognition to succeed across languages and the use of translation to thwart detection.

This talk will also mark the release of an adversarial stylometry data set that is many times larger than our previous release. This data set, provided by volunteers, includes at least 6500 words per author of unmodified writing as well as sample adversarial passages intended to preserve the anonymity of the author and demographic information for each author.

The content of this talk will be relevant to those with interest in novel issues in privacy and anonymity, forensics and anti-forensics, and machine learning. All of the work presented here is from the Privacy, Security and Automation Lab at Drexel University. Founded in 2008, our lab focuses on the use of machine learning to augment privacy and security decision making.