Session:Deanonymization and Author Recognition with Digital Humanities-Tools

From 34C3_Wiki
Jump to: navigation, search
Description Digital Humanities research developed computational methods to attribute anonymous texts with Open Source tools. Participants will learn about the methodological foundations of authorship attribution, its possibilities and its limitations. They will then apply it hands-on to a sample text collection and / or own examples (you can bring your own text). No programing experience is required, having the R programming language installed previously would be helpful.
Website(s) https://hackmd.okfn.de/34c3-stylometrie?view
Type Workshop
Kids session No
Keyword(s) software, art, hacking
Tags Assembly:Open Knowledge Assembly, Digital Humanities, Stylometrics, Text Analysis, Deanonymization, Workshop
Processing assembly Assembly:Open Knowledge Assembly
Person organizing User:Pielstroem
Language de - German, en - English
de - German, en - English
Other sessions...

refresh

Starts at 2017/12/29 13:15
Ends at 2017/12/29 14:45
Duration 90 minutes
Location Room:Esszimmer

The process of writing a text is driven by many often unconcious habits that influence writing style, resulting in reoccuring features that can be utilized to identify an anonymous author. In the recent decades, researchers interested in literary works of disputed authorship have developed computational methods to attribute such texts to candidate authors, and published open source tools that perform that task.

Participants will learn about the methodological foundations of authorship attribution, its possibilities and its limitations. They will then apply it hands-on to a sample text collection and, if they wish, to other examples of their own choosing.

No programing experience is required, having the R programming language installed previously would be helpful. Participants are invited to bring their own examples in txt or xml format with one file per text. A single anonymous text will not do, the method requires as many comparision texts from candidate authors as possible. Please name comparison text files according to a “AuthorName_TextIdentifyer.txt” pattern.