Session:Deanonymization and Author Recognition with Digital Humanities-Tools
Description | Digital Humanities research developed computational methods to attribute anonymous texts with Open Source tools. Participants will learn about the methodological foundations of authorship attribution, its possibilities and its limitations. They will then apply it hands-on to a sample text collection and / or own examples (you can bring your own text). No programing experience is required, having the R programming language installed previously would be helpful. |
---|---|
Website(s) | https://hackmd.okfn.de/34c3-stylometrie?view |
Type | Workshop |
Kids session | No |
Keyword(s) | software, art, hacking |
Tags | Assembly:Open Knowledge Assembly, Digital Humanities, Stylometrics, Text Analysis, Deanonymization, Workshop |
Processing assembly | Assembly:Open Knowledge Assembly |
Person organizing | User:Pielstroem |
Language | de - German, en - English |
Other sessions... |
Starts at | 2017/12/29 13:15 |
---|---|
Ends at | 2017/12/29 14:45 |
Duration | 90 minutes |
Location | Room:Esszimmer |
The process of writing a text is driven by many often unconcious habits that influence writing style, resulting in reoccuring features that can be utilized to identify an anonymous author. In the recent decades, researchers interested in literary works of disputed authorship have developed computational methods to attribute such texts to candidate authors, and published open source tools that perform that task.
Participants will learn about the methodological foundations of authorship attribution, its possibilities and its limitations. They will then apply it hands-on to a sample text collection and, if they wish, to other examples of their own choosing.
No programing experience is required, having the R programming language installed previously would be helpful. Participants are invited to bring their own examples in txt or xml format with one file per text. A single anonymous text will not do, the method requires as many comparision texts from candidate authors as possible. Please name comparison text files according to a “AuthorName_TextIdentifyer.txt” pattern.