Veranstaltung

Veranstaltung

Abstract: If you work involves sifting through and making sense of large amounts of data, we welcome you to a session on OpenAleph (openaleph.org), where we can brainstorm (and commiserate) together. We will show practical examples of surfacing interesting leads from leaked data and explore falsely-held beliefs that stand in the way of investigators. Making sense of a large volume of data is marketed as being a textbook use-case for generative AI. We beg to differ.

Description: Making sense of large amounts of unstructured data makes users reach for prompting a chatbot. Newsrooms and organizations we work with hope to “ask questions” in a chat window, instead of searching through their data. But these same users rightfully demand accuracy and deterministic results. If they don’t find exactly what they are looking for, they doubt the efficiency of the entire software stack.

Our experience, at the Data and Research Center (darc.li) with supporting research and investigations with algorithms and infrastructure led to insights about how to answer difficult questions in a deterministic way.

This session will walk the audience through several features that surface names, companies, and other interesting data from large leaks. We will explore conundrums about search and deduplication features, which pose difficult questions for investigators and programmers.

There are many falsehoods we tend to believe about our world. What is a name, actually? What constitutes a country, and who decides on that? How do you search for words across several languages, all at once? And how do you reveal hidden links in large volumes of data?

All these will be answered without prompting a chatbot, not even once!