Current techniques for topic modelling and narrative detection are optimised for English language content, primarily catering to English-centric or other widely spoken languages, and do not perform as well on less spoken languages. These less-spoken languages, including those of the Baltic region, often contain unique cultural and contextual nuances not captured well by models trained in major languages. This issue is compounded for less spoken languages, where the challenge of accurately capturing cultural nuances becomes even more pronounced, leading to inaccurate or incomplete analysis. In strategic communications, these limitations pose a significant challenge in effectively analysing the digital information environment. Furthermore, equalising strategic communications capabilities across allies is essential.
As highlighted in our report, AI in Support of StratCom Capabilities, this research aims to bring parity in strategic communication tools and practices among allies. Disinformation transcends geographic boundaries, and technological limitations in the digital space necessitate the development of robust capabilities for defence against such challenges. This research aims to illuminate gaps in current methodologies that require further exploration and resolution. This report evaluates the potential of the languages of the Baltic States—Estonian, Latvian, and Lithuanian—for topic modelling and narrative detection. Specifically, we will focus on annotated datasets and open-source tools suitable for executing these tasks.