In today’s world, much of what we do, from exercising to texting, is measured and tracked. This data is constantly harvested by data brokers and social media to create behavioral profiles, later used by artificial intelligence (AI) powered services. The overwhelming abundance of data has ushered in an age of analytics, and the rise of AI has enabled big-data decision making. However, the training of AI models is often challenging due to a supervised training strategy that requires a large amount of labelled data. Synthetic data is a possible solution to various challenges, including data labelling and, more importantly, data privacy problems. Synthetic data can be generated by using advances in rendering pipelines, generative adversarial models, and fusion models. It is predicted that most online-generated content will be AI created.
The aim of this report is to inform the general community of AI practitioners and enthusiasts about the risks (red team perspective) and opportunities (blue team perspective) synthetic data brings to digital content generation which can be used to support the disinformation actors in their activities, including context-based content generation. Then we continue with the blue team perspective and investigate how open-source AI (machine learning and deep learning) could be used to carry out and understand hostile communications, including how AI has enabled a new generation of synthetic operations, how current sentiment analysis lags behind, and how more granular text analysis enables micro aggressive and aggressive text patterns to be searched, thus allowing the identification of hostile communications in various online and social media data.