

How to build suitable datasets for successful detection of audio deepfakes
Deepfakes are a significant threat to democracy as well as private individuals and companies. They make it possible to spread disinformation, to steal intellectual property and to commit fraud, to name but a few. While robust AI detection systems offer a possible solution, their effectiveness depends largely on the quality of the underlying data, simply put: »Garbage in, garbage out.« But how do you create a dataset that is well suited to identifying the ever-evolving deepfakes and enables robust detection? And what constitutes high-quality training data?