Cybersecurity company Endgame , which provides security solutions for preventing attacks and detecting threats, has released a large data set that can be used for training AI-based security systems. In a research paper they recently published, Endgame’s Hyrum S Anderson and Phil Roth describe EMBER – a “benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files”.
With the number of security threats constantly rising and the number of malware variants increasing daily, as threat actors use automated tools to create bespoke payloads, AI is becoming increasingly important. But one of the challenges that comes with programming AI systems is having large datasets that can be used to train the software. Endgame is hoping EMBER (Endgame Malware BEnchmark for Research) will change that.
The EMBER research paper describes the dataset, noting that it contains both malicious and benign samples of over a million different files. These can be used to train AI systems.
This is a sign off the changing nature of the cybersecurity industry. In the past, companies held such data tightly, hoping to use it as a market advantage. But the dark web’s emergence in facilitating a shadow economy where bad guys can cooperate to share skills and resources has changed the threat landscape. Consequently, the people we trust to help defend our against threats are responding by sharing information and working together.
The release of EMBER is another step along that road.