Towards Consensus Labeling of Malware Threats

The unprecedented immensity and variety of malware threats (e.g., virus, Trojan horses, worms) have spurred intensive research on large-scale malware analysis in both academia and industrial communities; yet, the knowledge bases built by such effort have not been collectively leveraged to a large extent. One fundamental barrier facing the integration of threat intelligence is the lack of malware labeling standards. We show the severity of this problem by an in-depth empirical study of the labeling systems of five popular anti-virus engines using a large collection of malware instances. Instead of attempting to unify the malware naming conventions, we propose a pragmatic alternative: leveraging correspondence evidences from multiple anti-virus sources to create a virtual, consensus malware categorization, such that different anti-virus vendors can communicate through this consensus scheme without changing their local naming conventions. We present a prototype malware label matching system LATIN that makes it possible to tell whether two malware samples under different naming conventions refer to the same malware category simply by their names.

By: Ting Wang, Xin Hu

Published in: RC25288 in 2012


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .