Alexandre Letois
Last known affiliation: EPITA
Robert Erra 🗣 | Sébastien Larinier 🗣 | Alexandre Letois | Marwan Burelle
Abstract (click to view)
Malware are now developed at an industrial scale and human analysts need automatic tools to help them.
We propose here to present the results of our experiments on this difficult problem: how to cluster a very large set of malware (with only static information) to be able to classify some new malware. To cluster a set of (numerical) objects is to group into meaningful categories these objects. We want objects in the same group to be closer (or more similar) to each other than to those in other groups. Such groups of similar objects are called clusters. When data are labeled, this problem is called supervised clustering. It is a difficult problem but easier that the {it unsupervised clustering} problem we have when data are not labeled.
All our experiments have been done with code written in Python and we have mainly used scikit-learn so you will probably be able to do the work again with your own feature vectors (well we hope for you!).
We will present some results on our dataset of two million malware. We will give some example of the results we have found and we will talk about future works that could be interesting to do (well: problems still to be solved).