Using systematic code reuse analysis to create robust YARA rules
YARA is a commonly used tool to detect and identify malware. There are roughly two types of YARA rules used on binary files: 1) based on metadata and strings and 2) based on code.
There are certain benefits by basing YARA rules on code. Since code reuse is frequent amongst binaries of a malware family, it offers plenty of options to base a YARA rule on. If the chosen code is heavily reused amongst the binaries, then it can result in very robust rules.
This approach comes with certain challenges. A key aspect is being able to find heavily reused code amongst many binaries of a malware family. Unless some sort of automation is at play, this quickly becomes difficult and time-consuming. Once suitable reused code is identified, it needs to be turned into a YARA rule, so that it works even when compiler differences, optimizations or instruction set changes are involved.
In this workshop we will create robust YARA rules for a handful of malware families based on automatically identifying shared code between many binaries of a family.