Inline Detection of Copy-Paste Botnet C&C
The source code of botnets is often leaked online and re-used by new botnets. The re-use of source code assists bot-owners in quickly setting up their botnets, but it also inherits similarities to known botnets that can assist in detection. Most specifically, the URL paths that a bot uses to communicate with their C&C are often re-used.
In this talk, we present a system to identify patterns in URL paths that serve known botnets in order to block them if they are ever re-used by new botnets. The results of the systems are intended for use in an inline, high-performing HTTP proxy and accordingly, existing solutions that target malicious URL detection such as neural networks are inconsiderable. Instead, we construct an offline language model using the Smith-Waterman algorithm, cluster it and use a known set of genetic algorithms to propose regular expressions that match on sets of bot C&C URLs without matching any benign URL. Our experimental setup includes 1.4M URLs, both bot C&C and benign, and our initial results yielded 1.3k new bot C&C URLs, and a 96.3% accuracy for patterns that appeared at least twelve times within the training data. Moreover, the system is currently being deployed on a large-scale HTTP traffic to report results over time.