This article is a summary of my contribution to Honeytrap in the context of Google Summer of Code 2018, as required by Google.
I achieved the proposed project, “Implementing Yara rules in Honeytrap” (code: 1, 2, 3). I also implemented several features both in the scope of my project and outside of it, for example transforms, Redis/memcached/adb honeypots, ports-based config.
My proposal was about implementing Yara rules in the Honeytrap honeypot. Yara is a pattern-matching language, commonly used to classify malware samples in a language- and platform-agnostic way; I figured it’d be interesting to apply it to events produced by a honeypot.
The text of my proposal follows (amended for conciseness).
Yara is very extensible, and could be conveniently used to describe malicious actors interacting with a honeypot. Specifically, with the pattern-matching we’ll be able to identify specific attacks and link them to specific botnets or exploits.
The final goal of my application is to fully integrate Yara in Honeytrap spanning several contexts, a work which directly translates to a logical timeline:
- Yara filter: Honeytrap implements filters, which are components that select data from a channel and pipe it into another. A Yara-based filter enables the user to write smart filters that group interactions logically: an example application would be a malware researcher storing all Mirai connection attempts into ElasticSearch for later analysis, or a company logging Nmap-based connections to a canary to be alerted of internal network scans. Such filters would allow for both stateless matching (operating on single events, independently of the others) and more complex stateful matching.
- Yara search: users may also be interested in analyzing interactions at a later time. As such, there may be interest in searching logs using Yara rules. This section would implement Yara searching in ElasticSearch, the Honeytrap backend of choice, through server-side scripting.
- Yara reporting: Honeytrap implements a Web-based dashboard. Integrating Yara into the dashboard is a key step in making this contribution easy to use.
- Yara definitions: once the pattern-matching infrastructure is in place, I plan to create Yara definitions for common threats and other items of interest: among them, specific CVEs (eg. CVE-2017-0144 aka EternalBlue), botnets (Mirai network), tools (sqlmap, nmap).
I eventually found that filters were too simple for the project requirements, so I refactored them into transforms, a stateful component which receives single events and can send zero, one or more events (in this sense, it is similar to a flat map).
I proceeded to implement a transform for Yara matching in the form of a filter (you pass a Yara file in the config, it filters all events that match at least one rule); however, I found that there are many situations in which a filter is not sufficient (eg. detecting an event from ruleset A, followed by an event from ruleset B), so I made a generic Yara “library” which can be reused in custom complex transforms.
As a result of this development, I also contributed to an external project, a Yara parser.
PRs: #284, #326
I developed an ElasticSearch plugin for searching events that match a Yara ruleset.
As a personal note, the development of this feature was by far the most excruciating, due to the sheer complexity of Java development and the lack of developer documentation for ElasticSearch.
I added a Yara panel to the Web UI.
I wrote “services” (i.e. honeypots) for the following protocols/softwares: Redis, Memcached, TFTP, WordPress, CWMP/TR-069, SNMP, ADB.
#241, #265, #308, #362, #384, #407, #408.
Honeytrap: #257, #267, #280, #289, #371.
I found out that to apply Yara rules I had to first define variables, which led to the use of a parser to extract variables from the rule source. To do that, I extended an existing parser in Go,
Bug reports: unexpected behaviour when…