I recently started a project to archive network traffic traces with some fellow researchers from whom I learned about
Bro. Bro is an amazing intrusion detection system that has been developed at UC Berkeley based on many years of research done by Prof. Vern Paxson and his team. The interesting thing about Bro is that it could be used for archiving network traffic traces with a high granularity. What this means is unlike utilities like tcpdump Bro is intelligent enough to uniquely identify connections between hosts (TCP/UDP) and application level protocols being used in those connections. As a result Bro logs network level connections, HTTP, SMTP, FTP, SSH, SSL, etc with their protocol data instead of logging raw packet headers.
Another cool feature of Bro is the new extensible logging framework which will be available in up coming version 2.0. A well written document on the extensible logging framework can be found
here. Even though Bro was capable of on the fly anonymization of traces in version 1.5 it is broken and will be removed from the next version (version 2.0). According to Bro devs we will have to wait for sometime to see this very useful feature back. Until they come up with the code for it based on
Pang R, et al. "A High-level Programming Environment for Packet Trace Anonymization and Transformation" I came up with a workaround that works but far less elegant (More on this in my next blog post). Since logs are anonymized sufficiently we can now carry out our research. Except for issues on trace anonymization Bro does a fantastic job (However anonymization matters a lot when data is shared between institutions). End of the day all I can say is that Bro is fast, clean and extensible!