Tuesday, September 18, 2012

Two Quick Email Forensic Nuggets

Email is certainly one of the more prevelant methods of spreading malware to a target, and likely one that any network analyst is going to be investigating on a regular basis. While nearly every bit of an email header can be found documented thoroughly at places such as the Forensic Wiki, there are a few tricks I've figured out along the way that I haven't seen anywhere else and they could help with that email you're trying to figure out where it came from.


Tip #1: Encoded internal IP addresses and machine names

During a pentest a few years ago I was studying a batch of emails with "malicious" links to create a signature to find any more that we may have missed. The pentest team was good: different email addresses each time, completely different themes, different links, and even different IP addresses from a variety of dial-up accounts around the country were used on each email. Finding a signature to look for would be a little more challenging than usual. 


Not part of a pentest: Sample email header with message ID from Microsoft MimeOLE. Credit to Mila Parkour
The first thing we noticed was that the end of each message ID field was similar:

<0023af[...]$0e34a8c0@wr1p4>
<03e12d[...]$1734a8c0@wr2p3>

Of course, we had to figure out what this was. If you're reading this, you're probably already aware that when Microsoft products (Outlook, Outlook Express) use SMTP to send an email, they generate their own message ID when the message is sent. The last part (after the @) is the computer name that generated the message ID - in this case the computer names used by the pentest team: "WR1p4" and "WR2P3"

This "feature" is pretty well known. But what was interesting were the rest of the bits. How are these generated? Well, it seems that for additional entropy, the MIME library is taking the IP address of computer  and embedding it into the message ID! The 32-bit (8-hex character) group following the last "$" in the message ID contains the hex-encoded, big-endian version of the IP address. Here's how it works:

Given the message ID: <000c01cc0998$15c8ec70$0201a8c0@protech.com.tw>
Take the part after the last "$": 0201a8c0
Separate into 4 x 1 byte groups and reverse the order: c0 a8 01 02
Convert each byte to decimal: c0 = 192, a8 = 168, 01 = 1, 02 = 2
And you've got the IP addresses assigned to the sending computer: 192.168.1.2

While a 192.168.0.0/16 address isn't usually too interesting, it can be helpful if you're trying to identify a specific machine behind a NAT. The IP address is also sometimes a globally routable IP address. And in my pentest case, although the IP was an RFC 1918 address, the IP range used was unique to a particular type of older Cisco router which helped find other emails sent via this infrastructure - even those without similar computer names.

Of course this doesn't always work. This type of message ID is only generated when SMTP is used from the client. Additionally, I've seen some crazy values here in spam messages. None the less, it's been very useful in getting a little bit more info about an attacker for me.



Tip #2: Gmail HTTP embeds sender time zones


Sample Gmail header from targeted attack email. Credit to Mila Parkour

When SMTP is used to send email, Gmail headers look like anyone else's services for the most part. However, when the sender uses the Gmail web interface, the story is a bit different. Unfortunately for those of us who analyze email headers, Google takes privacy seriously for it's users, which means they don't include the sender's IP address in the header. Instead all you get is:



Received: by 10.227.157.66 with HTTP

Which isn't very useful for identifying the originator. However, I've found that there is one little hidden gem that's helped me more than once: the timezone!

Date: Fri, 22 Apr 2011 22:32:02 +0800

While not the best piece of data, the timezone according to the sender's computer (not IP-based location), as detected by Google's servers, is used on the "Date:" line. Even if the attacker uses a VPN to connect from a destination in a timezone more suitable, the computer gives away its settings and Gmail will embed that data into the header. PDT/PST is used elsewhere, but that one particular line can help guide you to the origin of the email. If the sender claims to be in New York, but has GMT+8 as a timezone, perhaps you should respond with 我发现你.