Parsing HTTP Log Files with Python
0Couple of quick snippets on parsing apache/http log files (common format) with Python. This is the regular expression for parsing each line of the log:
combined_format_re = re.compile(r'''(?P.*?) -(?P.*?)- \[(?P.*?)\] "(?P.*?)(?P
.*?)(?P\?.*?)? (?P
.*?)" (?P\d*) (?P.*?) "(?P.*?)""(?P.*?)"''')
You can use it ala:
match = combined_format_re.search(line)
And you can get the matches in a convenient hash form via:
fields = match.groupdict()
print fields['useragent']
And while we're at it, here's how you parse the timestamp into a python datetime object:
import datetime
timestamp = datetime.datetime.strptime(fields['date'].split()[0], '%d/%b/%Y:%H:%M:%S')
Wordpress formatting will probably mess up some of the code above; in theory I'll be releasing a small piece of code soon that uses all of these so you can get the source.
Manage your expenses via Email, SMS, iPhone, Twitter, Voice (Call and say your expense), IM (Yahoo, AIM, MSN), or Web.