How to parse the output of git log

Here is how to get the output of "git log" in an easy to parse format and build a python dict from the result. You could then convert the dict to JSON, XML, HTML, etc.

First, look at the git-log man page and find the section on "Pretty Formats." There are different codes to use (like printf) for the commit metadata (e.g. %an for author name).

Store these codes, along with the corresponding field names in two lists:

GIT_COMMIT_FIELDS = ['id', 'author_name', 'author_email', 'date', 'message']
GIT_LOG_FORMAT = ['%H', '%an', '%ae', '%ad', '%s']

Then, join the format fields together with "\x1f" (ASCII field separator) and delimit the records by "\x1e" (ASCII record separator). These characters are not likely to appear in your commit data, so they are pretty safe to use for parsing.

GIT_LOG_FORMAT = '%x1f'.join(GIT_LOG_FORMAT) + '%x1e'

Then run git log --format="..." with your format string, split the fields, and make a dict from them:

p = Popen('git log --format="%s"' % GIT_LOG_FORMAT, shell=True, stdout=PIPE)
(log, _) = p.communicate()
log = log.strip('\n\x1e').split("\x1e")
log = [row.strip().split("\x1f") for row in log]
log = [dict(zip(GIT_COMMIT_FIELDS, row)) for row in log]


$ python
[{'author_email': '',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:58:00 2012 -0800',
  'id': 'f1dc488e092e5e725c2ec3b7afc3962f0ba707d3',
  'message': 'third commit'},
 {'author_email': '',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:57:54 2012 -0800',
  'id': '1bf26e9aa0cb8c9b95b579695c6af349319a88ab',
  'message': 'second commit'},
 {'author_email': '',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:57:47 2012 -0800',
  'id': '9c2db5dffa7c70358ab78b6092539ce26006775b',
  'message': 'this is the first commit'}]

Full working example.

blog comments powered by Disqus
Illustration of a grassy knoll