python script to inventorise your ab1 files with md5sums

November 7, 2008 at 11:35 am (opensource, tips) (, , , , )

The problem that needed solving this time was having a list of filenames of my ab1 files, location( directory path) and a md5sum so i know if duplicate filenames are the same file or just a result of misnaming.

managed to come up with this after copying from two different scripts

one that was used to make an inventory of  a directory of ogg songs and the other a python equivalent of md5sum check in linux.

Have fun!

#         FILE:
#        USAGE:  ./
#  DESCRIPTION:  Lists all the files of extension .ab1 with the directory and its md5sum
#  adapated from code from and
#  used md5sum code from
#      OPTIONS:  ---
#         BUGS:  will execute md5 on directory as well
#                current method to get CWD is not OS independent
#        NOTES:  ---
#       AUTHOR:  Kevin ,
#      VERSION:  1.0
#      CREATED:  11/07/2008 07:03:16 PM SGT
#     REVISION:  ---

import dircache, os, md5
counter = 0

def sumfile(fobj):
    '''Returns an md5 hash for an object with read() method.'''
    m =
    while True:
        d =
        if not d:
    return m.hexdigest()

def md5sum(fname):
    '''Returns an md5 hash for file fname, or stdin if fname is "-".'''
    if fname == '-':
        ret = sumfile(sys.stdin)
            f = file(fname, 'rb')
            return 'Failed to open file'
        ret = sumfile(f)
    return ret

def PrintFiles(indent):
    global counter
    thisDir = os.getcwd()

    for file in dircache.listdir(thisDir):
        if (file.endswith('ab1') or os.path.isdir(file)) and not file.startswith('.'):
            if file.endswith('ab1'):
                counter += 1

            currdir = os.popen("pwd") #for output of cwd currently works for linux pending upgrade to OS independent
            md5 = md5sum(file) #calls the md5sum function, md5 lib ships with Python

            ab1File.write('%s%s\t%s\t%s\n' %(indent, file, currdir.readline()[:-1], md5))

            if os.path.isdir(file):
                PrintFiles(indent + '  ')

    ab1File = open('ab1files.txt', 'w')
except IOError, e:
    print "Unable to open 'ab1files.txt' for writing: ", e
    ab1File.write('\nCurrent number of ab1 files: %d\n\n' %(counter))

Permalink 4 Comments

Google DevFest D3vF3st now!

October 28, 2008 at 5:32 am (opensource) (, , )

Lolz writing now from the Google DevFest at Singapore… hmm sadly its not packed to the brim right now.. maybe cos its just after a long weekend. oh well, am pretty excited though.. will post relevant updates if any..

check out

my online notes as the event progresses

Update: There’s going to be a SE Asia OpenSocial Application Contest

Check out details at

The Event website

Permalink Leave a Comment

Python script to split a text file by even or odd numbers

June 20, 2008 at 11:50 am (opensource, software, tips) (, , , , , , )

written a short script to split a file into even or odd line numbers 🙂

## loop do something to each line of input file
## changed to write the even line numbers to a file
## and the odd line numbers to another
## note that even numbers start with line 0 (not 1!)
## usage: inputfile
##  written by kevinl @

import sys

def isodd(n):
    return bool(n%2)

input=open(sys.argv[1], 'r')
evenout=open('evenout', 'w')

for linecount in range(len(L)):
    if isodd(linecount):
    #print "line number is " + str(linecount)

Permalink Leave a Comment

1st to publish with Google apps?

April 8, 2008 at 3:47 pm (bioinformatics, opensource, software, tips) ()

Gosh down with flu yesterday and exciting news broke out

to read the reviews and comments check out

O’Reilly Radar writeup

I wonder who will be the first to develop an app host it there and publish a paper in a journal with it..

greasemonkey extensions have already been published. what’s stopping a bioinformatician with a lack of web resources to use google’s?

Permalink Leave a Comment

7 bioinformatics secrets every biologist should know

March 28, 2008 at 5:26 pm (bioinformatics, opensource, review, software, tips)

see this lecture in youtube!

I am already using some of the stuff mentioned in here. I might even add a few more to the list..  but it seems like a cool lecture for biologists.

consolidated links list from computational biology blog

Permalink Leave a Comment

Seqan a open source C++ seq analysis tool pack

January 28, 2008 at 6:48 am (opensource) ()


SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.

Check it out here found via computationalbiologynews

Permalink Leave a Comment