python script to inventorise your ab1 files with md5sums

November 7, 2008 at 11:35 am (opensource, tips) (, , , , )

The problem that needed solving this time was having a list of filenames of my ab1 files, location( directory path) and a md5sum so i know if duplicate filenames are the same file or just a result of misnaming.

managed to come up with this after copying from two different scripts

one that was used to make an inventory of  a directory of ogg songs and the other a python equivalent of md5sum check in linux.

Have fun!

#!/usr/bin/python
#===============================================================================
#
#         FILE:  inventory-abi.py
#
#        USAGE:  ./inventory-abi.py
#
#  DESCRIPTION:  Lists all the files of extension .ab1 with the directory and its md5sum
#  adapated from code from http://pthree.org/2007/08/09/recursion-in-python/ and
#  used md5sum code from http://code.activestate.com/recipes/266486/
#      OPTIONS:  ---
# REQUIREMENTS:  ---
#         BUGS:  will execute md5 on directory as well
#                current method to get CWD is not OS independent
#        NOTES:  ---
#       AUTHOR:  Kevin ,
#      VERSION:  1.0
#      CREATED:  11/07/2008 07:03:16 PM SGT
#     REVISION:  ---
#===============================================================================

import dircache, os, md5
counter = 0

def sumfile(fobj):
    '''Returns an md5 hash for an object with read() method.'''
    m = md5.new()
    while True:
        d = fobj.read(8096)
        if not d:
            break
        m.update(d)
    return m.hexdigest()

def md5sum(fname):
    '''Returns an md5 hash for file fname, or stdin if fname is "-".'''
    if fname == '-':
        ret = sumfile(sys.stdin)
    else:
        try:
            f = file(fname, 'rb')
        except:
            return 'Failed to open file'
        ret = sumfile(f)
        f.close()
    return ret

def PrintFiles(indent):
    global counter
    thisDir = os.getcwd()

    for file in dircache.listdir(thisDir):
        if (file.endswith('ab1') or os.path.isdir(file)) and not file.startswith('.'):
            if file.endswith('ab1'):
                counter += 1

            currdir = os.popen("pwd") #for output of cwd currently works for linux pending upgrade to OS independent
            md5 = md5sum(file) #calls the md5sum function, md5 lib ships with Python

            ab1File.write('%s%s\t%s\t%s\n' %(indent, file, currdir.readline()[:-1], md5))

            if os.path.isdir(file):
                os.chdir(file)
                PrintFiles(indent + '  ')
                os.chdir('../')

try:
    ab1File = open('ab1files.txt', 'w')
except IOError, e:
    print "Unable to open 'ab1files.txt' for writing: ", e
else:
    PrintFiles('')
    ab1File.write('\nCurrent number of ab1 files: %d\n\n' %(counter))
    ab1File.close()
Advertisements

Permalink 4 Comments

Google DevFest D3vF3st now!

October 28, 2008 at 5:32 am (opensource) (, , )

Lolz writing now from the Google DevFest at Singapore… hmm sadly its not packed to the brim right now.. maybe cos its just after a long weekend. oh well, am pretty excited though.. will post relevant updates if any..

check out

http://code.google.com/events/apacdevfest/

my online notes as the event progresses

http://docs.google.com/Doc?id=dhj8xhdw_47djn633f6

Update: There’s going to be a SE Asia OpenSocial Application Contest

Check out details at http://code.google.com/events/apacdevfest/contest/

The Event website

http://www.e27.sg/2008/10/13/googles-1st-hackathon-in-southeast-asia-whos-coming/

Permalink Leave a Comment

Python script to split a text file by even or odd numbers

June 20, 2008 at 11:50 am (opensource, software, tips) (, , , , , , )

written a short script to split a file into even or odd line numbers 🙂

#!/usr/bin/python
## loop do something to each line of input file
## changed to write the even line numbers to a file
## and the odd line numbers to another
## note that even numbers start with line 0 (not 1!)
## usage: sort-even-odd.py inputfile
##  written by kevinl @ kevinl.wordpress.com

import sys

def isodd(n):
    return bool(n%2)

input=open(sys.argv[1], 'r')
L=input.readlines()
evenout=open('evenout', 'w')
oddout=open('oddout','w')

for linecount in range(len(L)):
    if isodd(linecount):
        oddout.write(L[linecount])
    else:
        evenout.write(L[linecount])
    #print "line number is " + str(linecount)

Permalink Leave a Comment

1st to publish with Google apps?

April 8, 2008 at 3:47 pm (bioinformatics, opensource, software, tips) ()

Gosh down with flu yesterday and exciting news broke out

http://code.google.com/appengine/

to read the reviews and comments check out

O’Reilly Radar writeup

http://googleblog.blogspot.com/2008/04/developers-start-your-engines.html

http://nsaunders.wordpress.com/2008/04/08/googles-appengine/

I wonder who will be the first to develop an app host it there and publish a paper in a journal with it..

greasemonkey extensions have already been published. what’s stopping a bioinformatician with a lack of web resources to use google’s?

Permalink Leave a Comment

7 bioinformatics secrets every biologist should know

March 28, 2008 at 5:26 pm (bioinformatics, opensource, review, software, tips)

see this lecture in youtube!

I am already using some of the stuff mentioned in here. I might even add a few more to the list..  but it seems like a cool lecture for biologists.

http://www.mozilla.org/
http://biobar.mozdev.org/
http://www.google.com/intl/en/options/
https://addons.mozilla.org/en-US/firefox/addon/748
http://www.ihop-net.org/UniPub/iHOP/
http://bioinfo.icapture.ubc.ca/iHOPerator/
http://gaggle.systemsbiology.org/docs/
http://apropos.mcw.edu/
http://string.embl.de/

consolidated links list from computational biology blog

Permalink Leave a Comment

Seqan a open source C++ seq analysis tool pack

January 28, 2008 at 6:48 am (opensource) ()

Abstract

SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.

Check it out here http://www.seqan.de/ found via computationalbiologynews

Permalink Leave a Comment