The problem that needed solving this time was having a list of filenames of my ab1 files, location( directory path) and a md5sum so i know if duplicate filenames are the same file or just a result of misnaming.
managed to come up with this after copying from two different scripts
one that was used to make an inventory of a directory of ogg songs and the other a python equivalent of md5sum check in linux.
#!/usr/bin/python #=============================================================================== # # FILE: inventory-abi.py # # USAGE: ./inventory-abi.py # # DESCRIPTION: Lists all the files of extension .ab1 with the directory and its md5sum # adapated from code from http://pthree.org/2007/08/09/recursion-in-python/ and # used md5sum code from http://code.activestate.com/recipes/266486/ # OPTIONS: --- # REQUIREMENTS: --- # BUGS: will execute md5 on directory as well # current method to get CWD is not OS independent # NOTES: --- # AUTHOR: Kevin , # VERSION: 1.0 # CREATED: 11/07/2008 07:03:16 PM SGT # REVISION: --- #=============================================================================== import dircache, os, md5 counter = 0 def sumfile(fobj): '''Returns an md5 hash for an object with read() method.''' m = md5.new() while True: d = fobj.read(8096) if not d: break m.update(d) return m.hexdigest() def md5sum(fname): '''Returns an md5 hash for file fname, or stdin if fname is "-".''' if fname == '-': ret = sumfile(sys.stdin) else: try: f = file(fname, 'rb') except: return 'Failed to open file' ret = sumfile(f) f.close() return ret def PrintFiles(indent): global counter thisDir = os.getcwd() for file in dircache.listdir(thisDir): if (file.endswith('ab1') or os.path.isdir(file)) and not file.startswith('.'): if file.endswith('ab1'): counter += 1 currdir = os.popen("pwd") #for output of cwd currently works for linux pending upgrade to OS independent md5 = md5sum(file) #calls the md5sum function, md5 lib ships with Python ab1File.write('%s%s\t%s\t%s\n' %(indent, file, currdir.readline()[:-1], md5)) if os.path.isdir(file): os.chdir(file) PrintFiles(indent + ' ') os.chdir('../') try: ab1File = open('ab1files.txt', 'w') except IOError, e: print "Unable to open 'ab1files.txt' for writing: ", e else: PrintFiles('') ab1File.write('\nCurrent number of ab1 files: %d\n\n' %(counter)) ab1File.close()
Lolz writing now from the Google DevFest at Singapore… hmm sadly its not packed to the brim right now.. maybe cos its just after a long weekend. oh well, am pretty excited though.. will post relevant updates if any..
my online notes as the event progresses
Update: There’s going to be a SE Asia OpenSocial Application Contest
Check out details at http://code.google.com/events/apacdevfest/contest/
The Event website
written a short script to split a file into even or odd line numbers 🙂
#!/usr/bin/python ## loop do something to each line of input file ## changed to write the even line numbers to a file ## and the odd line numbers to another ## note that even numbers start with line 0 (not 1!) ## usage: sort-even-odd.py inputfile ## written by kevinl @ kevinl.wordpress.com import sys def isodd(n): return bool(n%2) input=open(sys.argv, 'r') L=input.readlines() evenout=open('evenout', 'w') oddout=open('oddout','w') for linecount in range(len(L)): if isodd(linecount): oddout.write(L[linecount]) else: evenout.write(L[linecount]) #print "line number is " + str(linecount)
Gosh down with flu yesterday and exciting news broke out
to read the reviews and comments check out
I wonder who will be the first to develop an app host it there and publish a paper in a journal with it..
greasemonkey extensions have already been published. what’s stopping a bioinformatician with a lack of web resources to use google’s?
see this lecture in youtube!
I am already using some of the stuff mentioned in here. I might even add a few more to the list.. but it seems like a cool lecture for biologists.
consolidated links list from computational biology blog
SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.