python script to inventorise your ab1 files with md5sums
The problem that needed solving this time was having a list of filenames of my ab1 files, location( directory path) and a md5sum so i know if duplicate filenames are the same file or just a result of misnaming.
managed to come up with this after copying from two different scripts
one that was used to make an inventory of a directory of ogg songs and the other a python equivalent of md5sum check in linux.
Have fun!
#!/usr/bin/python
#===============================================================================
#
# FILE: inventory-abi.py
#
# USAGE: ./inventory-abi.py
#
# DESCRIPTION: Lists all the files of extension .ab1 with the directory and its md5sum
# adapated from code from http://pthree.org/2007/08/09/recursion-in-python/ and
# used md5sum code from http://code.activestate.com/recipes/266486/
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: will execute md5 on directory as well
# current method to get CWD is not OS independent
# NOTES: ---
# AUTHOR: Kevin ,
# VERSION: 1.0
# CREATED: 11/07/2008 07:03:16 PM SGT
# REVISION: ---
#===============================================================================
import dircache, os, md5
counter = 0
def sumfile(fobj):
'''Returns an md5 hash for an object with read() method.'''
m = md5.new()
while True:
d = fobj.read(8096)
if not d:
break
m.update(d)
return m.hexdigest()
def md5sum(fname):
'''Returns an md5 hash for file fname, or stdin if fname is "-".'''
if fname == '-':
ret = sumfile(sys.stdin)
else:
try:
f = file(fname, 'rb')
except:
return 'Failed to open file'
ret = sumfile(f)
f.close()
return ret
def PrintFiles(indent):
global counter
thisDir = os.getcwd()
for file in dircache.listdir(thisDir):
if (file.endswith('ab1') or os.path.isdir(file)) and not file.startswith('.'):
if file.endswith('ab1'):
counter += 1
currdir = os.popen("pwd") #for output of cwd currently works for linux pending upgrade to OS independent
md5 = md5sum(file) #calls the md5sum function, md5 lib ships with Python
ab1File.write('%s%s\t%s\t%s\n' %(indent, file, currdir.readline()[:-1], md5))
if os.path.isdir(file):
os.chdir(file)
PrintFiles(indent + ' ')
os.chdir('../')
try:
ab1File = open('ab1files.txt', 'w')
except IOError, e:
print "Unable to open 'ab1files.txt' for writing: ", e
else:
PrintFiles('')
ab1File.write('\nCurrent number of ab1 files: %d\n\n' %(counter))
ab1File.close()
Google DevFest D3vF3st now!
Lolz writing now from the Google DevFest at Singapore… hmm sadly its not packed to the brim right now.. maybe cos its just after a long weekend. oh well, am pretty excited though.. will post relevant updates if any..
check out
http://code.google.com/events/apacdevfest/
my online notes as the event progresses
http://docs.google.com/Doc?id=dhj8xhdw_47djn633f6
Update: There’s going to be a SE Asia OpenSocial Application Contest
Check out details at http://code.google.com/events/apacdevfest/contest/
The Event website
http://www.e27.sg/2008/10/13/googles-1st-hackathon-in-southeast-asia-whos-coming/
Python script to split a text file by even or odd numbers
written a short script to split a file into even or odd line numbers
#!/usr/bin/python
## loop do something to each line of input file
## changed to write the even line numbers to a file
## and the odd line numbers to another
## note that even numbers start with line 0 (not 1!)
## usage: sort-even-odd.py inputfile
## written by kevinl @ kevinl.wordpress.com
import sys
def isodd(n):
return bool(n%2)
input=open(sys.argv[1], 'r')
L=input.readlines()
evenout=open('evenout', 'w')
oddout=open('oddout','w')
for linecount in range(len(L)):
if isodd(linecount):
oddout.write(L[linecount])
else:
evenout.write(L[linecount])
#print "line number is " + str(linecount)
1st to publish with Google apps?
Gosh down with flu yesterday and exciting news broke out
http://code.google.com/appengine/
to read the reviews and comments check out
http://googleblog.blogspot.com/2008/04/developers-start-your-engines.html
http://nsaunders.wordpress.com/2008/04/08/googles-appengine/
I wonder who will be the first to develop an app host it there and publish a paper in a journal with it..
greasemonkey extensions have already been published. what’s stopping a bioinformatician with a lack of web resources to use google’s?
7 bioinformatics secrets every biologist should know
see this lecture in youtube!
I am already using some of the stuff mentioned in here. I might even add a few more to the list.. but it seems like a cool lecture for biologists.
http://www.mozilla.org/
http://biobar.mozdev.org/
http://www.google.com/intl/en/options/
https://addons.mozilla.org/en-US/firefox/addon/748
http://www.ihop-net.org/UniPub/iHOP/
http://bioinfo.icapture.ubc.ca/iHOPerator/
http://gaggle.systemsbiology.org/docs/
http://apropos.mcw.edu/
http://string.embl.de/
consolidated links list from computational biology blog
Seqan a open source C++ seq analysis tool pack
Abstract
SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data. Our library applies a unique generic design that guarantees high performance, generality, extensibility, and integration with other libraries. SeqAn is easy to use and simplifies the development of new software tools with a minimal loss of performance.
Check it out here http://www.seqan.de/ found via computationalbiologynews