Python script to split a text file by even or odd numbers

June 20, 2008 at 11:50 am (opensource, software, tips) (, , , , , , )

written a short script to split a file into even or odd line numbers :)

#!/usr/bin/python
## loop do something to each line of input file
## changed to write the even line numbers to a file
## and the odd line numbers to another
## note that even numbers start with line 0 (not 1!)
## usage: sort-even-odd.py inputfile
##  written by kevinl @ kevinl.wordpress.com

import sys

def isodd(n):
    return bool(n%2)

input=open(sys.argv[1], 'r')
L=input.readlines()
evenout=open('evenout', 'w')
oddout=open('oddout','w')

for linecount in range(len(L)):
    if isodd(linecount):
        oddout.write(L[linecount])
    else:
        evenout.write(L[linecount])
    #print "line number is " + str(linecount)

Permalink No Comments

Comment: Putting evolutionary theory into practice - opinion - 06 May 2008 - New Scientist

May 7, 2008 at 8:03 am (evolution, journal, review) (, , )

fascinating thoughts and a new journal! Didn’t know there can be a field called applied evolution. although I have to disagree that it has only surfaced recently on how fast evolution can be.

One of the earliest research I have read about genetics and applied evol pressure is done on guppies size variation in Trinidad due to predator pressure.

how ‘fast’ evolution proceeds is in most parts due to the lifespan and reproductive patterns of the organism in question.

link

— Nesse says that progress is being hampered by the fact that many medics still think of the body as a machine designed by an engineer, when in fact it is a “bundle of compromises … designed to maximise reproduction, not health”. There is no question about the importance of applied evolution. The trouble is, if biologists themselves are only just waking up to how relevant and crucial evolution can be, what hope is there of educating the leaders and policy makers who need to understand and act upon this research? Not much, I fear.

Permalink No Comments

Lest you think annotation is easy

April 17, 2008 at 9:53 am (bioinformatics, genome, review) ()

Fungal Genomes and Comparative Genomics - Lest you think annotation is easy - Apr 13
I love the title of this post. And I totally agree.. gene annotation is anything but easy..
keeping track of which prediction programs and the rationale behind the annotation (manual or automated) is one thing the author didn’t mention.

Permalink No Comments

Bioinformaticians needed!

April 9, 2008 at 5:06 am (bioinformatics, genome) (, )

Chanced upon an ad by the Rubin’s lab. at newscientist link

the job scope is largely similar to what I am doing now. Currently where i am, bioinformaticians are moving away from the industry. Sad but true.

perhaps if we need more directors here like

‘Director David C. Page likens the Institute to an artists’ colony. “What we do here at Whitehead is attract the best possible intellectual capital and empower maximally creative—really wildly creative—individuals to realize their dreams within these walls”.’

I love reading job descriptions in my field. They let u in on the developing areas where talent is needed, so you know where to improve yourself. I am surprised though they didn’t mention python.

Description:

• Develop and implement existing and new computational methods and tools for high-throughput analysis of diverse data.
• Integrate multiple types of data and analytical methods in creative ways to exploit genomic information such as gene expression profiles and large-scale genome sequence data
• Manage data handling and analysis pipeline for Solexa sequencing platform - use software and databases to assemble and analyze genome sequence data
• Assist with the design and development of major bioinformatics-related programming projects.
• Conduct independent research projects, including primary responsibility for authoring manuscripts for publication in biology and bioinformatics journals.
• Write custom scripts to access databases and analyze sequence data.
• Collaborate with and support lab personnel in the area of bioinformatics analysis.

Qualifications:

• M.S./Ph.D. or equivalent in bioinformatics / computational biology disciplines with emphasis on biology.
• Minimum of one year of related experience.
• Proven experience using bioinformatics to solve biologically important questions.
• Experience with microarray data analysis, familiarity with online bioinformatics tools and databases, and pathway analysis.
• Experience with genome sequence alignments, large scale sequence data analysis
• Excellent interpersonal, verbal, and written communication skills.
• Must demonstrate outstanding personal initiative and the ability to work effectively as part of a team
• Background in utility programming (C Shell, Perl, JAVA, or other languages) in a UNIX environment, preferred but not required.
• Familiarity with designing, developing, and programming databases (Oracle, MySQL), preferred but not required.

Permalink No Comments

1st to publish with Google apps?

April 8, 2008 at 3:47 pm (bioinformatics, opensource, software, tips) ()

Gosh down with flu yesterday and exciting news broke out

http://code.google.com/appengine/

to read the reviews and comments check out

O’Reilly Radar writeup

http://googleblog.blogspot.com/2008/04/developers-start-your-engines.html

http://nsaunders.wordpress.com/2008/04/08/googles-appengine/

I wonder who will be the first to develop an app host it there and publish a paper in a journal with it..

greasemonkey extensions have already been published. what’s stopping a bioinformatician with a lack of web resources to use google’s?

Permalink No Comments

7 bioinformatics secrets every biologist should know

March 28, 2008 at 5:26 pm (bioinformatics, opensource, review, software, tips)

see this lecture in youtube!

I am already using some of the stuff mentioned in here. I might even add a few more to the list..  but it seems like a cool lecture for biologists.

http://www.mozilla.org/
http://biobar.mozdev.org/
http://www.google.com/intl/en/options/
https://addons.mozilla.org/en-US/firefox/addon/748
http://www.ihop-net.org/UniPub/iHOP/
http://bioinfo.icapture.ubc.ca/iHOPerator/
http://gaggle.systemsbiology.org/docs/
http://apropos.mcw.edu/
http://string.embl.de/

consolidated links list from computational biology blog

Permalink No Comments

De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.

March 17, 2008 at 6:51 am (genome, journal, sequencing, software) (, , )

 Chanced upon this interesting paper!

De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.

Geneva University Hospitals;

Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing datasets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5 - 3.0 kb fragments. We also provide indications that the broad coverage achieved by high throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced.

PMID: 18332092 [PubMed - as supplied by publisher]

Permalink 1 Comment

Consolidated quotes on junk DNA aka Non coding sequences

March 11, 2008 at 11:20 am (evolution, genome, junk dna, review) (, , , , , , )

Permalink No Comments

Rename Multiple Files Efficiently Using Excel or Google Docs

March 5, 2008 at 8:06 am (software, tips, winxp) (, , , , , )

link

batch renaming is what I picked up perl for in the first place. Then I found interesting software like 14arename (win only). I then also picked up abit of SED and AWK in linux.

I know about the batch rename feature in winxp but it didn’t occur to me I could do it in excel. Basically this page teaches you to use

“use SUBSTITUTE to change specific text in the filenames, use CONCATENATE() with DATE() if you want to add date to the filename, etc.” to create a column of rename commands in DOS. something like

ren  abcd.fa abcd.gbk

very old school i know but hey its a godsend in your colleagues windows box with no admin rights to install anything.

Permalink 1 Comment

Supramap a tool to map evol trees onto a globe

March 3, 2008 at 9:44 am (bioinformatics, genome) (, , , )

interesting note to self should explore this one day..

http://supramap.osu.edu/supramap/index.php?page=theory — Geographic mapping of evolutionary trees projected into a virtual globe allows users to analyze the spread of the organismal lineages into areas of interest. When all these data are integrated, we can visualize patterns in or to develop and test hypotheses. For example, we have used supramap to combine phylogenetic and virtual globe technologies to pinpoint which strains of a virus are infecting which hosts in specific areas (Janies et al., 2007). Finally, because phylogenetic analysis groups like strains into lineages, information drawn from limited experimentation on one strain in a lineage can be used to predict the properties of another strain in the lineage. This transitive property of phylogenetic inference will help us predict which strains are capable of infecting humans, are pathogenic, and/or are resistant to drugs. These capabilities are valuable to the public health community to make informed decisions on where and how to allocate resources to prepare for emerging diseases.

Permalink No Comments

« Previous entries