written a short script to split a file into even or odd line numbers
#!/usr/bin/python
## loop do something to each line of input file
## changed to write the even line numbers to a file
## and the odd line numbers to another
## note that even numbers start with line 0 (not 1!)
## usage: sort-even-odd.py inputfile
## written by kevinl @ kevinl.wordpress.com
import sys
def isodd(n):
return bool(n%2)
input=open(sys.argv[1], 'r')
L=input.readlines()
evenout=open('evenout', 'w')
oddout=open('oddout','w')
for linecount in range(len(L)):
if isodd(linecount):
oddout.write(L[linecount])
else:
evenout.write(L[linecount])
#print "line number is " + str(linecount)
Permalink
No Comments
fascinating thoughts and a new journal! Didn’t know there can be a field called applied evolution. although I have to disagree that it has only surfaced recently on how fast evolution can be.
One of the earliest research I have read about genetics and applied evol pressure is done on guppies size variation in Trinidad due to predator pressure.
how ‘fast’ evolution proceeds is in most parts due to the lifespan and reproductive patterns of the organism in question.
link
— Nesse says that progress is being hampered by the fact that many medics still think of the body as a machine designed by an engineer, when in fact it is a “bundle of compromises … designed to maximise reproduction, not health”. There is no question about the importance of applied evolution. The trouble is, if biologists themselves are only just waking up to how relevant and crucial evolution can be, what hope is there of educating the leaders and policy makers who need to understand and act upon this research? Not much, I fear.
Permalink
No Comments
I love the title of this post. And I totally agree.. gene annotation is anything but easy..
keeping track of which prediction programs and the rationale behind the annotation (manual or automated) is one thing the author didn’t mention.
Permalink
No Comments
Chanced upon an ad by the Rubin’s lab. at newscientist link
the job scope is largely similar to what I am doing now. Currently where i am, bioinformaticians are moving away from the industry. Sad but true.
perhaps if we need more directors here like
‘Director David C. Page likens the Institute to an artists’ colony. “What we do here at Whitehead is attract the best possible intellectual capital and empower maximally creative—really wildly creative—individuals to realize their dreams within these walls”.’
I love reading job descriptions in my field. They let u in on the developing areas where talent is needed, so you know where to improve yourself. I am surprised though they didn’t mention python.
Description:
• Develop and implement existing and new computational methods and tools for high-throughput analysis of diverse data.
• Integrate multiple types of data and analytical methods in creative ways to exploit genomic information such as gene expression profiles and large-scale genome sequence data
• Manage data handling and analysis pipeline for Solexa sequencing platform - use software and databases to assemble and analyze genome sequence data
• Assist with the design and development of major bioinformatics-related programming projects.
• Conduct independent research projects, including primary responsibility for authoring manuscripts for publication in biology and bioinformatics journals.
• Write custom scripts to access databases and analyze sequence data.
• Collaborate with and support lab personnel in the area of bioinformatics analysis.
Qualifications:
• M.S./Ph.D. or equivalent in bioinformatics / computational biology disciplines with emphasis on biology.
• Minimum of one year of related experience.
• Proven experience using bioinformatics to solve biologically important questions.
• Experience with microarray data analysis, familiarity with online bioinformatics tools and databases, and pathway analysis.
• Experience with genome sequence alignments, large scale sequence data analysis
• Excellent interpersonal, verbal, and written communication skills.
• Must demonstrate outstanding personal initiative and the ability to work effectively as part of a team
• Background in utility programming (C Shell, Perl, JAVA, or other languages) in a UNIX environment, preferred but not required.
• Familiarity with designing, developing, and programming databases (Oracle, MySQL), preferred but not required.
Permalink
No Comments
Chanced upon this interesting paper!
De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.
Geneva University Hospitals;
Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing datasets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5 - 3.0 kb fragments. We also provide indications that the broad coverage achieved by high throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced.
PMID: 18332092 [PubMed - as supplied by publisher]
Permalink
1 Comment
check out this compilation of quotes on the genomicron blog
here’s a snapshot dated today his post will be updated. Go back to his post for updates!
To facilitate access to the series of posts on what has been said in the literature about noncoding DNA and its potential functions, I will maintain an updated list here.
Permalink
No Comments
link
batch renaming is what I picked up perl for in the first place. Then I found interesting software like 14arename (win only). I then also picked up abit of SED and AWK in linux.
I know about the batch rename feature in winxp but it didn’t occur to me I could do it in excel. Basically this page teaches you to use
“use SUBSTITUTE to change specific text in the filenames, use CONCATENATE() with DATE() if you want to add date to the filename, etc.” to create a column of rename commands in DOS. something like
ren abcd.fa abcd.gbk
very old school i know but hey its a godsend in your colleagues windows box with no admin rights to install anything.
Permalink
1 Comment
interesting note to self should explore this one day..
http://supramap.osu.edu/supramap/index.php?page=theory — Geographic mapping of evolutionary trees projected into a virtual globe allows users to analyze the spread of the organismal lineages into areas of interest. When all these data are integrated, we can visualize patterns in or to develop and test hypotheses. For example, we have used supramap to combine phylogenetic and virtual globe technologies to pinpoint which strains of a virus are infecting which hosts in specific areas (Janies et al., 2007). Finally, because phylogenetic analysis groups like strains into lineages, information drawn from limited experimentation on one strain in a lineage can be used to predict the properties of another strain in the lineage. This transitive property of phylogenetic inference will help us predict which strains are capable of infecting humans, are pathogenic, and/or are resistant to drugs. These capabilities are valuable to the public health community to make informed decisions on where and how to allocate resources to prepare for emerging diseases.
Permalink
No Comments