Back to search results

Genome Sequencing (Bioinformatics II)



University of California, San Diego


Phillip Compeau, Pavel Pevzner

Mathematics, Life Sciences, Computer Science, Computer Science

4 weeks



In "Finding Hidden Messages in DNA", we discussed how to find short regulatory motifs in genomes.  But how do we know what the DNA sequence making up a genome is in the first place?  After all, biologists still do not possess technology that would read all the nucleotides of your genome from beginning to end. In this course, you will learn how entire genomes are assembled from millions of short overlapping pieces of DNA.  The scale of this problem (the human genome is 3 billion nucleotides long!) implies that computers must be involved. Yet the problem is even more complex than it may appear ... to solve it, we will need to travel back in time to meet three famous mathematicians, and learn about algorithms based on graph theory. Later in the course, we will see that sequencing genomes is not the only task related to decoding biological macromolecules.  Another difficult problem is sequencing antibiotics, short mini-proteins engineered by bacteria to fight each other.  Even though antibiotics often contain fewer than 10 amino acids, sequencing them is a formidable challenge. Decoding the sequence of amino acids making up an antibiotic is an important biomedical problem, but the practical barriers to sequencing short antibiotics are often more substantial than barriers to assembling a genome with millions of  nucleotides! To address this computational challenge, we will learn about brute force algorithms that often succeed in various bioinformatics applications. Finally in this course, you will learn how to apply popular bioinformatics software tools to assemble a deadly Staphylococcus bacterium. You will also be introduced to the popular cloud service BaseSpace offered by Illumina, the leading DNA sequencing company, thus joining the thousands of biologists and bioinformaticians who use BaseSpace every day.