Most Important Biology Knowledge for Bioinformatics

Key knowledge for hassle-free intro to bioinformatics

Bioinformatics is the design, construction and use of software tools to generate, store, annotate and analyse data and information relating to Molecular Biology. Many people, like me, come into this field of study without a background in biological sciences, thus require at least some knowledge of biology to better make sense of the literature.

The good news is you don't need a degree program worth of knowledge to get started. You can get started with just a few key pieces of information and learn more as you grow and go deeper. In fact, many of the pioneers of molecular biology (e.g. Max Delbrück and Erwin Schrödinger) had no background in biology too, so you're not alone.

In this post, I'll share those pieces of information that is absolutely essential for getting started with bioinformatics. If you need a more detailed introduction, I highly recommend you read Chapter 1.2 of Jiang's Basics of Bioinformatics or Chapter 3 of Jones & Pevzner's An Introduction to Bioinformatics Algorithms. See this as back-of-envelope scribbles of key points.

I've grouped these key points into sections for easier read. I hope to update this post from time to time, and I'll need your feedback to help me do that.

Cells

  • Cells are of two types: those that have real nucleus and encapsulate their DNA in a nucleus (called eukaryotic cells), and those that do not (called prokaryotic cells).
  • All multicellular organisms are eukaryotic. Most unicellular organisms, e.g. bacteria, are prokaryotic.
  • Prokaryotic genes are continuous strings. Eukaryotic genes are broken into pieces (called exons). These exons are separated by pieces called introns.

DNA

  • DNA is written in a four-letter alphabet.
  • DNA contains a number of chromosomes. Each chromosome contains a number of genes. Each gene contains a number of SNPs.
    • A normal human somatic cell contains 23 pairs of chromosomes: two copies of chromosomes 1 - 22 and two copies of X chromosome in females or one copy of X and one copy of Y in males. X and Y are the sex chromosomes.
  • DNA has four types of bases: A, T, G, C. A and T always pair together. C and G always pair together.
  • DNA usually consists of two strands running in opposite directions.

RNA

  • RNA is written in a four-letter alphabet.
  • RNA has four types of bases: A, U, G, C. A and U always pair together. C and G always pair together.
  • Transcription: The process by which information coded in DNA sequence is passed on to a type of RNA called messenger RNA (mRNA).
  • During transcription, an A in the DNA is transcribed to a U in the RNA, a T to an A, a G to a C, and a C to a G.

Proteins

  • Proteins are chains of amino acids. There are 20 types of standard amino acids used in lives.
  • Proteins are written in a 20-letter alphabet.
  • The structure of a protein determines its function.
  • Translation: The process by which information in mRNA is then passed on to proteins. This is done by a special dictionary: the genetic codes or codon.
  • tRNAs (transfer RNAs) help with the location of the proper amino acid for a given codon. There are 20 of them, each binding to a different amino acid.
  • The tRNA molecules have a three-base segment (called an anticodon) that is complementary to the codon in the mRNA. As in DNA base-pairing, the anticodon on the tRNA sticks to the codon on the RNA.
  • One gene can code for many proteins.

Genomes

  • The term genome refers to all the DNA sequences of an organism or a cell. The genomes of most cell types in an organism are the same.
  • Human is regarded as the most advanced form of life on the earth, but the human genome is not the largest. The number of genes in a genome is also not directly correlated with the organism’s complexity or size.
  • When we speak of “the” genome of a species, we are referring to some sort of “master” genome that is fairly representative of all the possible genomes that an individual of that species could have.

Programming Skills

In theory you can use programming language you want; but most of the loots and libraries you'll want to leverage are available in R and Python - and MATLAB. So you may want to prioritize those.

Miscellaneous

  • Just three types of molecules form the basis of all life on this planet: DNA, RNA, and proteins.
  • When people say genes, they usually mean protein-coding genes - the fragments of the DNA that take part in production of proteins (through the process of transcription and translation).
  • Genes that are not involved in protein production are called nonprotein-coding genes or noncoding genes.
  • microRNAs or miRNAs usually refer to noncoding genes.
  • Central Dogma: DNA → transcription → RNA → translation → Protein.

Did you find this useful? Did I miss anything? Please share your thoughts. I'll appreciate your feedback.

comments powered by Disqus