A Review of the Central Dogma

This post is here to provide a basic overview of the central dogma of molecular biology as a introduction for a later post on Haskell functions pertinent to the mechanisms in the central dogma. To get to that post, click here.

If you’re interested in a general review of the central dogma, its mechanisms and current place in modern biology, continue reading. Feel free to skip it if all of that is old news.

This review assumes some basic familiarity with the base-pairing nature of nucleic acids and the structural nature of DNA, RNA, and protein.

What the “central dogma” is all about

The central dogma of molecular biology is the codification of how information is transferred between the biopolymers: DNA, RNA, and protein. It is sometimes given in the simplest form as “DNA proceeds to RNA proceeds to protein”. To put it in other terms, the central dogma as a statement expresses the idea that base-sequence information is conveyed from DNA, to RNA, ultimately to protein, and sequence information in protein is never conveyed back to DNA or RNA. Here’s a little diagram of what the central dogma allows:

central dogma graph

Much has been discovered since the central dogma was originally formulated, giving us a richer view of genetics and biology. In light of some of these discoveries, some argue that the central dogma has been falsified, leaving it perhaps only useful as a general statement of how most things work. I believe that these views frequently reflect how our thinking has shifted with regards to information in a biological context. There’s a lot more to information than solely the sequence information with which the central dogma was originally concerned.

Falsification of the central dogma?

What are the falsification conditions of the central dogma? If any of the following were demonstrated, we can say that the central dogma has been demonstrated false:

If the decoding of protein sequence to nucleic acids (reverse translation) were to be discovered, that would be a refutation of the central dogma. So far, no such discovery has been made. However there is some debate over whether prion activity represents a violation. This paper and the conversation that follows provides a good discussion of the matter. I’ll try to give a short layperson’s summary.

A prion is a protein which may switch to a different folded shape and further induce others to do the same. This is the basis of some neurodegenerative diseases as well as a form of epigenetic heritability in some organisms. One particularly intriguing case is that of prion activity affecting how the translation of many other proteins takes place, the ribosome does not terminate as typical, and so affects the sequence information of other proteins. In light of this, even keeping to a very strict sense of the central dogma, it could be argued that protein sequence information may result in altering protein sequence information and violate the dogma.

I’m not sure whether or not I am fully decided here, because it would seem to me that there is little distinction between this subject (though certainly no less fascinating) and other epigenetic factors. I might suggest then that if epigenetics do impinge on the central dogma, then it is contradicted not only by prions, but other epigenetic modes as well that influence biopolymer sequence.

Whether or not the central dogma stands does not undermine the foundations of biology. If falsified, the central dogma may remain useful as a negation which manages to capture a great quantity of known molecular biology. Either way, it is vital to note that one’s understanding of biology does not end with the central dogma.

Mechanisms of sequence information flow

Now I’ll turn the attention over to known mechanisms of information flow in the central dogma with some examples of sequence transformations.

DNA : {A,C,G,T}

DNA to DNA occurs ubiquitously, replication of DNA by DNA polymerases is essential for providing genetic material to an organism’s’ progeny. Fidelity of replication is important so that viable organisms produce enough viable progeny, as is the occasional infidelity to provide diverse progeny that might respond better to change and competition. Replication of GATTACA would give GATTACA, with possible chance mutations.

DNA to RNA occurs ubiquitously to produce RNAs of a very wide wide variety of functions. The process of producing RNA from DNA templates is known as transcription and is carried out by RNA polymerases that create RNA chains in a pair-wise complementary fashion to the DNA template. RNA is chemically very similar to DNA, one distinguishing feature is the use of the nucleobase uracil (U) in place of the nucleobase thymine (T) of DNA. Both are complementary to the nucleobase adenosine (A). Thus the corresponding RNA sequence of GATTACA, perhaps transcribed from the complementing strand would be GAUUACA.

DNA to Protein is not known to occur in nature, though it has been accomplished in vitro under certain conditions It is not forbidden by the central dogma, and there is speculation as to why it is either rare or non-existant. It is important to address that we frequently refer to the genetic code, the coding of amino acid sequence for protein by nucleotide triplets, in DNA terms even though it must first pass through a transcribed mRNA intermediate (more on that below). I’ll extend GATTACA up to 9 nucleotides for this example: GATTACAGA would translate to the amino acid sequence DYR. Be aware that the genetic code can vary in some cases.

RNA : {A,C,G,U}

RNA to DNA occurs in many organisms and was considered a possibility from the beginning of the central dogma, though evidence for it came somewhat later. RNA is used as the basic genetic material for some viruses; retroviruses, HIV is one example, must reverse transcribe their RNA genetic material to DNA before it is integrated into the genome of the host cell. RNA templates are used in reverse transcription to extende the telomere caps of chromosomes by telomerases. To do reverse transcription, just reverse transcription! GAUUACA produces GATTACA.

RNA to RNA replication occurs via RNA-dependent RNA polymerases sometimes called replicases. This is important to the replication of viruses with RNA as genetic material as well as eukaryotic cells for RNA silencing. RNA Silencing is a huge deal in biology, not just for its natural role, but because it serves as a powerful tool of basic research. Ignoring stochastic mutations, base complementarity means that GAUUACA will give GAUUACA.

RNA to Protein is conducted by a massive complex of protein and RNA called the ribosome, using the genetic code that was mentioned earlier, as it scans over a messenger (mRNA) template. This is the only known way to produce protein and is called translation. Translation from the RNA sequence GAUUACAGA in the (not totally) universal genetic code would produce the amino acid sequence DYR.

Protein : {A,R,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V,...}

Protein to DNA or RNA is not known to occur in nature, in fact it is strongly suspected to not occur. This is not permitted to occur according to the central dogma. There’d be a Nobel Prize for whoever discovered a natural biological system that accomplished this. It might then be interesting that this is a common research job. Before genome sequencing was developed and became relatively cheap, protein sequencing might have yielded the amino acid sequence of the protein, which would need to be reverse translated through the genetic code to either get an idea of what the encoding gene looks like (introns and other gene features are lost data), or to produce the protein transgenically. Protein sequencing is still an important part of biological research, predictions made in bioinformatics on genomic sequences need experimental validation after all. Reverse translation is made somewhat complicated by the fact that the genetic code is degenerate, meaning that amino acids may be coded by more than one codon, so there are many nucleic acid sequences that could hypothetically produce the same protein. To be brief, the choice of codon can affect how well a sequence is translated or what other factors may regulate the mRNA; this is an area of active study. One rudimentary approach to codon optimization is to calculate the relative frequency of codon usage in an organism’s protein-encoding genes and choose common codons rather than underrepresented ones. For the sequence DYR in the universal genetic code, we would have the following (brackets indicate choose one) [GAT,GAC][TAT,TAC][CGA,CGC,CGG,CGT,AGA,AGG] for a total of 2 * 2 * 6 = 24 possible corresponding DNA sequences.

Protein to Protein As mentioned earlier, prions may represent a form of protein sequence modification by other proteins and it is a matter of current some discussion whether or not this is a refutation of the central dogma that excludes protein to protein sequence transferrence. Protein is not replicated from protein, nor is it produced by other means aside from translation.

On a final note

I did not mention that DNA, RNA, and protein can be modified in ways beyond their sequence, but it is very commonly so. In addition they interact with eachother and with various components in their biological system. Much of this is outside the concern of the central dogma, but entirely within our concern as scientists seeking to understand the world. So the central dogma is really a Bio 101 concept, from there we have moved on.

The wonderful and bewildering nature of biology is that it is complex. It is the interplay of numerous agents that gives rise to the diversity of the world in which we live, why biological research remains difficult, why some diseases have no simple cures.

For some further reading on the complexifying factors of general information flow in biology, the following key terms come to mind and in no particular order: epigenetics, intron excision, alternative splicing, RNA editing, RNA silencing, post-translational modification, inteins, riboswitches, horizontal gene transfer.