Texas Tech University

Rawls Professor Partners with Texas Tech Geoscientist on Genetic Code Research

August 16, 2019 | By: Glenys Young 

Professor Yadav
Surya Yadav, a professor of information systems at the Rawls College of Business

Six years ago, Texas Tech University's Sankar Chatterjee released a groundbreaking theory on the beginning of life on Earth, what he called "the Holy Grail of science." He claimed that a heavy bombardment of icy comets and carbon-rich asteroids 4 billion years ago left young Earth's surface pockmarked with craters, similar to the surface of the moon.

Filled with water and the cosmic building blocks for life, delivered by these meteorites, these craters eventually became the primitive cradles in which the first simple organisms grew.

Based on theories of chemical evolution and evidence from the Earth's early geology, Chatterjee's proposal still left one gaping question unanswered: exactly how these primordial organisms developed information systems.

Professor Chatterjee
Sankar Chatterjee is the Paul Whitfield Horn Professor in the Department of Geosciences, as well as curator of paleontology and director of the Antarctic Research Center at the Museum of Texas Tech University

"It's become clear in recent years that the biological world is computational at its core," said Chatterjee, a Horn Professor in the Department of Geosciences and Curator of Paleontology at the Museum of Texas Tech University. "Algorithms, or instruction sets, are found in every cell and in the manner in which information flows through and between cells. Digital storage of molecular information is the key to defining life and understanding its origin. The key mechanism is the origin of the genetic code."

As Chatterjee explains, the genetic code was deciphered in the 1960s, and the many scientists responsible for cracking the code were awarded Nobel Prizes. But since that time, there has been no comprehensive theory about why the genetic code evolved in the first place, before the origin of DNA and the first life.

Until now.

In collaboration with Surya Yadav, a professor of information systems in the Jerry S. Rawls College of Business, Chatterjee has built upon his former theory. Their research was recently published in Life journal.

Professor Yadav viewing model on computer
Professor Surya Yadav explains a model he created as a simulation-visualization of the translation of a portion of the genetic code into protein. It shows the process of translating the genetic code carried by mRNA into the corresponding protein (a sequence of amino acids) by the stage III biological machine complex.

"My role was to look at the evolution of life as a complex information system," he said.

Chatterjee and Yadav used software to simulate their theory of the coevolution of the genetic code and translation machines.

Many scientists have been trying to figure out how life originated and how life evolved, he said.

"It is a hot research topic right now because of interest among scientists in decoding and understanding the evolution of life," Yadav said.

Interdisciplinary research on this subject has intensified with the development of more powerful computer systems that can simulate and model different kinds of ideas related to life formation and evolution, he said.

"The question of the origin of the code is the greatest challenge in modern molecular biology and origin-of-life research," Chatterjee said. "We have provided a novel model: how the genetic code might have evolved gradually with the improvement of the translation machine during protein synthesis."

Origin of the genetic code

In the craters on Earth's surface 4 billion years ago was what Chatterjee calls a prebiotic soup: a combination of water and biomolecules deposited there by comets and meteorites, all stewing together thanks to the hydrothermal energy from erupting vents. Among the biomolecules were likely several dozen types of amino acids and an assortment of nucleotides. Four specific nucleotide bases – uracil (U), cytosine (C), adenine (A) and guanine (G) – began combining into chains of ribonucleic acid (RNA).

Similarly, about 12 kinds of amino acids were joined together to form peptide chains.

The three stages of the evolution of the genetic code
The three stages of the evolution of the genetic code correspond to the evolution of the translation machines and the progressive addition of amino acids. In the abiotic stage, the primitive GNC code appeared, which code four amino acids: valine, alanine, aspartic acid and glycine. In the next stage, the translation machine becomes modified and efficient with the evolution of the tRNA/aaRS/mRNA translation machine, when six new amino acids–glutamic acid, leucine, proline and histidine were created. With the appearance of ribosomes, the transitional SNS code is modified to universal genetic code with 64 codons and 20 amino acids.

Because RNA contains a sequence of these nucleotide bases that is analogous to the letters in a word, it can function as an information-containing molecule, Chatterjee explained. Moreover, RNA, as a single chain, is free to take any kind of shape. From this basic architecture of a single-stranded RNA molecule, different species of RNA – such as ribozymes, transfer RNA (tRNA), messenger RNA (mRNA) and ribosomal RNA (rRNA) – evolved inside protocells. Each species contained a supply of information, distinct in attribute and configuration, in response to the specific amino acids it collected.

"The advent and multifunction of different species of RNA molecules signal the transition from the age of chemistry to the age of information," Chatterjee said.

Among the molecular milieu, mRNAs began to encode the recipe for proteins, while tRNAs carried different amino acids and tried to match the three-nucleotide-sequences – called codons – of mRNA, each of which corresponds to a specific amino acid.

But mRNA languages and protein languages are different. A bilingual translator was needed to read the message in mRNA and a molecular machine was needed to manufacture protein according to the recipe. The translators are special kinds of enzymes, called aaRS, that help convert the code to the right language. Then the mRNA is fed into the ribosome, and the ribosome reads the message and makes a protein accordingly.

The genetic code is essentially a set of rules defining how the four-letter code of mRNA is translated into the 20-letter code of amino acids, which are the building blocks of proteins. Proteins, in turn, are the "hardware" – the main enzymes and structural material – for cells.

Life Stage I
For a simulation of this stage, click here or on the image. The primitive, pre-aaRS/pre-tRNA/pre-mRNA translation machine. Pre-aaRS is the matchmaker between pre-tRNA and amino acid. Four primitive amino acids and their four pre-tRNAs and pre-aaRS molecules were selected from the prebiotic soup. Each amino acid with its specific pre-tRNA molecules was catalyzed by pre-aaRS enzyme to create a charged pre-tRNA molecule. In a similar way, four charged molecules were available to decode the short string mRNA one at a time. Each pre-tRNA delivers the appropriate amino acid, which is linked to form a chain of biosynthetic protein for the first time, containing four amino acids. This is the first stage of translation, when primitive GNC code evolves. 

Evolution of the genetic code

The genetic code developed in three distinct stages that coevolved with the refinement of the translation machine. The primitive genetic code used only four amino acids and four codons to make a simple strand of protein.

"In the primitive translation machine, a symbiotic relationship was established among three components – pre-tRNA, pre-aaRS, and pre-mRNA – to create a short chain of amino acids, which form the biosynthetic protein," Chatterjee said. "The protein chain grew through the addition of further amino acids in the same manner. By linking the amino acids carried by the pre-tRNAs, the first protein synthesis occurred. But at this stage of the primitive code, the translation machine was simple and made errors during protein synthesis."

The transitional genetic code was the second generation, employing 10 amino acids and 16 codons. Compared to the primitive translation machine, the transitional translation machine was somewhat refined to minimize errors. In this stage of translation, pre-tRNA evolved into tRNA through gene duplication. Pre-mRNA evolved into mRNA by linking several strands of pre-mRNA to increase the storage capacity. Pre-aaRS joined to specific tRNA and became aaRS. The protein chain in this stage was moderately long.

Life Stage II
For a simulation of this stage, click here or on the image. The transitional, aaRS/tRNA/mRNA translation machine. Ten primitive amino acids joined with specific tRNA molecules by aaRS enzymes to form a pool of 10 charged tRNA molecules. These charged tRNA molecules begin to decode mRNA, creating a chain of longer, biosynthesized protein molecules. At this stage, the transitional SNS code appears with 10 amino acids for 16 codons. The translation is moderately efficient with the appearance of redundancy to minimize the translation errors. 

The universal genetic code was the final stage of code development along with the evolution of translation machine, maximizing its efficiency. It contains 64 codons specifying 20 amino acids. Chatterjee says the universal code proved more reliable than the primitive or transitional codes with minimum errors, so natural selection favored it.

The final and most important component of the translation machine, the ribosome, was a hybrid of rRNAs and r-proteins. With the participation of the ribosome, the translation machinery became more elaborate with tRNA/aaRS/mRNA/ribosome complexes, which enabled higher specificity in the genetic coding. The protein chain in this stage is long and complex, with a biological information system that adds rules, instruction, feedback and algorithm to its repertoire.

Chatterjee and Yadav hypothesize that the genetic code evolved as pathways for the synthesis of new amino acids became available – and these, in turn, were the results of progressive refinement of the translation machine.

"Through successive refinement, the universal code has optimized functional efficiency to minimize coding errors," Chatterjee said. "Once the universal code evolved, the protein synthesis became highly coordinated, beautifully orchestrated and universally adopted by all life."

Life Stage III
For a simulation of this stage, click here or on the image. The final, aaRS/tRNA/mRNA/ribosome translation machine. tRNA delivers amino acids to ribosomes that serve as the sites of protein synthesis. Each ribosome has a large 50S subunit and a small 30S subunit that join together at the beginning of decoding of mRNA to synthesize a protein chain from amino acids carried by a tRNA. The correct tRNA enters the A site of the ribosome and the appropriate amino acid is incorporated into the growing peptide chain, which transfers from tRNA in the P site to the tRNA of A site. As the ribosome moves, both tRNAs and mRNA then shift to the E site. Each newly translated amino acid is then added to a growing protein chain until the ribosome completes the protein synthesis. At this stage, the universal genetic code is optimized with 20 amino acids for 64 codons, including start and stop codons. The translation is highly efficient, and redundancy minimizes the translation errors and mutations. 

Chatterjee and Yadav proposed that the coevolution of the genetic code and the translation machine marks the beginning of Darwinian evolution at the molecular level, an interplay between information and its supporting structure. This hypothesis provides the logical and incremental steps for the origin of programmed protein synthesis.

The code obviously is not the result of a random assignment of codons to amino acids, Chatterjee said, because it has a specific, organized structure with a large number of codons to provide redundancy; that is, several codons may specify the same amino acid.


"The expanded genetic code is so universal that there is strong evidence that all life on Earth had a single origin in the universal code before the last universal common ancestor evolved," he said. "This universal genetic code has been operating for the last 4 billion years and has remained unchanged since it was perfected."

Information system of life

Life is now often compared with a self-replicating complex information system consisting of various levels of subsystems that process different kinds of information, Yadav said.

"These biological information systems are of various shapes and sizes," he said. "For example, nucleic acids such as RNA, DNA and proteins, are now regarded as bio-nanobots—a sophisticated macromolecular information system."

To explain the origin of the genetic code in simpler terms, Chatterjee compares it to the evolution of personal computing – ironic, since the idea of computing originally came from trying to imitate information processing in living systems. First came Apple I, the first Apple computer, followed by Apple II. Then Macintosh computers built upon the Apple II by adding a graphical interface and a more powerful operating system.

Today, we have iPhones, which allow us to carry more information in our pockets than NASA had during the Apollo missions. As computer systems have been modified and refined, so has the information processing on them – just as the genetic code has been modified and refined with the increasing complexity of the lifeforms it comprises.

"However, the analogy ends there," Chatterjee said. "We know very well that in 50 years, this iPhone would be obsolete. But once life's computing machinery and its software fully evolved, they have remained the same for the last 4 billion years. Isn't that amazing?"

Chatterjee and Yadav point out that life is more sophisticated than any man-made computer where the software/hardware dichotomy is blurred and integrated. They find this computer analogy too simplistic.

"Both the informational (RNA/DNA) and functional polymers (proteins) in the translational machinery can be viewed as highly mobile nanobots, which are fully equipped with both the information and the material needed to accomplish their task," Chatterjee said. "These nanobots 'know' how to put themselves together by self-assembly or by cooperation with other molecules."

Information-directed mRNA and protein synthesis are remarkable feats of early protocells. All of the information is stored in RNA genomes, and when a new protein or mRNA is needed, the information is read and used to direct its construction. Some essential proteins, which perform central tasks, remain unchanged for billions of years.

"The beauty of this information system of life is that if there is any minor spelling mistake in A, U, C and G during translation in protein synthesis, this mistake – called mutation – will create variation among population: the raw material for Darwinian natural selection," Chatterjee said. "Because of this occasional spelling mistake in the software during the last 4 billion years, today we see the biodiversity of life. However, the genetic code remains unchanged."

Proteins, Chatterjee said, are regarded as one of the "nanobots" of a cell. They do most of the work – such as controlling metabolism, transport, communication, structure, catalysts and many aspects of cell function – and they are constructed for many different functions. With the availability of proteins, there was a gradual evolution of the components of protocells.

"Life is more than a computer," Chatterjee said. "Unlike computers, life creates its own custom-made components. No computer can achieve this remarkable feat. Thus, a significant part of the process for creating an organism's components is essentially bootstrapped from its own DNA and mRNA. An early protocell innovated the most powerful technology ever created on this planet."

After spending more than 10 years investigating the origins of life, Chatterjee is proud to have found a potential answer to one of mankind's biggest questions, but he emphasizes that this information system analogy is not the complete story.

"Life is the most sophisticated and durable computer system in the universe, which can create its own copy," he said. "In our computers, we have to upgrade our software every year or so and buy a new model every few years. But for life, the code was so well designed by evolution and became so near foolproof, and the translation machine so sophisticated, that they did not need any upgrading; they are still working perfectly."

Chatterjee and Yadav are continuing their research to understand the origin and evolution of biological information systems.

Staci Semrad contributed reporting.