Regarding different nucleotides on a DNA strand

Regarding different nucleotides on a DNA strand

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am not a biologist therefore, it might seem as an elementary question. I retrieved rodent sequences from UTR database:

And I know there are supposed to be four nucleotides namely: A,G,C, and T. However, on this database I get sequences that have 'n' nucleotide in several locations, 'm' and 'k' as well. What does these letters mean? I haven't been able to find anything regarding it. Does it mean these positions in UTR are undefined or they are some 'intermediate' nucleotides if there are any?

These are IUPAC codes for so-called "degenerate" or ambiguous bases, alternatives to the usualA,T,C, andGnucleotides (for DNA). TheNcode represents any base, whileMandKmap toAorC, andGorT, respectively.

7.1: DNA Structure

  • Contributed by E. V. Wong
  • Axolotl Academica Publishing (Biology) at Axolotl Academica Publishing

As you can see in Figure1, the nucleotides only vary slightly, and only in the nitrogenous base. In the case of DNA, those bases are adenine, guanine, cytosine, and thymine. Note the similarity of the shapes of adenine and guanine, and also the similarity between cytosine and thymine. A and G are classified as purines, while C and T are classified as pyrimidines. As long as we&rsquore naming things, notice &ldquodeoxyribose&rdquo and &ldquoribose&rdquo. As the name implies, deoxyribose is just a ribose without an oxygen. More specifically, where there is a hydroxyl group attached to the 2-carbon of ribose, there is only a hydrogen attached to the 2-carbon of deoxyribose. That is the only difference between the two sugars.

In randomly constructing a single strand of nucleic acid in vitro, there are no particular rules regarding the ordering of the nucleotides with respect to their bases. The identities of their nitrogenous bases are irrelevant because the nucleotides are attached by phosphodiester bonds through the phosphate group and the pentose. It is therefore often referred to as the sugar-phosphate backbone. If we break down the word &ldquophosphodiester&rdquo, we see that it quite handily describes the connection: the sugars are connected by two ester bonds ( &mdashO&mdash) with a phosphorous in between. One of the ideas that often confuses students is the directionality of this bond, and therefore, of nucleic acids in general. For example, when we talk about DNA polymerase, the enzyme that catalyzes the addition of nucleotides in living cells, we say that it works in a 5-prime (5&rsquo) to 3-prime (3&rsquo) direction. This may seem like arcane molecular-biologist-speak, but it is actually very simple. Take another look at two of the nucleotides joined together by the phosphodiester bond (Figure (PageIndex<1>), bottom left). An adenine nucleotide is joined to a cytosine nucleotide. The phosphodiester bond will always link the 5-carbon of one deoxyribose (or ribose in RNA) to the 3-carbon of the next sugar. This also means that on one end of a chain of linked nucleotides, there will be a free 5&rsquo phosphate (-PO4) group, and on the other end, a free 3&rsquo hydroxyl (-OH). These define the directionality of a strand of DNA or RNA.

Figure (PageIndex<1>). DNA. Deoxyribonucleic acid is a polymer chain of nucleotides connected by 5&rsquo to 3&rsquo phosphodiester bonds. DNA normally exists as a two antiparallel complementary strands held together by hydrogen bonds between adenines (A) and thymines (T), and between guanines (G) and cytosines (C).

DNA is normally found as a double-stranded molecule in the cell whereas RNA is mostly single-stranded. It is important to understand though, that under the appropriate conditions, DNA could be made single-stranded, and RNA can be double-stranded. In fact, the molecules are so similar that it is even possible to create double-stranded hybrid molecules with one strand of DNA and one of RNA. Interestingly, RNA-RNA double helices and RNA-DNA double helices are actually slightly more stable than the more conventional DNA-DNA double helix.

The basis of the double-stranded nature of DNA, and in fact the basis of nucleic acids as the medium for storage and transfer of genetic information, is base-pairing. Base-pairing refers to the formation of hydrogen bonds between adenines and thymines, and between guanines and cytosines. These pairs are significantly more stable than any association formed with the other possible bases. Furthermore, when these base-pair associations form in the context of two strands of nucleic acids, their spacing is also uniform and highly stable. You may recall that hydrogen bonds are relatively weak bonds. However, in the context of DNA, the hydrogen bonding is what makes DNA extremely stable and therefore well suited as a long-term storage medium for genetic information. Since even in simple prokaryotes, DNA double helices are at least thousands of nucleotides long, this means that there are several thousand hydrogen bonds holding the two strands together. Although any individual nucleotide-to-nucleotide hydrogen bonding interaction could easily be temporarily disrupted by a slight increase in temperature, or a miniscule change in the ionic strength of the solution, a full double-helix of DNA requires very high temperatures (generally over 90 o C) to completely denature the double helix into individual strands.

Because there is an exact one-to-one pairing of nucleotides, it turns out that the two strands are essentially backup copies of each other - a safety net in the event that nucleotides are lost from one strand. In fact, even if parts of both strands are damaged, as long as the other strand is intact in the area of damage, then the essential information is still there in the complementary sequence of the opposite strand and can be written into place. Keep in mind though, that while one strand of DNA can thus act as a &ldquobackup&rdquo of the other, the two strands are not identical - they are complementary. An interesting consequence of this system of complementary and antiparallel strands is that the two strands can each carry unique information.

Bi-directional gene pairs are two genes on opposite strands of DNA, but sharing a promoter, which lies in between them. Since DNA can only be made in one direction, 5&rsquo to 3&rsquo, this bi-directional promoter, often a CpG island (see next chapter), thus sends the RNA polymerase for each gene in opposite physical directions. This has been shown for a number of genes involved in cancers (breast, ovarian), and is a mechanism for coordinating the expression of networks of gene products.

The strands of a DNA double-helix are antiparallel. This means that if we looked at a double-helix of DNA from left to right, one strand would be constructed in the 5&rsquo to 3&rsquo direction, while the complementary strand is constructed in the 3&rsquo to 5&rsquo direction. This is important to the function of enzymes that create and repair DNA, as we will be discussing soon. In Figure (PageIndex<1>), the left strand is 5&rsquo to 3&rsquo from top to bottom, and the other is 5&rsquo to 3&rsquo from bottom to top.

From a physical standpoint, DNA molecules are negatively charged (all those phosphates), and normally a double-helix with a right-handed twist. In this normal (also called the &ldquoB&rdquo conformation) state, one full twist of the molecule encompasses 11 base pairs, with 0.34 nm between each nucleotide base. Each of the nitrogenous bases are planar, and when paired with the complementary base, forms a at planar &ldquorung&rdquo on the &ldquoladder&rdquo of DNA. These are perpendicular to the longitudinal axis of the DNA. Most of the free-floating DNA in a cell, and most DNA in any aqueous solution of near-physiological osmolarity and pH, is found in this B conformation. However, other conformations have been found, usually under very specific environmental circumstances. A compressed conformation, A-DNA, was observed as an artifact of in vitro crystallization, with slightly more bases per turn, shorter turn length, and base-pairs that are not perpendicular to the longitudinal axis. Another, Z-DNA, appears to form transiently in GC-rich stretches of DNA in which, interestingly, the DNA twists the opposite direction.

Figure (PageIndex<2>). Three conformations of DNA. B-DNA is most common, A-DNA is likely an artifact of crystallization in vitro, and Z-DNA may form transiently in parts of the chromosome.

It has been suggested that both the A and Z forms of DNA are, in fact, physiologically relevant. There is evidence to suggest that the A form may occur in RNA-DNA hybrid double helices as well as when DNA is complexed to some enzymes. The Z conformation may occur in response to methylation of the DNA. Furthermore, the &ldquonormal&rdquo B-DNA conformation is something of a idealized structure based on being fully hydrated, as is certainly very likely inside a cell. However, that hydration state is constantly changing, albeit minutely, so the DNA conformation will often vary slightly from the B-conformation parameters in Figure (PageIndex<2>).

In prokaryotes, the DNA is found in the cytoplasm (rather obvious since there is no other choice in those simple organisms), while in eukaryotes, the DNA is found inside the nucleus. Despite the differences in their locations, the level of protection from external forces, and most of all, their sizes, both prokaryotic and eukaryotic DNA is packaged with proteins that help to organize and stabilize the overall chromosome structure. Relatively little is understood with regard to prokaryotic chromosomal packaging although there are structural similarities between some of the proteins found in prokaryotic and eukaryotic chromosomes. Therefore, most introductory cell biology courses stick to eukaryotic chromosomal packaging.

Figure (PageIndex<3>). DNA packaging. (A) A naked strand of DNA is approximately 2 nm in diameter. (B) Histones, which are octameric proteins depicted here as a roughly cylindrical protein, have positive charges distributed on the outer surface to interact with the negatively-charged DNA backbone. (C) Even the organization afforded by histone binding can leave an unmanageable tangle of DNA, especially with longer eukaryotic genomes, and therefore the histone-bound DNA is packaged into the &ldquo30-nm strand&rdquo. This is held together, in part, by histone interactions. (D) The 30-nm fibers are looped into 700-nm fibers, which are themselves formed into the typical eukaryotic chromosome (E).

Naked DNA, whether prokaryotic or eukaryotic, is an extremely thin strand of material, roughly 11 nm in diameter. However, given the size of eukaryotic genomes, if the DNA was stored that way inside the nucleus, it would become unmanageably tangled. Picture a bucket into which you have tossed a hundred meters of yarn without any attempt whatsoever to organize it by coiling it or bunching it. Now consider whether you would be able to reach into that bucket pull on one strand, and expect to pull up only one strand, or if instead you are likely to pull up at least a small tangle of yarn. The cell does essentially what you would do with the yarn to keep it organized: it is packaged neatly into smaller, manageable skeins. In the case of DNA, each chromosome is looped around a histone complex to form the first order of chromosomal organization: the nucleosome.

Figure (PageIndex<4>). The nucleosome is composed of slightly over two turns of DNA around a histone core containing two copies each of H2A, H2B, H3, and H4 histones. The H1 histone is not part of the core unit and functions in coor- dinating interaction between nucleosomes.

The 30-nm fiber is held together by two sets of interactions. First, the linker histone, H1, brings the nucleosomes together into an approximate 30-nm structure. This structure is then stabilized by disulfide bonds that form between the H2A histone of one nucleosome and the H4 histone of its neighbor.

Histones are a family of basic (positively-charged) proteins. They all function primarily in organizing DNA, and the nucleosome is formed when DNA wraps (a little over 2 times) around a core of eight histones - two each of H2A, H2B, H3, and H4. The number and position of the positive charges (mostly from lysines and arginines) are crucial to their ability to tightly bind DNA, which as previously pointed out, is very negatively charged. That &ldquoopposites attract&rdquo idea is not just a dating tip from the advice columns.

Figure from RCSB Protein Data Bank (

Upon examination of the 3D structure of the histone core complex, we see that while relatively uncharged protein interaction domains hold the histones together in the center, the positively charged residues are found around the outside of the complex, available to interact with the negatively charged phosphates of DNA.

In a later chapter, we will discuss how enzymes read the DNA to transcribe its information onto smaller, more manageable pieces of RNA. For now, we only need to be aware that at any given time, much of the DNA is packaged tightly away, while some parts of the DNA are not. Because the parts that are available for use can vary depending on what is happening to/in the cell at any given time, the packaging of DNA must be dynamic. There must be a mechanism to quickly loosen the binding of DNA to histones when that DNA is needed for gene expression, and to tighten the binding when it is not. As it turns out, this process involves acetylation and deacetylation of the histones.

Figure (PageIndex<6>). (A) Deacetylated histone allows interaction between the negatively charged phosphates of the DNA and the positively charged lysines of the histone. (B) When the histone is acetylated, not only is the positive charge on the lysine lost, the acetyl group also imparts a negative charge, repelling the DNA phosphates.

Histone Acetyltransferases (HATs) are enzymes that place an acetyl group on a lysine of a histone protein. The acetyl groups are negatively charged, and the acetylation not only adds a negatively charged group, it also removes the positive charge from the lysine. This has the effect of not only neutralizing a point of attraction between the protein and the DNA, but even slightly repelling it (with like charges). On the other side of the mechanism, Histone Deactylases (HDACs) are enzymes that remove the acetylation, and thereby restore the interaction between histone protein and DNA. Since these are such important enzymes, it stands to reason that they are not allowed to operate willy-nilly on any available histone, and in fact, they are often found in a complex with other proteins that control and coordinate their activation with other processes such as activation of transcription.

Each of These Microscopic Glass Beads Stores an Image Encoded on a Strand of DNA

Increasingly, civilization’s information is stored digitally, and that storage is abundant and growing. We don’t bother deleting those seven high-definition videos of the ceiling or 20 blurry photos of a table corner taken by our kid. There’s plenty of room on a smartphone or in the cloud, and we count on both increasing every year.

As we fluidly copy information from device to device, this situation seems durable. But that’s not necessarily true.

The amount of data we create is increasing rapidly. And if we (apocalyptically) lost the ability to produce digital storage devices—hard drives or magnetic tape, for example—our civilization’s collective digital record would begin to sprout holes within years. In decades, it’d become all but unreadable. Digital storage isn’t like books or stone tablets. It has a shorter expiration date. And, although we take storage for granted, it’s still expensive and energy hungry.

Which is why researchers are looking for new ways to archive information. And DNA, life’s very own “hard drive,” may be one solution. DNA offers incredibly dense data storage, and under the right conditions, it can keep information intact for millennia.

In recent years, scientists have advanced DNA data storage. They’ve shown how we can encode individual books, photographs, and even GIFs in DNA and then retrieve them. But there hasn’t been a scalable way to organize and retrieve large collections of DNA files. Until now, that is.

In a new Nature Materials paper, a team from MIT and Harvard’s Broad Institute describe a DNA-based storage system that allows them to search for and pull individual files—in this case images encoded in DNA. It’s a bit like thumbing through your file cabinet, reading the paper tabs to identify a folder, and then pulling the deed to your car from it. Only, obviously, the details are bit more complicated.

“We need new solutions for storing these massive amounts of data that the world is accumulating, especially the archival data,” said Mark Bathe, an MIT professor of biological engineering and senior author of the paper. “DNA is a thousandfold denser than even flash memory, and another property that’s interesting is that once you make the DNA polymer, it doesn’t consume any energy. You can write the DNA and then store it forever.”

How to Organize a DNA Storage System

How does one encode an image in a strand of DNA, anyway? It’s a fairly simple matter of translation.

Each pixel of a digital image is encoded in bits. These bits are represented by 1s and 0s. To convert it into DNA, scientists assign each of these bits to the DNA’s four base molecules, or nucleotides, adenine, cytosine, guanine, and thymine—usually referred to in shorthand by the letters A, C, G, and T. The DNA bases A and G, for example, could represent 1, and C and T could represent 0.

Next, researchers string together (or synthesize) a chain of DNA bases representing each and every bit of information in the original file. To retrieve the image, researchers reverse the process, reading the sequence of DNA bases (or sequencing it) and translating the data back into bits.

The standard retrieval process has a few drawbacks, however.

Researchers use a technique called a polymerase chain reaction (PCR) to pull files. Each strand of DNA includes an identifying sequence that matches a short sequence of nucleotides called a PCR primer. When the primer is added to the DNA solution, it bonds with matching DNA strands—the ones we want to read—and only those sequences are amplified (that is, copied for sequencing). The problem? Primers can interact with off-target sequences. Worse, the process uses enzymes that chew up all the DNA.

“You’re kind of burning the haystack to find the needle, because all the other DNA is not getting amplified and you’re basically throwing it away,” said Bathe.

The microscopic glass spheres pictured here are DNA “files.” Each contains an image, encoded in DNA, and is coated in DNA tags describing the image within. Image Credit: Courtesy of the researchers (via MIT News)

To get around this, the Broad Institute team encapsulated the DNA strands in microscopic (6-micron) glass beads. They affixed short, single-stranded DNA labels to the surface of each bead. Like file names, the labels describe the bead’s contents. A tiger image might be labeled “orange,” “cat,” “wild.” A house cat might be labeled “orange,” “cat,” “domestic.” With just four labels per bead, you could uniquely label 10 20 DNA files.

The team can retrieve specific files by adding complementary nucleotide sequences, or primers, corresponding to an individual file’s label. The primers contain fluorescent molecules, and when they link up with a complementary strand—that is, the searched-for label—they form a double helix and glow. Machines separate out the glowing beads, which are opened and the DNA inside sequenced. The rest of the DNA files remain untouched, left in peace to guard their information.

The best part of the method is its scalability. You could, in theory, have a huge DNA library stored in a test tube—Bathe notes a coffee mug of DNA could store all the world’s data—but without an easy way to search and retrieve the exact file you’re looking for, it’s worthless. With this method, everything can be retrieved.

George Church, a Harvard professor of genetics and well-known figure in the field of synthetic biology, called it a “giant leap” for the field.

“The rapid progress in writing, copying, reading, and low-energy archival data storage in DNA form has left poorly explored opportunities for precise retrieval of data files from huge…databases,” he said. “The new study spectacularly addresses this using a completely independent outer layer of DNA and leveraging different properties of DNA (hybridization rather than sequencing), and moreover, using existing instruments and chemistries.”

This Isn’t Coming For Your Computer

To be clear, all DNA data storage, including the work outlined in this study, remains firmly in the research phase. Don’t expect DNA hard drives for your laptop anytime soon.

Synthesizing DNA is still extremely expensive. It’d cost something like $1 trillion dollars to write a petabyte of data in DNA. To match magnetic tape, a common method of archival data storage, Bathe estimates synthesis costs would have to fall six orders of magnitude. Also, this isn’t the speediest technique (to put it mildly).

The cost of DNA synthesis will fall—the technology is being advanced in other areas as well—and with more work, the speed will improve. But the latter may be beside the point. That is, if we’re mainly concerned with backing up essential data for the long term with minimal energy requirements and no need to regularly access it, then speed is less important than fidelity, data density, and durability.

DNA already stores the living world’s information, now, it seems, it can do the same for all things digital too.

Medical Implications of Detailed Human Genome Maps

Advances in molecular genetics made over the past two decades are already having a major impact on medical research and clinical care. The ability to clone and analyze individual genes and to deduce the amino acid sequences of encoded proteins has greatly increased our understanding of genetic disorders, the immune system, endocrine abnormalities, coronary artery disease, infectious diseases, and cancer. A few proteins produced on a commercial scale by recombinant DNA methods are available for therapeutic use or in clinical trials, and many more are in earlier developmental stages. Recent progress in determining the genetic basis for such neurological and behavioral disorders as Huntington's disease (Gusella et al., 1983), Alzheimer's disease (St George-Hyslop et al., 1987), and manic-depressive illness (Egeland et al., 1987) promises new insights into these common and serious conditions. Higher resolution maps of the human genome will accelerate progress in understanding disease pathogenesis and in developing new approaches to diagnosis, treatment, and prevention in many areas of medicine. In Chapter 3 the potential medical impact of a detailed human genomic map is discussed further.

DNA Replication

  • The two original strands of DNA are separated with the help of enzymes known as DNA helicases . Helicases work by breaking the hydrogen bonds holding the nucleotide bases together.
  • Enzymes known as DNA polymerases add complementary nucleotides to each strand. Adenine bonds with thymine, and cytosine bonds with guanine.
  • Two DNA molecules, which are identical to the original DNA molecule, form. Each newly formed DNA molecule consists of two strands of DNA, one from the parent molecule and one built from scratch using the parent molecule as a template.

DNA Replication is said to be semi-conservative . Each copy contains one newly-replicated strand and one strand from the original molecule.

The process of DNA replication is biologically significant because it allows the cells of living organisms to copy their DNA before cell division.

Regarding different nucleotides on a DNA strand - Biology

Adenine and guanine are purines. Purines are the larger of the two types of bases found in DNA. The 9 atoms that make up the fused rings (5 carbon, 4 nitrogen) are numbered 1-9. All ring atoms lie in the same plane. Cytosine and thymine are pyrimidines. The 6 atoms (4 carbon, 2 nitrogen) are numbered 1-6. Like purines, all pyrimidine ring atoms lie in the same plane.

Deoxyribose Sugar

The deoxyribose sugar of the DNA backbone has 5 carbons and 3 oxygens. The carbon atoms are numbered 1', 2', 3', 4', and 5' to distinguish from the numbering of the atoms of the purine and pyrmidine rings. The hydroxyl groups on the 5'- and 3'- carbons link to the phosphate groups to form the DNA backbone. Deoxyribose lacks an hydroxyl group at the 2'-position when compared to ribose, the sugar component of RNA.

Nucleosides and Nucleotides

A nucleoside is one of the four DNA bases covalently attached to the C1' position of a sugar. The sugar in deoxynucleosides is 2'-deoxyribose. The sugar in ribonucleosides is ribose. Nucleosides differ from nucleotides in that they lack phosphate groups. The four different nucleosides of DNA are deoxyadenosine (dA), deoxyguanosine (dG), deoxycytosine (dC), and (deoxy)thymidine (dT, or T). In dA and dG, there is an "N-glycoside" bond between the sugar C1' and N9 of the purine. A nucleotide is a nucleoside with one or more phosphate groups covalently attached to the 3'- and/or 5'-hydroxyl group(s).

DNA Backbone

The DNA backbone is a polymer with an alternating sugar-phosphate sequence. The deoxyribose sugars are joined at both the 3'-hydroxyl and 5'-hydroxyl groups to phosphate groups in ester links, also known as "phosphodiester" bonds.

DNA Double Helix

DNA is a normally double stranded macromolecule. Two polynucleotide chains, held together by weak thermodynamic forces, form a DNA molecule.

Features of the DNA Double Helix

  • Two DNA strands form a helical spiral, winding around a helix axis in a right-handed spiral.
  • The two polynucleotide chains run in opposite directions.
  • The sugar-phosphate backbones of the two DNA strands wind around the helix axis like the railing of a sprial staircase.
  • The bases of the individual nucleotides are on the inside of the helix, stacked on top of each other like the steps of a spiral staircase.

Base Pairs

Within the DNA double helix, A forms 2 hydrogen bonds with T on the opposite strand, and G forms 3 hyrdorgen bonds with C on the opposite strand. dA-dT and dG-dC base pairs are the same length, and occupy the same space within a DNA double helix. Therefore the DNA molecule has a uniform diameter. dA-dT and dG-dC base pairs can occur in any order within DNA molecules

DNA Helix Axis

The helix axis is most apparent from a view directly down the axis. The sugar-phosphate backbone is on the outside of the helix where the polar phosphate groups (red and yellow atoms) can interact with the polar environment. The nitrogen (blue atoms) containing bases are inside, stacking perpendicular to the helix axis.

Why is a DNA sequencing useful?

DNA sequencing is a biological technique that many different technologies rely on. For example, it can be used for:

  • identifying regions of DNA associated with particular features, including specific diseases or increased susceptibility to specific diseases
  • understanding gene expression and how different genes interact
  • identification of substances, individuals and species
  • other genetic technologies such as gene therapy, genealogy research, plant and animal breeding and genetic modification.

Question : 1. Which of the following statements about DNA is FALSE? one DNA molecule can include four different nucleotides in its structure, DNA is a double helix, DNA uses the nitrogenous base uracil, DNA has deoxyribose 2. Which of the following statements regarding a DNA double helix is ALWAYS true? the amount of adenine is always equal to the amount of uracil and

1. Which of the following statements about DNA is FALSE? one DNA molecule can include four different nucleotides in its structure, DNA is a double helix, DNA uses the nitrogenous base uracil, DNA has deoxyribose

2. Which of the following statements regarding a DNA double helix is ALWAYS true? the amount of adenine is always equal to the amount of uracil and the amount of guanine is always equal to the amount of cytosine, the amount of adenine is always equal to amount of guanine and the amount of cytosine is always equal to amount of thymine, the amount of adenine is always equal to the amount of thymine and the amount of guanine is always equal to amount of cytosine, the amount of adenine is always equal to amount of cytosine and the amount of guanine is always equal to the amount of thymine.

3. Which of the following statements about RNA is FALSE? RNA comes in 3 shapes, RNA has nucleotide base uracil, RNA is a single stranded molecule, RNA has sugar dextrose

4. If on strand of DNA is CGGTAC then the complimentary strand isL GCCAUG, CGGTAC, CGGUAC, GCCATG

5. Which of the following enzymes catalyzes the elongation of a new strand of DNA? single strand binding protein, ligase, helicase, DNA polymerase.

Biology 171

Since the rediscovery of Mendel’s work in 1900, the definition of the gene has progressed from an abstract unit of heredity to a tangible molecular entity capable of replication, expression, and mutation ((Figure)). Genes are composed of DNA and are linearly arranged on chromosomes. Genes specify the sequences of amino acids, which are the building blocks of proteins. In turn, proteins are responsible for orchestrating nearly every function of the cell. Both genes and the proteins they encode are absolutely essential to life as we know it.

Learning Objectives

By the end of this section, you will be able to do the following:

  • Explain the “central dogma” of DNA-protein synthesis
  • Describe the genetic code and how the nucleotide sequence prescribes the amino acid and the protein sequence

The cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template on ribosomes converts nucleotide-based genetic information into a protein product. That is the central dogma of DNA-protein synthesis. Protein sequences consist of 20 commonly occurring amino acids therefore, it can be said that the protein alphabet consists of 20 “letters” ((Figure)). Different amino acids have different chemistries (such as acidic versus basic, or polar and nonpolar) and different structural constraints. Variation in amino acid sequence is responsible for the enormous variation in protein structure and function.

Structures of the 20 amino acids found in proteins are shown. Each amino acid is composed of an amino group (), a carboxyl group (COO – ), and a side chain (blue). The side chain may be nonpolar, polar, or charged, as well as large or small. It is the variety of amino acid side chains that gives rise to the incredible variation of protein structure and function.

The Central Dogma: DNA Encodes RNA RNA Encodes Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the central dogma ((Figure)), which states that genes specify the sequence of mRNAs, which in turn specify the sequence of amino acids making up all proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis, while keeping the DNA itself intact and protected. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence. However, the translation to protein is still systematic and colinear , such that nucleotides 1 to 3 correspond to amino acid 1, nucleotides 4 to 6 correspond to amino acid 2, and so on.

The Genetic Code Is Degenerate and Universal

Each amino acid is defined by a three-nucleotide sequence called the triplet codon. Given the different numbers of “letters” in the mRNA and protein “alphabets,” scientists theorized that single amino acids must be represented by combinations of nucleotides. Nucleotide doublets would not be sufficient to specify every amino acid because there are only 16 possible two-nucleotide combinations (4 2 ). In contrast, there are 64 possible nucleotide triplets (4 3 ), which is far more than the number of amino acids. Scientists theorized that amino acids were encoded by nucleotide triplets and that the genetic code was “degenerate.” In other words, a given amino acid could be encoded by more than one nucleotide triplet. This was later confirmed experimentally: Francis Crick and Sydney Brenner used the chemical mutagen proflavin to insert one, two, or three nucleotides into the gene of a virus. When one or two nucleotides were inserted, the normal proteins were not produced. When three nucleotides were inserted, the protein was synthesized and functional. This demonstrated that the amino acids must be specified by groups of three nucleotides. These nucleotide triplets are called codons . The insertion of one or two nucleotides completely changed the triplet reading frame , thereby altering the message for every subsequent amino acid ((Figure)). Though insertion of three nucleotides caused an extra amino acid to be inserted during translation, the integrity of the rest of the protein was maintained.

Scientists painstakingly solved the genetic code by translating synthetic mRNAs in vitro and sequencing the proteins they specified ((Figure)).

In addition to codons that instruct the addition of a specific amino acid to a polypeptide chain, three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called nonsense codons , or stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5′ end of the mRNA. Following the start codon, the mRNA is read in groups of three until a stop codon is encountered.

The arrangement of the coding table reveals the structure of the code. There are sixteen “blocks” of codons, each specified by the first and second nucleotides of the codons within the block, e.g., the “AC*” block that corresponds to the amino acid threonine (Thr). Some blocks are divided into a pyrimidine half, in which the codon ends with U or C, and a purine half, in which the codon ends with A or G. Some amino acids get a whole block of four codons, like alanine (Ala), threonine (Thr) and proline (Pro). Some get the pyrimidine half of their block, like histidine (His) and asparagine (Asn). Others get the purine half of their block, like glutamate (Glu) and lysine (Lys). Note that some amino acids get a block and a half-block for a total of six codons.

The specification of a single amino acid by multiple similar codons is called “degeneracy.” Degeneracy is believed to be a cellular mechanism to reduce the negative impact of random mutations. Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. For example, aspartate (Asp) and glutamate (Glu), which occupy the GA* block, are both negatively charged. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might specify the same amino acid but have no effect or specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

The genetic code is nearly universal. With a few minor exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a purified mRNA encoding the globin protein in horses could be transferred to a tulip cell, and the tulip would synthesize horse globin. That there is only one genetic code is powerful evidence that all of life on Earth shares a common origin, especially considering that there are about 10 84 possible combinations of 20 amino acids and 64 triplet codons.

View Transcribe and Translate a Gene (webpage, Flash animation) to transcribe a gene and translate it to protein using complementary pairing and the genetic code.

Which Has More DNA: A Kiwi or a Strawberry?

Question: Would a kiwi and strawberry that are approximately the same size ((Figure)) also have approximately the same amount of DNA?

Background: Genes are carried on chromosomes and are made of DNA. All mammals are diploid, meaning they have two copies of each chromosome. However, not all plants are diploid. The common strawberry is octoploid (8n) and the cultivated kiwi is hexaploid (6n). Research the total number of chromosomes in the cells of each of these fruits and think about how this might correspond to the amount of DNA in these fruits’ cell nuclei. What other factors might contribute to the total amount of DNA in a single fruit? Read about the technique of DNA isolation to understand how each step in the isolation protocol helps liberate and precipitate DNA.

Hypothesis: Hypothesize whether you would be able to detect a difference in DNA quantity from similarly sized strawberries and kiwis. Which fruit do you think would yield more DNA?

Test your hypothesis: Isolate the DNA from a strawberry and a kiwi that are similarly sized. Perform the experiment in at least triplicate for each fruit

  1. Prepare a bottle of DNA extraction buffer from 900 mL water, 50 mL dish detergent, and two teaspoons of table salt. Mix by inversion (cap it and turn it upside down a few times).
  2. Grind a strawberry and a kiwi by hand in a plastic bag, or using a mortar and pestle, or with a metal bowl and the end of a blunt instrument. Grind for at least two minutes per fruit.
  3. Add 10 mL of the DNA extraction buffer to each fruit, and mix well for at least one minute.
  4. Remove cellular debris by filtering each fruit mixture through cheesecloth or porous cloth and into a funnel placed in a test tube or an appropriate container.
  5. Pour ice-cold ethanol or isopropanol (rubbing alcohol) into the test tube. You should observe white, precipitated DNA.
  6. Gather the DNA from each fruit by winding it around separate glass rods.

Record your observations: Because you are not quantitatively measuring DNA volume, you can record for each trial whether the two fruits produced the same or different amounts of DNA as observed by eye. If one or the other fruit produced noticeably more DNA, record this as well. Determine whether your observations are consistent with several pieces of each fruit.

Analyze your data: Did you notice an obvious difference in the amount of DNA produced by each fruit? Were your results reproducible?

Draw a conclusion: Given what you know about the number of chromosomes in each fruit, can you conclude that chromosome number necessarily correlates to DNA amount? Can you identify any drawbacks to this procedure? If you had access to a laboratory, how could you standardize your comparison and make it more quantitative?

Section Summary

The genetic code refers to the DNA alphabet (A, T, C, G), the RNA alphabet (A, U, C, G), and the polypeptide alphabet (20 amino acids). The central dogma describes the flow of genetic information in the cell from genes to mRNA to proteins. Genes are used to make mRNA by the process of transcription mRNA is used to synthesize proteins by the process of translation. The genetic code is degenerate because 64 triplet codons in mRNA specify only 20 amino acids and three nonsense codons. Most amino acids have several similar codons. Almost every species on the planet uses the same genetic code.

Free Response

Imagine if there were 200 commonly occurring amino acids instead of 20. Given what you know about the genetic code, what would be the shortest possible codon length? Explain.

For 200 commonly occurring amino acids, codons consisting of four types of nucleotides would have to be at least four nucleotides long, because 4 4 = 256. There would be much less degeneracy in this case.

Discuss how degeneracy of the genetic code makes cells more robust to mutations.

Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might either specify the same amino acid and have no effect, or may specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

A scientist sequencing mRNA identifies the following strand: CUAUGUGUCGUAACAGCCGAUGACCCG

What is the sequence of the amino acid chain this mRNA makes when it is translated?

The first step to writing the amino acid sequence is to find the start codon AUG. Then, the nucleotide sequence is separated into triplets: CU AUG UGU CGU AAC AGC CGA UGA. We stop the translation at UGA because that triplet encodes a stop codon. When we convert these codons to amino acids, the sequence becomes Met Cys Arg Asn Ser Arg.



DNA encodes protein sequence by a series of three-nucleotide codons. Any given sequence of DNA can therefore be read in six different ways: Three reading frames in one direction (starting at different nucleotides) and three in the opposite direction. During transcription, the RNA polymerase read the template DNA strand in the 3′→5′ direction, but the mRNA is formed in the 5′ to 3′ direction. [3] The mRNA is single-stranded and therefore only contains three possible reading frames, of which only one is translated. The codons of the mRNA reading frame are translated in the 5′→3′ direction into amino acids by a ribosome to produce a polypeptide chain.

An open reading frame (ORF) is a reading frame that has the potential to be transcribed into RNA and translated into protein. It requires a continuous sequence of DNA from a start codon, through a subsequent region which usually has a length that is a multiple of 3 nucleotides, to a stop codon in the same reading frame. [4]

When a putative amino acid sequence resulting from the translation of an ORF remained unknown in mitochondrial and chloroplast genomes, the corresponding open reading frame was called an unidentified reading frame (URF). For example, the MT-ATP8 gene was first described as URF A6L when the complete human mitochondrial genome was sequenced. [5]

The usage of multiple reading frames leads to the possibility of overlapping genes there may be many of these in viral, prokaryote, and mitochondrial genomes. [6] Some viruses, e.g. hepatitis B virus and BYDV, use several overlapping genes in different reading frames.

In rare cases, a ribosome may shift from one frame to another during translation of an mRNA (translational frameshift). This causes the first part of the mRNA to be translated in one reading frame, and the latter part to be translated in a different reading frame. This is distinct from a frameshift mutation, as the nucleotide sequence (DNA or RNA) is not altered—only the frame in which it is read.