banner

Home Syllabus Schedule Lecture Notes Extras Glossary

Lecture 10 - DNA Replication

September 24, 2013

Introduction

We have adapted this approach to presenting DNA replication from that used by Eric Lander in his lectures at MIT.

We reviewed the basics of the chemistry of DNA from last time, starting with how we number the carbons in deoxyribose and what is happening at each carbon. The 1' carbon is the site of attachment of the base. The 2' carbon lacks a hydroxyl group in deoxyribose. The 3' carbon has the hydroxyl that attaches to the 5' carbon of the next deoxyribonucleotide through a phosphodiester linkage. The 5' carbon has three phophate groups in a nucleoside triphosphate; as part of a strand of DNA, there is a single phosphate on the 5' end. We compared the structure of deoxyribonucleoside triphosphates to a familiar ribonucleotide, ATP.

The key innovation in Watson and Crick's model for the secondary structure of DNA was base pairing, using the Chargaff Rules of A=T and C=G. An A-T base pair is held together by two hydrogen bonds, while a G-C base pair is held together by three. An A-T base pair and a G-C base pair have the same dimensions, making the helix regular in structure regardless of the sequence.

We also reviewed the Meselson-Stahl experiment that established that DNA replication was semiconservative, with each newly-synthesized DNA double helix consisting of one old strand and one new strand.

The Central Dogma

Francis Crick's quip about information flow from DNA has become enshrined as our description of the process: The Central Dogma. Even though we no longer accept the dogma without question (which is what the word means), the name has stuck. The idea is illustrated in the figure below.

central dogma

In the next few lectures, we will present the biochemistry of the Central Dogma, but for today, we will concentrate on the top part of the figure, the copying of information from DNA to DNA during replication.

DNA Polymerase and friends

In order to investigate the biochemistry of DNA replication, we do what any biochemist does: devise an assay for the enzyme activity that we are looking for, then fractionate cells to look for that enzyme activity. This was first successfully carried out by Arthur Kornberg, who looked for an enzyme in E. coli that would synthesize DNA. Kornberg devised an assay that included a single-stranded DNA template, deoxyribonucleotides, and a key innovation, short primers that made patches of double-stranded DNA on the template. He found that fractionated E. coli cell lysates contained an enzyme activity that would extend primed DNA templates in the 5' --> 3' direction, as shown below.

Kornberg polymerase

This activity is detected using radioactive dNTPs (a dNTP is dATP, dTTP, dCTP, or dGTP) and a chemistry procedure that allows us to separate partially replicated DNA from unincorporated dNTPs for the detection of radioactivity. The exact details are not important.

The DNA polymerase that Kornberg found can only extend a growing DNA strand from the 5' to 3' direction, and requires a primer. It is worth considering why it would be a bad idea to attempt to synthesize DNA in the 3' to 5' direction.

The drawings below show the addition of a single base to a primed DNA template.

Kornberg polymerase Here is the primed template ready for the addition of a C to the growing DNA strand.
Kornberg polymerase The C (dCTP) base pairs with the G on the template strand. DNA polymerase is ready to form a phosphodiester bond with the 3' hydroxyl of the primer.
Kornberg polymerase DNA polymerase carries out the formation of the phosphodiester bond, adding to the growing DNA strand. Energy for the reaction comes from the high-energy phosphate bond.

The phosphate bonds of nucleotides are subject to spontaneous hydrolysis, going from dNTPs to dNMPs. If this happens to a nucleotide awaiting incorporation into the growing DNA strand, it is not a big deal, the enzyme can just stall for a millisecond and wait for the right dNTP.

Imagine, though, if DNA polymerase extended a DNA strand in the 3' to 5' direction. This would place the triphosphate required for energy at the growing end of the new strand. Spontaneous hydrolysis of this triphosphate would terminate chain elongation, which would be a very bad thing.

If this argument doesn't help you to remember the chemistry, just remember that DNA is synthesized in the 5' to 3' direction and you'll be fine.

DNA replication occurs at a replication fork, as the two strands of a DNA double helix are peeled apart with the help of proteins described in your textbook. Now that we know that DNA is only synthesized in the 5' to 3' direction, we are ready to think about the problem of synthesizing both new strands at a replication fork, as shown in the drawings below.

DNA replication DNA replication
Because DNA can only be synthesized from the 5' to 3' direction, the two newly-synthesized strands are synthesized differently. They are called the "continuous" (on the top in the figure) and "discontinuous" (on the bottom in the figure) strand, and also the "leading" (top) and "lagging" (bottom) strand. As the replication fork moves down the helix, you can see the leading strand synthesized continuously, while the lagging strand must initiate new sites of synthesis.

The drawing shows segments of the new strands in red. These are primers synthesized by the enzyme primase. Primase synthesizes short RNA primers that are then extended by DNA polymerase, which is capable of synthesizing DNA using an RNA primer, although DNA polymerase will not incorporate ribonucleotides into a strand that it is synthesizing.

The short fragments on the discontinuous strand are called Okazaki fragments after their discoverer. Eventually, DNA polymerase synthesizes new DNA to replace the RNA primers, and then it is only necessary to close the gap in the phosphodiester backbone. This reaction to seal the nicks is carried out by an enzyme called ligase (to ligate is to tie, as in surgery).

So far, we have described the activity of three different enzymes necessary for DNA replication: DNA polymerase, primase, and DNA ligase.

There is another kind of enzyme activity required for DNA replication. Consider the two figures below.

DNA replication DNA replication

In the figure on the left, we raise a problem best illustrated with the replication of a circular DNA molecule like the bacterial chromosome. Because the two DNA strands are wound together in a double helix, when the molecule replicates, the two copies will be catenated (linked) as shown in the bottom part of the figure. While there is a mathematical theorem about this, it was easier to illustrate in class using a long strip of newspaper that had a complete turn introduced into it before taping it into a loop with a full twist (with a half twist it would be a Mobius Strip, but that's another story). When I cut the loop in half by cutting down the middle of the strip of newspaper for the full length of the loop, I got two linked loops. Try this at home. There is no way to deform the loops to allow them to separate. It is necessary to cut one of the loops, pass the other loop through it, and reseal the cut loop in order to decatenate the loops.

Two copies of a circular DNA molecule can exist in two forms that are chemically identical but topologically distinct: catenated or uncatenated. These are topological isomers or toposiomers. The enzymes that allow the resolution of topoisomers are called topoisomerases. The activity that decatenates DNA by introducing double-strand breaks, passing the other DNA molecule through the gap, and resealing the break is called topoisomerase II.

In the figure on the right, we examine the problem created by unwinding the DNA double helix during replication. Unwinding one part of the molecule induces extra turns (supercoiling) in the rest of the molecule. This was easily illustrated in class using a length of clothesline that was doubled back so that I held both ends and a student held the loop at the midpoint. The two ends were wound around each other to produce what looking just like a double helix. When I pulled apart the ends to simulate DNA replication, the rest of the rope wound more tightly. More and more force was required to separate the strands until at one point, I was not able to separate them any further. When I moved toward the student holding the middle, the rope became supercoiled, with kinks appeared. As I backed away and moved the strand back together, the winding propagated all the way along the line, and the tension relaxed. Try this at home!

This means that in order for DNA replication to proceed, there must be an activity at the replication fork that relaxes the supercoiling induced by unwinding the helix. This activity is topoisomerase I, an enzyme that induces nicks (breaks in the phosphodiester backbone of one strand) and reseals them.

At this point, we know of five enzyme activities required for DNA replication: DNA polymerase, primase, DNA ligase, topoisomerase I, and topoisomerase II. Your textbook details some additional activities: helicase, single-stranded binding protein, and the sliding clamp that holds DNA polymerase in place during strand extension. I showed a slide with a cartoon of a replication fork to show that it is a busy place.

Fidelity of Replication

It is important to look at the accuracy of DNA replication. We have drawn the activity of DNA polymerase as if it never makes mistakes. In biochemistry, we don't say never. There is a measurable equilibrium constant. We can see that it is energetically favorable to put the correct base into position, because it will make the correct number of hydrogen bonds. Yet, there must be some frequency with which the wrong base is put into place during strand extension. The measured Keq = 103 for putting the correct base in place; Keq = 10-3 for adding the wrong base. This is just a fancy way of saying that it is 1000 times more likely that the correct base will be added than that the wrong base will be added.

Is this an acceptable error rate? The average gene is encoded by about 2000 bases (we'll discuss what this means in detail later), so an error rate of 10-3 means that there will be an average of 2 errors/gene/cell division. Everyone agreed that we could not develop as multicellular organisms with an error rate this high.

It turns out that DNA polymerase has an additional enzyme activity besides the 5' to 3' DNA polymerase activity. It has a 3' to 5' exonuclease activity. A nuclease is an activity that takes apart nucleic acids. An exonuclease chews away at the end of a nucleic acid molecule, while an endonuclease (we will meet a lot of these later) attacks nucleic acids in the middle of the molecule.

Why does DNA polymerase have a 3' to 5' exonuclease activity? It is this activity that allows DNA polymerase to proofread the strand that it is synthesizing. DNA polymerase is much more likely to remove the last base that it laid down if that base is incorrect, so the 3' to 5' exonuclease activity increases the fidelity of DNA replication. With the proofreading function, the error rate in DNA replication is 10-6.

Is this an acceptable error rate? The human genome is 3 x 109 bases, so an error rate of 10-6 would mean about a million errors/genome/cell generation. Everyone agreed that this error rate was too high.

The last line of defense against errors in DNA replication is the mismatch repair system. Remember that A:T and G:C base pairs are the same size, allowing the DNA double helix to have uniform dimensions regardless of the sequence. The insertion of an incorrect base during replication will create a bulge or dent in the helix, depending on the exact substitution. The mismatch repair system is a multiprotein complex that recognizes such errors. It removes a base from a mismatch and resynthesizes DNA of the correct sequence. The activity of the mismatch repair system gets the error rate in DNA replication down to 10-9.

Is this an acceptable error rate? This would mean that there would only be a few errors/genome/cell division. Everyone agreed that because 98% of our genome does not encode proteins, we could live with this error rate, which is great, because that is what the error rate is.

Of course, this raises the question of how the mismatch repair system can recognize the incorrect base is a mismatched base pair. Clearly, one of the bases is wrong, but which one? The wrong base would be in the newly synthesized strand, so this turns into the problem of recognizing the newly-synthesized strand. In E. coli (and in humans), DNA is extensively methylated. The methylase is slow relative to the mismatch repair system, so the newly-synthesized strand is recognized because it is unmethylated.

Inborn Errors of Mismatch Repair

We have previously mentioned that there are some human pedigrees that show inheritance of a predisposition for certain types of cancer. One of these hereditary cancer syndromes is Hereditary Non-Polyposis Colorectal Cancer (HNPCC), also known as Lynch Syndrome. The diagnostic criteria for HNPCC are: at least three relatives with colorectal cancer, with one of these individuals being a first-degree relative of the other two; at least two generations affected; and at least one relative who develops non-polyposis colorectal cancer before age 50. Analysis of pedigrees that fit these criteria show inheritance of autosomal dominant mutations in several different genes, all of which have been characterized at the molecular level. An HNPCC pedigree is shown below.

HNPCC

This pedigree shows that HNPCC is transmitted as an autosomal dominant with incomplete penetrance. Note the female shaded in green in the second generation. Although she did not develop cancer, she is an obligate carrier because she has two affected sons. The females shaded in green in the third generation are indicated as carriers, presumably because they have been tested for variant alleles using molecular techniques.

The risk of developing colorectal cancer before age 70 for people without HNPCC is 5.5%. The risk to carriers of HNPCC is 52-82%. These numbers show that carriers of HNPCC are at greatly elevated risk of colorectal cancer, but also that the trait is not fully penetrant, as we have seen from the pedigree.

The genes associated with HNPCC are MLH1, MSH2, MSH6, and PMS2. These are all human homologs of the genes of the bacterial mismatch repair system. Cancer is a disease caused by somatic mutations in genes that control cell growth and division. Apparently, so many somatic mutations are required to develop colorectal cancer that mutations that cause an increase in the mutation rate by reducing the fidelity of DNA replication are required in order to develop this type of cancer. Tumors of this type that develop in people that did not inherit mutations in mismatch repair genes often show somatic mutations in these same genes.

The subject of inborn errors of mismatch repair is of some personal interest to me because of the results of sequencing my genome. I carry three variants in mismatch repair genes: MLH1-V716M, MSH2-G322D, and MSH6-G39E. The first variant, MLH1-V716M, is very rare, with an estimated allele frequency of 0.1%. This is usually not a good sign, because it can indicate selection against the variant, which is an indicator that it is pathogenic. There is no published literature indicating that this variant is pathogenic, but with such a low allele frequency, it might be missed in studies of colorectal cancer unless it was highly pathogenic. MSH2-G322D has an allele frequency of 1.6% and is known to be benign. MSH6-G39E (allele frequency unknown) has been associated with a slight increase in risk of colorectal cancer.

These results are tempered somewhat by my family history (no colorectal cancer in my immediate family), my medical history (normal results from colonoscopy at age 55), and my current health status (everything's OK at 58). As my own genetic counselor in this case, I told myself not to worry about this, but to follow standard medical advice in this area. We present this result in some detail to show the reasoning behind recommendations in genomic medicine.

Polymerase Chain Reaction

It is hard to talk about DNA replication without raising one of the most spectacular pieces of technology to emerge from the study of DNA replication: the Polymerase Chain Reaction, aka PCR. PCR was developed in the early 1980s by Kary Mullis, a scientist at the biotechnology company Cetus in California. PCR allows the isolation of defined DNA segments from tiny amounts of DNA, even a single molecule, and has entirely transformed forensics.

Here is the problem, in the words of Kary Mullis:

"Casual discussions of DNA molecules sometimes make them sound like easily obtained objects. The truth is that in practice it is difficult to get a well-defined molecule of natural DNA from any organism except extremely simple viruses."

Kary Mullis was making lots of DNA primers at Cetus. They began making them manually, but finally someone invented primer-making machines. Scientists weren't buying the primers in the volume that the company had hoped, but everyone was still getting paid, and had plenty of time on their hands to think about experiments. One night, Kary Mullis was on a long moonlit drive with a friend when he thought of a cool experiment. You can read his account in The Unusual Origin of the Polymerase Chain Reaction.

The Polymerase Chain Reaction is a way of making a lot of copies of a specifically targeted sequence. It is described in the table and figures below, and shown in an animation.

PCR We begin with a single molecule of DNA.
PCR We can melt the DNA (break the hydrogen bonds holding the helix together) by heating it to 98 degrees.
PCR In this example, we have designed two DNA primers to anneal to a known sequence. The primers are separated in the sequence that we are targeting by a few hundred base pairs. Cooling the reaction from 98 degrees to a more moderate temperature (perhaps 48 degrees) allows annealing to take place.
PCR Now we have two primed templates. With dNTPs and DNA polymerase in the reaction mixture, new DNA is synthesized.
PCR We melt the DNA for another cycle. Because there is a vast molar excess of primers, when we cool the mixture, we again anneal primers.
PCR New DNA is synthesized.
PCR In the next cycle, we begin to see DNA molecules whose ends are defined by the primers.
PCR After many cycles of melting, annealing, and replication, the overwhelming majority of DNA molecules in the mixture have ends defined by the primers.

When polymerase chain reaction was first carried out, after every cycle of heating and annealing, new DNA polymerase had to be added, because the enzyme is denatured by heat. This problem was soon solved by isolating DNA polymerase from Thermus acquaticus, a bacterium isolated from boiling hot springs at Yellowstone National Park. Not surpisingly, the DNA polymerase from this species retains function when heated to 98 degrees.

The other innovation in PCR was the development of programmable thermal cyclers (aka PCR machines) that allow a researchers to set up an amplification reaction and let it run automatically until it is complete.

We will take up applications of PCR in the future.