DNA sequencing is the process of determining the
precise order of nucleotides within a DNA molecule. It includes
any method or technology that is used to determine the order of the four bases—adenine, guanine, cytosine, and thymine—in a
strand of DNA. The advent of rapid DNA sequencing methods has greatly
accelerated biological and medical research and discovery.
Knowledge of DNA sequences has become indispensable for
basic biological research, and in numerous applied fields such as diagnostic, biotechnology,
forensic
biology, virology
and biological systematics. The rapid speed of sequencing attained with
modern DNA sequencing technology has been instrumental in the sequencing of
complete DNA sequences, or genomes of numerous types and species of life, including the human
genome and other complete DNA sequences of many animal, plant, and microbial
species.
The first DNA sequences were obtained in the early 1970s by
academic researchers using laborious methods based on two-dimensional chromatography.
Following the development of fluorescence-based sequencing methods with automated
analysis, DNA sequencing has become easier and orders of magnitude faster.
Use of sequencing
DNA sequencing may be used to determine the sequence of
individual genes,
larger genetic regions (i.e. clusters of genes or operons), full
chromosomes or entire genomes. Sequencing provides the
order of individual nucleotides in DNA or RNA (commonly represented
as A, C, G, T, and U) isolated from cells of animals, plants, bacteria, archaea, or
virtually any other source of genetic information. This is useful for:
- Molecular biology – studying the genome itself, how proteins are made, what proteins are made, identifying new genes and associations with diseases and phenotypes, and identifying potential drug targets
- Evolutionary biology – studying how different organisms are related and how they evolved
- Metagenomics – Identifying species present in a body of water, sewage, dirt, debris filtred from the air, or swab samples of organisms. Helpful in ecology, epidemiology, microbiome research, and other fields.
Less-precise information is produced by non-sequencing
techniques like DNA fingerprinting. This information may be
easier to obtain and is useful for:
- Detect the presence of known genes for medical purposes (see genetic testing)
- Forensic identification
- Parental testing
History
Though the structure of DNA was established as a double
helix in 1953, several decades would pass before fragments of DNA could be
reliably analyzed for their sequence in the laboratory. RNA sequencing was one
of the earliest forms of nucleotide sequencing. The major landmark of RNA
sequencing is the sequence of the first complete gene and the complete genome
of Bacteriophage MS2, identified and published by Walter
Fiers and his coworkers at the University of Ghent (Ghent, Belgium), in 1972
and 1976.
The first method for determining DNA sequences involved a
location-specific primer extension strategy established by Ray Wu at Cornell University in 1970. DNA polymerase
catalysis and specific nucleotide labeling, both of which figure prominently in
current sequencing schemes, were used to sequence the cohesive ends of lambda
phage DNA Between 1970 and 1973, Wu, R Padmanabhan and colleagues demonstrated
that this method can be employed to determine any DNA sequence using synthetic
location-specific primers. Frederick
Sanger then adopted this primer-extension strategy to develop more rapid
DNA sequencing methods at the MRC Centre, Cambridge, UK
and published a method for "DNA sequencing with chain-terminating
inhibitors" in 1977. Walter Gilbert and Allan Maxam
at Harvard also developed sequencing methods,
including one for "DNA sequencing by chemical degradation". In 1973,
Gilbert and Maxam reported the sequence of 24 basepairs using a method known as
wandering-spot analysis. Advancements in sequencing were aided by the
concurrent development of recombinant DNA technology, allowing DNA samples to
be isolated from sources other than viruses.
The first full DNA genome to be sequenced was that of bacteriophage φX174 in 1977.[18]
Medical Research Council scientists
deciphered the complete DNA sequence of the Epstein-Barr virus in 1984, finding it to be 170
thousand base-pairs long.
A non-radioactive method for transferring the DNA molecules
of sequencing reaction mixtures onto an immobilizing matrix during
electrophoresis was developed by Pohl and co-workers in the early 80’s.
Followed by the commercialization of the DNA sequencer
“Direct-Blotting-Electrophoresis-System GATC 1500” by GATC
Biotech, which was intensively used in the framework of the EU
genome-sequencing programme, the complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome
II. Leroy
E. Hood's laboratory at the California Institute of Technology
announced the first semi-automated DNA sequencing machine in 1986. This was
followed by Applied Biosystems' marketing of the first fully
automated sequencing machine, the ABI 370, in 1987 and by Dupont's Genesis 2000
which used a novel fluorescent labeling technique enabling all four
dideoxynucleotides to be identified in a single lane. By 1990, the U.S. National Institutes of Health (NIH)
had begun large-scale sequencing trials on Mycoplasma capricolum, Escherichia
coli, Caenorhabditis elegans, and Saccharomyces cerevisiae at a cost
of US$0.75 per base. Meanwhile, sequencing of human cDNA sequences called expressed sequence tags began in Craig
Venter's lab, an attempt to capture the coding fraction of the human
genome.[24]
In 1995, Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research
(TIGR) published the first complete genome of a free-living organism, the
bacterium Haemophilus influenzae. The circular
chromosome contains 1,830,137 bases and its publication in the journal Science
marked the first published use of whole-genome shotgun sequencing, eliminating
the need for initial mapping efforts. By 2001, shotgun sequencing methods had
been used to produce a draft sequence of the human genome.
Several new methods for DNA sequencing were developed in the
mid to late 1990s. These techniques comprise the first of the
"next-generation" sequencing methods. In 1996, Pål
Nyrén and his student Mostafa Ronaghi at the Royal Institute of
Technology in Stockholm
published their method of pyrosequencing.[28]
A year later, Pascal Mayer and Laurent Farinelli submitted patents to the World
Intellectual Property Organization describing DNA colony sequencing. Lynx
Therapeutics published and marketed "Massively parallel signature
sequencing", or MPSS, in 2000. This method incorporated a
parallelized, adapter/ligation-mediated, bead-based sequencing technology and
served as the first commercially available "next-generation"
sequencing method, though no DNA
sequencers were sold to independent laboratories. In 2004, 454
Life Sciences marketed a parallelized version of pyrosequencing. The first
version of their machine reduced sequencing costs 6-fold compared to automated
Sanger sequencing, and was the second of the new generation of sequencing
technologies, after MPSS.
The large quantities of data produced by DNA sequencing have
also required development of new methods and programs for sequence analysis.
Phil Green and Brent Ewing of the University of Washington described their phred quality score for sequencer data analysis
in 1998.
Basic methods
Maxam-Gilbert sequencing
Allan Maxam and Walter
Gilbert published a DNA sequencing method in 1977 based on chemical
modification of DNA and subsequent cleavage at specific bases. Also known as
chemical sequencing, this method allowed purified samples of double-stranded
DNA to be used without further cloning. This method's use of radioactive
labeling and its technical complexity discouraged extensive use after
refinements in the Sanger methods had been made.
Maxam-Gilbert sequencing requires radioactive labeling at
one 5' end of the DNA and purification of the DNA fragment to be sequenced.
Chemical treatment then generates breaks at a small proportion of one or two of
the four nucleotide bases in each of four reactions (G, A+G, C, C+T). The
concentration of the modifying chemicals is controlled to introduce on average
one modification per DNA molecule. Thus a series of labeled fragments is
generated, from the radiolabeled end to the first "cut" site in each
molecule. The fragments in the four reactions are electrophoresed side by side
in denaturing acrylamide gels for size separation. To visualize the fragments,
the gel is exposed to X-ray film for autoradiography, yielding a series of dark
bands each corresponding to a radiolabeled DNA fragment, from which the
sequence may be inferred.
Chain-termination methods
The chain-termination method developed by Frederick
Sanger and coworkers in 1977 soon became the method of choice, owing to its
relative ease and reliability. When invented, the chain-terminator method used
fewer toxic chemicals and lower amounts of radioactivity than the Maxam and
Gilbert method. Because of its comparative ease, the Sanger method was soon
automated and was the method used in the first generation of DNA
sequencers.
Sanger sequencing is the method which prevailed from the
80's until the mid-2000s. Over that period, great advances were made in the
technique, such as fluorescent labelling, capillary electrophoresis, and
general automation. These developments allowed much more efficient sequencing,
leading to lower costs. The Sanger method, in mass production form, is the
technology which produced the first human genome in 2001, ushering in the
age of genomics.
However, later in the decade, radically different approaches reached the
market, bringing the cost per genome down from $100 million in 2001 to $10,000
in 2011.
Advanced methods and de novo sequencing
Genomic DNA is fragmented into random pieces and cloned as a
bacterial library. DNA from individual bacterial clones is sequenced and the
sequence is assembled by using overlapping DNA regions.(click to expand)
Large-scale sequencing often aims at sequencing very long
DNA pieces, such as whole chromosomes, although large-scale sequencing can also be
used to generate very large numbers of short sequences, such as found in phage
display. For longer targets such as chromosomes, common approaches consist
of cutting (with restriction enzymes) or shearing (with
mechanical forces) large DNA fragments into shorter DNA fragments. The
fragmented DNA may then be cloned into a DNA vector
and amplified in a bacterial host such as Escherichia
coli. Short DNA fragments purified from individual bacterial colonies
are individually sequenced and assembled
electronically into one long, contiguous sequence. Studies have shown that
adding a size selection step to collect DNA fragments of uniform size can
improve sequencing efficiency and accuracy of the genome assembly. In these
studies, automated sizing has proven to be more reproducible and precise than
manual gel sizing.
The term "de novo sequencing" specifically
refers to methods used to determine the sequence of DNA with no previously
known sequence. De novo translates from Latin as "from the
beginning". Gaps in the assembled sequence may be filled by primer
walking. The different strategies have different tradeoffs in speed and
accuracy; shotgun methods are often used for sequencing
large genomes, but its assembly is complex and difficult, particularly with sequence repeats often causing gaps in
genome assembly.
Most sequencing approaches use an in vitro cloning
step to amplify individual DNA molecules, because their molecular detection
methods are not sensitive enough for single molecule sequencing. Emulsion PCR
isolates individual DNA molecules along with primer-coated beads in aqueous
droplets within an oil phase. A polymerase chain reaction (PCR) then
coats each bead with clonal copies of the DNA molecule followed by
immobilization for later sequencing. Emulsion PCR is used in the methods
developed by Marguilis et al. (commercialized by 454
Life Sciences), Shendure and Porreca et al. (also known as "Polony sequencing") and SOLiD sequencing, (developed by Agencourt,
later Applied Biosystems, now Life Technologies).
Shotgun sequencing
Shotgun sequencing is a sequencing method designed for
analysis of DNA sequences longer than 1000 base pairs, up to and including
entire chromosomes. This method requires the target DNA to be broken into
random fragments. After sequencing individual fragments, the sequences can be
reassembled on the basis of their overlapping regions.
Bridge PCR
Another method for in vitro
clonal amplification is bridge PCR, in which fragments are amplified upon
primers attached to a solid surface and form "DNA
colonies" or "DNA clusters". This method is used in the Illumina Genome Analyzer sequencers.
Single-molecule methods, such as that developed by Stephen
Quake's laboratory (later commercialized by Helicos) are an exception: they use bright
fluorophores and laser excitation to detect base addition events from
individual DNA molecules fixed to a surface, eliminating the need for molecular
amplification.
Next-generation methods
Next-generation sequencing applies to genome sequencing,
genome resequencing, transcriptome profiling (RNA-Seq),
DNA-protein interactions (ChIP-sequencing), and epigenome
characterization.Resequencing is necessary, because the genome of a single
individual of a species will not indicate all of the genome variations among
other individuals of the same species.
The high demand for low-cost sequencing has driven the
development of high-throughput sequencing (or next-generation sequencing)
technologies that parallelize the sequencing process, producing
thousands or millions of sequences concurrently. High-throughput sequencing
technologies are intended to lower the cost of DNA sequencing beyond what is
possible with standard dye-terminator methods. In ultra-high-throughput
sequencing as many as 500,000 sequencing-by-synthesis operations may be run in
parallel.
Comparison of next-generation sequencing methods
|
||||||||
Method
|
Read length
|
Accuracy
|
Reads per run
|
Time per run
|
Cost per 1 million bases (in US$)
|
Advantages
|
Disadvantages
|
|
Single-molecule real-time sequencing (Pacific Bio)
|
5,500 bp to 8,500 bp avg (10,000 bp N50);
maximum read length >30,000 bases
|
99.999% consensus accuracy; 87% single-read accuracy
|
50,000 per SMRT cell, or ~400 megabases
|
30 minutes to 2 hours
|
$0.33–$1.00
|
Longest read length. Fast. Detects 4mC, 5mC, 6mA.
|
Moderate throughput. Equipment can be very expensive.
|
|
Ion semiconductor (Ion Torrent sequencing)
|
up to 400 bp
|
98%
|
up to 80 million
|
2 hours
|
$1
|
Less expensive equipment. Fast.
|
Homopolymer errors.
|
|
Pyrosequencing (454)
|
700 bp
|
99.9%
|
1 million
|
24 hours
|
$10
|
Long read size. Fast.
|
Runs are expensive. Homopolymer errors.
|
|
Sequencing by synthesis (Illumina)
|
50 to 300 bp
|
98%
|
up to 3 billion
|
1 to 10 days, depending upon sequencer and specified read
length
|
$0.05 to $0.15
|
Potential for high sequence yield, depending upon
sequencer model and desired application.
|
Equipment can be very expensive. Requires high
concentrations of DNA.
|
|
Sequencing by ligation (SOLiD sequencing)
|
50+35 or 50+50 bp
|
99.9%
|
1.2 to 1.4 billion
|
1 to 2 weeks
|
$0.13
|
Low cost per base.
|
Slower than other methods. Have issue sequencing
palindromic sequence.
|
|
Chain termination (Sanger sequencing)
|
400 to 900 bp
|
99.9%
|
N/A
|
20 minutes to 3 hours
|
$2400
|
Long individual reads. Useful for many applications.
|
More expensive and impractical for larger sequencing
projects.
|
|
Massively parallel signature sequencing (MPSS)
The first of the next-generation sequencing technologies, massively parallel signature
sequencing (or MPSS), was developed in the 1990s at Lynx Therapeutics, a
company founded in 1992 by Sydney Brenner and Sam Eletr. MPSS was a bead-based method that
used a complex approach of adapter ligation followed by adapter decoding,
reading the sequence in increments of four nucleotides. This method made it
susceptible to sequence-specific bias or loss of specific sequences. Because
the technology was so complex, MPSS was only performed 'in-house' by Lynx
Therapeutics and no DNA sequencing machines were sold to independent
laboratories. Lynx Therapeutics merged with Solexa (later acquired by Illumina) in 2004, leading to the development of
sequencing-by-synthesis, a simpler approach acquired from Manteia Predictive Medicine, which
rendered MPSS obsolete. However, the essential properties of the MPSS output
were typical of later "next-generation" data types, including
hundreds of thousands of short DNA sequences. In the case of MPSS, these were
typically used for sequencing cDNA for measurements of gene
expression levels.
Polony sequencing
The Polony sequencing method, developed in the
laboratory of George M. Church at Harvard, was among the first
next-generation sequencing systems and was used to sequence a full genome in
2005. It combined an in vitro paired-tag library with emulsion PCR, an
automated microscope, and ligation-based sequencing chemistry to sequence an E.
coli genome at an accuracy of >99.9999% and a cost approximately 1/9
that of Sanger sequencing. The technology was licensed to Agencourt
Biosciences, subsequently spun out into Agencourt Personal Genomics, and
eventually incorporated into the Applied Biosystems SOLiD platform, which is
now owned by Life Technologies,
which was recently bought by Thermo Fisher Scientific.
454 pyrosequencing
A parallelized version of pyrosequencing
was developed by 454 Life Sciences, which has since been acquired
by Roche Diagnostics. The method amplifies DNA
inside water droplets in an oil solution (emulsion PCR), with each droplet
containing a single DNA template attached to a single primer-coated bead that
then forms a clonal colony. The sequencing machine contains many picoliter-volume
wells each containing a single bead and sequencing enzymes. Pyrosequencing uses
luciferase
to generate light for detection of the individual nucleotides added to the
nascent DNA, and the combined data are used to generate sequence read-outs.
This technology provides intermediate read length and price per base compared
to Sanger sequencing on one end and Solexa and SOLiD on the other.
Illumina (Solexa) sequencing
Solexa,
now part of Illumina, was founded by Shankar Balasubramanian
and David Klenerman in 1998, and developed a sequencing method based on
reversible dye-terminators technology, and engineered polymerases.The
terminated chemistry was developed internally at Solexa and the concept of the
Solexa system was invented by Balasubramanian and Klenerman from Cambridge
University's chemistry department. In 2004, Solexa acquired the company Manteia Predictive Medicine in order to
gain a massivelly parallel sequencing technology based on "DNA
Clusters", which involves the clonal amplification of DNA on a surface.
The cluster technology was co-acquired with Lynx Therapeutics of California.
Solexa Ltd. later merged with Lynx to form Solexa Inc.
In this method, DNA molecules and primers are first attached
on a slide and amplified with polymerase so that local clonal DNA colonies, later coined
"DNA clusters", are formed. To determine the sequence, four types of
reversible terminator bases (RT-bases) are added and non-incorporated
nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the
dye, along with the terminal 3' blocker, is chemically removed from the DNA,
allowing for the next cycle to begin. Unlike pyrosequencing, the DNA chains are
extended one nucleotide at a time and image acquisition can be performed at a
delayed moment, allowing for very large arrays of DNA colonies to be captured
by sequential images taken from a single camera.
Decoupling the enzymatic reaction and the image capture
allows for optimal throughput and theoretically unlimited sequencing capacity.
With an optimal configuration, the ultimately reachable instrument throughput
is thus dictated solely by the analog-to-digital conversion rate of the camera,
multiplied by the number of cameras and divided by the number of pixels per DNA
colony required for visualizing them optimally (approximately 10
pixels/colony). In 2012, with cameras operating at more than 10 MHz A/D
conversion rates and available optics, fluidics and enzymatics, throughput can
be multiples of 1 million nucleotides/second, corresponding roughly to 1 human
genome equivalent at 1x coverage per hour per instrument, and 1 human genome
re-sequenced (at approx. 30x) per day per instrument (equipped with a single
camera).
SOLiD sequencing
Applied Biosystems' (now a Life Technologies
brand) SOLiD technology employs sequencing by ligation. Here, a pool of all
possible oligonucleotides of a fixed length are labeled according to the
sequenced position. Oligonucleotides are annealed and ligated; the preferential
ligation by DNA
ligase for matching sequences results in a signal informative of the
nucleotide at that position. Before sequencing, the DNA is amplified by
emulsion PCR. The resulting beads, each containing single copies of the same
DNA molecule, are deposited on a glass slide. The result is sequences of
quantities and lengths comparable to Illumina sequencing. This sequencing by ligation method has been
reported to have some issue sequencing palindromic sequences.
Ion Torrent semiconductor sequencing
Ion Torrent Systems Inc. (now owned by Life Technologies)
developed a system based on using standard sequencing chemistry, but with a
novel, semiconductor based detection system. This method of sequencing is based
on the detection of hydrogen ions that are released during the polymerisation
of DNA, as opposed to
the optical methods used in other sequencing systems. A microwell containing a
template DNA strand to be sequenced is flooded with a single type of nucleotide.
If the introduced nucleotide is complementary to the leading
template nucleotide it is incorporated into the growing complementary strand.
This causes the release of a hydrogen ion that triggers a hypersensitive ion
sensor, which indicates that a reaction has occurred. If homopolymer
repeats are present in the template sequence multiple nucleotides will be
incorporated in a single cycle. This leads to a corresponding number of
released hydrogens and a proportionally higher electronic signal.
DNA nanoball sequencing
DNA nanoball sequencing is a type of high throughput sequencing technology
used to determine the entire genomic
sequence of an organism. The company Complete
Genomics uses this technology to sequence samples submitted by independent
researchers. The method uses rolling circle replication to amplify
small fragments of genomic DNA into DNA nanoballs. Unchained sequencing by
ligation is then used to determine the nucleotide sequence. This method of DNA
sequencing allows large numbers of DNA nanoballs to be sequenced per run and at
low reagent
costs compared to other next generation sequencing platforms. However, only
short sequences of DNA are determined from each DNA nanoball which makes
mapping the short reads to a reference
genome difficult. This technology has been used for multiple genome
sequencing projects and is scheduled to be used for more.
Heliscope single molecule sequencing
Heliscope sequencing is a method of single-molecule
sequencing developed by Helicos Biosciences. It uses DNA fragments with
added poly-A tail adapters which are attached to the flow cell surface. The
next steps involve extension-based sequencing with cyclic washes of the flow
cell with fluorescently labeled nucleotides (one nucleotide type at a time, as
with the Sanger method). The reads are performed by the Heliscope sequencer.
The reads are short, up to 55 bases per run, but recent improvements allow for
more accurate reads of stretches of one type of nucleotides.
This sequencing method and equipment were used to sequence
the genome of the M13 bacteriophage.
Single molecule real time (SMRT) sequencing
SMRT sequencing is based on the sequencing by synthesis
approach. The DNA is synthesized in zero-mode wave-guides (ZMWs) – small
well-like containers with the capturing tools located at the bottom of the
well. The sequencing is performed with use of unmodified polymerase (attached
to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the
solution. The wells are constructed in a way that only the fluorescence
occurring by the bottom of the well is detected. The fluorescent label is
detached from the nucleotide upon its incorporation into the DNA strand,
leaving an unmodified DNA strand. According to Pacific Biosciences, the SMRT technology
developer, this methodology allows detection of nucleotide modifications (such
as cytosine methylation). This happens through the observation of polymerase
kinetics. This approach allows reads of 20,000 nucleotides or more, with
average read lengths of 5 kilobases.
Methods in development
DNA sequencing methods currently under development include
labeling the DNA polymerase, reading the sequence as a DNA strand transits
through nanopores, and microscopy-based techniques,
such as atomic force microscopy or transmission electron
microscopy that are used to identify the positions of individual
nucleotides within long DNA fragments (>5,000 bp) by nucleotide labeling
with heavier elements (e.g., halogens) for visual detection and recording.Third
generation technologies aim to increase throughput and decrease the time to
result and cost by eliminating the need for excessive reagents and harnessing
the processivity of DNA polymerase.
Nanopore DNA sequencing
This method is based on the readout of electrical signals
occurring at nucleotides passing by alpha-hemolysin
pores covalently bound with cyclodextrin. The DNA passing through the nanopore
changes its ion current. This change is dependent on the shape, size and length
of the DNA sequence. Each type of the nucleotide blocks the ion flow through
the pore for a different period of time. The method has a potential of
development as it does not require modified nucleotides, however single
nucleotide resolution is not yet available.
Two main areas of nanopore sequencing in development are
solid state nanopore sequencing, and protein based nanopore sequencing. Protein
nanopore sequencing utilizes membrane protein complexes ∝-Hemolysin and
MspA (Mycobacterium Smegmatis Porin A), which show great promise given their
ability to distinguish between individual and groups of nucleotides. Whereas,
solid-state nanopore sequencing utilizes synthetic materials such as silicon nitride
and aluminum oxide and it is preferred for its superior mechanical ability and
thermal and chemical stability. The fabrication method is essential for this
type of sequencing given that the nanopore array can contain hundreds of pores
with diameters smaller than eight nanometers.
The concept originated from the idea that single stranded
DNA or RNA molecules can be electrophoretically driven in a strict linear
sequence through a biological pore that can be less than eight nanometers, and
can be detected given that the molecules release an ionic current while moving through
the pore. The pore contains a detection region capable of recognizing different
bases, with each base generating various time specific signals corresponding to
the sequence of bases as they cross the pore which are then evaluated. When
implementing this process it is important to note that precise control over the
DNA transport through the pore is crucial for success. Various enzymes such as
exonucleases and polymerases have been used to moderate this process by
positioning them near the pore’s entrance.
Tunnelling currents DNA sequencing
Another approach uses measurements of the electrical
tunnelling currents across single-strand DNA as it moves through a channel.
Depending on its electronic structure each base affects the tunnelling current
differently, allowing differentiation between different bases.
The use of tunnelling currents has the potential to sequence
orders of magnitude faster than ionic current methods and the sequencing of
several DNA oligomers and micro-RNA has already been achieved.
Sequencing by hybridization
Sequencing by hybridization is a
non-enzymatic method that uses a DNA
microarray. A single pool of DNA whose sequence is to be determined is
fluorescently labeled and hybridized to an array containing known sequences.
Strong hybridization signals from a given spot on the array identifies its
sequence in the DNA being sequenced.
This method of sequencing utilizes binding characteristics
of a library of short single stranded DNA molecules (oligonucleotides) also
called DNA probes to reconstruct a target DNA sequence. Non-specific hybrids
are removed by washing and the target DNA is eluted. Hybrids are re-arranged
such that the DNA sequence can be reconstructed. The benefit of this sequencing
type is its ability to capture a large number of targets with a homogenous
coverage. Although a large number of chemicals and starting DNA is usually
required. But, with the advent of solution based hybridization much less
equipment and chemicals are necessary.
Sequencing with mass spectrometry
Mass spectrometry may be used to determine DNA
sequences. Matrix-assisted laser desorption ionization time-of-flight mass
spectrometry, or MALDI-TOF MS, has
specifically been investigated as an alternative method to gel electrophoresis
for visualizing DNA fragments. With this method, DNA fragments generated by
chain-termination sequencing reactions are compared by mass rather than by
size. The mass of each nucleotide is different from the others and this
difference is detectable by mass spectrometry. Single-nucleotide mutations in a
fragment can be more easily detected with MS than by gel electrophoresis alone.
MALDI-TOF MS can more easily detect differences between RNA fragments, so
researchers may indirectly sequence DNA with MS-based methods by converting it
to RNA first.
The higher resolution of DNA fragments permitted by MS-based
methods is of special interest to researchers in forensic science, as they may
wish to find single-nucleotide polymorphisms in
human DNA samples to identify individuals. These samples may be highly degraded
so forensic researchers often prefer mitochondrial
DNA for its higher stability and applications for lineage studies. MS-based
sequencing methods have been used to compare the sequences of human
mitochondrial DNA from samples in a Federal Bureau of Investigation
database and from bones found in mass graves of World War I soldiers.
Early chain-termination and TOF MS methods demonstrated read
lengths of up to 100 base pairs. Researchers have been unable to exceed this
average read size; like chain-termination sequencing alone, MS-based DNA
sequencing may not be suitable for large de novo sequencing projects.
Even so, a recent study did use the short sequence reads and mass spectroscopy
to compare single-nucleotide polymorphisms in pathogenic Streptococcus
strains.
Microfluidic Sanger sequencing
In microfluidic Sanger sequencing the entire thermocycling
amplification of DNA fragments as well as their separation by electrophoresis
is done on a single glass wafer (approximately 10 cm in diameter) thus
reducing the reagent usage as well as cost. In some instances researchers have
shown that they can increase the throughput of conventional sequencing through
the use of microchips. Research will still need to be done in order to make
this use of technology effective.
Microscopy-based techniques
This approach directly visualizes the sequence of DNA
molecules using electron microscopy. The first identification of DNA base pairs
within intact DNA molecules by enzymatically incorporating modified bases,
which contain atoms of increased atomic number, direct visualization and
identification of individually labeled bases within a synthetic 3,272 base-pair
DNA molecule and a 7,249 base-pair viral genome has been demonstrated.
RNAP sequencing
This method is based on use of RNA
polymerase (RNAP), which is attached to a polystyrene
bead. One end of DNA to be sequenced is attached to another bead, with both
beads being placed in optical traps. RNAP motion during transcription brings
the beads in closer and their relative distance changes, which can then be
recorded at a single nucleotide resolution. The sequence is deduced based on
the four readouts with lowered concentrations of each of the four nucleotide
types, similarly to the Sanger method.
RNA polymerase is attached to one end of a polystyrene bead
and the other end is attached to the distal end of a DNA fragment. Each bead is
then stuck in to an optical trap that levitates the beads. The interactions
between the RNAP and the DNA result in a change in the length of the DNA
between the two beads. This change is the measured with precision resulting in
a single base resolution on a single DNA molecule. This is then repeated four
times where each time there is a lower concentration of one of the four
nucleotides, this shares some similarity with the primers used in the Sanger
Sequencing method. A comparison is made between regions and sequence
information is deduced by comparing the known sequence regions to the unknown
sequence regions.
In vitro virus high-throughput sequencing
A method has been developed to analyze full sets of protein
interactions using a combination of 454 pyrosequencing and an in vitro
virus mRNA
display method. Specifically, this method covalently links proteins of
interest to the mRNAs encoding them, then detects the mRNA pieces using reverse
transcription PCRs. The mRNA may then be amplified and
sequenced. The combined method was titled IVV-HiTSeq and can be performed under
cell-free conditions, though its results may not be representative of in
vivo conditions.
Development initiatives
In October 2006, the X Prize Foundation established an initiative to
promote the development of full genome sequencing technologies, called
the Archon
X Prize, intending to award $10 million to "the first Team that can
build a device and use it to sequence 100 human genomes within 10 days or less,
with an accuracy of no more than one error in every 100,000 bases sequenced,
with sequences accurately covering at least 98% of the genome, and at a
recurring cost of no more than $10,000 (US) per genome."
Each year the National Human Genome Research
Institute, or NHGRI, promotes grants for new research and developments in genomics. 2010
grants and 2011 candidates include continuing work in microfluidic, polony and
base-heavy sequencing methodologies.
Computational challenges
The sequencing technologies described here produce raw data
that needs to be assembled into longer sequences such as complete genomes (sequence
assembly). There are many computational challenges to achieve this, such as
the evaluation of the raw sequence data which is done by programs and
algorithms such as Phred and Phrap. Other
challenges have to deal with repetitive
sequences that often prevent complete genome assemblies because they occur in
many places of the genome. As a consequence, many sequences may not be assigned
to particular chromosomes. The production of raw sequence data is only
the beginning of its detailed bioinformatical
analysis. Yet new methods for sequencing and correcting sequencing errors were
developed.
Read Trimming
Sometimes, the raw reads produced by the sequencer are
correct and precise only in a fraction of their length. Using the entire read
may introduce artifacts in the downstream analyses like genome assembly, snp
calling, or gene expression estimation. Two classes of trimming programs have
been introduced, based on the window-based or the running-sum classes of
algorithms. This is a partial list of the trimming algorithms currently
available, specifying the algorithm class they belong to:
- Cutadapt Running sum
- ConDeTri Window based
- ERNE-FILTER Running sum
- FASTX quality trimmer Window based
- PRINSEQ Window based
- Trimmomatic Window based
- SolexaQA Window based
- SolexaQA-BWA Running sum
- Sickle Window based
SUBSCRIBERS - ( LINKS) :FOLLOW / REF / 2 /
findleverage.blogspot.com
Krkz77@yahoo.com
+234-81-83195664
No comments:
Post a Comment