Information

12.3: Genomics and Proteomics - Biology

12.3: Genomics and Proteomics - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The study of nucleic acids began with the discovery of DNA, progressed to the study of genes and small fragments, and has now exploded to the field of genomics. Genomics is the study of entire genomes, including the complete set of genes, their nucleotide sequence and organization, and their interactions within a species and with other species. The advances in genomics have been made possible by DNA sequencing technology. Just as information technology has led to Google Maps that enable us to get detailed information about locations around the globe, genomic information is used to create similar maps of the DNA of different organisms.

Mapping Genomes

Genome mapping is the process of finding the location of genes on each chromosome. The maps that are created are comparable to the maps that we use to navigate streets. A genetic map is an illustration that lists genes and their location on a chromosome. Genetic maps provide the big picture (similar to a map of interstate highways) and use genetic markers (similar to landmarks). A genetic marker is a gene or sequence on a chromosome that shows genetic linkage with a trait of interest. The genetic marker tends to be inherited with the gene of interest, and one measure of distance between them is the recombination frequency during meiosis. Early geneticists called this linkage analysis.

Physical maps get into the intimate details of smaller regions of the chromosomes (similar to a detailed road map) (Figure 10.3.1). A physical map is a representation of the physical distance, in nucleotides, between genes or genetic markers. Both genetic linkage maps and physical maps are required to build a complete picture of the genome. Having a complete map of the genome makes it easier for researchers to study individual genes. Human genome maps help researchers in their efforts to identify human disease-causing genes related to illnesses such as cancer, heart disease, and cystic fibrosis, to name a few. In addition, genome mapping can be used to help identify organisms with beneficial traits, such as microbes with the ability to clean up pollutants or even prevent pollution. Research involving plant genome mapping may lead to methods that produce higher crop yields or to the development of plants that adapt better to climate change.

Genetic maps provide the outline, and physical maps provide the details. It is easy to understand why both types of genome-mapping techniques are important to show the big picture. Information obtained from each technique is used in combination to study the genome. Genomic mapping is used with different model organisms that are used for research. Genome mapping is still an ongoing process, and as more advanced techniques are developed, more advances are expected. Genome mapping is similar to completing a complicated puzzle using every piece of available data. Mapping information generated in laboratories all over the world is entered into central databases, such as the National Center for Biotechnology Information (NCBI). Efforts are made to make the information more easily accessible to researchers and the general public. Just as we use global positioning systems instead of paper maps to navigate through roadways, NCBI allows us to use a genome viewer tool to simplify the data mining process.

CONCEPT IN ACTION

Online Mendelian Inheritance in Man (OMIM) is a searchable online catalog of human genes and genetic disorders. This website shows genome mapping, and also details the history and research of each trait and disorder. Click the link to search for traits (such as handedness) and genetic disorders (such as diabetes).

Whole Genome Sequencing

Although there have been significant advances in the medical sciences in recent years, doctors are still confounded by many diseases and researchers are using whole genome sequencing to get to the bottom of the problem. Whole genome sequencing is a process that determines the DNA sequence of an entire genome. Whole genome sequencing is a brute-force approach to problem solving when there is a genetic basis at the core of a disease. Several laboratories now provide services to sequence, analyze, and interpret entire genomes.

In 2010, whole genome sequencing was used to save a young boy whose intestines had multiple mysterious abscesses. The child had several colon operations with no relief. Finally, a whole genome sequence revealed a defect in a pathway that controls apoptosis (programmed cell death). A bone marrow transplant was used to overcome this genetic disorder, leading to a cure for the boy. He was the first person to be successfully diagnosed using whole genome sequencing.

The first genomes to be sequenced, such as those belonging to viruses, bacteria, and yeast, were smaller in terms of the number of nucleotides than the genomes of multicellular organisms. The genomes of other model organisms, such as the mouse (Mus musculus), the fruit fly (Drosophila melanogaster), and the nematode (Caenorhabditis elegans) are now known. A great deal of basic research is performed in model organismsbecause the information can be applied to other organisms. A model organism is a species that is studied as a model to understand the biological processes in other species that can be represented by the model organism. For example, fruit flies are able to metabolize alcohol like humans, so the genes affecting sensitivity to alcohol have been studied in fruit flies in an effort to understand the variation in sensitivity to alcohol in humans. Having entire genomes sequenced helps with the research efforts in these model organisms (Figure 10.3.2).

The first human genome sequence was published in 2003. The number of whole genomes that have been sequenced steadily increases and now includes hundreds of species and thousands of individual human genomes.

Applying Genomics

The introduction of DNA sequencing and whole genome sequencing projects, particularly the Human Genome Project, has expanded the applicability of DNA sequence information. Genomics is now being used in a wide variety of fields, such as metagenomics, pharmacogenomics, and mitochondrial genomics. The most commonly known application of genomics is to understand and find cures for diseases.

Predicting Disease Risk at the Individual Level

Predicting the risk of disease involves screening and identifying currently healthy individuals by genome analysis at the individual level. Intervention with lifestyle changes and drugs can be recommended before disease onset. However, this approach is most applicable when the problem arises from a single gene mutation. Such defects only account for about 5 percent of diseases found in developed countries. Most of the common diseases, such as heart disease, are multifactorial or polygenic, which refers to a phenotypic characteristic that is determined by two or more genes, and also environmental factors such as diet. In April 2010, scientists at Stanford University published the genome analysis of a healthy individual (Stephen Quake, a scientist at Stanford University, who had his genome sequenced); the analysis predicted his propensity to acquire various diseases. A risk assessment was done to analyze Quake’s percentage of risk for 55 different medical conditions. A rare genetic mutation was found that showed him to be at risk for sudden heart attack. He was also predicted to have a 23 percent risk of developing prostate cancer and a 1.4 percent risk of developing Alzheimer’s disease. The scientists used databases and several publications to analyze the genomic data. Even though genomic sequencing is becoming more affordable and analytical tools are becoming more reliable, ethical issues surrounding genomic analysis at a population level remain to be addressed. For example, could such data be legitimately used to charge more or less for insurance or to affect credit ratings?

Genome-wide Association Studies

Since 2005, it has been possible to conduct a type of study called a genome-wide association study, or GWAS. A GWAS is a method that identifies differences between individuals in single nucleotide polymorphisms (SNPs) that may be involved in causing diseases. The method is particularly suited to diseases that may be affected by one or many genetic changes throughout the genome. It is very difficult to identify the genes involved in such a disease using family history information. The GWAS method relies on a genetic database that has been in development since 2002 called the International HapMap Project. The HapMap Project sequenced the genomes of several hundred individuals from around the world and identified groups of SNPs. The groups include SNPs that are located near to each other on chromosomes so they tend to stay together through recombination. The fact that the group stays together means that identifying one marker SNP is all that is needed to identify all the SNPs in the group. There are several million SNPs identified, but identifying them in other individuals who have not had their complete genome sequenced is much easier because only the marker SNPs need to be identified.

In a common design for a GWAS, two groups of individuals are chosen; one group has the disease, and the other group does not. The individuals in each group are matched in other characteristics to reduce the effect of confounding variables causing differences between the two groups. For example, the genotypes may differ because the two groups are mostly taken from different parts of the world. Once the individuals are chosen, and typically their numbers are a thousand or more for the study to work, samples of their DNA are obtained. The DNA is analyzed using automated systems to identify large differences in the percentage of particular SNPs between the two groups. Often the study examines a million or more SNPs in the DNA. The results of GWAS can be used in two ways: the genetic differences may be used as markers for susceptibility to the disease in undiagnosed individuals, and the particular genes identified can be targets for research into the molecular pathway of the disease and potential therapies. An offshoot of the discovery of gene associations with disease has been the formation of companies that provide so-called “personal genomics” that will identify risk levels for various diseases based on an individual’s SNP complement. The science behind these services is controversial.

Because GWAS looks for associations between genes and disease, these studies provide data for other research into causes, rather than answering specific questions themselves. An association between a gene difference and a disease does not necessarily mean there is a cause-and-effect relationship. However, some studies have provided useful information about the genetic causes of diseases. For example, three different studies in 2005 identified a gene for a protein involved in regulating inflammation in the body that is associated with a disease-causing blindness called age-related macular degeneration. This opened up new possibilities for research into the cause of this disease. A large number of genes have been identified to be associated with Crohn’s disease using GWAS, and some of these have suggested new hypothetical mechanisms for the cause of the disease.

Pharmacogenomics

Pharmacogenomics involves evaluating the effectiveness and safety of drugs on the basis of information from an individual's genomic sequence. Personal genome sequence information can be used to prescribe medications that will be most effective and least toxic on the basis of the individual patient’s genotype. Studying changes in gene expression could provide information about the gene transcription profile in the presence of the drug, which can be used as an early indicator of the potential for toxic effects. For example, genes involved in cellular growth and controlled cell death, when disturbed, could lead to the growth of cancerous cells. Genome-wide studies can also help to find new genes involved in drug toxicity. The gene signatures may not be completely accurate, but can be tested further before pathologic symptoms arise.

Metagenomics

Traditionally, microbiology has been taught with the view that microorganisms are best studied under pure culture conditions, which involves isolating a single type of cell and culturing it in the laboratory. Because microorganisms can go through several generations in a matter of hours, their gene expression profiles adapt to the new laboratory environment very quickly. On the other hand, many species resist being cultured in isolation. Most microorganisms do not live as isolated entities, but in microbial communities known as biofilms. For all of these reasons, pure culture is not always the best way to study microorganisms. Metagenomics is the study of the collective genomes of multiple species that grow and interact in an environmental niche. Metagenomics can be used to identify new species more rapidly and to analyze the effect of pollutants on the environment (Figure 10.3.3). Metagenomics techniques can now also be applied to communities of higher eukaryotes, such as fish.

Creation of New Biofuels

Knowledge of the genomics of microorganisms is being used to find better ways to harness biofuels from algae and cyanobacteria. The primary sources of fuel today are coal, oil, wood, and other plant products such as ethanol. Although plants are renewable resources, there is still a need to find more alternative renewable sources of energy to meet our population’s energy demands. The microbial world is one of the largest resources for genes that encode new enzymes and produce new organic compounds, and it remains largely untapped. This vast genetic resource holds the potential to provide new sources of biofuels (Figure 10.3.4).

Mitochondrial Genomics

Mitochondria are intracellular organelles that contain their own DNA. Mitochondrial DNA mutates at a rapid rate and is often used to study evolutionary relationships. Another feature that makes studying the mitochondrial genome interesting is that in most multicellular organisms, the mitochondrial DNA is passed on from the mother during the process of fertilization. For this reason, mitochondrial genomics is often used to trace genealogy.

Genomics in Forensic Analysis

Information and clues obtained from DNA samples found at crime scenes have been used as evidence in court cases, and genetic markers have been used in forensic analysis. Genomic analysis has also become useful in this field. In 2001, the first use of genomics in forensics was published. It was a collaborative effort between academic research institutions and the FBI to solve the mysterious cases of anthrax (Figure 10.3.5) that was transported by the US Postal Service. Anthrax bacteria were made into an infectious powder and mailed to news media and two U.S. Senators. The powder infected the administrative staff and postal workers who opened or handled the letters. Five people died, and 17 were sickened from the bacteria. Using microbial genomics, researchers determined that a specific strain of anthrax was used in all the mailings; eventually, the source was traced to a scientist at a national biodefense laboratory in Maryland.

Genomics in Agriculture

Genomics can reduce the trials and failures involved in scientific research to a certain extent, which could improve the quality and quantity of crop yields in agriculture (Figure 10.3.6). Linking traits to genes or gene signatures helps to improve crop breeding to generate hybrids with the most desirable qualities. Scientists use genomic data to identify desirable traits, and then transfer those traits to a different organism to create a new genetically modified organism, as described in the previous module. Scientists are discovering how genomics can improve the quality and quantity of agricultural production. For example, scientists could use desirable traits to create a useful product or enhance an existing product, such as making a drought-sensitive crop more tolerant of the dry season.

Proteomics

Proteins are the final products of genes that perform the function encoded by the gene. Proteins are composed of amino acids and play important roles in the cell. All enzymes (except ribozymes) are proteins and act as catalysts that affect the rate of reactions. Proteins are also regulatory molecules, and some are hormones. Transport proteins, such as hemoglobin, help transport oxygen to various organs. Antibodies that defend against foreign particles are also proteins. In the diseased state, protein function can be impaired because of changes at the genetic level or because of direct impact on a specific protein.

A proteome is the entire set of proteins produced by a cell type. Proteomes can be studied using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. The study of the function of proteomes is called proteomics. Proteomics complements genomics and is useful when scientists want to test their hypotheses that were based on genes. Even though all cells in a multicellular organism have the same set of genes, the set of proteins produced in different tissues is different and dependent on gene expression. Thus, the genome is constant, but the proteome varies and is dynamic within an organism. In addition, RNAs can be alternatively spliced (cut and pasted to create novel combinations and novel proteins), and many proteins are modified after translation. Although the genome provides a blueprint, the final architecture depends on several factors that can change the progression of events that generate the proteome.

Genomes and proteomes of patients suffering from specific diseases are being studied to understand the genetic basis of the disease. The most prominent disease being studied with proteomic approaches is cancer (Figure 10.3.7). Proteomic approaches are being used to improve the screening and early detection of cancer; this is achieved by identifying proteins whose expression is affected by the disease process. An individual protein is called a biomarker, whereas a set of proteins with altered expression levels is called a protein signature. For a biomarker or protein signature to be useful as a candidate for early screening and detection of a cancer, it must be secreted in body fluids such as sweat, blood, or urine, so that large-scale screenings can be performed in a noninvasive fashion. The current problem with using biomarkers for the early detection of cancer is the high rate of false-negative results. A false-negative result is a negative test result that should have been positive. In other words, many cases of cancer go undetected, which makes biomarkers unreliable. Some examples of protein biomarkers used in cancer detection are CA-125 for ovarian cancer and PSA for prostate cancer. Protein signatures may be more reliable than biomarkers to detect cancer cells. Proteomics is also being used to develop individualized treatment plans, which involves the prediction of whether or not an individual will respond to specific drugs and the side effects that the individual may have. Proteomics is also being used to predict the possibility of disease recurrence.

The National Cancer Institute has developed programs to improve the detection and treatment of cancer. The Clinical Proteomic Technologies for Cancer and the Early Detection Research Network are efforts to identify protein signatures specific to different types of cancers. The Biomedical Proteomics Program is designed to identify protein signatures and design effective therapies for cancer patients.

Summary

Genome mapping is similar to solving a big, complicated puzzle with pieces of information coming from laboratories all over the world. Genetic maps provide an outline for the location of genes within a genome, and they estimate the distance between genes and genetic markers on the basis of the recombination frequency during meiosis. Physical maps provide detailed information about the physical distance between the genes. The most detailed information is available through sequence mapping. Information from all mapping and sequencing sources is combined to study an entire genome.

Whole genome sequencing is the latest available resource to treat genetic diseases. Some doctors are using whole genome sequencing to save lives. Genomics has many industrial applications, including biofuel development, agriculture, pharmaceuticals, and pollution control.

Imagination is the only barrier to the applicability of genomics. Genomics is being applied to most fields of biology; it can be used for personalized medicine, prediction of disease risks at an individual level, the study of drug interactions before the conduction of clinical trials, and the study of microorganisms in the environment as opposed to the laboratory. It is also being applied to the generation of new biofuels, genealogical assessment using mitochondria, advances in forensic science, and improvements in agriculture.

Proteomics is the study of the entire set of proteins expressed by a given type of cell under certain environmental conditions. In a multicellular organism, different cell types will have different proteomes, and these will vary with changes in the environment. Unlike a genome, a proteome is dynamic and under constant flux, which makes it more complicated and more useful than the knowledge of genomes alone.

Glossary

biomarker
an individual protein that is uniquely produced in a diseased state
genetic map
an outline of genes and their location on a chromosome that is based on recombination frequencies between markers
genomics
the study of entire genomes, including the complete set of genes, their nucleotide sequence and organization, and their interactions within a species and with other species
metagenomics
the study of the collective genomes of multiple species that grow and interact in an environmental niche
model organism
a species that is studied and used as a model to understand the biological processes in other species represented by the model organism
pharmacogenomics
the study of drug interactions with the genome or proteome; also called toxicogenomics
physical map
a representation of the physical distance between genes or genetic markers
protein signature
a set of over- or under-expressed proteins characteristic of cells in a particular diseased tissue
proteomics
study of the function of proteomes

Genomics Proteomics Core

We have recently added the exciting and powerful new, cutting edge proteomics assay, SOMAscan, from SomaLogic. The core is the exclusive service provider for this 1310 protein biomarker discovery platform in the Boston/Longwood Medical Area.

SOMAscan (SomaLogic) is a high multiplex, high sensitivity aptamer-based immune-like protein and biomarker discovery platform that simultaneously quantifies 1,310 human proteins in all types of protein extracts from bodily fluids such as serum, plasma, urine, saliva, CSF, cyst fluid, and tissue, cells, lavage, animal models and exosomes.

Services

Genomics Services
  • GeneChip HT Genome Array Plate Set - New GeneChip HT System designed to complete large projects of up to 96 samples per run, including human, mouse and rat
  • Whole Genome Cartridge Gene Arrays
  • We offer amplification services for limited amount of RNA and partially degraded RNA (FFPE)
Proteomic Services
  • High multiplex, high sensitivity aptamer-based SOMAscan-1310 protein biomarker discovery
  • Relative quantitation by µLC/MS/MS
  • ITRAQ - 8-plex isobaric peptide tagging system that enables you to label all primary amines, regardless of peptide class
  • SILAC - stable isotope labeling of amino acids in cell culture which is a biosynthetic approach
  • GIST - global internal standard technology, a post digestion peptide level labeling technique
  • ICAT - isotope coded affinity tag -based protein profiling
  • Protein identification by LC-MALDI and LC/MS/MS
  • Coomassie & silver stained gel band
  • Identification of protein and peptide modifications
  • Phosphorylation sites
  • Protein modifications such as acetylation, methylation, and ubiquitination
  • Coomassie stain only, purified proteins
  • Protein profiling of complex biological samples (serum, urine, CSF, biopsies tissue, cell extracts, lavage, saliva, etc).
  • Profiling by µLC/MS
  • Multidimensional protein fractionation by µLC
Data Analysis and Bioinformatics Services

In addition, the core offers a full bioinformatics core for data management and analysis as well as new software development and the capability to perform a wide variety of genomic and high throughput assays.


Genomics and proteomics analysis of cultured primary rat hepatocytes

The use of animal models in pharmaceutical research is a costly and sometimes misleading method of generating toxicity data and hence predicting human safety. Therefore, in vitro test systems, such as primary rat hepatocytes, and the developing genomics and proteomics technologies, are playing an increasingly important role in toxicological research. Gene and protein expression analysis were investigated in a time series (up to 5 days) of primary rat hepatocytes cultured on collagen coated dishes. Especially after 24 h, a significant down-regulation of many important Phase I and Phase II enzymes (e.g., cytochrome P450’s, glutathione-S-transferases, sulfotransferases) involved in xenobiotic metabolism, and antioxidative enzymes (e.g., catalase, superoxide dismutase, glutathione peroxidase) was observed. Acute-phase-response enzymes were frequently up-regulated (e.g., LPS binding protein, α-2-macro-globulin, ferritin, serine proteinase inhibitor B, haptoglobin), which is likely to be a result of cellular stress caused by the cell isolation procedure (perfusion) itself. A parallel observation was the increased expression of several structural genes (e.g., β-actin, α-tubulin, vimentin), possibly caused by other proliferating cell types in the culture, such as fibroblasts or alternatively by hepatocyte dedifferentiation.

In conclusion, the careful interpretation of data derived from this in vitro system indicates that primary hepatocytes can be successfully used for short-term toxicity studies up to 24 h. However, culturing conditions need to be further optimized to reduce the massive changes of gene and protein expression of long-term cultured hepatocytes to allow practical applications as a long-term toxicity test system.


Experimental procedures

The development of these data formats has taken place since 2014 and it has been an open process via conference calls and discussions at the PSI annual meetings. Both format specifications have been submitted to the PSI document process [31] for review. The overall goal of this process, analogous to an iterative scientific manuscript review, is that all formalized standards are thoroughly assessed. This process is handled by the PSI Editor and external reviewers who can provide feedback on the format specifications. Additionally, there is a phase for public comments, ensuring the involvement of heterogeneous points of view from the community. At the moment of writing, the PSI review process has been finalized for both formats and version 1.0 of both of them is stable.

Both formats use controlled vocabulary (CV) terms and definitions as part of the PSI-MS CV [32], also used in other PSI data formats. All the related documentation, including the detailed file format specifications and example files, are available at http://www.psidev.info/probam and at http://www.psidev.info/probed.

Overview of the proBAM and proBed formats

The proteogenomics formats proBAM and proBed are designed to store a genome-centric representation of proteomics data (Fig. 1). As mentioned above, both formats are highly compatible with their originating genomics counterparts, thus benefiting already from a plethora of existing tools developed by the genomics community.

Overview of the proBAM and proBed proteogenomics standard formats. Both proBAM and proBed can be created from well-established proteomics standard formats containing peptide and protein identification information (mzTab and mzIdentML, blue box), which are derived from their corresponding MS data spectrum files (mzML, brown box). The proBAM and proBed formats (green box) contain similar PSM-related and genomic mapping information, yet proBAM contains more details, including enzymatic (protease) information, key in proteomics experiments (enzyme type, mis-cleavages, enzymatic termini, etc.) and mapping details (CIGAR, flag, etc.). Additionally, proBAM is able to hold a full MS-based proteomics identification result set, enabling further downstream analysis in addition to genome-centric visualization, as it is also the purpose for proBed (purple box)

ProBAM overview

The BAM format was originally designed to hold alignments of short DNA or RNA reads to a reference genome [22, 23]. A BAM file typically consists of a header section storing metadata and an alignment section storing mapping data (Figs. 1 and 2 Additional file 1: Table S1A). The metadata can include information about the sample identity, technical parameters in data generation (such as library, platform, etc.), and data processing (such as mapping tool used, duplicate marking, etc.). Essential information includes where reads are aligned, how good the alignment is, and the quality of the reads. Specific fields or tags are designed to represent or encode such information. The proBAM format inherits all these features. In this case, sequencing reads are replaced by PSMs (see proBAM specification document for full details, http://www.psidev.info/probam#proBAM_specs).

Fields of proBAM and proBed format. A proBed file holds 12 original BED columns (highlighted by a bold box) and 13 additional proBed columns. The proBAM alignment record contains 11 original BAM columns (highlighted by a bold box) and 21 proBAM-specific columns, using the TAG:TYPE:VALUE format. Each row in the table represents a column in proBAM and proBed. The rows are colored to reflect the categories of information provided in the two formats (see color legend at the bottom, the header section of proBAM format is not included here). The rows without any background color in the proBAM table represent original BAM columns that are not used in proBAM but that are retained for compatibility. The last row in the proBAM table indicates the customized columns that could be potentially used

It should be noted that, since the tags used in BAM usually have recognized meanings, we did not attempt to repurpose any of them but rather created new ones to accommodate specific proteomics data types such as PSM scores, charge states, and protein PTMs (Fig. 2 and proBAM specification document section 4.4.1 for full description on PSM-specific tags). We also envisioned that additional fields and tags may be necessary to hold additional aspects of proteomics data. We thus designed a “Z?” tag as an extension anchor. Analogously to proBed, the format can also accommodate peptides (as groups of PSMs with the same peptide sequence).

ProBed overview

The original BED format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1), developed by the UCSC, provides a flexible way to define data lines that can be displayed as annotation tracks. proBed is an extension to the original BED file format [28]. In BED, data lines are formatted in plain text with white-space separated fields. Each data line represents one item mapped to the genome. The first three fields (corresponding to genomic coordinates) are mandatory and an additional nine fields are standardized and commonly interpreted by genome browsers and other tools, totaling 12 BED fields, re-used here. The proBed format includes a further 13 fields to describe information primarily on peptide-spectrum matches (PSMs) (Figs. 1 and 2 Additional file 1: Table S1B). The format can also accommodate peptides (as groups of PSMs with the same peptide sequence), but in that case, some assumptions need to be taken in some of the fields (see proBed specification document section 6.8 for details, http://www.psidev.info/probed#proBed_specs).

Distinct features of proBAM and proBed and their use cases

The proBAM and proBed formats differ in similar ways as their genomic counterparts do, although representing analogous information. In fact, proBAM and proBed are complementary and have different use cases. Figure 3 shows two examples of proBAM and proBed visualization tracks of the same datasets. An IGV and Ensembl visualization are presented including multiple splice-junction peptides (Fig. 3a) and a novel translation initiation event in the HDGF gene locus (Fig. 3b), respectively.

Visualization of proBAM and proBed files in genome browsers. a IGV visualization: proBAM (green box) and proBed (red box) files coming from the same dataset (accession number PXD001524 in the PRIDE database). proBed files are usually loaded as annotation tracks in IGV whereas proBAM files are loaded in the mapping section. b Ensembl visualization: proBAM (green box) and proBed (red box) files derived from the same dataset (accession number PXD000124) illustrating a novel translational event. The N-terminal proteomics identification result points to an alternative translation initiation site (TIS) for the gene HDGF at a near-cognate start-site located in the 5’-UTR of the transcript (blue box)

Similar to the designed purposes of SAM/BAM, the basic concepts behind the proBAM format are: (1) to provide genome coordinates as well as detailed mapping information, including CIGAR, flag, nucleotide sequences, etc. (2) to hold richer proteomics-related information and (3) to serve as a well-defined interface between PSM identification and downstream analyses. Therefore, the proBAM format contains much more information about the peptide-gene mapping statuses as well as PSM-related information, when compared to proBed. Peptide and nucleotide sequences are inherently embedded in proBAM, which can be useful for achieving improved visualization by tools such as IGV. This feature enables intuitive display of the coverage of a region of interest, peptides at splice junctions, single nucleotide/amino acid variation, and alternative spliced isoforms (Fig. 3), among others. Therefore, proBAM can hold the full MS proteomics result set, whereupon further downstream analysis can be performed: gene-level inference [33], basic spectral count based quantitative analysis, reanalysis based on different scoring systems, and/or false discovery rate (FDR) thresholds.

The proBed format, on the other hand, is more tailored for storing only the final results of a given proteogenomics analysis, without providing the full details. The BED format is commonly used to represent genomic features. Thus, proBed stores browser track information at the PSM and/or peptide level mainly for visualization purposes. As a key point, proBed files can be converted to BigBed [34], a binary format based on BED, which represents a feasible way to store the same information present in BED as compressed binary files, and is the final routinely used format as annotation tracks. It should be noted that a proBAM to proBed conversion should be possible and vice versa. However, “null” values for some of the Tags would be logically expected for the mapping from proBed to proBAM.

Software implementations

Both proBAM and proBed are fully compatible out-of-the-box with existing tools designed for the original SAM/BAM and BED files. Therefore, existing popular tools in the genomics community can readily be applied to read, merge and visualize these formats (Table 1). As mentioned already, several stand-alone and web genome browsers are available to visualize these formats, e.g. UCSC browser, Ensembl, Integrative Genomics Viewer, and JBrowse. For visualizing MS/MS identification results, an integrated proteomics data visualization tool, PDV (Table 1), currently accepts proBAM and matched spectrum file as input.

Routinely used command line tools such as SAMtools allow to manipulate (index, merge, sort) alignments in proBAM. Bedtools, seen as the “Swiss-army knife” tools for a wide range of genomic analysis tasks, allows similar actions to both formats, including, among others, intersection, merging, count, shuffling, and conversion functionality. Conversion from proBAM to CRAM format is also enabled by tools as SAMtools, Scramble, or Picard. With the UCSC “bedToBigBed” converter tool (http://hgdownload.soe.ucsc.edu/admin/exe/), one can also convert the proBed to bigBed. In this context, it is important to note that bedToBigBed version 2.87 is highlighted in the proBed format specification as the reliable version that can be used to create bigBed files coming from proBed (version 1.0) files.

There is also software specifically written for proBAM and proBed, supporting all the proteomics-related features. In fact, proteogenomics data encoded in the PSI standard formats mzIdentML and mzTab can be converted into proBAM and proBed, although it should be noted that the representation for proteogenomics data in mzIdentML has only been formalized recently [35]. In this context, first of all, the open-source Java library ms-data-core-api, created to handle different proteomics file formats using the same interface, can be used to write proBed [36]. A Java command line tool, PGConverter (https://github.com/PRIDE-Toolsuite/PGConverter), is also able to convert from mzIdentML and mzTab to proBed and bigBed. Analogously, several tools are available to write proBAM files, such as the Bioconductor proBAMr package. An additional R package, called proBAMtools, is also available to analyze fully exported MS-based proteomics results in proBAM [33]. proBAMtools was specifically designed to perform various analyses using proBAM files, including functions for genome-based proteomics data interpretation, protein and gene inference, count-based quantification, and data integration. It also provides a function to generate a peptide-based proBAM file coming from a PSM-based one.

ProBAMconvert is another intuitive tool that enables the conversion from mzIdentML, mzTab, and pepXML (another popular proteomics open format) [37] to both peptide- or PSM-based proBAM and proBed (http://probam.biobix.be) [38]. It is available as a command line interface (CLI) and a graphical user interface (GUI for Mac OS X, Windows and Linux). As with CLI, it is also wrapped in a Bioconda package (https://bioconda.github.io/recipes/probamconvert/README.html) and in a Galaxy tool, available from the public test toolshed (https://testtoolshed.g2.bx.psu.edu/view/galaxyp/probamconvert). The PGConverter tool also allows the validation of proBed files. For proBAM files, a validator is available that checks the validity of the original SAM/BAM format (https://github.com/statgen/bamUtil), although additional proteogenomics data verification still needs to be implemented.


Abstract

The biology and disease oriented branch of the Human Proteome Project (B/D-HPP) was established by the Human Proteome Organization (HUPO) with the main goal of supporting the broad application of state-of the-art measurements of proteins and proteomes by life scientists studying the molecular mechanisms of biological processes and human disease. This will be accomplished through the generation of research and informational resources that will support the routine and definitive measurement of the process or disease relevant proteins. The B/D-HPP is highly complementary to the C-HPP and will provide datasets and biological characterization useful to the C-HPP teams. In this manuscript we describe the goals, the plans, and the current status of the of the B/D-HPP.


Mr. Sandipan Ray. Sandipan Ray received his M.Sc. Degree in Biotechnology from the University of Calcutta, India in 2009. Presently, he is working as a senior research fellow at the Department of Biosciences and Bioengineering, IIT Bombay, India. He has published quite a few peer-reviewed research articles and reviews in the field of clinical proteomics and emerging proteomics technologies. He is a member of the Human Proteome Organisation (HUPO), US-HUPO, and Proteomics Society, India (PSI). He is actively involved in the development of Virtual Proteomics Lab and other related E-Learning resources at IIT Bombay. His current research interests include serum proteome analysis of Falciparum and Vivax Malaria to decipher disease pathogenesis, host immune response and identify surrogate protein markers.

Ms. Nicole Rachel Koshy. Nicole completed her Masters in Bioinformatics from CMS college of Science and Commerce, Coimbatore in 2008 and went on to pursue her M.S. in Biotechnology from the University of Houston — Clear Lake, Texas in the U.S. She worked on the virtual proteomics laboratory project at the IIT — Bombay and contributed to the bioinformatics module of experiments. She has also worked on a number of publications for the Virtual proteomics laboratory and is currently working at a Biotechnology company in Mumbai, India.

Mr. Panga Jaipal Reddy. Jaipal Reddy obtained his B.Sc. Degree from Osmania University and completed his Masters in Biochemistry from the University of Pune, India in 2008. Presently, he is working as a junior research fellow in Department of Biosciences and Bioengineering, IIT Bombay, India. He is the author of few scientific publications in reputed journals. He has participated in the development of Virtual Proteomics Lab and other related E-Learning resources at IIT Bombay. His current research interests include understanding the regulation of Z-ring assembly and identification of drug targets using proteomics.

Dr. Sanjeeva Srivastava. Dr. Srivastava completed his Ph.D from the University of Alberta, Canada in 2006 and postdoctoral research from Harvard Institute of Proteomics, Harvard Medical School, USA in 2009. He has taught few proteomics courses at the Cold Spring Harbor Laboratory, New York. Presently, he is an Assistant Professor of Department of Biosciences and Bioengineering, IIT Bombay, India. Current research in this group centers on using high-throughput proteomics for biomarker discovery in cancer and other diseases, to study protein–protein interactions and drug target discovery. Additionally, multi-dimensional Omics data are employed for in silico studies and models. The group has developed E-learning resources such as Virtual Laboratory as a community resource and is collaborating actively both across India and internationally to advance this knowledge frontier for the benefit of global health. He is recipient of several awards, including the National Young Scientist Award (Canada), Young Scientist Awards (India) and the Apple Research Technology Support Award (UK). He serves as Editor-in-Chief for the peer reviewed International Journal of Genomics and Proteomics, and Associate Editor for Current Pharmacogenomics and Personalized Medicine and several other international journals.


Abstract

Forensic DNA profiling currently allows the identification of persons already known to investigating authorities. Recent advances have produced new types of genetic markers with the potential to overcome some important limitations of current DNA profiling methods. Moreover, other developments are enabling completely new kinds of forensically relevant information to be extracted from biological samples. These include new molecular approaches for finding individuals previously unknown to investigators, and new molecular methods to support links between forensic sample donors and criminal acts. Such advances in genetics, genomics and molecular biology are likely to improve human forensic case work in the near future.


Body of the article

Since the 1950s and until recently, it was believed that mutations in a single gene confer vulnerability to multiple infectious diseases. Concomitantly, common infections have been presumed to be associated with the inheritance of mutations in multiple susceptibility genes. In recent work towards a unified genetic theory of disease [1-3], Prof. JL Casanova identified and characterized many new genetic defects that predispose otherwise healthy individuals to a single type of infection [4]. This novel causal relationship has modified the paradigm that dominated the field for several decades. Single-gene inborn errors of immunity in children may confer severe and selective vulnerability to specific infectious illnesses, whereas corresponding infections in adults usually involve more complex gene patterns. Several diseases have been studied including mycobacterial diseases, invasive pneumococcal disease, chronic mucocutaneous candidiasis, severe flu, Kaposi sarcoma and herpes simplex encephalitis (HSE).

Herpes simplex encephalitis

Herpes simplex virus (HSV-1) encephalitis (HSE) is a severe infection of the central nervous system (CNS)[5]. Although HSV-1 is widespread and typically innocuous in human populations, HSE is the most common form of sporadic viral encephalitis in Western countries, where it is estimated to occur in approximately two to four per million individuals per year. Peaks of HSE incidence occur between theages of 6 months to3 years during primary infection with HSV-1. The virus reaches the CNS via a neurotropic route involving the trigeminal and olfactory nerves [6,7]. The mortality rate, which used to be as high as 70%, has declined significantly thanks to treatment with the anti-viral acyclovir [8-10]. In spite of the treatment, up to 60% of patients suffer from long-term neurological sequelae of varying severity [7,11].

Genomic studies, exome sequencing

Exome sequencing, the targeted sequencing of the protein-coding portion of the human genome, has been shown to be a powerful and efficient method for detection of disease variants underlying Mendelian disorders. In the human genome, exons represent about 1% [12]. It is estimated that the protein coding regions of the human genome constitute about 85% of the disease-disposing mutations [13]. Robust sequencing of the complete coding region (exome) has the potential to be clinically relevant in genetic diagnosis as understanding of the functional consequences in sequence variation improves [13]. Currently exome sequencing is discovering inborn errors of immunity in children that confer severe and selective vulnerability to certain infectious diseases [14,15].

Childhood HSE has not been associable with known immunodeficiencies and its pathogenesis remained elusive until identified the first five genetic aetiologies of this condition were identified [16-21]. Autosomal recessive UNC-93B deficiency abolishes Toll-like receptor 3 (TLR3), TLR7, TLR8, and TLR9 signalling [16], whereas autosomal dominant TLR3 deficiency specifically affects TLR3 signalling[21]. Recently an autosomal recessive form of complete TLR3 deficiency has been described as a compound heterozygous for two loss-of-function TLR3 alleles [17]. Moreover an autosomal dominant deficiency in TNF receptor-associated factor 3 (TRAF3) [19], a Toll/IL1R (TIR) domain-containing adaptor inducing IFN-β (TRIF) deficiency [20] and TANK-binding kinase (TBK1)�iciency [18] have been described.

All of these genetic defects involve the Toll-like receptor 3 (TLR3) signalling pathway and these studies suggested that childhood HSE may result from impaired interferon (IFN)-α/β and IFN-λ production in response to the stimulation of TLR3 by dsRNA intermediates of HSV-1 in the CNS ( Fig. 1 ). However, the study of proteins implicated in the TLR3-IFN pathway for HSE patients revealed that only a small fraction of children with HSE carry mutations in UNC93B1, TLR3, TRAF3, TRIF or TBK1 [16-21]. A larger proportion of patients display an impaired production of IFN type I and III upon TLR3 stimulation of their fibroblasts. Conversely, the study of IFN type I and III production after TLR3 activation in SV40-fibroblasts of HSE patients has shown that 30%, of a total of 89 patients analysed, have IFN type I and III production which is normal. This suggested that in spite of the importance of the TLR3 pathway in HSE immunity, genetic defect(s) responsible for the susceptibility to HSV-1 in the CNS could be due to TLR3-independent pathways, or other TLR3, IFN-dependent pathways that are activated after the initial TLR3 activation.

A simplified diagram of the TLR-mediated and interferon (IFN)-mediated immunity in response to viruses. TLR3 is located in the endoplasmic reticulum (ER) and in endosomes, where it recognizes double-stranded RNA produced during the replication of most viruses. Activation of TLR3 induces activation of IRF-3 and NF- κ B via the TRIF adaptor, and the production of IFN-α/β and/or - λ. UNC-93B is required for the trafficking of TLR3, TLR7, TLR8 and TLR9 from the ER to the endosomal compartment. Proteins of the TLR3 pathway for which genetic mutation have been identified and associated with susceptibility to Herpes simplex virus-1 encephalitis (TLR3, TRIF, UNC-93B, TRAF3 and TBK1) are depicted in blue.

Formulating a Proteomics Approach

Conventional attempts to define disease-related genetic defects involving single proteins commonly try to screen large numbers of patient samples to validate single gene defects. This often has the major obstacle that sufficient numbers of patient samples are difficult to obtain. Since TLR3-dependent pathways are clearly involved in HSV-1 susceptibility, it was considered that it might be possible to use a combination of proteomics and systems biology methods to look for other networks and/or proteins[22]. The basic idea behind this strategy is that the increasing amount of available systems biology information has changed the situation. Using only small numbers of healthy controls and patient samples with appropriate functional stimulation, it might now be possible to detect disease-related functional networks by monitoring large numbers of proteins simultaneously, even if the underlying genetic defect in any individual patient cannot be statistically validated by such a study. If successful, definition of such functional networks could already have important diagnostic and therapeutic implications.

In the case of HSE, a large volume of previous biochemical experiments on several hundred patients and healthy individuals provided a strong base of knowledge, well established, highly reproducible sample preparation methods and related experimental tests of cellular response for the proteomics experiments. These previous studies also made it clear that there was a potential problem that is not often considered in present proteomics studies: what constitutes a normal, healthy response? Monitoring the abundance of key proteins (IFN-beta, IFN-lambda, IL6, NF㮫 and IRF3), cell survival after Vesicular stomatitis virus (VSV) infection, and viral replication [16-21], suggested that several different phenotypes could be distinguished and indicated that there were highly reproducible variations of up to 50 fold in the amounts of IFN type I and III produced by control cells from different healthy individuals in response to dsRNA ( Fig 2 ). In short, for these kinds of proteomics experiments highly reproducible sample preparation for “normal” cell samples that span the range of possible response over the human population seem to be required. Put differently, proteomics experiments should probably only be undertaken in the context of a large base of previous characterization of healthy population variation. With this background in mind, the goal was set out to obtain an initial test of four propositions. (1) Is the variation over population seen by biochemical tests for a few proteins expressed in larger numbers of proteins? (2) Are there proteomics signatures that are characteristic despite population variation? (3) Can significant differences between healthy and patient cells be detected despite population variation? (4) Are the differences of potential genetic, diagnostic or therapeutic interest? Positive results were obtained for all four propositions, but indicated some new challenges for proteomics and bioinformatics (see below).

Production of IFN-β by SV40-fibroblasts after poly(I:C) stimulation (25 μg/ml) for 24 hoursas assessed by ELISA. C1-C5 are the positive healthy controls and UNC93B1 -/- is the UNC-93B-deficientpatient. Mean values ± SD were calculated from three independent experiments.

A Short Summary of the Biological Findings

The SILAC measurements of differential protein abundance were conducted for six samples: three healthy controls from different individuals that showed weak (C3), medium (C1) and strong (C2) production of IFNs in response to dsRNA that was intended to sample population variation, a healthy control without dsRNA stimulation (C2NS), a patient with an UNC-93B-/- defect that abolishes TLR3 pathway response (UNC) and a patient with an unknown genetic defect (P). As described in the original publication [22], common functional pathways in healthy individuals implicated in transmigration of immune cells, apoptosis and oxidative stress that were abrogated in HSE patients were discovered. Furthermore, a set of new proteins for further investigation of possible disease aetiologies was identified ( Fig.3 ) and evidence was obtained that manipulation of one of these (SOD2) could have therapeutic benefits. The observation of changes in proteins involved in mitochondrial oxidative stress systems (SOD2, PPIF) opened a new, additional perspective apart from nuclear-directedTLR3 pathways that is consistent with other recent studies of response to viral infections [23-25]. For the patient with an unknown gene defect and without the fibroblastic phenotype, a lack of ICAM-1 upregulation, strong upregulation of SOD2, and upregulation of a variety of proteins previously associated with TLR3 pathways, delineated a new cellular phenotype which will help to dissect his genetic aetiology. The details of these results and of their context relative to the biological literature are contained in the original publication, to which the reader is referred – in the following we note some new features of the experimental results that have important consequences for future proteomics studies.

Illustration of the potential biological significance in immunity against HSE for proteins upregulated after TLR3 activation.

New Challenges for Proteomics and Bioinformatics

(1) Population variation is a crucial issue for proteomics

Comparison of the SILAC ratios between the different healthy samples indicated related response with correlation values in the ranges usually accepted as biologically relevant ( Fig. 4A ). Correlation with the unstimulated sample was in all cases very small. However, in agreement with the large differences in IFN production, the response to dsRNA showed large variation in the overall magnitude of SILAC ratios between different healthy individuals ( Fig. 4B ). The SILAC ratios revealed that the strong variation in response levels over the population previously detected with small numbers of proteins using western blotting is in fact reflected in the abundance changes for large numbers of proteins. Although the H/L distributions remained approximately Gaussian, for the healthy cells with the strongest response (C2), several hundred proteins showed 2-fold changes in abundance.

(A) Correlation of log2(H/L) between healthy samples (C1, C2, C3) and the healthy, non-stimulated sample (C2NS). (B) Cumulative proportion of proteins with the indicated H/L ratios for all six samples.

(2) Sets of “Most Significant” Proteins are Dependent on Population Variation

Even though similar functional networks were involved, the identity and rank order of proteins with the “most significant” abundance changes differed strongly between different individuals. For example, seven annexins were recorded for all samples. For the healthy samples the general trend for abundance changes was C2 > C1 > C3 in parallel to the changes in IFNs, but the rank order of individual annexins showed C1: ANXA5 > ANXA1 > ANXA7 C2: ANXA7 > ANXA6 > ANXA11 C3: ANXA11 > ANXA7 > ANXA4 ( Fig. 5A ). This feature complicates the choice of “most significant” proteins for subsequent analysis of functional networks using systems biology tools such as GeneGo. Because the H/L distributions remained approximately Gaussian ( Fig. 4B ), a Significance B formulation was used[26] to select the most significant changes for each sample type ( Fig. 5B ). However, for samples such as C1 and C2 the distribution of H/L is not dominated by experimental noise (C2NS), but rather by cellular response ( Fig. 4B ). Consequently proteins excluded from the C2 most significant set in fact showed substantially stronger abundance changes than proteins accepted for the C1 or C3 data sets ( Fig. 5B ). As an alternative that was the same for all data sets, a Significance B* factor was calculated relative to the signal intensity/scatter of the unstimulated C2NS data set ( Fig. 5C ), i.e. relative to real experimental noise. The disadvantage of this was that, because of the large differences in cellular response, the number of proteins accepted for network analysis was heavily dominated by C2, e.g. at Significance B* < 1e -5 , the C3/C1/C2 significant data sets included 15/351/842 proteins. Conversely, use of Significance B < 0.05 led to exclusion of large numbers of proteins from the C1 and C2 data sets that had large abundance changes with high reliability relative to real experimental noise. As a compromise, Significance B < 0.05 and H/L cut offs was used to select approximately equal numbers of “most significant” proteins from each sample type [22]. This 𠇎qual sampling” was successful in identifying relevant functional networks using GeneGo. However, only a minority of the “most significant” proteins were common to all three healthy individuals, even though other proteins in the union over the healthy samples satisfied the stringent cutoff Significance B* < 1e -5 . In the context of small numbers of samples it might be possible to use dosage of the stimulation (amount of dsRNA) to attain similar response levels for different cell samples, but for higher throughput analyses of larger numbers of samples, new computational approaches are needed.

Heat maps showing alternative strategies for selection of “most significant” protein sets for subsequent functional network searches using GeneGo. (A) SILAC ratios recorded for 7 different annexins over the six simple types. The number of ratio counts for individual proteins ranged from 6 to 243 per sample. (B) Proteins retained with a Significance B < 0.05 filter applied to each sample independently. (C) Proteins retained with a Significance B* < 0.001 filter applied across all samples. Boxed regions: proteins deleted that had |log2(S)| equal to or greater than “significant” proteins retained in other samples.

Such population variation has been seen in other recent proteomics studies. For example, measurements for 90 genetically different strains of yeast showed that most variation in protein abundance was due to variability in translation and/or protein stability rather than in transcript levels [27]. Similarly, a recent study of four patients with acute myeloid leukemia, five patients with acute lymphoid leukemia and 8 healthy controls compared the basal abundances of 639 different proteins using alignment-based quantitation of LC-MS/MS data sets [28] and found population variation similar to that shown in Fig. 5 .

(3) Current systems biology tools need adaptation to analysis of population variation

The ultimate goal of a population-wide, network-based analysis of function would be to identify common networks across the population and to specify for different individuals the extent to which a common stimulation engages the different networks. Such networks will not be easy to define since they are likely to be highly intertwined (buffered networks in the terminology of complex adaptive systems theory [29]) and the “output” of any sub-network may be diverse and may include: changes in protein abundance, post-translational state and subcellular spatial distribution [30,31] (from proteomics), changes in abundance of metabolites, co-factors, etc. (metabolomics) and genetic changes (epigenetics, micro-RNAs, etc.). The conceptual model of similar networks turned on to different degrees in different individuals that are reflected in protein abundance changes ( Fig. 6A ) is a testable model. Across the space of icell samples from different individuals, all proteins k that belong to network j have a vector of measured H/L ratios of the form: V → k = a jk ( n 1 j , n 2 j , … , n ij ) in which ajk represents an amplitude for “unit engagement” of the network jfor each protein k and nij represents the amplitude to which the network is engaged in each individual i. That is, in a multidimensional space with H/L ratios for different cell samples as the orthogonal axes, there is an axis described by the vector (n1j,n2j, …,nij) that is the same for all proteins k in network j ( Fig. 6B ).

(A) Model of abundance changes for four networks with intrinsic abundance changes ajk for different proteins for unit turn-on of the network. For three cell samples from healthy individuals each network is turned on to different degrees. These results in changes in the set of “most significant” proteins selected with Significance B filters (dashed lines) and their rank order for each cell sample. (B) Relationships in the 3D space of SILAC ratios [log2(S1), log2(S2), log2(S3)] for proteins from a single network. The red/blue spheres and axis indicate increased/decreased abundance. The relative amplitude to which the network is turned on in the different cell samples is given by the axis log2(S1):log2(S2):log2(S3) = 1:1:1 for equal activation in all cell samples. (C) Putative network for proteins involved in redox responses following stimulation of the healthy samples with dsRNA. log2(S1):log2(S2):log2(S3) = 0.42:0.91:0.13. (D) Putative network for proteins involved in nuclear processes following stimulation of the healthy samples with dsRNA. log2(S1):log2(S2):log2(S3) = 0.69:0.73:0.26.

The model implies the need to search for correlation amongst functionally related proteins (systems biology functional correlations) in functional data (H/L ratios) for high dimensional spaces (many individual samples) – a feature that seems not to be available in current publicly accessible systems biology tools. There are strong indications of such relations in the present data ( Fig. 6C, D ), but new, more sophisticated analysis and statistical validation is required.


Author information

Affiliations

Institute of Biological Sciences, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia

Saiful Anuar Karsani & Nor Afiza Saihen

Oral Cancer Research and Co-ordinating Centre & Faculty of Dentistry, University of Malaya, 50603, Kuala Lumpur, Malaysia

Oral Cancer Research Team, 2nd Floor Outpatient Centre, Sime Darby Medical Centre, Cancer Research Initiatives Foundation (CARIF), 47500 Subang Jaya, Selangor, Malaysia

Department of Clinical Oral Biology, Faculty of Dentistry, Universiti Kebangsaan Malaysia, 50300, Kuala Lumpur, Malaysia

Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Malaya, 50603, Kuala Lumpur, Malaysia

University of Malaya Centre for Proteomics Research (UMCPR), University of Malaya, 50603, Kuala Lumpur, Malaysia


The Clark Lab

We study the process of Adaptive Evolution, during which species adopt novel traits to overcome challenges. We retrace the evolutionary histories of genomic elements to determine the changes underlying adaptation and to discover previously unknown genetic networks. These discoveries have already led to advances in human health, species conservation, and molecular biology. To meet these goals we have developed a suite of computational and experimental approaches employing comparative genomics and proteomics. Ultimately, our research program develops an evolutionary model in which genomic elements are shaped by their co-evolution with other elements and their environment.

We are a combined computational and experimental lab in the Department of Human Genetics at the University of Utah. We are a member of the Cluster in Evolutionary Genetics and Genomics (CEGG), whose member labs span Human Genetics and Biology.

Eccles Institute for Human Genetics
Department of Human Genetics
University of Utah
Lab Room 6460
15 S 2030 E
Salt Lake City, Utah 84112-5330


Watch the video: Genomics and Proteomics (July 2022).


Comments:

  1. Erik

    It's just incomparably topic

  2. Octha

    There is something in this. Now everything became clear to me, thank you for the information.

  3. Langleah

    Can I ask at your place?

  4. Akinogul

    I apologize, but in my opinion you admit the mistake. Write to me in PM, we will discuss.

  5. Faelabar

    I apologise, but, in my opinion, you commit an error. I can defend the position. Write to me in PM, we will talk.

  6. Baird

    The idea honored



Write a message