Are drug-gene interactions predictable?

Are drug-gene interactions predictable?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have been reading about drug-gene interactions and pharmacogenetic tests. I haven't been able to find many resources that define the process of testing for possible drug-gene interactions without performing trial and error method. For example, drug-drug interactions can be predicted by using several parameters as noted here. Are there any parameters similar to the ones described in given paper for predicting drug-gene interactions if drug characteristics and genetic data is available? Please describe process or provide source.

One can list candidate genes based on their functions.

If a drug acts by inhibiting the expression of geneGin order to reduce processP, then, geneG, all its regulatory elements and all sequences involve in processPare good candidates for such interaction and should be investigated. If there is known genetic variation in the population at these loci and if we know this genetic variation affect phenotypic variance, then definitely, those are good candidates.

However, to my knowledge, this is the extend at which we are able to go. We can list candidates but it does not mean in the absolute that a candidate must present an interaction with the drug of interest and it does not mean that a non candidate gene would necessarily not have an interaction. Only a statistical testing for such interaction would allow one to solve the mystery.

Genetics and Epigenetics of Addiction DrugFacts

Why do some people become addicted while others don't? Family studies that include identical twins, fraternal twins, adoptees, and siblings suggest that as much as half of a person's risk of becoming addicted to nicotine, alcohol, or other drugs depends on his or her genetic makeup. Finding the biological basis for this risk is an important avenue of research for scientists trying to solve the problem of drug addiction.

Genetics is the study of genes. Genes are functional units of DNA that make up the human genome. They provide the information that directs a body's basic cellular activities. Research on the human genome has shown that, on average, the DNA sequences of any two people are 99.9 percent the same. However, that 0.1 percent variation is profoundly important—it accounts for three million differences in the nearly three billion base pairs of DNA sequence! These differences contribute to visible variations, like height and hair color, and invisible traits, such as increased risk for or protection from certain diseases such as heart attack, stroke, diabetes, and addiction.

Some diseases, such as sickle cell anemia or cystic fibrosis, are caused by a change, known as a mutation, in a single gene. Some mutations, like the BRCA 1 and 2 mutations that are linked to a much higher risk of breast and ovarian cancer, have become critical medical tools in evaluating a patient's risk for serious diseases. Medical researchers have had striking success at unraveling the genetics of these single-gene disorders, though finding treatments or cures has not been as simple. Most diseases, including addiction, are complex, and variations in many different genes contribute to a person's overall level of risk or protection. The good news is that scientists are actively pursuing many more paths to treatment and prevention of these complex illnesses.


The ability to systematically interrogate multiple genetic backgrounds with chemical perturbagens is known as chemogenetic profiling. While this approach has many applications in chemical biology, it is particularly relevant to cancer therapy, where clinical compounds or chemical probes are profiled to identify mutations that inform on genetic vulnerabilities, resistance mechanisms, or targets [1]. Systematic surveys of the fitness effects of environmental perturbagens across the yeast deletion collection [2] offered insight into gene function at a large scale, while profiling of drug sensitivity in heterozygous deletion strains identified genetic backgrounds that give rise to increased drug sensitivity [3]. Now, with the advent of CRISPR technology and its adaptation to pooled library screens in mammalian cells, high-resolution chemogenetic screens can be carried out directly in human cells [4,5,6,7]. Major advantages to this approach include the ability to probe all human genes, not just orthologs of model organisms the analysis of how drug-gene interactions vary across different tissue types, genetic backgrounds, and epigenetic states and the identification of suppressor as well as synergistic interactions, which may preemptively indicate mechanisms of acquired resistance or pre-existing sources of resistant cells in heterogeneous tumor populations.

Design and analysis of CRISPR-mediated chemogenetic interaction screens in human cells can be problematic. Positive selection screens identifying genes conferring resistance to cellular perturbations typically have a high signal-to-noise ratio, as only mutants in resistance genes survive. This approach has been used to identify genes conferring resistance to targeted therapeutics, including BRAF and MEK inhibitors, as well as other drugs [5,6,7,8,9,10,11,12,13]. Conversely, negative selection CRISPR screens require growing perturbed cells over 10 or more doublings to allow sensitive detection of genes whose knockout leads to moderate fitness defects. Adding the detection of drug interactions to these experiments necessitates dosing at sub-lethal levels to balance between maintaining cell viability over a long time course and inducing drug-gene interactions beyond native drug effects [14,15,16,17].

In this study, we describe drugZ, an algorithm for the analysis of CRISPR-mediated chemogenetic interaction screens. We apply the algorithm to identify genes that drive normal cellular resistance to the PARP inhibitor olaparib in three cell lines. We demonstrate the greatly enhanced sensitivity of drugZ over contemporary algorithms [7, 18,19,20] by showing how it identifies more hits with higher enrichment for the expected DNA damage response pathway, and further how it identifies both synergistic and suppressor interactions. We further demonstrate the discovery of both synergistic and suppressor interactions in a single experiment with KRAS-mutant pancreatic cancer cell lines treated with an ERK inhibitor, and through reanalysis of published data. Interestingly, we observe a trend across several datasets where tumor suppressor genes score as drug suppressors, indicating a possible systematic source of false positives. We provide all software and data [21] necessary to replicate the analyses presented here see “Availability of data and materials” below for links.


DrugZ algorithm

We calculate the log2 fold change of each gRNA in the pool by normalizing the total read count of each sample (to n = 10 million reads) at the same time point and taking the log ratio, for each replicate, of treated to control reads.

pseudocount = default value is 5

We estimate the variance of each fold change by calculating the standard deviation of fold changes with similar abundance in the control sample:

N = number of fold changes with similar abundance (default = 1000)

fcr, i = fold change for each guide in a replicate

and then calculate a Z-score for each fold change using this estimate:

The guide Z-score of all gRNA across all replicates is summed to get a gene-level sumZ score, which is then normalized (by dividing by the square root of the number of summed terms) to the final normZ (Fig. 1b):

Workflow. a Experimental design. In a drug-gene interaction screen, cells are transduced with a pooled CRISPR library. Cells are split into drug-treated and untreated control samples, grown for several doublings genomic DNA is collected and the relative abundance of CRISPR gRNA sequences in the treated and control population is compared. b DrugZ processing steps include normalizing read counts, calculating fold change, estimating the standard deviation for each fold change, Z-score transformation, and combining guide scores into a gene score. c–e Comparing existing methods vs. drugZ for SUM149PT olaparib screen. DrugZ hits show strongest enrichments for DDR genes across a range of FDR thresholds. c Number of raw hits. d Number of annotated DNA damage response (DDR) genes in hits. e −log P values for DDR gene enrichment by hypergeometric test

A P value is calculated from the normZ and corrected for multiple hypothesis testing using the method of Benjamini and Hochberg [22]. The open-source Python software can be downloaded from

DrugGS algorithm

After empirical Bayes variance estimation approach is applied on normalized log-fold changes to calculate a Z-score for each guide, we applied Gibbs sampling to generate posterior distribution of fold changes for each gene.

Each gene has a distribution composed of Z-scores for guides targeting that specific gene across replicates. Distribution is characterized as (μ, τ), where τ is ( frac<1> ) .

Both μ and τ have hyperparameters (μ : μ, σ 2 , τ : a, b) that we initialize at the very start of sampling.

Γ(a, b) = Gamma prior with a (shape) and b (rate) hyperparameters

P(μ| τ, data)

(μ, σ 2 ) = Normal prior with μ (mean) and σ 2 (variance) hyperparameters

We then update μ and τ with respect to their priors in every 1000 samples that we generate for each gene.

Equations to update μ:

Equations to update τ:

n = number of data points (guide Z-scores) for each gene

( overline ) = actual mean of data points

From those 1000 newly sampled μ and τ, we then calculate the mean and standard deviation. Each gene’s μ posterior distribution’s mean is what was converted into Z-score and used to compare with the drugZ normZ values.

S = number of samples (in our case 1000)

Drug-gene interaction screens

Olaparib screens were described in [14]. Temozolomide screens were described in [23].

Cell culture

hTERT RPE-1 (CRL-4000) and 293T (CRL-3216) cells were purchased from the ATCC and grown in Dulbecco’s high glucose modified Eagle’s medium (DMEM HyClone) with 10% fetal bovine serum (FBS), 1× GlutaMAX (Gibco), 100 mM sodium pyruvate (Gibco), 1× non-essential amino acids (NEAA), 1X penicillin-streptomycin (Pen/Strep), and 5μg ml −1 Plasmocure. Incubator conditions were kept at 37 °C with 5% CO2.

Lentivirus production

For production of the TKOV3 lentivirus, 9.0 × 10 6 293T cells were transfected with psPAX2 (lentiviral packaging Addgene #12260), pMD2.G (VSV-G envelope Addgene #12259), and TKOV3 (Toronto KnockOut CRISPR Library Addgene #90294) using X-tremeGENE 9 DNA transfection reagent (Sigma-Aldrich) in medium with lowered antibiotic concentration (0.1× Pen/Strep). The medium was replaced with viral harvest medium (DMEM + 1.1% BSA + 1× Pen/Strep) 18 h post-transfection. Virus-containing supernatant was collected

24–48 h post-transfection, and fresh viral harvest medium was added to transfected plates. Virus-containing supernatant was collected again

24 later. The virus-containing supernatant was centrifuged to remove cell debris and stored at -80 °C.

CRISPR screening

For transduction of the hTERT RPE-1 cells, the TKOv3 virus was added with 8μg/ml Polybrene. For selection of the transduced cells, puromycin was introduced at a concentration of 20 μg/ml at 24 h post-infection (the hTERT cassette used to immortalize RPE1 cells contains a puromycin resistance marker, necessitating extreme puromycin concentrations for selection). Puromycin selection continued for 72 h post-transduction and completed upon the selection against the hTERT RPE-1 parental line as a control. Completion of selection was considered the initial time point (T0). The TKOv3-transduced cells were split into technical replicates. To ensure proper coverage, 15 × 10 6 cells across 11 × 15 cm dishes were used for infection with the TKOv3 virus per replicate. The chemotherapeutic drugs gemcitabine (2 nM) and vincristine (0.4 nM) were added to separate replicates, with one set of replicates receiving no drug treatment. Both drug-treated and untreated replicates were not allowed to reach confluence in the 15 cm dishes. Cells were lifted, counted, and re-plated at the coverage stated above, and the excess cell pellets were frozen at − 20 °C as a time point. Once 8 doublings were reached from T0, the screens were terminated and pellets frozen at − 20 °C. Coverage of screens was kept at 200 cells per gRNA.

The QIAamp Blood Maxi Kit (Qiagen) was used to isolate the genomic DNA (gDNA) from the frozen cell pellets. Guide sequences were enriched using PCR with HiFi HotStart ReadyMix (Kapa Biosystems) and primers targeting the guide region in the genomic DNA. A second round of PCR was performed with i5 and i7 primers to give each condition and replicate a unique multiplexing barcode. The final PCR products were purified using the E-Gel System (Invitrogen), normalized, and sequenced on the NextSeq500 system to determine the representation of guides under each treated and non-treated condition.

Drug–drug–gene interactions

DDGIs can be divided into three main categories: inhibitory interactions, induction interactions, and phenoconversion interactions. Inhibitory and induction interactions can be defined as any interactions that affect the victim drug’s pharmacokinetics (PK) to increase or reduce concentrations of the drug, respectively. Induction or inhibition can occur either with the administration of a perpetrator drug that alters the victim drug metabolism or transport, or with the presence of loss- or gain-of-function (LOF or GOF) genetic variants that alter function of enzymes that alter metabolism or transport of the victim drug, or the combination of both. A DDGI can be thought of as a double hit—whereby the genetic variant and the perpetrator drug combine to act on transporter or metabolism pathways to greatly alter drug concentrations. It is also possible to see phenoconversion—where the interacting drug effect and the genotype have opposing effects, resulting in a temporary phenotype shift e.g. neutralizing/reversing the effect of a GOF genotype when an inhibitory drug is prescribed. In this review we describe, with examples, different cases of interactions under each of the above three categories, focusing initially on metabolizing enzymes, before considering drug transporters.

Drug–drug-metabolizing enzyme gene interactions (DDMEGIs)

Inhibitory interactions

Inhibitory effects of drugs and genotype can alter substrate metabolism by both drug and genotype impacting on the same metabolizing enzyme, or on two distinct routes of metabolism.

In general, poor metabolizers are expected to experience the highest substrate drug plasma concentration, compared with other genotypes, when co-treated with inhibitors. For example, co-administration of simvastatin (a CYP2C9 inhibitor) with warfarin (CYP2C9 substrate) has been shown to reduce warfarin dosage requirements in CYP2C9*3 carriers with a greater percentage as compared with noncarriers (29% vs 5% respectively) [9]. A similar conclusion has been reported with celecoxib (Supplementary Table 1) [10]. The inhibitory effect of drug and genotype is not always additive—genetically poor metabolizers may have only limited further enzyme inhibition by administration of an inhibitory drug. For instance, a statistically significant elevation in rabeprazole (a CYP2C19 substrate) plasma levels was observed in both normal metabolizers and heterozygous genotype carriers after treatment with fluvoxamine (a CYP2C19 inhibitor) while no additional clinically significant elevation was detected with poor metabolizers who have already experienced the highest rabeprazole plasma levels [11]. A similar scenario is seen with other examples (Supplementary Table 1) [12,13,14,15].

Where a drug is metabolized by two or more CYP enzymes, then inhibition of one of these enzymes alone (by drug or genotype) may have minimal effect, due to redundancy of the pathways. However, if a genotype and interacting drug affect these different routes of metabolism, then the interaction may be very large. For example, it has been observed that for voriconazole (a CYP2C19 and CYP3A4 substrate) bioavailability is increased markedly (

5.6-fold) in patients who have reduced CYP2C19 activity and are administered with atazanavir or ritonavir (potent CYP3A4 inhibitors) [16]. A similar scenario can be noted with other examples (Supplementary Table 1) [17,18,19].

Prodrugs, on the other hand, require the function of certain CYPs to be therapeutically active, and in these cases the effect is the opposite to that described above. Clopidogrel, for example, is activated by CYP1A2, CYP2B6, CYP3A4, CYP2C9, and CYP2C19 [20]. Carriers of LOF variants in one or more of these genes and co-administered with their inhibitors are at increased risk for treatment resistance. For instance, carriers of CYP2C19*2 and/or *3 alleles who are treated with clopidogrel and proton pump inhibitors (CYP2C19 inhibitors) were observed to be more likely to have reduced clopidogrel efficacy the addition of a third risk factor (e.g., calcium channel blockers (CYP3A4 inhibitors)) was also correlated with a greater reduction in efficacy of clopidogrel [21, 22].

Figure 1 shows the predicted changes of plasma levels of active drugs and active metabolites of prodrugs with and without the presence of inhibitors and/or LOF variants.

The predicted active drug/active metabolites of prodrugs plasma levels and biliary excretion changes without or with the presence of inhibitors or LOF variants or both on metabolizing enzymes. The predicted active drug/active metabolites of prodrugs plasma levels and biliary excretion changes without (a-1/a-2) or with the presence of inhibitors or LOF variants (b-1/b-2) or both (c-1/c-2) on metabolizing enzymes. a-1/a-2 represent the normal scenario with no interacting drug or genetic variant. In b-1/b-2 either an inhibitory drug or loss-of-function variant (LOF) in the metabolizing enzyme, results in reduced metabolism to inactive metabolites, and increased (b-1)/decreased (b-2) active drug in the systemic circulation. In c-1/c-2 the presence of inhibitory drug and the LOF genetic variant combine to produce greater increase (c-1)/decrease (c-2) in the systemic concentration of active drug

Induction interactions

Increased metabolism of active drugs by an enzyme inducer or GOF variant will result in reduced efficacy of the victim drug. For example, when voriconazole (a CYP2C19 substrate) is co-prescribed with carbamazepine (CYP2C19 inducer) the voriconazole dose is usually increased to overcome this increased metabolism. In a case report, therapeutic concentrations of voriconazole were not achieved, as the patient carried two GOF CYP2C19 *17 variants [23].

The opposite effect is seen with prodrugs. Increased metabolism by an enzyme inducing drug or GOF variant, will result in high plasma levels of active metabolites leading to increased side effects and/or efficacy. Thus, patients carrying CYP2C19*17 GOF variants have increased conversion of clopidogrel to active metabolites resulting in reduced cardiovascular events and/or increased bleeding episodes [24,25,26,27,28,29,30,31,32,33]. Co-administration of an inducer of CYP1A2, CYP2C9, and/or CYP3A4 would be expected to result in greater efficacy of clopidogrel, with increased risk of bleeding, however no studies have been published to establish this.

Figure 2 shows the predicted changes of plasma levels of active drugs and active metabolites of prodrugs with and without the presence of inducers and/or GOF variants.

The predicted active drug/active metabolites of prodrugs plasma levels and biliary excretion changes with out or with the presence of inducers or GOF variants or both on metabolizing enzymes. The predicted active drug/active metabolites of prodrugs plasma levels and biliary excretion changes without (a-1/a-2) or with the presence of inducers or GOF variants (b-1/b-2) or both (c-1/c-2) on metabolizing enzymes. a-1/a-2 represent the normal scenario with no interacting drug or genetic variant. In b-1/b-2 either an inducer drug or gain-of-function variant (GOF) in the metabolizing enzyme, results in increased metabolism to inactive metabolites, and decreased (b-1)/increased (b-2) active drug in the systemic circulation. In c-1/c-2 the presence of inducer drug and the GOF genetic variant combine to produce greater decrease(c-1)/increase(c-2) in the systemic concentration of active drug

Phenoconversion interactions

As described above, a temporary phenotype shift can be seen when the perpetrator drug and genetic effect are opposed. For example, the presence of reduced function CYP2C9 variants results in reduced tolbutamide (a CYP2C9 substrate) metabolism, yet co-treatment with rifampicin (a CYP2C9 inducer) in these patients reverses this genetic effect resulting in a twofold increase in tolbutamide clearance [34]. Conversely, proton pump inhibitors (CYP2C19 inhibitors) treatment with clopidogrel results in phenoconversion in genetically determined ultra-rapid phenotype to a poor metabolizer status indicated by loss of clopidogrel efficacy [35].

The beneficial side of phenoconversion interactions is that genetically determined phenotypes can be normalized by the addition of medications of opposite effects on metabolism. For example, resistance to nortriptyline (CYP2D6 substrate) due to abnormally rapid metabolism has been successfully reversed and normalized with the addition of paroxetine a (CYP2D6 inhibitor), which produces a recovery of nortriptyline therapeutic plasma levels [36].

Figure 3 presents different scenarios of phenoconversion interactions.

Different scenarios of phenoconversion interactions where genetic effects may be reversed or shifted in the opposite direction. a Represents the normal scenario with no interacting drug or genetic variant. In b the effect of loss-of-function variant (LOF) or gain-of-function variant (GOF) is reversed with the presence of a moderate inducer drug or a moderate inhibitor drug respectively and results in a clinical outcome similar to the normal situation (a). In c the presence of a strong inducer drug has temporarily shifted a poor metabolism status into a rapid metabolism status and results in decreased active drug in the systemic circulation. In d the presence of a strong inhibitor drug has temporarily shifted a rapid metabolism status into a poor metabolism status and results in increased active drug in the systemic circulation

Drug–drug-transporters genes interactions (DDTGIs)

Drug transporters govern the movement of pharmaceutical compounds from and into different body tissues. The liver, kidney, blood–brain barrier (BBB), and intestine are the key sites of transporters that influence drug PK. In addition to summarizing the distribution and localization of transporters, Fig. 4 also classifies transporters into three categories according to the similarity of transport directions in different tissue types (the figure has been formulated with the aid of reference [37]). Drug–drug–gene interactions for transporters are less well studied than for metabolizing enzymes. For each subgroup, Drug Transporter-gene interaction (DTGI) studies will be utilized (if no direct DDTGI studies are available) to illustrate each mechanism for potential interaction. Similar to the drug metabolizing enzyme scenarios outlined above, we predict that these interactions may be intensified or reversed, via inhibitory/induction or phenoconversion pathways, with the co-administration of inhibitors or inducers.

Drug transporters as classified into three categories according to the similarity of the transport directions in different tissue types. Numbers from to = order of oral drug movement through different tissue types. Nonoral drug formulations bypass the effect of intestinal transporters. / = increased/decreased substrate drug plasma level is predicted as a result of impairment of this transporter due to LOF variants or inhibitors. The reverse is predicted with GOF variants or inducers. The presence of the two factors (i.e. LOF variant + an inhibitor or GOF variant + an inducer) is predicted to double the clinical impact with neutralizing or shifting the clinical effect when the preparator drug and genetic effect are opposed (phenoconversion interactions). = Apical membrane. = Basolateral membrane

Efflux transporters

Efflux transporters have been classified into two groups (group I and group II) according to the similarity in the transport directions.

Group I

P-glycoprotein 1 (P-gp, ABCB1), multidrug resistance-associated protein 2 (MRP2, ABCC2), and breast cancer resistant protein (BCRP, ABCG2) transporters are expressed in the intestine, liver, kidney, and BBB, sharing similar transport pathways. They efflux substrates back to intestinal lumen, facilitate hepatic and renal excretion (excluding BCRP), and work inversely in the BBB where they protect the brain from the entry of xenobiotics and return them back to systemic circulation. Blocking their function in the intestine, liver, or kidney is expected to elevate a substrate’s systemic exposure (although opposite effects would be predicted if inhibiting transport across the BBB).

In this group, the most evidence for DDTGI comes from drugs altering ABCB1 (P-gp) transport and genetic variants in the gene encoding this transporter. For example, cyclosporine is an ABCB1 substrate. Diltiazem (a moderate ABCB1 inhibitor [38]) has been shown to increase cyclosporin trough concentrations in Chinese patients who carry the TT genotype (low P-gp activity) at rs1045642(C>T) in ABCB1 yet no effect was seen in other ABCB1 genotypes (e.g., CC at rs1045642) [39]. Methadone is also a P-gp substrate, acting in the brain and effluxed across the BBB via P-gp. Patients with the TT genotype at rs1045642 and treated with quetiapine (ABCB1 inhibitor) experienced the lowest increase in methadone plasma levels compared with those with CT or CC genotypes (3% vs 23% vs 33% respectively) [40]. Low methadone plasma levels in this study would be explained by loss of the ABCB1 protective function in the BBB which results in increased intracerebral concentration of this central nervous system (CNS) drug. As a result of a similar DDTGI mechanism, the CNS drug granisetron was associated with increased efficacy in Japanese subjects (Supplementary Table 1) [41].

In some cases, it seems that adding strong inhibitors abolishes the effect of genotype. For example, no additional inhibitory effects were detected in carriers of different genotypes of the rs1045642 (C>T) ABCB1 variant who were either on dabigatran/rivaroxaban-clarithromycin combination or tacrolimus-itraconazole combination (ABCB1 substrates-ABCB1 strong inhibitors [38]) [42, 43].

ABCC2 and ABCG2 would be predicted to follow similar interaction scenarios as ABCB1, yet we were unable to find any studies that report DDTGIs for these transporters.

Group II

Unlike group I transporters, there are no published studies describing DDTGIs for group II transporters. So here we report DGTIs to highlight the potential mechanisms whereby genes and drugs that alter these transporters may influence drug outcomes. MRP1(ABCC1), MRP3 (ABCC3), and MRP4(ABCC4) share the similar transport direction in the kidney and BBB as the Group I transporters. However, in the liver, they are expressed in the basolateral membrane working to pump drugs back into systemic circulation. MRP1, for example, transports the active metabolite of irinotecan (SN-38) out of hepatocytes into the blood contributing to the well-known side effect of irinotecan induced neutropenia [44]. The reduced function variant, rs17501331, in the ABCC1 gene is associated with low incidence of neutropenia the reverse effect was detected with the GOF variant rs6498588 in the same gene [45]. In some cases, increased activity of the MRP1 transporter can be advantageous, as seen with methotrexate hepatotoxicity where carriers of wild-type genotype of ABCC1 rs246240 (A>G) variant are at higher risk for developing methotrexate toxicity compared with carriers of reduced function alleles [46]. Of note, MRP1 is also expressed in the myocardium protecting the heart from the entry of xenobiotics [47]. For example, the reduced transport associated with the rs45511401 (G>T) in ABCC1 increases the chance of developing cardiotoxicity due to intracellular accumulation of doxorubicin [48]. MRP1 and MRP3, in contrast to P-gp, MRP2 and BCRP, are expressed in the basolateral membrane of the intestine effluxing substrates into the portal circulation. As orally administered drugs are first exposed to intestinal transporters, any modification of their role might affect drug concentration in the other tissues (liver, kidney, or BBB). C.1037 C>T and c.1820G>A ABCC3 variants, for example, have low transport activity [49] suggesting their potential to diminish the bioavailability of oral MRP3 substrates irrespective of subsequent alteration in transport into other tissues, or subsequent metabolism.

Uptake Transporters (Group III)

In the liver, kidney, and BBB, all important uptake transporters (organic cation transporters (OCTs)1/2/3, organic anion-transporting polypeptide (OATP) 1B1/1B3/2B1, and multidrug and toxic compound extrusion proteins (MATE) 1/2), follow an identical main route for transporting their substrates: from systemic circulation into different tissues or urine/bile in case of MATEs. Consequently, reducing or increasing these transport capacities would result in increased or reduced systemic drug concentrations respectively. The reverse effects are seen with the uptake transporters expressed in the intestinal apical membrane such as OATPs and OCT1 since the transportation pathway is in the opposite direction.

In some circumstances, altering uptake transporter function can increase ADRs. For example, it has been observed that carriers of two OCT1 (SLC22A1) reduced function alleles who were treated with OCT1 inhibitors were over four times more likely to develop gastrointestinal side effects with metformin (an OCT1 substrate) treatment, which would be attributable to metformin accumulation in the intestinal lumen (assuming apical OCT1 localization) [50]. This finding was supported by a previous study [51]. At the level of renal uptake transporters, other DDTGIs have been reported in which carrying the mutant alleles and the co-administration of inhibitors was linked to increased metformin plasma levels/toxicity or reduced clearance (see Supplementary Table 1) [52, 53]. By contrast, reducing transport in some cases may reduce certain side effects. For instance, cisplatin (a OCT2 (SLC22A2) substrate) is both a nephrotoxic and an ototoxic agent. People carrying the rs316019 (C>A) OCT2 mutation were protected from these adverse reactions as the variant resulted in reduced transport of cisplatin into the kidney and the inner ear (cochlea) (where OCT2 is expressed as well) [54,55,56].

In many situations, the efficacy of a drug relies upon the ability of that drug to access certain tissues. Statins are taken up into the liver by OATP1B1(SLCO1B1) and this is crucial for their lipid lowering effect. Reducing this uptake pathway reduces statin efficacy and raises plasma concentrations, resulting in myopathy and, rarely, rhabdomyolysis. The rs4149056 (T>C) (SLCO1B1*15) variant has been widely studied, and in 23 studies [57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79], this variant has been persistently connected to increased statin plasma exposure, muscle aches, dose reduction, and/or treatment-resistant phenotypes. A number of other DDGIs have been described for the SLCO1B1 transporter. For example, although the increase in pravastatin (SLCO1B1 substrate) AUC after treatment with ritonavir (SLCO1B1 inhibitor) was not statistically significant (21% increase vs pravastatin alone) a large interaction was seen in those carrying the SLCO1B1*15 or *17 haplotypes, with a resulting 113% elevation in pravastatin AUC [80]. Other DDTGIs with the similar mechanism have also been published (see Supplementary Table 1) [81,82,83]. Interestingly, unlike the ritonavir example just outlined, in some situations reduced function variants do not show any significant PK change until after the addition of inhibitors. For example, patients with AG or AA genotypes at rs2289669(G>A) of the MATE1 transporter only had significantly lower metformin (MATE1 substrate) clearance compared with carriers of GG genotype after treatment with ranitidine (a MATE1 inhibitor) [84].

The genetic landscape of a physical interaction

A key question in human genetics and evolutionary biology is how mutations in different genes combine to alter phenotypes. Efforts to systematically map genetic interactions have mostly made use of gene deletions. However, most genetic variation consists of point mutations of diverse and difficult to predict effects. Here, by developing a new sequencing-based protein interaction assay - deepPCA - we quantified the effects of >120,000 pairs of point mutations on the formation of the AP-1 transcription factor complex between the products of the FOS and JUN proto-oncogenes. Genetic interactions are abundant both in cis (within one protein) and trans (between the two molecules) and consist of two classes - interactions driven by thermodynamics that can be predicted using a three-parameter global model, and structural interactions between proximally located residues. These results reveal how physical interactions generate quantitatively predictable genetic interactions.

Keywords: S. cerevisiae computational biology deep mutagenesis epistasis genetic interaction human protein interactions systems biology transcription factors.

Conflict of interest statement

GD, BL No competing interests declared


( A ) Leucine zipper domains (colored) and heptad positions of human…

Figure 1—figure supplement 1.. Quality control of…

Figure 1—figure supplement 1.. Quality control of the PPI scores measured by deepPCA .

Figure 2.. Effects of single mutants.

Figure 2.. Effects of single mutants.

( A ) Heatmap of single mutant PPI scores…

Figure 2—figure supplement 1.. Single mutant effects.

Figure 2—figure supplement 1.. Single mutant effects.

( A ) Distribution of single mutants effects.…

Figure 3.. A thermodynamic model predicts double…

Figure 3.. A thermodynamic model predicts double mutant outcomes.

( A ) Distribution of double…

Figure 3—figure supplement 1.. Correlations between genetic…

Figure 3—figure supplement 1.. Correlations between genetic interaction scores from the three biological replicates of…

Figure 3—figure supplement 2.. Fitting the thermodynamic…

Figure 3—figure supplement 2.. Fitting the thermodynamic model on the trans library data.

Figure 4.. Structural genetic interactions.

Figure 4.. Structural genetic interactions.

( A ) Heatmap ( top ) and distribution (…

Figure 4—figure supplement 1.. Distribution of number…

Figure 4—figure supplement 1.. Distribution of number of significant genetic interactions for Fos or Jun…

Figure 4—figure supplement 2.. Robustness of enrichments…

Figure 4—figure supplement 2.. Robustness of enrichments in position pairs for significant (FDR

Figure 4—figure supplement 3.. Robustness of enrichments…

Figure 4—figure supplement 3.. Robustness of enrichments in structural features for significant (FDR

Figure 5.. Comparison of double mutant mutation…

Figure 5.. Comparison of double mutant mutation outcome and genetic interactions in cis and trans…

Figure 5—figure supplement 1.. PPI scores for…

Figure 5—figure supplement 1.. PPI scores for the cis library.

( A ) Correlation between…

Figure 5—figure supplement 2.. Proportion of cis…

Figure 5—figure supplement 2.. Proportion of cis and trans double mutants classified as strengthening, intermediate…

Figure 5—figure supplement 3.. Fitting the thermodynamic…

Figure 5—figure supplement 3.. Fitting the thermodynamic model on both the trans and cis data.

Figure 5—figure supplement 4.. Significant structural genetic…

Figure 5—figure supplement 4.. Significant structural genetic interactions in cis and trans .

Figure 5—figure supplement 5.. Comparisons of the…

Figure 5—figure supplement 5.. Comparisons of the patterns of structural genetic interactions in cis and…

Figure 5—figure supplement 6.. Comparisons of the…

Figure 5—figure supplement 6.. Comparisons of the pairs of positions enriched in cis and trans…

Figure 5—figure supplement 7.. Robustness of the…

Figure 5—figure supplement 7.. Robustness of the enrichments in structural features when sub-sampling to match…

Figure 5—figure supplement 8.. Robustness of enrichments…

Figure 5—figure supplement 8.. Robustness of enrichments in significant (FDR cis and trans libraries.

Figure 5—figure supplement 9.. Comparisons of the…

Figure 5—figure supplement 9.. Comparisons of the extent of structural genetic interactions in cis and…


In this study, we test the idea that species interactions with positive effects at the individual level increase diversification rates in lineages in which they are present, whereas those with negative impacts decrease diversification rates. We take advantage of the dozens of phylogenetic studies that have examined the relationship between species interactions and diversification rates. Our results suggest that there are general patterns across plants and animals, despite considerable uncertainty in the literature about how species interactions might impact diversification (e.g. Jablonski, 2008 Ricklefs, 2010 Weber et al., 2017 Chomicki et al., 2019 Hembry and Weber, 2020 ). Below we discuss the potential causes of these patterns, and their broader implications.

How do species interactions affect diversification rates?

We find that species interactions with positive individual-level effects generally increase diversification rates and that negative-effect interactions decrease diversification. At some level, these results make intuitive sense. After all, an interaction that has negative fitness effects on individuals might be expected to increase species-level extinctions over longer timescales. Similarly, traits that increase an individual’s fitness might buffer a clade from extinction and thereby promote its long-term persistence and proliferation. Yet, it is less obvious how increases in individual fitness will necessarily increase speciation rates. One potential mechanism is that larger population sizes might lead to species-level range expansion (e.g. Ricklefs, 2010 ), which can then lead to larger clade-level ranges (e.g. spreading to new regions), which can increase diversification rates (e.g. Gómez and Verdú, 2012 Hernández-Hernández and Wiens, 2020 ).

Chomicki et al. ( 2019 ) discussed other mechanisms by which positive interactions (specifically mutualisms) might increase diversification rates, which seem generally relevant here. In addition to increased range size, they also discussed divergent selection, ecological opportunity and reduced extinction. We discuss these in turn below. One example of how divergent selection might increase diversification involves animal pollination in land plants (especially angiosperms), a trait that is significantly related to increased diversification rates across plants (Hernández-Hernández and Wiens, 2020 ). This increase may have occurred through increased reproductive isolation and speciation among plant species, mediated by divergent selection associated with shifts in pollinator species and flower morphology (e.g. Whitehead and Peakall, 2014 ). The effects of different pollinators on speciation might be particularly important in combination with other isolating factors (reviewed in Kay and Sargent, 2009 ). Partner switching in general might be an important driver of diversification in clades with mutualisms (Chomicki et al., 2019 ) and with antagonisms (e.g. parasites).

Ecological opportunity is thought to fuel adaptive radiation and thus rapid diversification (e.g. Yoder et al., 2010 ), potentially through utilisation of new resources and freedom from the constraints of competition and limited resources. Ecological opportunity might underlie both positive impacts of mutualism on diversification, and positive impacts of antagonistic interactions for the focal clade.

Clade-level diversification rates might also be increased by decreasing species-level extinction rates, through increased survival of individuals involved in the mutualism. For example, one mutualistic trait considered here involves extra-floral nectaries (Weber and Agrawal, 2014 ). These nectaries provide nectar for insects that then defend the plants against herbivores, and are associated with higher diversification rates in the plants (Weber and Agrawal, 2014 ). These insect defenders might reduce the extinction risk of these plant species (although not explicitly tested by Weber and Agrawal, 2014 ).

Chomicki et al. ( 2019 ) also described mechanisms by which mutualisms might decrease diversification rates instead, including stabilising selection associated with co-evolution, reduced genetic diversity of symbionts and increased extinction risk associated with reduced niche breadth or high fitness costs of partner loss. These mechanisms might apply to some of our results that were opposite to the predicted patterns, especially the positive-effect interactions that decreased diversification rates in some animal clades.

At some level, our results might seem contrary to the simulation study by Yoder and Nuismer ( 2010 ). They concluded that the impact of co-evolutionary interactions on diversification depends on the type of interaction, with competition and host–parasite antagonism potentially increasing diversification (when there is a cost to matching phenotypes between interactors), and mutualisms restricting diversification. However, in that study, ‘diversification’ referred to variation in a single phenotypic trait among two interacting species. Thus, those results may not generalise to rates of speciation and extinction among dozens to thousands of species over timescales of tens of millions of years (our focus here). We suggest that new theory is needed, building on Yoder and Nuismer ( 2010 ), to address the impact of different types of species interactions on speciation and extinction rates over macroevolutionary timescales.

There is also a body of empirical and theoretical work that suggests that competition drives speciation (e.g. Schluter, 1994 , 2000 Dieckmann and Doebeli, 1999 ), which seems counter to our finding that competition decreases diversification. However, there are also many studies suggesting that competition impedes diversification, especially studies proposing that adaptive radiations occur when there is ecological opportunity associated with competitive release (e.g. Yoder et al., 2010 ). The effects of competition may be scale specific, with intraspecific competition driving ecological speciation within populations over shorter timescales, and interspecific competition decreasing diversification over longer timescales (Hembry et al., 2014 ). Our results, showing the importance of interspecific competition in decreasing diversification rates over macroevolutionary timescales, are consistent with this idea. Importantly, our results do not rule out the possibility that competition can also help drive speciation, especially at shallower timescales.

Overall, the specific mechanisms by which species interactions increase or decrease diversification may depend on the interaction and taxa. Yet, some mechanisms may be general, such as links between fitness, population size, range sizes, extinction and speciation, and the potential for partner-switching to drive diversification.

Potential caveats

We acknowledge several potential caveats regarding our analyses. First, our analyses are merely correlative. Therefore, it is possible that in some cases, an apparent effect of a species interaction on diversification rates was actually caused by some other factor instead. However, it seems unlikely that this explains the overall pattern across these diverse interactions and groups of organisms. Similarly, it is possible that increases in diversification rates might somehow influence the evolution of species interactions in some cases, instead of vice versa (as we postulate here). We know of few plausible mechanisms by which this would happen. For example, any trait (including a species interaction) is more likely to evolve in a more-species rich clade by chance, all else being equal. Yet, most methods for testing trait-diversification associations should not yield a significant result under these circumstances (e.g. if the interaction originates randomly within a large clade and is therefore present in only some species having accelerated diversification rates). Again, these sorts of false positives seem possible in a few cases, but much less so across many studies. Furthermore, our main result is not simply that species interactions influence diversification rates, but rather that positive-effect interactions increase them and negative-effect interactions decrease them. This more complex pattern makes chance associations between interactions and diversification rates seem even less likely to explain our overall results.

Second, our sampling is based on relevant studies available in the literature, not systematic sampling of all clades and/or interactions. This latter type of sampling is simply not possible at this point in time. Nevertheless, the literature did contain many relevant studies across taxa and interaction types (Table 1). We acknowledge that some patterns might change as more studies are published. We suggest that our summary here can still guide future empirical and theoretical studies going forward.

Third, methods for estimating diversification rates and linking them to traits can be controversial (e.g. Morlon, 2014 ). For example, state-dependent speciation-extinction models (like BiSSE) can potentially infer trait-dependent diversification when no dependency is present (e.g. Maddison and FitzJohn, 2015 ). However, if a method routinely inferred trait-dependent diversification when it was absent, this should make it harder to infer that positive-effect interactions increase diversification and negative-effect interactions decrease diversification. Other methods may underestimate variation in diversification rates among clades (e.g. Rabosky, 2014 Meyer and Wiens, 2018 ), making it more difficult to find significant relationships with traits. Overall, it seems unlikely that our results are an artifact of methods for estimating diversification rates, because problems in these methods should make it harder to find significant patterns.

Non-significant results may have been underrepresented in our analysis due to publication bias. However, the direction of effects (i.e. positive vs. negative effects on diversification rates) should be insensitive to this bias. Furthermore, there is little evidence that effect sizes are generally greater in published than unpublished studies (e.g. Koricheva, 2003 Møller et al., 2005 ). Another bias (“research bias” Koricheva et al., 2013 ), may arise if researchers focus on unevenness in richness across a phylogeny. Again, this should not affect the direction of the effects on diversification. Importantly, our study is testing how different interaction types impact diversification, and not how often species interactions affect diversification.

Finally, we acknowledge that our sample size of studies is limited. However, dismissing significant results because of low sample sizes is statistically nonsensical (since lower sample sizes reduce power). Furthermore, some of our single data points include all animals and also all land plants (together encompassing approximately 90% of all described species on Earth Scholl and Wiens, 2016 ). We also included multiple studies (>10) within both of these groups. Future studies will doubtless find some exceptions to these general patterns (as did we), but our results do suggest that a broad overall pattern may exist.

Materials and Methods

Cell lines and cell culture

Parental HCT116 cells (HCT116, P1) were obtained from ATCC. The second parental HCT116 cell line (HCT116, P2) and all isogenic HCT116 cell lines were obtained from Horizon Discovery Ltd. The isogenic cell lines comprised following genotypes: parental HCT116 cell lines (P1 and P2, HCT116 CTNNB1 wt +/mt + KRAS wt +/mt + PI3KCA wt +/mt + ) CTNNB1 wt where the oncogenic mutation of CTNNB1 (β-catenin) was deleted leaving only the respective wild-type allele (HCT116 CTNNB1 wt +/mt − ) KRAS wt where the oncogenic mutation of KRAS was deleted leaving only the respective wild-type allele (HCT116 KRAS wt +/mt − ) PI3KCA wt where the oncogenic mutation of PI3KCA was deleted leaving only the respective wild-type allele (HCT116 PI3KCA wt +/mt − ) PTEN KO (HCT116 PTEN −/− ) AKT1 KO (HCT116 AKT1 −/− ) AKT1 KO and AKT2 KO (AKT1/2 KO, HCT116 AKT1 −/− AKT2 −/−) MAP2K1 KO (MEK1 KO, HCT116 MAP2K1 −/− ), MAP2K2 KO (MEK2 KO, HCT116 MAP2K2 −/− ), TP53 KO (P53 KO, HCT116 TP53 −/− ) and BAX KO (HCT116 BAX −/− ).

All cell lines were authenticated using SNP profiling (Multiplexion). HCT116 cells were propagated in McCoy's 5a modified medium (Life Technologies) supplemented with 10% FBS (Biochrom) and 1% penicillin/streptomycin (P/S) at 37°C and 5% CO2. Sub-cultivation was performed every 4 days at a ratio of 1:10 – 1:20. DLD-1 cells were obtained from ATCC and propagated in Dulbecco's modified Eagle medium (DMEM) (Life Technologies) supplemented with 10% FBS (Biochrom) and 1% P/S at 37°C and 5% CO2. Sub-cultivation was performed every 4 days at a ratio of 1:10 – 1:20.

Compound treatment

Prior to screening, we prepared serial dilutions of the LOPAC compound library (Sigma) in RPMI medium (Life Technologies) to provide a final stock concentration of 50 μM. Taxol/paclitaxel, vinblastine, and U0126, as well as DMSO (all from Sigma) were included as additional spike-in controls present on all plates. A list of all compounds included in this library is provided with the R/Bioconductor package PGPC and Table EV1. We seeded 1,250 cells in 45 μl McCoy's medium into each well of 384-well clear-bottom microscopy plates (BD Biosciences) and incubated for 1 day at 37°C. 5 μl of compound solution was added using a Beckman Biomek FX robot with 384-well tip head to yield a final concentration of 5 μM and 0.1% DMSO. Cells were cultured for 2 days at 37°C before analysis. For screening, a single drug concentration of 5 μM was used.

Cell staining and imaging

Cell staining was performed using a Biomek FX robot with a 384-well tip head. Cells were fixed and permeabilized with 5% paraformaldehyde (Sigma) and 0.2% Triton X-100 (Sigma) for

60 min at room temperature. Nuclei and actin were stained with 2 μg/ml Hoechst 33342 (Invitrogen) and 75 ng/ml phalloidin labeled with tetramethylrhodamine isothiocyanate (Sigma) for

60 min at room temperature. Cells were washed four times with PBS (Invitrogen), and 0.05% sodium azide (Sigma) was added for storage. Plates were sealed with aluminum seals (Corning) and stored until imaged at 4°C while protected from light. Fluorescence images were acquired with an InCell Analyzer 2000 (GE Healthcare) at 10× magnification. Each well was fully covered by four images in each of the two color channels, resulting in

Image processing and feature extraction

Images were obtained as 16-bit TIFF images with a size of 2,048 pixels × 2,048 pixels. We adapted intensity correction, image segmentation, and feature extraction methods from previous studies, based on the R package EBImage (Pau et al, 2010 ). To remove biases due to lower illumination intensity at the image border, 150 pixels were cropped on each side. Nuclei were segmented by adaptive thresholding of the Hoechst channel images with a window size of 10 by 10 pixels. The number of segmented nuclei was used as a proxy for cell count. Using the segmented nuclei as seeds, a cell segmentation mask was generated by extending the nuclei segmentation into a threshold mask of the actin channel using a Voronoi-based propagation algorithm. Parameter and method settings are documented in the PGPC vignette. Briefly, the detected nuclei were used as seeds and expanded into masks of the cytoplasm for each cell. Morphological and texture features were extracted from the images using the segmentation masks. In total, we extracted, for each well, 385 quantitative phenotypic features (Table EV3). The data were transformed using a generalized logarithm transformation (Huber et al, 2002 ).

Selection of non-redundant features

To select informative, non-redundant features, a stepwise dimensionality reduction and selection algorithm was employed (Laufer et al, 2013 ). Starting with cell number as an initial feature, this iterative approach fits each feature by a linear model using the selected features as predictors. The correlations between the model residuals of each replicate were used as a surrogate for the novel information that the feature contains. The feature with the highest correlation of the model residuals is selected next. This process continues until the percentage of positive model residual correlations over all features is smaller than 50%.

The final set of 20 phenotypic features was grouped into five categories. The category “DNA texture/intensity” includes intensity- and texture-related features computed from the Hoechst staining image, such as Haralick texture features. The “nuclear shape” group includes size- and shape-related features computed from the Hoechst channel, including eccentricity and nuclear radius. The “cell shape” group includes size- and shape-related features and the “actin texture/intensity” group includes intensity- and texture-related features extracted from the actin channel. The 20 phenotypic features were visualized by radar charts, which we termed phenoprints. Here, the radial distance is proportional to the variable shown. Using cell number as an example, higher distance from the origin corresponds to higher cell number.

Quantification of chemical–genetic interactions

The data of each feature were modeled using a multiplicative model as previously described (Laufer et al, 2013 ) and robust L1 regression to estimate the effects of the cell line and compound treatment using the medpolish function of the statistics package R ( In this iterative approach row and column medians are subtracted alternately until the change in S, the sum of absolute residuals, divided by S, falls below the defined threshold of 0.0001. The final row and column values describe the compound and cell line effect, respectively. The residuals, either having a positive or a negative value, represent the interaction coefficients. This process was performed for each replicate and each feature individually. To account for the different proliferation rates of isogenic cell lines, the cell number values on the generalized logarithmic scale were normalized using the range defined by the median of the negative control values (1) and values of the compound taxol (0) for each cell line. Values below 0 and above 1 are possible. To detect significant interactions, the values of replicates were used to perform a moderated t-test against the null hypothesis μ = 0 using the implementation of the lmFit and eBayes functions of the limma R package (Smyth, 2004 ) on the interaction matrix of each feature. P-values were adjusted for multiple testing by controlling for the false discovery rate (FDR) using the method of Benjamini and Hochberg ( 1995 ) as previously described for the quantification of gene–gene interactions (Laufer et al, 2013 ). Significant interactions were selected by using a cutoff of 0.01 (FDR) on the adjusted P-values.

To predict compound mode of action, we performed hierarchical clustering with the complete linkage rule (Fig 5A). As measure of dissimilarity, we used 1 − cor(x, y), where x and y are the interaction profiles for two compounds and cor is the Pearson correlation coefficient.

Phenotypic chemical–genetic interaction map

We used Cytoscape version 2.8. (Shannon et al, 2003 ) to plot the phenotypic chemical–genetic interaction map of a filtered dataset. Briefly, we removed controls and considered only those compounds that had interactions with a maximum of three out of twelve genetic backgrounds tested. We further only considered compounds that affected more than one phenotypic feature. Due to these filtering steps, five of the twelve genetic backgrounds tested (both parental HCT116 lines, BAX, AKT1, and MEK2 KO cells) are not included in the map. A data file to produce the Cytoscape map is included in the R/Bioconductor package PGPC.

Resolution index (∆AUC)

To quantify the gain of information using the high-content phenotypic chemo-genomic approach over a high-content phenotypic (single cell line) or pharmacogenetic (just cell number in all cell lines) approach, we computed the resolution index ∆AUC as follows. First, the correlation of compound profiles was calculated using all features and all cell lines (genotypes and multiparametric phenotypes), all 20 selected phenotypic features of the parental HCT116 cell line P1 (multiparametric phenotypes), and just the cell number feature of all cell lines (genotypes). Second, the annotated target selectivity (Table EV1) was used to classify compound pairs into the “shared selectivity” or “no shared selectivity” class depending on whether compounds share the same target based on annotation. Chemical similarity was used to classify compound pairs into the “similar structure” or “different structure” class. For this, we used the distance matrix calculated from the compound sdf files using the ChemmineR package (Cao et al, 2008 ). Compounds were classified as “similar structure” if their structural distance as defined by the Tanimoto distance calculated by ChemmineR was below 0.6. For both approaches, the empirical cumulative distribution function (ECDF) of the compound profile correlations between compound pairs was calculated separately for each of the two classes. The resolution index is defined as the difference of the area under the curve (∆AUC) between the two classes for each approach.

Analysis of drug combination data

We measured the impact on cell viability of pairwise compound combinations using fixed-dose ratio concentration kinetics and employed the CellTiterGlo assay (Promega) to determine cell proliferation and viability independently from cell number. Compounds were combined at a concentration of 20 mM each in a pairwise fashion and diluted in a 1:2 series to cover 10 concentrations. MK2206 was obtained from Santa Cruz Biotechnology. AKTi VIII was obtained from VWR International. Bendamustine, disulfiram, U0126, and PD98,059 were obtained from Sigma. Compounds were spotted in 384-well plates (Greiner) and were then diluted in RPMI using a Biomek FX robot. Briefly, HCT116 and DLD-1 cells were seeded at a concentration of 1,000 cells in 45 μl McCoy's medium into each well of 384-well plate (Greiner) and incubated for 1 day at 37°C. 5 μl of compounds was then added to cells as described before to cover a concentration range of 10–0.0195 μM. Following compound administration, cells were incubated for 3 days at 37°C and cell viability was measured via the CellTiterGlo assay (Promega) using a Mithras LB940 plate reader (Berthold Technologies). Data were analyzed using cellHTS2 (Pelz et al, 2010 ).

As compound effects E, we used 1 − NPI, which is the normalized percentage inhibition obtained from the CellTiterGlo assay data by subtracting the value of each measurement from the average of the intensities on the plate positive controls and dividing the result by the difference between the means of the measurements on the positive and the negative controls on the plate. The raw plate reader values were logarithm-transformed before these calculations.

We quantified the unexpectedness of the effect of a compound pair by a non-interacting model (Bliss independence, BI) as previously used for large-scale compound synergism screens (Tan et al, 2012 ). In the BI model, the combined effect of the compound combination A and B is given by EAB = EA + EB − EA:B, where EA and EB are the single compound effects at the same dose as in the combination and EA:B is the interaction term, which is zero for non-interacting compounds. For each concentration, we used at least 10 measurements of EAB and 20 measurements each of EA and EB. We estimated the interaction effect EA:B by inserting the means of the measurements and solving the equation for EA:B. To test it against the null hypothesis EA:B = 0, we employed Student's t-test.

Cell-based proteasome activity assay

Compounds to be tested were spotted into 384-well plates (Greiner) at a concentration of 50 μM. ZPCK, disulfiram, CAPE, tyrphostin AG555, AG1478, and DAPH were obtained from Sigma. Bortezomib was obtained from NEB and MG132 was obtained from Merck Bioscience. HCT116 cells were seeded at a concentration of 3,000 cells in 45 μl McCoy's medium into each well of 384-well plate (Greiner) and incubated for 1 day at 37°C. Following compound administration at a final concentration of 5 μM and 0.1% DMSO, cells were incubated for 24 h at 37°C and the chymotrypsin-like, trypsin-like, and caspase-like proteasome activities were measured using the Proteasome-Glo ™ Cell-Based Assay Kit according to the manufacturer's instructions (Promega) using a Mithras LB940 plate reader (Berthold Technologies). To account for compound effects on cell proliferation, cell viability was measured via the CellTiterGlo assay (Promega). The data are normalized to the viability control CTG-assay wells on each plate (CTG was set to 1 on each plate). Based upon values corrected for cell viability, we calculated proteasome activity compared with the DMSO controls of the corresponding wells on each plate. The proteasome activity for DMSO was set to 1 for each assay. The inhibition was calculated relative to this value. Proteasome inhibition is then defined by 100*(1 − (PT/PC)/(VT/VC)), where PT is the respective proteasome activity for each drug treatment, PC is the respective proteasome activity for control (DMSO) treatment, VT is the respective cell viability for each drug treatment, and CV is the cell viability for control (DMSO) treatment. Consequently, DMSO control determines 0% normalized proteasome inhibition. We performed a t-test comparing values for the compounds against the null hypothesis of zero effect.

Western blotting

HCT116 cells were seeded at a concentration of one million cells in 2 ml McCoy's medium in each well of a 6-well plate. The next day, compounds were added in 2 ml fresh medium at a final concentration of 5 μM and 0.1% DMSO and cells were incubated for 24 h. ZPCK, disulfiram, CAPE, tyrphostin AG555, AG1478, and DAPH were obtained from Sigma. Bortezomib was obtained from NEB and MG132 was obtained from Merck Bioscience. Cells were harvested in lysis buffer and prepared for Western blotting as previously described (Kranz & Boutros, 2014 ). Protein concentration was measured using the BCA protein assay kit (Pierce, Thermo Scientific). Twenty microgram samples were supplemented with 5× Laemmli buffer and heated for 5 min at 96°C. Cell lysates were separated on 4–12% NuPAGE Bis/TRIS gels (Life Technologies) and transferred to Immobilon PVDF membranes (Millipore, Merck Biosciences). Antibodies used were anti-ubiquitin (clone P4D1, Cell Signaling 1:1,000), anti-β-actin (Abcam 1:20,000), and HRP-conjugated anti-mouse IgG2b (Southern Biotechnology 1:10,000).

Data availability

Complementary views on the data are available through the following avenues. The image data files are available from the BioStudies database at the European Bioinformatics Institute (EMBL-EBI) under the accession S-BSMS-PGPC1 ( An interactive front-end for exploration of the images is provided by the IDR database ( On, the package PGPC provides an executable document with the code that was used for the analysis reported in the paper, as well as intermediate data types, such as the numeric features (, see Code EV1). The authors are hosting an interactive webpage to browse images and interaction profiles at

We would like to thank all members of the Translational Cancer Genomics team who provided useful insight, including Matthew Coelho for providing careful reading of the manuscript. We are grateful for the diligent contributions of Jon Winter-Holt in terms of compound identification and data clearance, Pedro Beltrao for helpful discussions and Danilo Horta for support integrating the Limix package. Work in M.J.G laboratory was funded by Wellcome (206194) and AstraZeneca.

Conceptualisation: EG and MJG Formal analysis: EG Data curation: EG, CP, DvdM, ABa, HL, JTL, BS, CC, FI, SF and MJG Drug response acquisition and processing: DvdM, TM, ABa, LR, WY, EL, JH, CT, CH, IM, FT, JM, ABe and HL Drug annotation: EG, AS-C, GP, FMB, PJ, EAC, ARL, CC and MJG Writing original draft preparation: EG and MJG Writing, reviewing and editing: all authors Visualisation: EG Supervision: ABa, AL, JTL, BS, CC, FI, SF and MJG Funding acquisition: SF and MJG.

Pharmacokinetics I

DDIs can be prominent for drugs that are metabolized by single mechanisms. For example, caffeine is nearly completely (97%) metabolized by the cytochrome P450 enzyme CYP1A2 to the extent that it is used as a phsyiological probe of CYP1A2 function. High concentrations of caffeine in the bloodstream can monopolize this enzyme leading to interactions with other drugs that require it to be cleared. This co-interaction at the CYP1A2 enzyme with other drugs can lead to toxic drug–drug interactions with a number of drugs including some selective serotonin reuptake inhibitors such as fluvoxamine used in the treatment of obsessive–compulsive disorder and severe depression, antiarrhythmics such as mexiletine, the antipsychotic clozapine, the bronchodilators furafylline and theophylline and the quinolone antibiotic enoxacin [13] .


Cell culture, pathogens, and toxins

TZM-bl cells were obtained from the NIH AIDS Research and Reference Reagent Program (Germantown, MD). HepG2, Hep3B, L, MDCK, Sup-T1, and Vero E6 cells were obtained from the American Type Culture Collection (ATCC Manassas, VA). Cowpox virus (Brighton strain), human rhinovirus 2 (HGP strain), human rhinovirus type 16 (11757 strain), and poliovirus (Chat strain) were obtained from the ATCC. Herpes simplex virus type 1 (KA Strain) was kindly provided by Dr. David Knipe (Harvard University). Herpes simplex virus type 2 (186 strain) was a gift from Dr. Patricia Spear (Northwestern University). Reovirus type 1 (Lang strain) was obtained from Bernard N. Fields. Ebola virus (Zaire species, 1976 Mayinga strain) and Marburg virus (1967 Voege strain) were studied in a BSL4 containment facility at the Centers for Disease Control in Atlanta, GA. The U3neoSV1 retrovirus shuttle vector [73] was obtained from H. Earl Ruley (Vanderbilt University) and was used as an insertional mutagen to prepare gene-trap libraries with parental, virus-sensitive cells, as described [18,74–76].

Production of clonal gene-trap library cell lines resistant to lytic viral infection or toxin exposure

Methods describing the preparation of clonal gene-trap library cell lines resisting lytic infection using RIE-1 cells (reovirus), Sup-T1 (HIV-1), TZM-bl cells (human rhinovirus 2 and 16), and Vero E6 cells (cowpox, Ebola, Herpes simplex virus 1 and 2, Marburg, and poliovirus) were described previously [18,75–79]. Briefly, gene-trap libraries, each harboring approximately 10 4 gene entrapment events, were expanded to 80–90% confluency until

10 3 daughter cells represented each clone. The indicated cell lines were infected with a low MOI (range = 0.0002–0.01), and infection proceeded until > 90% cytopathic effects were observed (3–7 days). The medium was changed every 2–3 days until surviving clones were visible, which were generally observed after 2–3 weeks in culture. Surviving clones were expanded in duplicate wells of separate 24-well plates, and resistance was confirmed in clones by re-infecting 1 of the duplicate wells at a 10-fold higher MOI than the original cell populations were exposed to. Resistant clones showing > 70% survival following re-infection were selected for expansion to identify trapped genes, using cells growing in the uninfected wells of 24-well plates.

Gene-trap library cells resisting cytolytic toxin exposure were prepared as follows. Clostridium difficile-TcdB toxin experiments were performed by first plating Caco-2 cells in 75 cm 2 flasks and incubated with U3neoSV1 (multiplicity of infection, MOI = 0.1) at 37°C for 1 h in the presence of 4 μg/mL polybrene (Sigma), a cationic polymer used to increase the infection efficiency [80]. Next, gene-trapped Caco-2 cells were plated in 10-cm dishes and challenged with 15 nM native TcdB toxin for 4 h at 37°C, after which the medium was exchanged and the cells were left to recover for 96 h [22]. Clostridium perfringens ε toxin and Staphylococcus aureus α toxin experiments were performed after plating MDCK cells transduced with the gene-trap vector in nine 100-mm dishes (approximately 3.3 × 10 6 cells per dish), in Leibovitz's L-15 medium. Clostridium perfringens ε toxin was added to a final concentration of 20 nM, and the treated cells were incubated at 37°C for 16 hours [74]. AZ-521 cells were infected for 1 h with Helicobacter pylori vacuolating toxin at an MOI of 0.1 in the presence of 4 μg/ml of polybrene. THP-1 cells were infected with Francisella tularensis at three different MOI values. Native ricin holotoxin was obtained commercially or purified from extracts of developing Ricinus communis seeds by standard procedures using a column of propionic acid-treated Sepharose 6LB, followed by specific elution of the cytotoxic lectin with 50 mM N-acetylgalactosamine. Recombinant ricin A chain variants (e.g., carrying C-terminal sulfation sites and glycosylation sequins) were prepared by expressing the ricin A chain cDNA in Escherichia coli. After incubating each cell type with indicated toxin or bacterium, resistant clones were expanded in separate wells of multi-well plates. The detailed protocols were described previously [18,75–79].

Rescue and sequencing the U3neoSV1 shuttle vector from resistant clones

Genomic DNA from clonal, virus-resistant cell lines was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Inc., Valencia, CA). Shuttle vectors and genomic DNA fragments flanking the U3neoSV1 integration site were recovered by digesting genomic DNA with either BamH1 or EcoRI, self-ligating the resulting genomic DNA fragments, transforming Escherichia coli, and selecting for bacteria harboring carbenicillin-resistant plasmids, as described [75]. DNA sequences flanking the U3neoSV1 integration sites were sequenced using primers annealing to the U3neoSV1 shuttle vector.

Sequence analysis

Genomic sequences obtained from shuttle clones were analyzed by the RepeatMasker (, followed by nucleotide-nucleotide BLAST searches against the National Center for Biotechnology Information (NCBI) database ( Virtually all genes that we identified matched murine and human sequences with probability scores (P) of <10 −10 and <10 −20 , respectively. Detailed descriptions of this process were provided previously [18,22,74,80].

Construction of a high-quality human protein interactome

We downloaded protein-protein interaction data from various publications and bioinformatics databases. Because the current publicly available human protein interaction databases are still incomplete, we constructed 5 different yet complementary human PINs: (i) a large-scale physical PIN, (ii) a three-dimensional structural PIN, (iii) a kinase-substrate interaction network (KSIN), (iv) a comprehensive innate immunity PIN, and (v) a large-scale computationally predicted PIN, based on our previous studies [25,26]. We implemented 2 data cleaning steps. First, we defined high-quality interactions as those that have been experimentally validated in human models through a well-defined experimental protocol. Interactions that did not satisfy this criterion were discarded. Second, we annotated all protein-coding genes using gene Entrez ID, chromosome location, and the official gene symbols from NCBI database (, as described in detail previously [25,26].

Construction of the drug-gene interactome

Drug-gene interactions (DGI) were acquired from the DrugBank database (v3.0) [81], the Therapeutic Target Database (TTD, v4.3.02) [55], and the PharmGKB database (December 30, 2014) [56]. Drugs were grouped using ATC classification system codes and annotated using Medical Subject Headings (MeSH) and Unified Medical Language System (UMLS) vocabularies (November 1, 2014) [82]. All genes were mapped and annotated using the gene Entrez ID and official gene symbols found in the NCBI database. All duplicated DGI pairs were removed. In total, we obtained 17,490 DGI pairs connecting 4,059 FDA approved or investigational drugs and 2,746 gene products.

Categories of different disease gene sets

Cancer driver genes.

A set of 384 genes that are significantly mutated in cancer was selected from several large-scale cancer genomic analysis projects [83–86].

Other cancer genes.

Additional cancer genes were selected for bioinformatics analysis from the following resources. First, 560 experimentally validated cancer genes were downloaded on December 18, 2015 from the Cancer Gene Census [87] and denoted as CGC genes. We also collected 4,050 cancer genes assembled in a previous study [25], referred to here as the comprehensive catalogue of cancer genes, CCG set. Together, these resources provide overlapping and complementary candidate cancer genes.

Mendelian disease genes (MDGs).

A set of 2,714 MDGs was downloaded from the Online Mendelian Inheritance in Man (OMIM) database [88] in December 2012. The OMIM database contained 4,132 gene-disease association pairs connecting 2,716 disease genes in 3,294 Mendelian diseases or disorders (December 2012).

Orphan disease-causing mutant genes (ODMGs).

We collected 2,123 ODMGs from a previous study [89]. The United States Rare Disease Act of 2002 defines a disease as an orphan disease that affects fewer than 200,000 individuals in the United States, the equivalent of approximately 6.5 people per 10,000 [90].

Essential genes.

Essential genes (2,719) were compiled from the OGEE database [24].

Cell cycle genes.

Human host cell cycle genes (986 genes) regulating G0/1, S, and G2 phase transitions were collected from a previous study identified by a genome-wide RNAi screening [35].

Innate immune genes.

Human innate immunity genes (971) playing a critical role in the innate immune response were collected from InnateDB [27].

Computing selective pressure and evolutionary rates

We calculated dN/dS ratios [91] to examine selective pressures on genes. Initially, human-mouse orthologous genes were used to compute dN and dS substitution rates using human-mouse sequence data for 16,854 genes available in the Ensemble BioMart database ( In addition, evolutionary rate ratios were determined, as described in a previous study [92]. Details of data and analyses were provided in our previous publication [25].

Inferring protein evolutionary origins

The evolutionary origin of a protein refers to the approximate date that the protein originated and can be inferred from phylogenetic analysis. We used the protein origin data from ProteinHistorian [93]. Specially, the origin (age) of a protein was estimated by considering 3 factors: the species tree, the protein family database, and the ancestral family reconstruction algorithm. Furthermore, evolutionary distances were calculated by comparing human sequences with orthologous sequences from other animals, as described [92].

Computational identification of new antiviral indications for existing drugs

We collected drug-gene signatures from the Connectivity Map (CMap, build 02) [20]. The CMap is comprised of over 7,000 gene expression profiles from human cultured cell lines treated with various small bioactive molecules (1,309 total) at different concentrations, covering 6,100 individual instances. The CMap thus provides a measure of the extent of differential expression for a given probe set. The amplitude (a) was defined as follows: where t is the scaled and thresholded average difference value for the drug treatment group and c is the thresholded average difference value for the control group. Thus, a = 0 indicates no differential expression, a > 0 indicates increased expression (up-regulation) upon treatment, and a < 0 indicates decreased expression (down-regulation) upon treatment. For example, an amplitude of 0.67 represents a two-fold induction. Drug gene signatures with amplitudes of > 0.67 were defined as up-regulated drug-gene pairs, and amplitudes <—0.67 reflected down-regulated drug-gene pairs. To build a complete virus-host interactome, we combined the 712 host genes identified in our gene-trap study with the 2,449 host genes that were extracted from the literature based on experiments on 54 viruses using RNAi. Detailed data information was provided in S1 Table. After removing the duplicated data, we obtained

2,600 host genes, which were then used to build the global virus-host interactome. We then mapped probe sets into the global virus-host interactome. In total, we compiled

500,000 drug-gene pairs from the CMap connecting 1,309 drugs and 2,600 virus target genes.

For each drug-virus pair, we counted the number of host genes targeted by a given virus, those that are up- or down-regulated by drug treatments, as well as overlapping or mutually exclusive pairs (Fig 1E). Next, we calculated P values by Fisher’s exact test-corrected P values using Bonferroni’s multiple comparison test in R package for each drug-virus pair. We then used q < 0.1 as a cutoff to identify significant drug-virus pairs for antiviral drug repositioning.

Network topology measurements

Network theory proposes that there are 2 important components of networks, namely nodes and edges. We studied virus-host bipartite networks, wherein nodes represented viruses and host cellular genes, and edges denoted interactions found by gene-trap. For PIN studies, nodes were comprised of proteins and edges were based on known physical interactions, protein structure evidence, and phosphorylation. We calculated connectivity (degree) values using Cytoscape v3.0.1. Hubs were defined as nodes ranked in the top 20% in the connectivity distribution, based on two previous studies [25,26].