Altogether, our work provides a strong foundation for future mechanistic experimental and computational studies of transcriptional adaptation. For Perturb-seq data-derived gene expression distribution analyses, we chose to focus on human transcription factor genes, as defined by the most recent version of AnimalTFDB3, last accessed July 28, 2023 [120]. We searched for overlapping regulons between a knockout target and the paralog gene of interest in DoRothEA, only considering downstream genes with annotation confidence level A, B, or C (out of a possible range of A-E, see original source for evidence level descriptions) [63, 64]. There were 55 annotated regulon genes for the three target-paralog pairs with transcriptional adaptation versus 439 annotated regulon genes for all target-paralog pairs without transcriptional adaptation. We analyze existing bulk and single-cell transcriptomic datasets to uncover the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types.
Gene expression distribution robustness to mutation is dependent on model parameters
We were particularly interested in a robust method for identifying whether a distribution was unimodal and symmetric, suggesting a degree of homogeneity in expression. For distributions that were bimodal (or multimodal), one could imagine different emergent properties in a population of cells, e.g., with bistability or other kinds of functional diversity. For distributions that were unimodal but not symmetric, i.e., skewed, one could imagine a bias toward low-frequency diversity in behavior, either being very high expressors or very low expressors. Lastly, we also needed to identify when expression levels were very low in general, reflecting overall minimal transcriptional activity. We wanted to identify differentially expressed genes across the dozens of knockout samples we reanalyzed. When available, we used author-provided gene expression change calculations based on DESeq2 (for results from [58]).
Checking for genomic regulatory feature association with paralog upregulation
This behavior is not restricted to genes in specific pathways or encoding products with specific molecular functions. Transcription factors that show evidence of transcriptional adaptation have downstream regulons that are more robust to the transcription factor’s mutation compared to regulons of transcription factors without mutation-induced paralog upregulation. Lastly, simulations of a gene regulatory network with transcriptional adaptation produce a variety of expression distributions of downstream targets upon compensation, recapitulating observed diverse regulon expression changes after transcription factor mutation.
Gene expression distribution shape varies widely across the parameter search space
Localization of UPF proteins to compensating loci dependent on nonsense-mutated RNA in [79], tubulin family upregulation after Tubb4a mutation in [36], and others, such as the knockdown-knockout discrepancies reviewed in [2], could contribute to the field of transcriptional adaptation. A common consensus may facilitate faster discovery and reconciliation of paradoxical findings across contexts moving forward. We present several analyses centered on the question of when an expression distribution can remain robust to the mutation of an upstream regulator. Therefore, we built an algorithm for classifying distribution shapes to reflect plausibly important differences.
Gene expression distribution shape depends on model parameters
However, precisely how compensated expression fluctuations manifest into downstream effects has not been formally investigated. For example, what is the ensemble of gene expression distributions of effector molecules post-compensation? Under what conditions can we expect the system to exhibit robustness of the distribution shape and mean? In a similar vein, does the answer depend on the nature of interactions or network size (negative or positive; one or multiple paralogs)? In summary, our integrative analyses highlight the genome-wide prevalence of and gene regulatory constraints on transcriptional adaptation in mammalian cells. We show that upregulation of paralogs after reference gene mutation is common, but not necessarily ubiquitous, across cell types and contexts.
For example, one study used single-molecule approaches to study the effect of nonsense-mediated decay in U2OS cells with and without nonsense immunoglobulin-μ genes. They showed that UPF1 depletion increased the speed of transcriptional elongation in the wild-type but not in the nonsense immunoglobulin-μ gene [79]. Furthermore, regulatory network mappings at a single-cell level could also help explain incomplete phenotypic penetrance reported in association with transcriptional adaptation. Another set of questions center around whether gene length, number of introns and exons, chromosomal locations, and chromatin landscape play a role in which gene families exhibit nonsense-induced transcriptional compensation.
We then included genes only if they were expressed at a level of 10 raw counts or higher across all samples. We chose to classify paralogs as upregulated if DESeq2 reported an adjusted p-value ≤ 0.05 and a log2 fold-change ≥ 0.5. In supplementary analyses, we also show results when paralogs are classified as upregulated using either (1) only the adjusted p-value ≤ 0.05 filter or (2) adjusted p-value ≤ 0.05, log2 fold-change ≥ 0.5, and basemean ≥ 10 filters. The final analysis included all knockout target genes with any significant paralog differential expression, up or down, irrespective of log2 fold-change.
- We find that even a relatively parsimonious model of transcriptional adaptation can recapitulate paralog upregulation after mutation and diverse population-level gene expression distributions of downstream effectors qualitatively similar to those observed in real data.
- The mapping between simulation and wet-lab experiment can uncover plausible network and parameter constraints for individual compensating genes and could provide evidence for particular compensating gene regulatory steps affected by transcriptional adaptation.
- Another opportunity is presented by recently reported combinatorial CRISPR screens (e.g., [24, 74]), which include paired knockout of two or more genes in the same cells, which could identify gene sets for which transcriptional adaptation confounds the outcomes.
- These examples are programmatically compiled from various online sources to illustrate current usage of the word ‘condensed.’ Any opinions expressed in the examples do not represent those of Merriam-Webster or its editors.
- Our integrative approach identifies several putative hits—genes demonstrating possible transcriptional adaptation—to follow-up on experimentally and provides a formal quantitative framework to test and refine models of transcriptional adaptation.
The fact that transcriptional adaptation occurred across a wide range of processes and for gene sets not necessarily belonging to a single regulatory module or signaling pathway highlights the need to consider their implications when screening for any phenotypic outcomes. One way to address this concern is to perform screens with perturbation methods that avoid nonsense mutations. Techniques such as CRISPRi, already being used in pooled screens [74, 75], or other methods of engineering knockdowns, could be helpful.
And there he waited in that hot gray cloud that pressed to the roof where it condensed and fell like warm rain. The greatest part of the waste steam is condensed in heating the water to fill the boiler; what escapes is a mere nothing. None of the sound speeds previously measured in a variety of liquids and solids surpass the proposed limit, condensed matter physicist Kostya Trachenko and colleagues found. “This is the first time we can really claim that room-temperature superconductivity has been found,” said Ion Errea, a condensed matter theorist at the University of the Basque Country in Spain who was not involved in the work.
The edges between a given regulator gene product and the target gene alleles are set at equal weight, reflecting no regulatory differences at the allele level. One limitation of our work is that a majority of the analysis was performed on datasets from bulk RNA sequencing studies, limiting a quantitative single-cell mapping with simulations. Another limitation of our framework is that we focused primarily on mice and human datasets given the breadth of available datasets. In principle, our bioinformatic pipeline can be generalized to include other animal systems to reveal both species-specific and universal gene targets displaying transcriptional compensation [14, 15]. The mapping between simulation and wet-lab experiment can uncover plausible network and parameter constraints for individual compensating genes and could provide evidence for particular compensating gene regulatory steps affected by transcriptional adaptation.
Another set of questions center around the regulatory constraints on upregulated paralogs and their downstream effector molecules [34, 35]. In particular, nonsense-induced transcriptional compensation can result in incomplete phenotypic penetrance, such that there are either attenuated defects or a subset of cells or organisms which continue to have strong defects despite compensation. In some cases, compensation can happen without necessarily rescuing a phenotypic defect induced by knockout mutations [28, 36,37,38]. These observations, coupled with the documented evidence that transcription is bursty [39], raise the possibility that inherent stochasticity underlying the compensatory gene regulatory networks may translate into single-cell differences. Single-cell differences, in turn, could result in incomplete penetrance, particularly for phenotypes resulting from variable downstream effects on relevant effector gene expression.
Additionally, such mappings can help with the design and interpretation of functional genetic screens by taking into account genes known to be exhibiting transcriptional adaptation and the extent of its impact. The breadth of genes that appear to have transcriptional compensation also invites study of potential negative consequences of nonsense-induced paralog—or other related gene—upregulation. Might some compensatory changes be deleterious and, if so, could such deleterious changes explain select negative phenotypes previously ascribed to haploinsufficiency or gene dosage effects [80]? In a similar vein, our framework could be extended to analyze cases where paralogs are downregulated upon Cas9-induced nonsense mutations, potentially revealing new biology.
First, we argue that a systematic bioinformatic analysis of publicly available transcriptome-wide datasets that rely on CRISPR-Cas9-mediated mutagenesis can, in principle, suggest the presence of transcriptional adaptation, or lack thereof. Second, we extend the analysis of nonsense-induced compensatory effects to downstream regulatory targets of mutated genes, using annotated transcription factor regulons. We show that transcription factors that display potential transcriptional adaptation have more stable downstream regulatory targets after mutation. Lastly, we develop stochastic mathematical models of biallelic gene regulation and simulate over tens of millions of cells. We find that even a relatively parsimonious model of transcriptional adaptation can recapitulate paralog upregulation after mutation and diverse population-level gene expression distributions of downstream effectors qualitatively similar to those observed in real data. Our integrative framework is generalizable and lays the foundation for future work to test our findings experimentally and to refine models of transcriptional compensation.
We thank members of the Goyal lab, including Rohan Sohini, for insightful discussions related to this work. We also thank Lea Schuh (Helmholtz Munich), Karun Kiani (University of Pennsylvania), and Granton Jindal (UCSD) for discussions and manuscript comments. We thank Aviv Regev (Genentech) for pointing us to relevant datasets, and Timothee Lionnet (New York University) for pointing us to relevant literature. For each of the different models, we sampled over several other rate and ratio model parameters. Their typical areas of research include building models for condensed matter physics, viral evolution and population dynamics, not epidemiology or public health.
Similarly, gene regulatory network effects can result in distributions of effector molecules in single cells that are non-trivial to predict, yet they can have profound phenotypic implications. Elegans intestinal fate can result in downstream effector expression heterogeneity, further dependent on the continued function of other regulatory network components [65]. Here, we combine computational analysis of existing datasets with mathematical modeling of stochastic gene regulatory interactions to address the questions posed above.
Alternatively, if knockout is a requirement of the experimental design, engineering whole-gene deletion alleles could help decouple effects of transcriptional adaptation from that of specific gene knockouts. Another opportunity is presented by recently reported combinatorial CRISPR screens (e.g., [24, 74]), which include paired knockout of two or more genes in the same cells, which could identify gene sets for which transcriptional adaptation confounds the outcomes. For example, combined knockout of a reference gene and its paralogs could help to overcome the effects of transcriptional adaptation, while paired knockout of a reference gene and other interacting genes could help to disentangle reference- vs. paralog-specific functions. When such experimental designs are impractical or infeasible, the interpretation of nonsense-based knockout results could account for possible transcriptional adaptation by concurrent measurement of the expression of paralogs of the knockout target. In this way, caution must be taken in interpreting the phenotypic changes, or lack thereof, if a knockout target shows paralog upregulation. We next asked whether any specific annotated biological pathways, molecular functions, or cellular components were enriched for genes that display transcriptional adaptation by paralogs, either in bulk or in single-cell datasets.
We sought to describe the variability in gene expression emerging from gene regulatory networks with transcriptional adaptation and to quantify differences in aspects of variability between network outputs given different parameter values. Therefore, we calculated several summary statistics related to distribution shape to highlight important features of gene expression distributions. During the process of mining published datasets from disparate studies, we found several cases where the phenomenology could reflect what the two landmark studies [14, 15] term as “transcriptional adaptation,” but the results were not explicitly contextualized (nor definitively proven) as such.
We demonstrated the existence of transcriptional adaptation in mice and humans across multiple contexts. Particularly, the results from Perturb-seq datasets suggest incomplete penetrance of transcriptional adaptation at a population level, with single-cell differences in the frequency and magnitude of related paralog upregulation (Fig. 3) as well as downstream effector molecules (Fig. 4). While such publicly-available datasets provide an important view of nonsense-induced transcriptional compensation, several questions related remain unanswered. For example, can simple gene regulatory networks recapitulate single-cell variability in compensating paralogs? Furthermore, under what conditions is transcriptional adaptation capable of inducing robustness across a population of cells, in that the compensating paralog expression precisely mimics wild-type expression at a single-cell level? Of note, robust paralog expression alone may not be sufficient, as paralog activity (e.g., that of a paralogous enzyme or a transcription factor) can differ substantially from the original gene.