Link between macro lncRNA and DNA looping

Link between macro lncRNA and DNA looping

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I was wondering if anybody knows some publication about macro lncRNA (very long unspliced RNAs) or more generally a transcribed RNA that may lead to cis-DNA looping of genomic regions overlapped by the (macro) lncRNA transcript ?


A Keystone for ncRNA

A report on the Keystone symposium 'Non-coding RNAs' held at Snowbird, Utah, USA, 31 March to 5 April 2012.

Once upon a time RNA fit cleanly into the central dogma as a messenger between DNA and protein. Over the past 50 years, RNA molecules have continually emerged as dynamic and versatile regulators of the genome. Our modern understanding of non-coding RNAs (ncRNAs) may look like an intertwined mess of molecules, but collectively they exhibit architecture and coordination, leading to elegantly choreographed regulation of DNA and protein by RNA. The emerging role of RNA as an orchestrator resonated throughout the inaugural Keystone symposium on ncRNAs, with many examples of long ncRNAs (lncRNAs) having critical roles across numerous biological pathways. After the meeting, it was clear that more surprises would emerge from the 'dark matter' of the genome.

The keynote address by Nick Proudfoot (University of Oxford) set the stage by describing how the most classically studied regions of the genome, such as the β-globin locus, are now emerging as being exquisitely regulated by RNA. Specifically, Proudfoot summarized years of work showing how the act of transcription through non-coding regions, and importantly where transcriptional termination occurs, regulates the epigenetic dynamics of the locus. Intriguingly, convergent transcription by RNA polymerase II (RNA pol II) may serve as a substrate to recruit Dicer and other factors of the RNA interference (RNAi) machinery. Similarly, Robert Martienssen (Cold Spring Harbor Laboratory) presented an interplay between RNA/DNA polymerase activity and RNAi in establishing heterochromatic domains. The dependence on co-transcriptional RNAi allows the release of RNA polymerase and prevents collision with the centromeric DNA replication machinery. Together these studies demonstrate the need for not only identifying lncRNAs involved in epigenetic establishment but also for understanding many simultaneous intertwined layers of regulation.

The human noncoding transcriptome reveals a map of 'noncodarnia'

Thomas Gingeras (Cold Spring Harbor Laboratory) provided an overview of the complexity of the human transcriptome resulting from the efforts of the ENCODE consortium. The transcriptomic map has gained an unprecedented resolution, revealing that 76% of our genome is transcribed. With an average of approximately eight transcripts per genic region, the wealth of ENCODE has redefined the 'one gene - one function' hypothesis into 'many transcripts - one function', or possibly many. Using complementary datasets and approaches, Piero Carninci and the Riken OMICs Center have provided new insights into lncRNA promoter regulation. By fine mapping of the 5' 7-methyl guanosine caps on RNA, the group have found that 6 to 30% of 5' start sites of mouse and human transcripts initiate within repetitive elements. Remarkably, over 250,000 retrotransposon-derived transcription start sites show tissue- and cell-compartment-specific expression.

Leonard Lipovich (Wayne State University) and colleagues added 6,000 lncRNAs to this catalog by examining unclassified human cDNA clones and their expression profiles to determine whether these lncRNAs contribute to neurological disease phenotypes. They found that certain primate-specific and non-conserved lncRNAs are differentially expressed in brain regions that show high levels of activity. Some of these lncRNAs, antisense to protein-coding genes, can regulate their neighbors' expression. Weaving the intricacy of the transcriptome with the complexity of the mammalian body development and cognition, John Mattick (Garvan Institute of Medical Research) presented examples that emphasized the need to further understand the diversity of lncRNAs. Digging into the depths of the 'dark matter in the genome' using capture enrichment methods revealed not only numerous novel lncRNAs and their isoforms but also isoforms of well-studied protein-coding mRNAs such as p53. Hundreds of lncRNAs were shown to change during stem cell differentiation and to have similar transcript stability to mRNAs, and many are associated with epigenetic complexes, suggesting that this complexity cannot be dismissed en masse as transcriptional noise.

Features of lncRNA


As discussed above, the non-coding transcripts that do not encode proteins and are more than 200 nucleotides in length are known as long non-coding RNAs (lncRNAs). The length of a lncRNA can be more than 2 Kb while their coding potential is less than 100 amino acids [5]. Kaur and colleagues showed that in the human genome 20% of the transcriptional progress would be associated with protein-coding genes. This information illustrates that lncRNAs are four times longer than the coding RNA sequence [5].

Location in genome

The lncRNAs are harbored mostly in poorly conserved regions in the genome including the intronic regions of genes [51]. Besides, some lncRNAs are reported to be transcribed from one of the strands of a DNA sequence [61] within the protein-coding locus. The genomic locations of the lncRNAs bear direct association with their evolutionary conservedness [52, 53]. Research findings and scientific discussions suggest that plethora of lncRNAs are evolutionarily conserved [54] howbeit to lesser extent as compared to that of the protein-coding genes [55]. Interestingly, the promoter-regions of the lncRNAs are more conserved as compared to the sequence of the lncRNAs [56]. The presence of open reading frames in some lncRNAs makes these molecules difficult to distinguish from protein-coding RNAs [17]. The lncRNA gene ‘X Inactive Specific Transcript’ (or Xist), responsible for X-chromosome inactivation, is an example of lncRNA located within a less conserved region in the genome [81].


Different families of lncRNAs exercise varying modes of action for gene expression regulation and protein synthesis. These non-coding RNAs (ncRNAs) can act as scaffolds in sub-nuclear domains or can possess secondary structures to interact with DNA, RNA, and protein ( Long non-coding RNAs havecell-specific expression.It has been reported that transcription of individual lncRNAs occurs at a specifictime hence they can serve as molecularsignals to respond to diverse stimuli [103].

Cis- and trans-regulating action

The specific category of RNAs that exhibit sequence-complementarity to other RNA transcripts is known as natural antisense transcripts (NATs). The trans-NATs and their respective targets are physically located in different loci on the genome, like miRNAs. While the cis-NATs and their targets are located on the same locus, but opposite strands of the DNA. These cis-NATs were firstly identified in viruses, then prokaryotes and finally in eukaryotes. In eukaryotes (except nematodes), approximately 5–29% of the transcriptional units are involved in the overlap [51]. The cis-NATs are transcribed by RNApolymerase II which shows its involvement in mRNA processing. The interaction of sense and antisense transcripts suggests the role of NATs in gene expression regulation. Besides that, it has also been reported that in case of RNA hybrid formation and transcription of gene locus in both orientations can also induce gene silencing or can trigger an immune response [108].

Comparison with miRNA

miRNAs and lncRNAs, both are non-coding in nature. miRNAs are

22 nucleotides long as compared to 8–10 times longer lncRNAs. The exact functions of lncRNAs are not clear yet but it has been reported that both miRNA and lncRNAs act as regulators for controlling biological processes at post-transcriptional repression of protein-coding genes [101, 102, 105]. Besides,lncRNAs can also act as miRNA sponges and can reduce their regulatory effect on mRNA [78]. Experimental detection of the human genome has identified approximately 2000 different miRNAs and around 50,000 lncRNAs [15, 21, 34].

R-loop formation and distribution

R-loops are three-stranded structures consisting of an RNA𠄽NA hybrid and the displaced strand of DNA ( Fig. 1 , red circles). R-loop features have been extensively discussed in several excellent reviews 2,5,6,17 . Briefly, the three-stranded structures are typically, but not exclusively, linked to ongoing transcription, and until recently were assumed to be mere 𠆛y-products’ of transcription that occur exclusively in cis, at the site of transcription. The negative supercoiling, and hence increased tendency for DNA melting (strand separation) that occurs behind the RNA polymerase provide an ideal opportunity for the nascent transcript to anneal to the complementary template strand and form an R-loop. Although RNA𠄽NA hybrids are continually formed within the transcription bubble of the RNA polymerase, it is unlikely that R-loops are simply an extension of this limited hybrid, but rather form through re-invasion of the 5’ end of the RNA ( Fig. 1 ). This is supported by the structure of the RNA polymerase complex demonstrating that RNA and DNA are extruded from the complex at different exit channels, which would prevent an R-loop from simply being extended 28 . Moreover, a recent study demonstrated that efficient formation of co-transcriptional R-loops requires a free RNA end and a GC skew (asymmetry in the distribution of guanines and cytosines between the strands see below) 29 . RNA𠄽NA hybrids can also form in trans when the RNA is produced at a spatially distinct site ( Fig. 1 ) 30 .

The methodology for the detection of RNA𠄽NA hybrids has been crucial for understanding how R-loops are regulated and for mapping where and when they form in the genome. Although the predominant tool for detecting RNA𠄽NA hybrids has been the monoclonal hybrid-specific S9.6 antibody 31 , alternative reagents and techniques have emerged. Catalytic-dead versions of ribonuclease H1 (RNase H1 an enzyme that can degrade the RNA moiety of RNA𠄽NA hybrids), 29,32 or the RNase H1 hybrid binding domain fused to a fluorescent protein 33 can be used as hybrid sensors. Comprehensive summaries of these approaches, along with a discussion of their advantages and disadvantages, are available in refs. 3,5,34 . Although there is a general consensus on where hybrids accumulate, discrepancies between studies may be owing to different hybrid detection methods. One conserved feature of R-loops is their transient nature. This was first pointed out in a study using budding yeast, which demonstrated that only upon loss of both RNase H enzymes could a uniformly-distributed nuclear staining of RNA𠄽NA hybrids be detected 35 . This indicated that hybrids frequently arise, but are rapidly removed, at least in part, by RNase H. Subsequently, in addition to RNase H, multiple helicases have been shown to contribute to hybrid resolution, reviewed in ref 3 . The dispersed S9.6 staining throughout the nucleus also suggested that RNA𠄽NA hybrids were forming at multiple loci across the genome 35 . These predictions were verified when genome-wide sequencing of hybrid-harboring loci in yeast revealed a widespread distribution, including a strong presence in retrotransposons, telomeres and highly expressed genes, such as the ribosomal RNA and tRNA loci and other structured ncRNAs ( Fig. 2 ).

R-loops have been mapped genome-wide in a number of species. The most prevalent predictor of R-loop presence is high transcriptional activity indeed, R-loops are found at promoter regions, where they promote transcription by inducing DNA demethylation, and at transcription termination regions, where they promote transcription termination. Other features of R-loop-rich areas include high GC content and GC skew, g-quadruplex (G4) structures, antisense transcription and regions where the replication and transcription machineries collide. RNA𠄽NA hybrids (and perhaps R-loops) also form at sites of DNA damage, particularly at double-stranded breaks (DSBs), where they promote homologous recombination (HR)-mediated repair through the recruitment of breast cancer susceptibility protein 1 (BRCA1). ncRNA, non-coding RNAs snoRNAs, small nucleolar RNAs rDNA, ribosomal DNA.

R-loops are considerably enriched at genes associated with anti-sense transcription and have recently been shown to directly promote anti-sense expression 36� ( Fig. 2 ), suggesting they have a regulatory role in gene expression (see below). Although one study, which included treatment with S1 nuclease as part of the DRIP protocol to stabilize hybrids through removal of the displaced strand, indicated that AT skew and especially poly(A) tracts may be hotspots of R-loop formation 37 , the general consensus was that, in yeast, sequence per se is not a crucial determinant of R-loop accumulation, but rather the rate of transcription at the locus. Indeed, an R-loop-poor locus can be converted to an R-loop-rich locus, simply by boosting rates of transcription by changing promoters 37 . There is a tendency for GC-rich sequences to harbor more R-loops, however the same study revealed that GC-rich genes were typically more highly expressed in yeast 36 . In mammalian cells, GC skew strongly favors R-loop formation (see below) ( Fig. 2 ). However, yeast genomes do not display much GC skew, apart from at telomeres. In plants, both regions with GC skew or AT skew are enriched in R-loops 39 . The link between high levels of expression and R-loops suggests either that high rates of transcription promote R-loop accumulation, or that R-loops promote high rates of transcription. One could also envision a positive feedback loop whereby transcription results in R-loop accumulation, which promotes further transcription. On the other hand, it is important to keep in mind that R-loops are known to be potent inhibitors of transcription and lead to polymerase stalling 10 . The contradictory relationship between R-loops and transcription suggest that a tight control of R-loop persistence (half-life) is required to sustain high rates of transcription, or that a spatial separation exists between R-loops and open reading frames.

A study in yeast found little genomic overlap between R-loop hotspots and the localization of a subunit of RNA polymerase II (Pol II) 36 . This revealed that R-loop mapping by DRIP with the S9.6 antibody was not exclusively documenting ongoing transcription, but rather that some R-loops may persist, or be formed, post-transcriptionally. This was an early indication that R-loops may be more than mere transcription byproducts and suggested that hybrids may form independently of transcription, that is in trans, or influence transcription from a distance.

Similar to yeast, in human cells and in plants, R-loops accumulate at repetitive sequences such as transposable elements, ribosomal DNA, centromeres and telomeres 32,39� ( Fig. 2 ). Strikingly, in humans and plants they are also prominent at promoter regions that harbor CpG islands (CGIs) 29,32,39,42 ( Fig. 2 ). CGIs are present in the promoters of approximately 60% of human genes. One characteristic of CGIs is the presence of a positive GC skew: enrichment of guanine over cytosine on the non-template strand, downstream of the transcription start site (TSS). A more precise R-loop localization within promoter regions revealed that R-loops are generally constrained between the TSS and the first intron𠄾xon junction 43 . In general, a strong GC skew correlates with high gene expression and with R-loop enrichment. Accumulation of R-loops in this sequence-specific manner may be due to the particularly strong thermodynamic binding properties of G-rich RNA to a complementary sequence 44,45 . Moreover, the presence of DNA secondary structures such as G-quadruplexes (G4s) on the displaced DNA can contribute to R-loop stability 46 ( Fig. 1 ), either by preventing access of R-loop resolving proteins or by decreasing the re-annealing capacity of the DNA. A recent in vitro study using atomic force microscopy indicated that R-loops can attain secondary structures independently of the presence of G4 47 . These secondary structures are formed at the displaced DNA strand, and include 𠆋lobs’, ‘spurs’ and ‘loops’. Such ‘R-loop objects’ impose local physical constraints on the surrounding DNA, for example by bending the DNA. A fascinating possibility is that R-loop objects could encode higher order regulatory information, for example by recruiting different chromatin modifiers. In the future, it would be important to confirm whether such R-loop objects exist in a chromatin context in vivo.

The strong presence of R-loops within gene promoters was a first indication that these structures may have important gene regulatory functions. In additions to promoter regions, genome-wide studies found considerable enrichment of R-loops at transcription termination sites, especially those with GC skew. This is in accordance with the proposed function of R-loops in promoting transcription termination 48,49 . Finally, hybrids are produced at sites of DNA damage, including at dysfunctional telomeres and DSBs ( Fig. 2 ), and have important functions in coordinating DNA repair (reviewed in 5 and discussed below).

R-loops have long been considered accidental by-products of transcription, detrimental to cellular physiology, and merely a source of genomic instability when not properly removed 5,6 . However, it has recently emerged that there is a class of beneficial, ‘regulatory’ R-loops, which have an essential role in a variety of biological processes. Regulatory R-loops harness the sequence specificity of RNA-DNA basepairing to either repel cofactors from, or target them to distinct chromosomal loci. A paramount example, from prokaryotes, is the CRISPR (Clusters of Regularly Interspaced Short Palindromic Repeats)�s9 system, which has evolved in bacteria as a natural defence mechanism to recognize and destroy foreign DNA elements of viral origin 50,51 . Viral DNA that has been integrated into a CRISPR array in the bacterial genome is transcribed and associates with the Cas9 nuclease. The Cas9–RNA complex forms an R-loop with a matching sequence at the invading viral DNA and generates a DNA break, which eventually results in the destruction of the viral DNA. This module has now been harnessed as a highly precise and easily engineered gene editing tool using guide RNAs to target Cas9 to a specific sequence in the genome, a process which involves R-loop formation in trans 52,53 .

The first examples of endogenous regulatory R-loops in eukaryotes came from studies on class-switch recombination at the Ig heavy chain locus 54 . R-loops that form within the G-rich switch regions promote recombination-based deletions and drive antibody class diversity 55,56 . A similar recombination-based switch occurring in the variant surface glycoprotein (VSG) locus during immune evasion in trypanosomes is also R-loop-regulated 57 . The sequence specificity of RNA𠄽NA hybrids positions R-loops as prime candidate regulators of gene expression through precise targeting of promoter and termination sequences ( Fig. 2 ).


White adipocytes are responsible for energy storage, whereas brown and beige adipocytes are specialized in fuel oxidation and energy expenditure. Major progress has been made in delineating the molecular control of lineage-specific development of white, brown, and beige adipocytes (1–3), adipose tissue remodeling and inflammation (4–6), thermogenic energy expenditure (7–9), and more recently, the emerging endocrine functions of brown and beige fat (10,11). Increased adipose thermogenesis is often linked to an improved metabolic profile. Brown and beige fat thermogenesis is mediated by uncoupling protein 1 (UCP1)-dependent and UCP1-independent mechanisms the latter includes the creatine substrate cycle (12–14) and calcium futile cycle (15). Beyond thermogenesis, brown and beige fat exert their effects on metabolic physiology through secreting endocrine factors and microRNA-containing exosomes that act on other tissues in the body (10,11,16). Neuregulin 4 (Nrg4) is a brown fat–enriched secreted factor that attenuates hepatic lipogenesis and liver injury (17–19), whereas microRNAs encapsulated in exosomes are released by brown adipocytes and may serve as important messengers for intertissue cross-talk (20,21). Adipose tissue is densely innervated by sympathetic nerve fibers (22–24). Recent work also sheds light on the mechanisms that govern adipose sympathetic innervation and plasticity (25–28).

Long noncoding RNAs (lncRNAs) are emerging as important regulators of cellular signaling and gene expression in numerous cell types. lncRNAs are long RNA transcripts (>200 bp) that do not encode proteins. Many lncRNAs contain a 5′ cap, multiple exons, and 3′ polyadenylation (29). Some of these transcripts are intergenic while others are generated from genomic regions close to or partially overlapping with protein-coding genes. Depending on the relative position with the nearby coding genes, lncRNAs can be generally categorized into intergenic, antisense, divergent, intronic, and enhancer lncRNAs (29). lncRNAs can regulate the functions of cells through a variety of mechanisms. For instance, they can function as scaffolds to bring two or more proteins into a functional ribonucleoprotein complex, as decoys to titrate a protein away from its original target, as guides to recruit chromatin modification enzymes to specific loci on chromosome, and as microRNA sponges to buffer microRNAs’ inhibitory functions on gene expression (29,30). It is noteworthy that the coding potential of lncRNAs is often assessed by computational methods based on open reading frame length, conservation, codon usage, etc. These procedures are not error proof and may misannotate some micropeptide-coding transcripts as lncRNAs (31). It is therefore important to experimentally determine the coding potential of lncRNA candidates.

Recent years have seen a rapid increase in the number of adipocyte lncRNA studies focusing on genome-wide annotation of lncRNAs, molecular and functional analyses in cultured adipocytes, and, more recently, in vivo studies to delineate their role in physiology and disease. Here, we seek to highlight the latest advances in adipose lncRNA research, discuss new insights into the emerging lncRNA–protein regulatory interface, and provide perspectives on the current challenges and future directions of this field. Readers are also referred to a few other excellent lncRNA reviews that cover additional metabolic tissues (32–35).

LncRNA genes in the genome

The complex genome of eukaryotes is pervasively transcribed and efforts to comprehensively define all transcripts have led to the idea that about half of the genome can be transcribed into RNA in an individual cell (Djebali et al., 2012). The units that produce RNAs – the genes - can roughly be categorized into the two main biotypes: protein-coding genes (PCGs) and non-protein-coding genes (NCGs). The largest and most coherent category is the PCG, which encodes RNAs that serve as the template for all the peptides and proteins in the cell. The NCG category is a highly heterogenous collection and can be sub-grouped into small ncRNA (non-coding RNA) and long ncRNA (lncRNA) genes, where the term long refers to the arbitrary length of 200 nucleotides or longer. In particular, the lncRNA genes have attracted a lot of attention in recent years due to their wide range of action and mostly unexplored functions. While their number was overestimated after their initial discovery, similar to the overestimation of the number of PCGs at the beginning of the human genome project (Lander et al., 2001), current and careful curation projects, such as the GENCODE and FANTOM projects, list 17,957 and 27,919 lncRNA genes, respectively (Figure 1A), in their most recent data releases of the human genome (Frankish et al., 2019 Hon et al., 2017). Hence, the number of lncRNA genes are in the same range, or even a bit higher, than the number of PCGs (19,954). In the future, this currently very heterogeneous class of NCGs may be sub-categorized further into more specific biotypes.

LncRNA genes in the genome.

(A) Overview of genes and transcript numbers in the human genome (GENCODE v35). Circle area represents relative quantities. (B) Schematics of three possible functional properties of lncRNA loci.

Currently, three major functional principles can be assigned to lncRNA loci (Figure 1B): (1) either the RNA is the functional biomolecule and interacts with other components in the cell, for example DNA, proteins or RNAs, (2) a gene regulatory element is embedded in the transcription body of a lncRNA gene and the activity of the lncRNA gene directs the activity of the regulatory element or (3) the process of transcription influences genome and thereby gene activity. A lncRNA locus can haveone of these functions or a mixture of them (Yin et al., 2015). In this review we will focus on the latter two functional lncRNA properties, in which the RNA is, at least partially dispensable for the lncRNA gene function.

The transcription of genes

The generation of RNA using the genome as a template, or the process of transcription, depends on certain functional genomic elements (Figure 2). The core element of a gene that initiates the production of an RNA is the promoter. A GC-rich element that is accessible (open chromatin) will attract the polymerase machinery and general transcription factors (TFs). This minimal core element serves as a core promoter and can be sufficient to initiate transcription (Deaton and Bird, 2011). Transcription of RNA starts at the transcriptional start site (TSS), which is located within the core promoter. Like PCGs, most lncRNAs are transcribed by POL II (RNA polymerase 2, a multiprotein complex), but are more tissue-specific compared to PCGs (for review see Ransohoff et al., 2018). Both biotypes (PCGs and lncRNAs) have conserved core promoter sequences with fewer overlapping TF binding motifs in lncRNA promoters, resulting in an overall lower expression level compared to PCGs (Figure 2 Mattioli et al., 2019). Thus, the architecture of the core promoter is the first player that defines the degree of lncRNA expression (Batut and Gingeras, 2017 Mattioli et al., 2019). The second important element that influences the transcription of genes are enhancers, which are cis-regulatory elements that can either have a positive or a negative (which are then often referred to as repressors) impact on their target genes. Consequently, enhancers are genomic regions that encode binding sites for sequence-specific activator or repressor TFs. These elements often confer specificity in spatiotemporal expression. Many lncRNAs can also be generated from such enhancer elements, which contributes to their overall more tissue-specific expression when compared to PCGs (Mattioli et al., 2019).

Distinguishing features of transcript generation of PCGs and lncRNAs (A) LncRNA and (B) mRNAs: lncRNA genes are lowly expressed as fewer transcription factors (TFs) bind the promoter.

In addition, lncRNA TSS, exon and/or pA site more often associate with transposable elements (TEs), while TEs contribute mostly to UTRs and/or introns of mRNAs. In addition, mRNAs are more efficiently spliced.

The core promoter initiates transcription and thereby the generation of an RNA that may or may not be further diversified by splicing (Figure 2). This depends on whether splice sites are present between the promoter and the transcription termination element, the polyadenylation signal (pA). The mechanism of PCG and lncRNA splicing is similar, although the splicing efficiency of lncRNAs is lower than PCGs, likely due to the loss of proximal RNA POL II phosphorylation over 5’ splice sites (Krchnáková et al., 2019). In addition, lncRNAs show signs of co-transcriptional cleavage and premature termination with Thr4p PolII enriched over the entire lncRNA body (Schlackow et al., 2017). At some point the transcriptional machinery will run into a termination signal, a DNA sequence element consisting of AATAAA and downstream GU (or U)-rich motifs (Eaton et al., 2020). These elements are ubiquitously present in the genome. In humans, one can find 569,005 elements that meet the criterion of a pA signal (301,001 in mouse and 20,931 in C. elegans) (Herrmann et al., 2020). Moreover, this high number likely ensures successful termination of transcription (Eaton and West, 2020).

Another class of genetic elements that play an important role for gene and genome activity are transposable elements (TEs) (for review see Chuong et al., 2017). These mobile genomic elements make up more than 44% of the human genome (Lander et al., 2001) and attracted attention as important regulators of gene and genome activity (Bourque et al., 2018). In this respect, TEs are an important component of lncRNA biology as well (Figure 2A). Approximately, 75% of lncRNA transcripts contain sequence elements from TEs (Kapusta et al., 2013) and some of them represent important sequence elements to direct lncRNA localization (Lubelsky and Ulitsky, 2018). In addition, 25% of TEs are found to overlap with TSS and pA signals of lncRNA genes (Kapusta et al., 2013). Hence, they are an important driving force of lncRNA expression. One recent example is the primate-specific lncRNA XACT (Table 1), which has been shown to protect the active X chromosome from being silenced (antagonizing XIST lncRNA effect) and whose sequence contains elements derived from a TE (Casanova et al., 2019). Interestingly, XACT lncRNA is also regulated by a TE-derived enhancer element that harbors pioneer pluripotency factor binding sites. This exemplifies that TEs containing embedded TF motifs can direct tissue-specific expression when they insert next to a promoter element. Several other TE-derived lncRNAs are described elsewhere (Kapusta et al., 2013).

Selection of lncRNA genes with RNA independent function.
LncRNARelative location of respective TSSs target geneLiteratureMode of action
Regulatory element located within the transcription unit
Haunt (Halr1)40 kb downstream of HOXAYin et al., 2015Activation of HOXA
Lockd4 kb downstream of Cdkn1bParalkar et al., 2016Positive regulation of Cdkn1b via loop formation
Meteor80 kb upstream of EomesAlexanian et al., 2017Positive licensing of Eomes expression
ThymoD844 kb downstream of Bcl11bIsoda et al., 2017DNA methylation, CTCF-binding
Pcdhα-asPcdhαCanzio et al., 2019DNA methylation, CTCF-binding
GAL10-ncRNAGAL10 antisense transcriptHouseley et al., 2008GAL10 promoter acetylation
AIRN28 kb Antisense to Igfr2Latos et al., 2012Promoter methylation
Upperhand (Hand2os1)0,1 kb upstream of Hand2Anderson et al., 2016 Han et al., 2019Promotes enhancer accessibility for Hand2 activation
Activity exerted by transcription initiation or elongation
Ftx140 kb upstream of XistFurlan et al., 2018Xist activation independent of Ftx RNA
Chaserr16 kb upstream of Chd2Rom et al., 2019Negative regulation of Chd2
PVT152 kb downstream of MycCho et al., 2018Enhancer boundary element
Handsdown (Handlr)11 kb downstream of Hand2George et al., 2019 Ritter et al., 2019Transcriptional elongation-based enhancer shielding

In summary, the genome stores the information required to generate the RNAs that are necessary for a cell’s proper function, whether the RNA is protein-coding or not. An elaborate machinery is established that controls the specific activation of genes and whole genomic regions via positive or negative mechanisms. These regulatory mechanisms require energy investment from the cell. It is conceivable that sometimes it can be ‘cheaper’ for a cell to let spurious transcription of non-harmful transcripts occur, might they be coding or non-coding, than to invest energy in silencing all of these transcriptionally active sites.

Layers of gene regulation

The expression of genes and whole genomic regions is controlled by several layers of regulation. In addition to the genomic elements described above, DNA is packed with histone proteins into chromatin. These protein components can be modified to act as signaling centers for the transcription machinery (for review see Talbert et al., 2019). In addition, the proteins of the nucleus also regulate the 3D arrangement of genomic DNA in such a way that functionally connected elements of gene regulation come together. In short, each chromosome is composed of sub-megabase units known as topologically associated domains (TADs), the structural and functional unit of the chromosome (for review see Szabo et al., 2019). Such genome arrangements can allow for promoter-enhancer contacts and organize functionally dependent regulatory elements together (Hnisz et al., 2017). The major factors that regulate this organization are CTCF (CCCTC-binding TF) and the cohesin complex (Ali et al., 2016 Rao et al., 2017). CTCF binding frequently co-localizes and interacts with the cohesin complex at TAD borders (Li et al., 2020). Indeed, elimination of cohesin dissolves all chromatin TADs even in the presence of CTCF (Rao et al., 2017). Interestingly, disruption of the TADs either by removal of CTCF or cohesin results in unexpected mild effects on gene expression (Nora et al., 2017 Rao et al., 2017). While it has been accepted that gene expression and 3D genome folding are correlated, their functional relevance is still to be elucidated (Ibrahim and Mundlos, 2020).

All of these enhancers and genome organizing regions must be functionally regulated to accurately control gene and genome activity. As many such regulatory sites are associated with lncRNAs, these lncRNA loci might be important functional support elements. The process of transcription can assist in reorganizing chromatin marks (van Steensel and Furlong, 2019), allowing regions to be accessible for other factors or prevent others by diverting/directing the transcription machinery to nearby genes.

Current annotations in the database are a work-in-progress

Current annotations of genomic databases categorize genes according to various criteria. One that appears, on the surface, to be very simple is the separation of protein-coding genes (PCGs) and non-protein-coding genes (NCGs). It was already found some time ago that RNAs originating from NCGs do actually associate with ribosomes, the machinery that translates mRNAs into proteins (Ingolia et al., 2011 van Heesch et al., 2014). This association is not surprising, as the ribosomes function is to bind RNAs in the cytosol and attempt to translate it into a peptide or protein. However, just because an RNA is bound to a ribosome does not mean it is translated and even if translated, the pure presence of a peptide does not prove a function of this peptide. In more recent in-depth studies, it was found that some lncRNAs do produce peptides and that some of these peptides are even functional (Chen et al., 2020 Ji et al., 2015 van Heesch et al., 2019), including within 5’ and 3’ untranslated regions (UTR) of mRNAs. Hence, until databases are updated with suitable information that incorporates the presence of peptides derived from expressed RNAs, a peptide coding probability always must be taken into consideration when studying lncRNA function. Equally important, many PCG or NCGs have a high number of splice variants, some of which might encode a peptide and others not.

The revolution of high-throughput sequencing of fragmented cDNA libraries revealed the complexity of expression from the genome. Enrichment of lowly expressed transcripts and subsequent sequence analysis identified an even more complex pattern of splice variants (Mercer et al., 2012). However, these analyses relied on the sequencing of fragmented cDNA libraries and subsequent reconstruction of the transcriptome to a reference genome. The most recent generation of long read sequencers, such as the PacBio or the Nanopore systems, allows the direct analysis of RNAs and eliminates the intermediate step of a fragmented cDNA library. Capturing lncRNA genes specifically and resequencing by Long-read platform (known as Capture Long Sequence or CLS) determined the full variety of splice variants of the mammalian transcriptome (Lagarde et al., 2017). The advantage of this technology is the capability to precisely determine 5’ and 3’ ends and, ideally, all splice variants of a transcript. For example, the estimated mean number of exons per lncRNA using CLS was 4.27 compared to 3.59 measured by short-reads RNA-seq method (Lagarde et al., 2017). While this approach doesn’t eliminate the necessity to carefully determine the splice variants from a lncRNA locus entirely, it does provide a very good starting point for detailed analysis. In particular, when CLS data are not available for your locus-of-interest or your tissue-of-interest, one should determine the full transcript length, splice variants and regulatory elements of the lncRNA-of-interest. Only then can a successful strategy to study the lncRNA be initiated.

Gene regulation by lncRNA genes – regulatory elements within the transcription unit

Surveying the chromatin and DNA modification landscape led to the annotation of potential regulatory regions across the genome and sometimes even for specific tissues and cell types. Regulatory elements, whether they are promoters or other regulatory elements, can be found within or far away from the transcription unit of a gene. The occurrence of such a regulatory element within a transcription unit, for example of a lncRNA gene, can indicate that the function of this element might be affected by its activity.

One interesting lncRNA gene example that reflects the duality of lncRNA genes with respect to their RNA-based mechanism on one side, and an enhancer element on the other side, is Haunt. While the RNA of Haunt is thought to be required for negative regulation of HoxA, the Haunt locus contains regulatory elements to activate the HoxA locus during in vitro differentiation of pluripotent stem cells (Yin et al., 2015). While it is shown that these enhancers can interact with HoxA directly, the elements are not further defined nor how their function might depend on Haunt transcriptional activity.

A similar early example of a lncRNA locus that contains a regulatory element within its transcription unit is the Lockd lncRNA locus, which regulates its cis gene Cdkn1b. The deletion of the entire locus of Lockd, including TSS upstream elements, leads to a reduction of Cdkn1b expression (Paralkar et al., 2016). While the 5’ genomic region of Lockd interacts genomically with the promoter of Cdkn1b, this interaction is not altered if the transcription of Lockd is depleted by a pA signal inserted into the first exon of Lockd. Thus, the genomic locus itself is important as an regulatory element rather than its transcriptional activity.

Even if a specific regulatory element cannot be defined, careful analysis and genetic dissection of a lncRNA can point toward such a regulatory principle. The TSS of the Meteor lncRNA locus is important to license its cis-located gene Eomes for activation in the mesendoderm (Alexanian et al., 2017). The lack of Meteor expression by TSS deletion causes the loss of Eomes activation during mesendoderm differentiation of mouse ESCs. Decreasing levels of Meteor RNA during this process did not alter expression of downstream genes, arguing against an RNA-based function of Meteor. Interestingly, endogenous activation of Meteor is not only licensing Eomes gene activation, but other cardiac mesodermal genes as well. Moreover, transcriptional inhibition of Meteor using a polyadenylation element insertion downstream of the Meteor TSS does not cause the Eomes gene to be silenced during mesendoderm differentiation (Alexanian et al., 2017 Engreitz et al., 2016). This finding argues against a transcription-based mechanism of Meteor and suggests that the genomic locus Meteor harbors important regulatory elements to render the cis-located Eomes gene activatable during differentiation.

An excellent example of a lncRNA with a defined regulatory element within transcription unit is the ThymoD lncRNA locus. Its transcription prevents methylation of a CTCF-binding site located within its transcriptional unit (Isoda et al., 2017 Figure 3A). The binding of CTCF allows looping of the Bcl11b transcription unit in the same domain as activating regions of Bcl11b. This activation is lost when the transcription of ThymoD is blocked by insertion of a pA signal after exon two and before the CTCF-binding site and, consequentially, the CTCF-binding site is methylated (Figure 3A). Therefore, the transcriptional activity has an indirect, structural effect on the regulation of Bcl11b while the ThymoD RNA is dispensable.

Modulation of gene expression by lncRNA transcription.

(A) Transcriptional activity modulates DNA methylation and thereby alters occupation of DNA binding factors within the gene body, for example CTCF. The POL2 complex is indicated in violet. Black drumsticks indicate methylated CpGs, white drumstick non-methylated CpGs. (B) LncRNA expression alters promoter (Prom.) activity by modifying e.g. acetylation of histones at TSS sites. (C) Transcription elongation can activate poised enhancers within their gene body (only acetylation shown).

A more complex situation of several antisense transcripts regulating their cis gene is the Protocadherin alpha (Pcdhα) cluster. The variable, stochastic expression from several Protocadherin clusters provide cell-surface proteins for cellular identity recognition in the neuronal system to allow dendrites and axons to distinguish from self and other neurons. This stochastic expression is partly regulated by a distal enhancer region. The cluster of Pcdhα produces three distinct variants from three alternative TSSs to achieve stochastic expression of splice variants from this cluster. The first exon of each of these variants contains an antisense lncRNA transcript (Pcdhα-as) (Canzio et al., 2019). The expression of the lncRNAs precedes the expression of the PCGs and positively regulates the most nearby PCG expression. Mechanistically, the Pcdhα lncRNAs act similar to the ThymoD lncRNA (above) (Figure 3A). Expression of the Pcdhα-as variants leads to the demethylation of a CTCF-binding site in the region upstream of the Pcdhα PCG, thereby allowing for a stable loop formation with the distal enhancer region and a positive effect on the PCGs expression.

There are also examples of lncRNA genes that reside within a different transcriptional entity from cis target genes. Here, it is even more conceivable that their activity has an impact on the gene they are embedded in. One of the first examples was a ncRNA within the GAL10 gene cluster in yeast Saccharomyces cerevisiae. Under 0% galactose, the TF Reb1 binds to the promoter region of GAL10-ncRNA antisense to GAL10 and fully activates its expression (Houseley et al., 2008). The transcriptional unit of GAL10-ncRNA overlaps with the TSS of GAL10 and GAL1, leading to inhibition of the GAL10 and GAL1 gene by promoting high levels of H3K36me3 methylation and hypoacetylation at the GAL10 and GAL1 promoters. Addition of galactose to the growth medium blocks GAL10-ncRNA expression and hyperacetylation of the GAL10 and GAL1 promoters, leading to expression of genes that encode galactose fermenting proteins (Figure 3B).

A similar principle was shown in higher eukaryotes at the AIRN (antisense Igf2r RNA non-coding) locus. The TSS of the lncRNA AIRN is located in the second intron of the Igf2r PCG and AIRN is transcribed antisense to Ifg2r. Transcription of AIRN negatively regulates Igfr2 (Santoro et al., 2013). When transcription of AIRN is blocked by a polyA insertion before the promoter of Igf2r, this negative regulation is abolished (Figure 3B). However, if the same pA is inserted after the promotor of Igf2r, this negative regulatory effect on Igf2r is not observed (Latos et al., 2012). These findings support the hypothesis that the transcription of AIRN, and not the RNA product itself, is important for the transcriptional regulation of the Igfr2.

A lncRNA gene transcription that influences an enhancer is Upperhand, which is divergently expressed from the Hand2 protein-coding gene (Anderson et al., 2016). Loss of Upperhand transcription leads to a loss of histone acetylation upstream of Hand2, including at the cardiac enhancer. As a result, binding of GATA4 to its previously defined enhancer (McFadden et al., 2000) is reduced, and Hand2 expression in the heart is reduced as well. Hence, the Upperhand loss-of-function phenotype is similar to cardiac loss of Hand2 (Figure 3A). Additional mutants of Upperhand draw a more complicated picture of the role of Upperhand in activating Hand2. A complete deletion of the Upperhand transcription unit that encompasses all known regulatory regions of the Hand2 gene as well, causes loss of Hand2 5’UTR expression (Han et al., 2019). These findings assert the presence of important Hand2 activating genetic elements directly upstream of its TSS, independently of any RNA originating from this region. However, a promoter deletion of Upperhand causes a loss of its RNA while leaving all other elements in that region intact, but no effect on Hand2 expression was observed in this case. Furthermore, a deletion of the last two exons from Upperhand has a slight effect on Hand2 expression. There might be so far uncharacterized enhancer elements in the genomic region of these two exons and their deletion may influence Hand2 expression. In addition, although the Upperhand RNA is suggested to be not required for its in vivo function, the RNA generates peptides that might be functional (van Heesch et al., 2019). These somehow conflicting results underline the complexity of regulation of the Hand2 gene.

These examples highlight the importance of taking a careful look at the whole lncRNA locus that produces an RNA. The occurrence of an annotated regulatory element or the occupation of a genome regulating factor such as CTCF within the transcription unit can be an important indication to look for a genomic function of a lncRNA.

Gene regulation by lncRNA genes – the act of transcription is functional

The absence of a regulatory element within the transcription unit could be due to incomplete annotation or a yet unknown factor which binds there, or the act of transcription initiation or transcriptional elongation is important for the function of a lncRNA locus.

One example of such a regulation principle comes from work on the XIST lncRNA, which is one of the original lncRNAs that has been extensively studied (Brockdorff et al., 1992). While XIST acts via the produced RNA (Brannan et al., 1990 Brown et al., 1992), the regulation of XIST, at least in part, does not. The XIST lncRNA locus is flanked by many lncRNAs, and one of them is the Ftx locus found 140 kb upstream of Xist (Chureau et al., 2002). It was initially proposed that the Ftx RNA functions to regulate XIST (Chureau et al., 2011). However, detailed analysis uncovered that the transcription of Ftx, and not the produced RNA, is important to regulate Xist (Furlan et al., 2018). Knockdown of Ftx RNA does not cause a loss of Xist expression, but deletion of the promoter of Ftx, and the consequential loss of Ftx transcription, causes the loss of Xist expression. CRISPRi of Ftx similarly causes loss of Xist expression, suggesting that transcription of Ftx is the positive regulator of Xist expression. One possibility is that 3D genome architecture can be changed due to the transcriptional activity of a genomic locus (Figure 4). Strikingly, the promoter of Xist and Ftx are flanked by CTCF-occupied sites. However, deleting the CTCF-binding sites alone at the Ftx promoter has no effect on the expression level of Xist, arguing that genome folding induced by Ftx activity does not involve CTCF-binding.

Alteration of genome interactions by lncRNA activity.

DNA:DNA contacts can change upon transcriptional activity of nearby, cis located lncRNA genes.

Another good example is the Chaserr lncRNA locus, which lies 16 kb upstream of the Chd2 protein- coding gene (Rom et al., 2019). Although, knock-down of Chaserr RNA does cause a slight increase in Chd2 expression, additional lines of evidence infer that the transcription of the lncRNA gene is likely the most important function of Chaserr in regulating Chd2 (Figure 4). In addition, the promoter of Chaserr interacts with the Chd2 promoter in chromosome conformation capture analysis. Upon deletion of the Chaserr promoter region, the Chd2 promoter increasingly interacts with other enhancer elements upstream. In contrast, if the gene body of Chaserr is deleted, leaving the promoter intact, these changes in enhancer/Chd2-promoter contacts are not observed. A plausible explanation is that the transcription initiation activity rather than the transcription elongation is important for regulation of Chd2 by Chaserr.

Similarly, transcription initiation is important for the PVT1 lncRNA locus. The Pvt-1 lncRNA was originally discovered as a genomic translocation that causes the activation of the Myc oncogene (Adams and Cory, 1985). Initially, it was suggested that miRNAs embedded in the lncRNA transcript of PVT1 are important for regulation of target genes (Wang et al., 2019). It turns out that PVT1 transcription has an RNA-independent function as well. The PVT1 locus encodes several transcripts with alternative start sites. The activity of its major TSS serves as a boundary element to shield the MYC promoter from over-activation by an enhancer located within the transcriptional unit of PVT1 (Cho et al., 2018 Figure 4). The transcriptional activity is important for this shielding capacity, but not the elongation of the transcription (Figure 4). This does not mean that the miRNAs produced by PVT1 do not serve a function, but it seems the major activity of the PVT1 lncRNA, and its effect on MYC is conveyed by the transcriptional activation of PVT1.

In addition to the Upperhand lncRNA upstream of Hand2 (see above), there are Hand2-regulating lncRNA loci downstream of Hand2. We initially characterized this locus and termed it Handsdown, due its location downstream of Hand2. The Handsdown locus is expressed in the same tissues as Hand2 but is most significantly expressed in the developing heart. We have shown that transcription of Handsdown is important to negatively regulate the expression of Hand2 (Figure 4). Moreover, the HAND2 TF binds two distinct sites around the TSS of Handsdown in the developing E9.5 heart (Laurent et al., 2017). This suggests that HAND2 activates its own suppressor region in a negative feedback loop to control its dosage. However, deletion of the TSS region of Handsdown, including only one of the HAND2 occupied sites, does not result in the expected upregulation of Hand2 (George et al., 2019). Multiple, potential TSS regions are present in at the 5’ region of Handsdown and the deletion of one or the major TSS can lead to the appearance of alternate transcripts (Lavalou et al., 2019). Therefore, it is plausible that the second HAND2 occupied site may be sufficient to instruct the transcription of an alternate Handsdown transcript. Hence, as long as transcriptional activity is present in the Handsdown region, Hand2 can be negatively regulated and its expression level adjusted. The dosage of Hand2 is particularly important as loss of one copy of the Hand2 gene, as well as the gain of an additional copy of the Hand2 gene, causes malformations during development (Tamura et al., 2014). In addition to these lncRNA loci flanking the Hand2 gene, additional putative enhancers are predicted up- and downstream of Hand2, underlining the complex regulome of this important gene in development.

While functions of lncRNAs on the transcript level are becoming increasingly understood, elucidating mechanisms of how such loci, whose function is based on the transcriptional level, exhibit their effect (Table 1) is still in its infancy. While this list is not saturated, the number of lncRNAs that at least partially act by such a mechanism will increase in the future. One very promising model of how they may act are functional microdomains. In such a scenario, these microdomains promote the co-operativity between interacting components such as TFs, co-factors, chromatin regulators, RNA polymerase II, and non-coding RNA, thereby governing basic processes of gene regulation. Such microdomains are favorably formed by super-enhancers that also often generate an RNA, but function on the transcriptional level. Hence, transcriptional activity itself can influence chromatin accessibility, DNA methylation, histone modification, and higher order chromatin structure.


In the last decade, chromatin structure and histone modifications have emerged as key regulators of AS. The interaction between histone modifications, chromatin-binding proteins and SFs possibly constitutes a complex network of communication between chromatin and RNA ( 100). Also, it was demonstrated that the chromatin context influences RNA polymerase II (Pol II) elongation rate which in turns affects AS ( 101–102). This means that epigenetic regulation not only determines which parts of the genome are expressed, but also how they are spliced ( 100). Several lncRNAs participate in chromatin structure determination and dynamics, which may then impact the splicing output, notably by: (i) direct interaction between lncRNAs and DNA forming heteroduplexes, (ii) recruitment of chromatin modifiers to specific loci, or (iii) shaping the 3D organization of chromatin conformation across the cell nucleus. Finally, we discuss how lncRNA-derived small RNAs control chromatin remodeling and may determine AS patterns through this mechanism.

Splicing regulation by lncRNA-driven DNA–RNA duplexes

Circular RNAs (circRNAs) are covalently-closed circular molecules of single-stranded RNA, resulting from a non-canonical splicing event, the so-called back-splicing. This event consists in the ligation of a downstream splice donor site reversely with an upstream splice acceptor site from the pre-mRNA, generating a circular lncRNA molecule. These circular transcripts are abundant and highly stable, and they may efficiently compete with the linear pre-mRNA for the recognition of related splicing protein complexes ( 103). For instance, in flies and humans, the SF MUSCLEBLIND can strongly and specifically bind to the circRNA derived from its own locus, called circMbl ( 104). In human cells, circRNAs are dynamically modulated by the SF QKI during human EMT ( 105), and it was shown that in human endothelial cells circRNAs occurrence correlates with exon skipping throughout the genome ( 106). However, the molecular mechanisms involving circRNAs in animals remain largely unknown.

It was recently demonstrated in Arabidopsis that a circRNA can modulate the AS of its own parent gene by directly interacting with the DNA, forming an RNA-DNA hybrid known as an R-loop ( 107). The overexpression of the circRNA from exon 6 of the SEPALLATA 3 (SEP3) gene enhances the accumulation of the naturally-occurring SEP3.3 isoform, which consists of the exon 6-skipped transcript. SEP3 belongs to the MADS-box family (named after the founder members M CM1- A GAMOUS- D EFICIENS- S RF) of DNA-binding proteins, and it is involved in flower development in Arabidopsis. The modulation of SEP3 splicing gives rise to homeotic phenotypes in the flower. Strikingly, the exon 6 circRNA is capable of generating an R-loop by direct interaction with its own genomic locus, further supporting the idea that chromatin conformation plays a major role in splicing pattern determination (Figure 4i). Genome-wide characterization of R-loops will be needed to assess how widely this mechanism occurs throughout the Arabidopsis genome ( 108).

Long noncoding RNAs as chromatin remodelers. (i) The exon 6 of the SEP3 gene is transcribed and back-spliced into a circular RNA. The SEP3 circRNA directly interacts with its parent gene DNA, conforming a DNA–RNA duplex known as an R-loop and promoting the exon 6 skipping, thus the accumulation of the SEP3.3 mRNA isoform. (ii) The antisense transcript of the FGFR2 gene, called asFGFR2 (in red), recruits the PRC2 proteins EZH2 and SUZ12 to its parent locus, triggering the deposition of H3K27me3 and the recruitment of the H3K36 demethylase KDM2a. This complex enhances the deposition of H3K36me3 and impairs the binding of the chromatin-splicing adaptor complex MRG15–PTB to the exon IIIb, which is finally included in the mature mRNA (in green). (iii) NEAT1 and MALAT1 bind to common and distinct actively transcribed loci across the genome. Their binding on the gene body is different between them, e.g. NEAT1 binding peaks at the transcription start site as well as the end of the locus, whereas MALAT1 preferentially binds only at the end of the gene. It was proposed that MALAT1 and NEAT1 promote the formation of splicing-related nuclear speckles and paraspeckles, respectively, around its site of transcription of targeted loci.

Long noncoding RNAs as chromatin remodelers. (i) The exon 6 of the SEP3 gene is transcribed and back-spliced into a circular RNA. The SEP3 circRNA directly interacts with its parent gene DNA, conforming a DNA–RNA duplex known as an R-loop and promoting the exon 6 skipping, thus the accumulation of the SEP3.3 mRNA isoform. (ii) The antisense transcript of the FGFR2 gene, called asFGFR2 (in red), recruits the PRC2 proteins EZH2 and SUZ12 to its parent locus, triggering the deposition of H3K27me3 and the recruitment of the H3K36 demethylase KDM2a. This complex enhances the deposition of H3K36me3 and impairs the binding of the chromatin-splicing adaptor complex MRG15–PTB to the exon IIIb, which is finally included in the mature mRNA (in green). (iii) NEAT1 and MALAT1 bind to common and distinct actively transcribed loci across the genome. Their binding on the gene body is different between them, e.g. NEAT1 binding peaks at the transcription start site as well as the end of the locus, whereas MALAT1 preferentially binds only at the end of the gene. It was proposed that MALAT1 and NEAT1 promote the formation of splicing-related nuclear speckles and paraspeckles, respectively, around its site of transcription of targeted loci.

Long noncoding RNAs as recruiters of chromatin remodelers

An example of cell-specific AS mediated by lncRNA was linked to the antisense transcript called asFGFR2. The lncRNA asFGFR2 is generated from the human FGFR2 locus and it induces epithelial-specific AS of FGFR2 by promoting chromatin modifications in its own FGFR2 locus. It was proposed that asFGFR2 recruits chromatin modifiers specifically to this locus, perhaps via RNA-DNA heteroduplexes. Interestingly, chromatin pulldown of a biotinylated asFGFR2 RNA showed that upon its over­expression, asFGFR2 was targeted to the FGFR2 locus precisely around the differentially spliced intron ( 109). In epithelial cells, asFGFR2 was found to recruit chromatin modifiers like the Polycomb-group proteins and the H3K36 demethylase KDM2a to the FGFR2 locus. As a result, it generates a chromatin environment that prevents binding of inhibitory splicing regulators and favors exon IIIb inclusion ( 109). Polycomb-related proteins and KDM2a are differentially recruited along FGFR2 in a cell type–specific manner, correlating with FGFR2 splicing outcome. The hypothesis of a direct role of H3K27me3 and the PRC2 component EZH2 on FGFR2 AS was discarded by expressing EZH2 in cells after knockdown of KDM2a. EZH2 failed to induce exon IIIb inclusion in the absence of KDM2a, indicating that PRC2 promotes exon IIIb inclusion by maintaining low H3K36me2/3 levels via recruitment of KDM2a. This epigenetic landscape impairs binding of the chromatin-splicing adaptor complex MRG15–PTB, which normally inhibits the inclusion of exon IIIb. The antagonistic effect of H3K36me3 and H3K27me3 on FGFR2 splicing points to a lncRNA-mediated cross-talk between these histone modifications ( 109) (Figure 4ii).

Also, the mentioned splicing-related lncRNA MALAT1 was identified as a key regulator of Polycomb 2 protein (Pc2) methylation status, impacting chromatin conformation ( 110). It was reported that the methylation/demethylation of Pc2 determines the relocation of growth control genes between Polycomb bodies (PcGs) and interchromatin granules (ICGs). This behavior is ruled by the binding of methylated and unmethylated Pc2 to two lncRNAs, TUG1 and MALAT1, located in PcGs and ICGs, respectively. TUG1- and MALAT1-associated proteins were identified by pull-down using biotinylated RNAs followed by mass spectrometric analysis. This approach revealed that MALAT1 RNA bound not only to pre-mRNA SFs, but also to transcriptional co-activators and histone methyltransferases/demethylases associated with active histone marks whereas TUG1 RNA also specifically binds to a number of proteins involved in transcriptional repression, including histone methyltransferases/demethylases and chromatin modifiers. It transpired that these lncRNAs mediate the assembly of multiple co-repressors/co-activators, and can alter the histone marks read by Pc2 in vitro. Additionally, binding of MALAT1 to unmethylated Pc2 promotes SUMOylation of the transcription factor E2F1, leading to activation of the growth control gene program. Therefore, MALAT1 also participates in the modulation of the chromatin remodeling environment by selectively interacting with chromatin modifier proteins ( 110). Although there is no evidence of any direct link between MALAT1- or TUG1-mediated chromatin modulation and splicing, future studies will be needed to determine if the chromatin-related function of these lncRNAs may consequently affect AS of target genes.

Long noncoding RNAs shape the three-dimensional genome organization

The molecular pathway linking the actions of subnuclear structure-specific lncRNAs, such as TUG1 and MALAT1, and non-histone protein methylation to spatial relocation of transcription units in the nucleus, hints the role of lncRNAs in the dynamic 3D configuration of the genome in the cell nucleus. Nuclear spatial organization and chromatin 3D modulation by lncRNAs ( 37, 111–116) as well as AS modulation by chromatin modifications (for review see ( 100, 117)) have long been described in mammalian cells as well as in plants.

The splicing-related lncRNAs NEAT1 and MALAT1 were used as baits to map their binding sites across the human genome ( 118) by Capture Hybridization Analysis of RNA Targets (CHART ( 119)). Strikingly, NEAT1 and MALAT1 localize to hundreds of loci in human cells, primarily on actively transcribed genes. Many of these loci were co-enriched in NEAT1 and MALAT1 CHARTs, although displaying distinct gene body binding patterns, suggesting independent but complementary functions for both lncRNAs. CHART followed by mass spectrometry was also performed to identify NEAT1 and MALAT1 interactors, revealing common nuclear speckle and paraspeckle components. The elucidation of ribonucleoprotein complexes further supports complementary binding and functions exerted by both lncRNAs. The dynamic interactions between nuclear speckles and gene bodies indicate that speckles may serve as a concentrated reservoir of SFs that shuttle to transcribed genes ( 120, 121). Considering that speckles frequently localize within the vicinity of actively transcribed genes undergoing co-transcriptional splicing ( 122, 123), it was proposed that nuclear bodies may be organized around genes regulated by NEAT1 and MALAT1 ( 118). The previous elucidation of the 3D chromatin organization of human cells indicated that the MALAT1 and NEAT1 genomic loci are located in close proximity in the nucleus ( 124). According to this model, NEAT1 and MALAT1 could shape the structure of nuclear bodies at highly transcribed loci, as NEAT1 also participates in the organization of paraspeckle formation around its site of transcription ( 111, 112) (Figure 4.iii). Alternatively, NEAT1 and MALAT1 may serve as scaffolds, such as Xist or HOTAIR ( 119, 125, 126), bringing proteins that also interact with components of nuclear speckles and paraspeckles, together with RNA and/or DNA binding proteins. This model considers the action of lncRNAs as molecular bridges between specific chromosomal locations and nuclear speckles and paraspeckles ( 118).

Small RNAs in the interplay between splicing and chromatin compaction

Small ncRNAs (smRNAs) derived from lncRNA precursors act as small molecules of <50 nt. Since the discovery of RNA-mediated gene silencing by small interfering RNAs (siRNAs) ( 127, 128), other classes of small RNAs with multiple functions have been identified, such as microRNAs (miRNAs) ( 129), small RNA fragments derived from tRNAs (tsRNAs) and small RNA fragments derived from small nucleolar (sno)RNAs (sdRNAs, sno-derived RNAs) ( 130, 131). It has been demonstrated that they regulate gene expression by multiple mechanisms, such as targeting mRNA cleavage, translational or transcriptional repression, decoys of mRNAs or through the generation of other secondary smRNAs ( 132–134) and protein sequestering or titration ( 135). During transcriptional gene silencing, siRNAs trigger heterochromatin formation at DNA target sequences. Various plants and yeast studies have reported the relationship between splicing and silencing mediated by smRNA-directed heterochromatin formation. In particular, several mutants in SFs-encoding genes turned out to be also impaired in the silencing of certain genes ( 136–140). In Schizosaccharomyces pombe, mutation in any of the two SF-encoding genes Cwf10 or Prp39 was found to reduce centromeric siRNAs accumulation and to increase repeated transcripts like dg and dh ( 136). In the same way, mutation of a single nucleotide in the U4 snRNA gene impairs centromere silencing ( 137). In both cases, some SFs were found to facilitate siRNA production to modulate heterochromatin formation and induce centromere silencing. Similarly, in Arabidopsis the SF SR45 was found to be involved in de novo methylation by the RNA-directed DNA methylation (RdDM) pathway. In fact, the sr45 mutant decreased siRNAs and DNA methylation in transgenic FWA (FLOWERING WAGENINGEN) in company with the associated late flowering phenotype ( 138). Another example is the Arabidopsis SMALL NUCLEAR RIBONUCLEOPROTEIN D1 (SmD1) which was proposed to play a role in plants during splicing since a mutant exhibits altered AS of certain genes. This protein was also found to facilitate post-transcriptional gene silencing (PTGS) by protecting transgene aberrant RNAs from degradation by the NMD pathway. As a result, enough template is provided for siRNAs production establishing a link between aberrant RNA and AS ( 141). Apart from these examples of interaction between the splicing machinery and smRNA-directed heterochromatin formation, new studies also pointed out a link involving smRNAs in the regulation of AS through chromatin remodelling, a process that can be regulated by smRNAs. It was shown in humans, flies and worms that nucleosome density is higher over exons than introns suggesting that nucleosome positioning defines exons at the chromatin level ( 142–146). The chromatin context influences RNA polymerase II (Pol II) elongation rate which in turns affects AS ( 100–102). Rapid transcription favors exon skipping whereas slower transcription stimulates the use of weak splice sites of variant exons promoting the intron inclusion process or other alternative sites for splicing reactions ( 146, 147). For instance, the FIBRONECTIN 1 gene (FN1) produces different protein isoforms through AS of exon extra domain I (EDI). In hepatoma and HeLa cells, it was described that exogenous applied siRNAs targeting gene sequences located close to EDI alternative exon lead to a heterochromatic state in the site which affects Pol II elongation efficiency and mediates AS of EDI ( 148). This regulation was found to be dependent on ARGONAUTE 1 (AGO1), which is a crucial actor in RNA silencing, binding siRNAs to recognize their target RNAs. Another example of AS mediated by siRNAs is the inclusion of exon 18 of the NEURAL CELL ADHESION MOLECULE (NCAM) gene which is regulated by heterochromatin marks after differentiation of mouse N2a neural cells. This process could also be induced by exogenous application of exon-targeted siRNAs in undifferentiated N2a cells ( 149). These examples suggest that siRNAs could regulate AS through the modulation of heterochromatin in specific sites in order to fine-tune the Pol II elongation rate. Besides the mechanistic implications of using exogenous applied siRNAs to specific alternative exons, a genome-wide approach hinted the potential relevance of this mechanism in physiological conditions ( 150). Remarkably, purification of AGO1 and AGO2 chromatin associated complexes revealed their interaction with SFs. Furthermore, approximately one-third of smRNAs loaded in these AGO1 and AGO2 complexes align specifically with 3′ ends of introns, the intron–exon junctions ( 150). These observations suggest that these intron-related smRNAs and the RNAi protein machinery could have a function in AS regulation. Moreover, genome-wide exon arrays on embryonic fibroblasts of Ago2- or Dicer-null mice showed that they have similar altered AS events ( 150). In this work, the CD44 gene was taken as a model to characterise the underlying mechanism because several of its alternative exons can be highly included by a phorbol-12-myristate 13-acetate (PMA) treatment ( 150, 151). In mammalian cells treated with PMA, smRNAs recruited AGO1 and AGO2 to the transcribed regions of CD44 in a Dicer- and HP1 (HETEROCHROMATIN PROTEIN 1)-dependent manner, which increased local H3K9me3 levels at the region corresponding to the variable exons. Subsequently, the AGO proteins facilitate spliceosome recruitment and modulation of Pol II processivity in order to shape CD44 AS ( 150), further suggesting an involvement of smRNAs in splicing regulation. Heterochromatin regulation by smRNAs is widely found in different organisms among yeast, plants and mammals. Therefore, it is possible that the interplay between splicing and smRNAs/heterochromatin pathway is a widespread mechanism in eukaryotes to rapidly regulate splicing and AS rates of specific genes throughout growth and differentiation.


Drought is one of the vital factors limiting crop productivity and survival. Due to the ongoing global climate change, more and more research has focused on understanding the mechanisms of how crops resist drought stress and improve their resistance level [1,2,3,4,5,6,7,8]. Plants sense drought signals and produce second messenger substances, such as Ca 2+ , phosphatidylinositol and reactive oxygen species (ROS) [9, 10], while causing an increase in intracellular calcium ion concentration, initiating a cascade network of protein phosphorylation pathways. Finally, the target proteins are directly involved in the protection of cells, or regulate the expression of a series of specific stress-related genes through TFs (MYC/MYB, ABF, CBF/DREB, bZIP, etc.), thereby protecting the cells and improving the resistance of plants to adversity [11,12,13]. Although rapid developments in modern molecular biology have gradually uncovered the molecular mechanisms of plant drought resistance, developing drought-resistant plants to cope with drought stress will remain a substantive challenge in the future.

Long non-coding RNA (lncRNA) is a type of RNA transcripts which is more than 200 nucleotides in length and has no or limited protein coding abilities [14,15,16]. A growing body of evidence has shown that lncRNAs exert their regulatory effects on gene expression levels, involving epigenetic regulation, transcriptional regulation, and posttranscriptional regulation in the form of RNA [17,18,19,20,21,22,23,24,25]. With the advantage of next-generation sequencing technologies and bioinformatics approaches, many lncRNAs have been discovered in model plants, such as Arabidopsis [26,27,28,29], wheat [30], maize [31,32,33] and rice [34], indicating that lncRNAs play an important role in various biological processes of plant development and stress response. Recent research has confirmed that lncRNAs respond to abiotic stresses [31, 35, 36], including drought stress. For example, 664 drought-responsive lncRNAs were analyzed in maize [31]. Under drought stress, 2542 lncRNA candidates have been identified from Populus trichocarpa, 504 of which were found to be drought-responsive [37]. In Arabidopsis, 1832 lncRNAs changed after 2 h and/or 10 h of drought, cold, high-salt, and/or abscisic acid (ABA) treatments [29]. In maize, 664 transcripts were confirmed as drought-responsive lncRNAs, 8 out of which were proved as precursors of miRNAs [31]. In rice, pre-miRNA expression profiling indicated that miR171f is involved in the progression of rice root development and growth, as well as the response to drought stress [38]. In cotton, long intervening / intergenic noncoding RNAs (lincRNAs) XLOC 063105 and XLOC 115463, were involved in drought stress response by regulating neighboring genes [39]. Furthermore, 19 lncRNAs (17 lincRNAs and 2 natural antisense transcripts (NATs)) in foxtail millet responded to polyethylene glycol-6000 (PEG)-induced drought stress, only one of the drought-responsive lncRNA had synteny with its sorghum counterpart [40]. Qin et al. (2017) identified an Arabidopsis lncRNA, drought-induced lncRNA (DRIR), which responds to drought and salt stress. DRIR can be significantly activated by drought and salt stress as well as by abscisic acid (ABA) treatment [41]. In addition, in cassava, 318 lncRNAs were identified, which were responsive to cold and/or drought stress, and which are associated with hormone signal transduction, biosynthesis of secondary metabolites, and the sucrose metabolism pathway [42]. Additionally, numerous lncRNAs involved in the regulation of gene expression in response to stress have been identified and characterized in Brassica [43,44,45,46]. In Chinese cabbage (Brassica rapa ssp. chinensis), 4594 putative lncRNAs were identified to response to heat stress, 25 of which were co-expressed with 10 heat responsive genes [47]. In Brassica rapa L., 549 lncRNAs were identified significantly altered their expression in response to cold treatment, and short-term cold treatment induced natural antisense transcripts (NATs) in BrFLC and BrMAF genes which are involved in vernalization were identified [48]. Summanwar et al. (2019) identified 530 differentially expressed lncRNAs from the roots of clubroot-susceptible and -resistant Brassica napus lines. Twenty-four differentially expressed lncRNAs were identified from chromosome A08 which has been reported to confer resistance to different P. brassicae pathotypes [49]. In Brassica juncea, 1614 differentially expressed lncRNAs response to heat and drought stress, and some lncRNAs were co-expressed with TFs which are involved in abiotic stress response [50].

Rapeseed (Brassica napus L.) is an important oilseed crop in the world [51]. It is vulnerable to drought, which influences the production of rapeseed substantially [52,53,54]. Although many lncRNAs have been found in different plant species, indicating that lncRNAs can play an important role in response to abiotic stresses, a genome-wide identification and characterization of responses of lncRNAs to drought stress and rehydration treatments is still lacking, especially in B. napus. In order to further understand the molecular mechanisms of the response of B. napus to drought stress and re-watering, we compared changes in transcriptome between Q2 (a drought-tolerant genotype) and Qinyou8 (a drought-sensitive genotype) in response to drought stress and rehydration treatments at the seedling stage, and identified the lncRNAs involved in drought stress and rehydration treatments. The present study used a co-expression-based method, in which lncRNA functions were predicted, based on the functions of their co-expressed protein-coding genes [55]. Therefore, the lncRNA-mRNA co-expression network was constructed for pathway enrichment analysis. Moreover, the lncRNA-mRNA co-expression network of plant hormone signal transduction was analyzed to further explore the potential roles of differentially expressed lncRNAs in response to drought stress and re-watering.

Link between macro lncRNA and DNA looping - Biology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Considerations when interpreting phenotypes resulting from lncRNA mutation

The design of functional experiments should be guided by the essential RNA biology of the chosen lncRNA locus: its proximity to protein-coding genes, its chromatin signatures, stability, copy number, full-length transcript models and tissue expression profiles. If it shares a bidirectional promoter then minimise interference with the adjacent locus when designing targeting strategies. If more abundant and stable, with promoter-like chromatin marks at its transcriptional start site, then consider whether the lncRNA acts in trans in an RNA dependent manner.

Consider all available transcript, regulatory element and evolutionary evidence when designing mutations.

Consider whether, contrary to initial expectations, the lncRNA encodes protein or, as for H19, harbours a miRNA.

Choice of loss-of-function strategy and prediction of whether the lncRNA acts in cis or in trans should be informed by its cytoplasmic, nuclear or chromatin localisation. If found in the cytoplasm, consider whether it is, in fact, translated. If chromatin-associated consider whether it acts in cis. In contrast, if cytoplasmic or nucleoplasmic, consider whether it is trans-acting.

Choose cells for functional experiments in which the lncRNA is relatively highly expressed, certainly at greater than one molecule per cell.

Minimise genomic sequence disruptions when investigating lncRNA or lncRNA locus function. Use control manipulations to distinguish disruptions influencing flanking genes from those influencing the lncRNA.

Investigate each locus using multiple complementary strategies, for example introduction of minimal targeted DNA deletions, inversions or disruptions and, separately, of transcriptional truncation cassettes. Consider using controls for genetic manipulations of lncRNA loci: inverting the truncation cassette where possible, using a mutated truncation cassette, using a different type of truncation cassette, and using different sites to truncate the lncRNA. It is important to remove any selection cassettes and to consider the influence of reporter genes and loxP sites on the locus. Fully describe the mutated locus, including whether the selection cassette is retained.

Assay biological replicates separately. Embryonic stem (ES) or induced pluripotent (iPS) cells frequently vary in their differentiation kinetics, especially after undergoing gene targeting and selection, and mouse embryos, particularly early implantation stage mouse embryos, show considerable variation in developmental timing. Similarly, cancer cell lines are inherently genetically unstable. This variability makes it essential to study multiple clones of cells or independently derived mutants to ensure that the effects observed are due to the mutation of interest, and not dependent on other effects of the genetic background. This is especially important when the phenotypic effects are subtle.

Assessment of evidence for lncRNA functionality

Consider the evidence for each of the many known transcriptional or post-transcriptional, nuclear or cytoplasmic, cis or trans, RNA-dependent or -independent mechanisms of lncRNAs.

Employ RNAi-based techniques principally when investigating cytoplasmic RNAs and post-transcriptional RNA-dependent mechanisms. If using RNAi, the knockdown effect on the cytoplasmic and nuclear compartment should be determined separately. An alternative is to use antisense DNA oligos to induce an RNase H activity in the nucleus.

Only claim that a phenotype is caused by alteration of a trans-acting lncRNA transcript when it is successfully and repeatedly rescued upon expression of the lncRNA from an independent transgene.

Take advantage of carefully controlled biochemical approaches when assessing the potential function of a lncRNA.

Publications and reporting

Assess and report objectively all evidence for or against RNA sequence-dependent function or transcription-dependent (RNA sequence-independent) function.

Report phenotypes precisely. Commonly, gene knockouts kill embryos at critical periods for example, implantation, gastrulation, 12.5dpc when the cardiovascular system become essential, and at birth when lungs and many other systems become essential. In general the maternal organs rescue many organ defects of the embryo. For ES cells, phenotypes affecting pluripotency need to be defined and should be considered with caution due to the inherent instability of this state.

Explicitly caution when evidence for RNA-dependent vs–independent function, or trans- vs cis-acting function, is not clear-cut.

In vivo, loss-of-function strategies

Different genetic loss-of-function strategies can be employed in vivo to study the function of lncRNAs (Figure 2). Prioritisation of strategy should depend on the lncRNA's known biology, including its localisation to one or more of the cytoplasm, nucleus or chromatin. In one study, the majority of human lncRNAs were enriched in the cytoplasm (van Heesch et al., 2014) and these may associate with ribosomes and, contrary to expectations, some may be translated (Guttman et al., 2013 Kim et al., 2014 Wilhelm et al., 2014). Nuclear lncRNAs, particularly those that are chromatin-associated, could act as cis-acting transcriptional regulators, whereas cytoplasmic or nucleoplasmic lncRNAs might be predicted to function in trans by contrast, some nucleoplasmic lncRNAs may of course be non-functional products of transcription.

Different strategies for analysis of lncRNA loss-of-function. Strategies that have been used to alter lncRNA function are described pictorially, with the wild type situation on the top-most line.

The lncRNA locus is indicated in pink, neighbouring protein-coding gene in blue, transcription factor binding sites within it by blue and purple ovals, transcriptional terminator sequences in yellow (‘Term’) and the process of transcription by grey dotted lines. Antisense oligonucleotides are able to bind to nascent RNA transcripts and trigger RNase H mediated degradation of the transcript in the nucleus. RNAi is elicited by short RNA species that bind to argonaute proteins (Ago, green oval) within the cell. This complex recognises complementary lncRNA molecules in the cytoplasm, and triggers their destabilisation by the endogenous cellular machinery. The CRISPR and TALE systems use designer DNA binding factors to recruit repressor or activator domains (orange oval) to the lncRNA to affect transcriptional initiation. The effects of each strategy upon the process of transcription and presence of underlying DNA elements such as transcription factor binding sites are indicated. The possibility of generating stable transgenic animals to investigate phenotypes throughout development is also noted.

Depletion of protein-coding transcripts is often achieved using RNAi-based techniques, which supply double-stranded RNA that is able to trigger post-transcriptional destabilisation of the mature mRNA and inhibit translation, predominantly in the cytoplasm. Although the presence of active RNAi factors in human cell nuclei has been proposed (Gagnon et al., 2014) the extent to which exclusively nuclear lncRNAs can be knocked down remains unclear. Whilst useful for studies of many trans-acting lncRNAs, RNAi-based knockdown acts post-transcriptionally, and therefore does not block the act of transcription, precluding analyses of lncRNAs which may produce their effects via this mechanism.

Another experimental approach is to genetically manipulate the lncRNA locus. When inserting transcriptional terminator sequences care must be taken to control for changes in spacing between DNA regulatory elements and to take account of regulatory elements that may be inadvertently inserted, such as promoters of resistance genes, since these may be able to drive expression of neighbouring genes or divert activities from nearby enhancers. Insertion of exogenous sequences can induce phenotypes (Steshina et al., 2006). Even single loxP sites can attract germline methylation that might potentially repress flanking regulatory elements (Rassoulzadegan et al., 2002). Extra controls are thus needed to identify possible gain-of-function effects arising from inserted sequences, such as reporters or selection cassettes. The advent of programmable nucleases (Kim and Kim, 2014) provides opportunities to investigate these possibilities. Transcriptional terminator sequences can vary in their efficacy depending on the genomic context into which they are inserted, which can cause termination to be highly inefficient. For example, a sequence that efficiently terminates transcription in multiple contexts in Airn, failed to do so when inserted close to a CpG island (Latos et al., 2012).

Other approaches include deletion of the full-length lncRNA locus or its promoter sequence, mutation of putative functional domains or targeted interruption between the promoter and the RNA sequence through an engineered inversion (Figure 2 Table 1). Whilst useful, such strategies may not always be successful. Promoter inversion, for instance, may not always abrogate transcription, because of the bidirectionality of promoters (Wu and Sharp, 2013), and promoter deletion may also disrupt the expression level of protein-coding transcripts with which lncRNAs share a bidirectional promoter. In all of these cases, it is important to minimise the removal or reorganisation of regulatory factor binding sites or other regulatory elements within the DNA, and to control for the addition of novel binding sites. For example, it should be borne in mind that many lncRNAs initiate within enhancers (Marques et al., 2013) and in these cases disruption of the lncRNA promoter could also cause unintended changes in gene expression. In the case of transcription terminators, to ensure effects are due to changes in RNA rather than DNA, inversions of the terminator sequence or a variety of different terminators can be used. In the experimental design it is also important to consider alternatively spliced transcripts and additional transcriptional start sites to ensure full abrogation of lncRNA expression.

Antisense oligonucleotides might provide an alternative technique for analysis of lncRNA function. They are thought to act by forming a DNA/RNA hybrid with the nascent RNA transcript, and triggering RNase H-dependent degradation of the RNA in the nucleus (Figure 2). This reduces the level of the RNA before the mature transcript is produced, but the nature and extent of off-target effects are not fully understood and may be substantial (Sahu et al., 2007). Also, it is not possible to generate stable transgenic lines, which restricts analysis to cell lines or to systems where the oligonucleotides can be supplied by injection. Other approaches to disrupting lncRNA function use morpholino antisense oligos targeting e.g. splice sites (Ulitsky et al., 2011), or locked nucleic acid antisense oligonucleotides (Sarma et al., 2010).

Recent developments in rational design of DNA binding factors using transcription activator-like effector (TALE) proteins or the clustered regularly interspersed palindromic repeats (CRISPR) system have enabled recruitment of transcriptional activation (Cheng et al., 2013) or repression domains (Cong et al., 2012 Gilbert et al., 2013) to defined sites within the genome to modulate transcription, or to directly interfere with the passage of the RNA polymerase. These techniques could be used to modulate the rate of transcriptional initiation or elongation of the lncRNA (Figure 2), but care must be taken to control for direct effects of these factors on the transcription of neighbouring genes.

Separating RNA- from DNA-sequence dependent effects

Deletion of a lncRNA genomic locus does not cleanly separate a role of the lncRNA per se from a role of other functional elements contained within the underlying DNA. Such elements might be irrelevant to the lncRNA's function, yet critical to the normal function of a neighbouring protein-coding gene. Eighteen mouse knockout lines were recently described in which genomic regions containing intergenic lncRNA loci (21.6 kb mean size, 4.8 kb–49.7 kb range) were deleted and replaced by a lacZ reporter cassette (Sauvageau et al., 2013). For 13 of these lines no overt phenotypes were reported. In contrast, strong phenotypes from 5 knockout lines were observed: Peril −/− or Fendrr −/− mice have reduced viability Mdgt −/− and linc-Pint −/− mice show growth defects and linc-Brn1b −/− mice exhibit abnormal cortical anatomy. The authors conclude that these developmental disorders generated by DNA deletions demonstrate the critical roles that lncRNAs play in vivo (Sauvageau et al., 2013).

While this may be the correct interpretation, the strong phenotypes observed in these lines may derive from the engineered deletion of cis-regulatory DNA elements lying within these large DNA deletions that are critical for the normal functions of proximal protein-coding genes. For instance Fendrr is 1.4 kb from Foxf1, and Mdgt starts only 84 bp from the 5′ exon of Hoxd1 and terminates close to Hoxd3 (Figure 3). Consistent with this notion, data from the ENCODE project indicate that the genomic region deleted in Mdgt −/− lines contains binding sites for several transcription factors and chromatin regulatory proteins (Figure 3). Whilst the authors detected no global change of neighbouring protein-coding gene expression as assessed by limited RNAseq of tissues, it is still possible that altered cell type or developmental stage specific expression of these genes escaped detection. LncRNAs are often transcribed in a highly restricted cell population and a global, high-throughput analysis of even the full embryo may not have been informative. Ultimately, the best evidence for RNA-dependent lncRNA function derives from loss-of-function, followed by complementation approaches, as for example described in Grote et al. (2013).

Human and mouse ENCODE data indicate that Mdgt −/− lines contain deletions of conserved binding sites for transcription factors and chromatin regulatory proteins.

The engineered deletion in mouse, and its equivalent sequence in human, are indicated by red rectangles, and spans 85% (12.4 kb of 14.7 kb) of intergenic sequence between mouse Hoxd1 and Hoxd3. Mdgt, virtually shares its start site with Hoxd1, a gene expressed with exquisite specificities in only a few cell populations during early development (Zakany et al., 2001). Predicted transcription factor binding sites (TFBs) that are conserved in human, mouse and rat are shown against the human genome (Consortium, 2012 Ernst and Kellis, 2012). Numbers of experimentally-determined TFBs per genomic interval are shown in the histogram, and clusters of DNase 1 hypersensitivity sites, are also shown aligned against the human locus. Predicted CpG islands acquired from the UCSC Genome Browser are shown in green, and chained human-mouse alignments are shown in olive green. Evolutionary conservation (GERP) scores are indicated below the mouse locus.

This issue is also relevant for other lncRNAs transcribed from within Hox gene clusters. In the case of Hotair (Rinn et al., 2007), a several kb large deletion of the entire Hotair genomic DNA in vivo induces a subtle morphological phenotype in the spine, which was interpreted as a gain-of-function of Hoxd genes in trans (Li et al., 2013). However, Hotair is embedded in the HoxC gene cluster and topological modifications or re-arrangements in such a dense series of transcription units are likely to modify the expression of neighbouring genes. Further insights have been acquired by removing the entire HoxC locus, including both the lncRNA locus and flanking genes (Suemori and Noguchi, 2000 Schorderet and Duboule, 2011). Even when multiple alleles are available, as for Hotair, lncRNA function remains difficult to evaluate.

Expression specificity and allelic series

Deletion of the mouse Hotair lncRNA also induced a subtle developmental phenotype in the wrist (Li et al., 2013). However, because murine Hotair transcripts were not detected in developing forelimb buds (Schorderet and Duboule, 2011) it remains possible that this phenotype develops from a lack of Hotair RNAs during subsequent stages of wrist development. This possibility could only be assessed by further analysis of the expression pattern of this lncRNA. The systematic introduction of a reporter cassette into lncRNAs (Sauvageau et al., 2013) can help solve this problem, provided the difference between the stability of the reporter staining and the half-life of the RNA is kept in mind, in particular for small and dynamic cell populations (Zakany et al., 2001).

As for protein-coding genes, an exhaustive description of functional traits associated with a particular lncRNA cannot be achieved by using a single mutant allele, hence allelic series are necessary. As indicated above, the nature of the alleles required to assess the function of a given lncRNA depends upon its genomic location and its expression specificity during development and adulthood. This can be quite challenging, as exemplified by the bidirectional Hotdog and Twin of hotdog lncRNAs: even though these RNAs are located hundreds of kb distant from the HoxD gene cluster in the middle of a gene desert, their shared start site physically interacts with Hoxd genes as part of a general regulatory structure. In this case, a cis-effect could in principle be evaluated by separating the lncRNA loci from the HoxD cluster via a large inversion with a breakpoint in-between. It turns out, however, that this inversion globally disrupts the regulation of HoxD by displacing long-range acting enhancers along with the lncRNA loci, making interpretation difficult (Delpretti et al., 2013).

Discrepancies between different strategies

The lncRNA Fendrr has been studied using two independent strategies: genetic deletion (Sauvageau et al., 2013) and transcriptional terminator insertion (Grote et al., 2013). Whilst both studies describe a lethal phenotype, highlighting the potential importance of this lncRNA in development, the outcomes differ. Genetic deletion results in lung maturation and mesenchymal differentiation defects (Sauvageau et al., 2013), whilst terminator insertion leads to heart and body wall defects and to effects on the expression of the neighbouring Foxf1 gene (Grote et al., 2013). Importantly, the defects caused by terminator insertion were rescued by a transgene containing a single wild type copy of the Fendrr lncRNA locus (without its functional Foxf1 neighbour) this strongly implicates deletion of the RNA product, rather than its genomic DNA, as causing the observed phenotypes (Grote et al., 2013). Transgene rescue experiments are thus crucial for establishing RNA-dependent lncRNA function. An earlier successful illustration of this principle was the rescue of developmental defects in zebrafish by co-injection of spliced RNA for each of two lncRNAs, cyrano and megamind, whose precursor RNAs had been knocked down using morpholino antisense oligos (Ulitsky et al., 2011). However, regulatory sequences necessary for the transcription of the lncRNA itself should ideally be included in the rescue construct so as to maintain physiological levels of expression. This, added to the length of lncRNAs that can sometimes reach several hundred kb, may represent a challenge for a transgenic approach.

Substantial differences have also been observed between RNAi-mediated knockdown and transcriptional terminator insertion at the Evf-2 lncRNA locus (Feng et al., 2006 Bond et al., 2009 Berghoff et al., 2013 Kohtz, 2014). This lncRNA is transcribed across an enhancer element between the Dlx5 and Dlx6 genes, and initial studies in cell culture using RNAi suggested a model whereby Evf-2 was important for activation of Dlx5/6 (Feng et al., 2006). However, transcriptional terminator insertion in mice has shown the opposite effect on expression of Dlx5/6 (Bond et al., 2009) and causes specific changes in DNA methylation at the enhancer. Importantly these changes can be rescued by Evf-2 expression from a separate transgene, implying that they are dependent on the lncRNA itself (Berghoff et al., 2013).

Similarly to this example, knockdown of lincRNA-p21 by RNAi originally suggested a trans-acting mechanism, in which the lncRNA was involved in recruiting protein complexes to chromatin (Huarte et al., 2010). Nevertheless, subsequent studies where the promoter of the lncRNA was deleted or its transcription was blocked by antisense oligonucleotides have highlighted a different role, as this lncRNA regulates the adjacent p21 gene in cis, without having trans-acting effects (Dimitrova et al., 2014). Whilst both studies analysed by RNAseq the effect of lncRNA depletion on global gene expression in mouse embryonic fibroblasts, the two sets of differentially expressed genes did not overlap significantly. When analysing lncRNA function, it is thus important to consider multiple loss-of-function strategies that address multiple mechanisms of action.

The potential confounding effects of techniques used to separate DNA- from RNA-dependent function are further exemplified by studies of the Drosophila bxd lncRNA, which is expressed from within the HOX cluster, adjacent to the Ultrabithorax (Ubx) gene. Its expression is highly specific and occurs in the same broad region of the embryo as the Ubx gene, although notably never within the same cell (Petruk et al., 2006). Studies of bxd loss-of-function using different techniques have yielded conflicting interpretations. It has long been known that small deletions within this lncRNA cause dramatic effects on expression of the neighbouring Ubx gene (Lewis, 1978), resulting in homoeotic transformations. Indeed, certain allelic combinations are able to generate a four-winged fly. More recent studies of the same deletions suggest that the act of transcription of this lncRNA represses Ubx in cis by altering protein binding to the Ubx promoter (Petruk et al., 2006). In contrast, it was reported that inversion of the bxd promoter, driving transcription in the wrong direction whilst maintaining genomic composition, results in very minor effects on Ubx expression, and then only later in development (Pease et al., 2013). Also, a deletion removing the promoter induced a Cdx-like gain of function of Ubx (Sipos et al., 2007). Clearly, correct interpretation of such loss-of-function experiments, at such complex loci, requires careful consideration of potentially confounding factors.

Contrasting results of different experiments may also arise because of a lncRNA's involvement in different mechanisms in different cellular contexts. For example, in embryonic cells, transcription of Airn silences the adjacent Igf2r gene (Latos et al., 2012), whereas in extraembryonic tissues it acts more distally by recruiting the histone methyltransferase G9a to imprinted genes (Nagano et al., 2008).

The end of the beginning: a maturing lncRNA field

The study of lncRNAs is still in its infancy, and the biochemical and genetic techniques used to address the true significance and mechanisms of action of this class of RNA have only recently been developed or adapted from those used for investigating protein-coding genes. Such techniques must therefore be used with caution and with appropriate controls (Brockdorff, 2013 Riley and Steitz, 2013). From the examples described above, it is apparent that the optimal strategy with which to study a lncRNA's loss of function depends both on the mechanism by which it acts, in particular in a cis or trans configuration, and the regulatory sequences present within its locus. We suggest that early lessons learnt from paradigm repressor lncRNAs, such as Xist, and imprinted lncRNAs such as Airn or Kcnq1ot1, should guide the design of experiments on more recently identified lncRNAs. We have attempted to distil these lessons into the proposed considerations in Box 1. Introduction of the multiple alleles that will be necessary to adequately dissect lncRNA in vivo function will be greatly aided by recent advances in genome engineering using designer site-specific nucleases such as CRISPR/Cas9 and TALENs. The introduction of fast acute loss-of-function systems for lncRNAs, for example those that insert a sequence-specific ribonuclease site whose nuclease is under drug inducible control, would also greatly facilitate lncRNA investigation.

The trans function of a lncRNA may be investigated using locus deletion, promoter deletion, inversions, transcriptional termination or RNAi. Where possible, these strategies should be combined with genetic rescue experiments, where the lncRNA is expressed from an independent transgene inserted at a location distinct from the lncRNA locus. This strategy separates RNA-dependent effects from those arising from the manipulation of the underlying DNA. Rescue experiments using expression of the lncRNA from an independent transgene are only possible for trans-acting lncRNAs where the RNA moiety itself and not the act of transcription is critical for function.

The cis function of a lncRNA may be investigated using a combination of several alleles, such as insertion of transcriptional terminators, promoter deletions and inversions. Several alleles are likely to be required to separate lncRNA-dependent from other effects and, as controls, to reveal artefacts of genetic engineering. Engineered inversions can also be used to separate the lncRNA locus from its potential neighbouring target gene to investigate its roles in cis. Use of site-specific recombinases, such as the phiC31/attP system (Bateman et al., 2006 Zhu et al., 2014) as ‘landing sites’ or for recombination mediated cassette exchange, will greatly enhance our ability to generate such allelic series. For example, the lncRNA locus may be deleted and replaced by a recombinase ‘landing site’ into which different constructs can be introduced to investigate phenotype rescue.

In summary, if lncRNA biologists are to resolve the true in vivo functions of these numerous and enigmatic transcripts, then the strengths and weaknesses of available techniques will need to be acknowledged. Resolution will no doubt derive from the careful and comprehensive genetic dissection of individual loci using multiple alleles. The field of lncRNA biology would benefit greatly from the development of additional approaches that are effective in distinguishing effects mediated by lncRNAs as molecular species from their effect on gene regulatory elements with which lncRNA loci are interleaved across the mammalian genome.

Watch the video: 6. Noncoding RNAs piRNAs (August 2022).