Which sequence characteristics influence the transcription efficiency of T7 polymerase?

The T7 polymerase doesn't transcribe all sequences equally well, the transciption efficiency can vary widely for different sequences. One well known requirement of the T7 polymerase is that the sequence should begin with 2-3 guanines for efficient transcription.

Is there any literature that quantified the effects of the sequence on the transcription efficiency? I'm mostly interested in the effect of the first few nucleotides, as changing other parts of the sequence is usually not possible.

By far the most important part is the very beginning of the transcript, especially the positions +1 and +2. The conservered consensus sequence in class III T7 promoters isGGGAGA, any changes in the first two nucleotides severely reduce the transcription efficiency. Changes at positions +3 to +6 have much smaller effects.

Additionally, changes that put many AU base pairs in that region (e.g. GGUUU) seem to affect the trancription efficiency negatively.

Other sequences that are problematic anywhere, not only in the beginning are long stretches of uridines or adenines. Sequences with eight or more uridines or adenines can cause the polymerase to slip, which results in transcripts with more uridines or adenines than in the template.

These characteristics are detailed in the paper from Milligan and Uhlenbeck from 1989.

Milligan, J. F. & Uhlenbeck, O. C. Synthesis of small RNAs using T7 RNA polymerase. Meth. Enzymol. 180, 51-62 (1989).

