Tips for Optimizing Protein Expression and Purification

 

SDS-PAGE of a goat IgG antibody purification

Recombinant proteins are used throughout biological and biomedical science. The development of simple, commercially available systems has made the production of recombinant proteins more widespread. Most significantly, it has dramatically expanded the number of proteins that can be investigated both biochemically and structurally. Since every protein is different, the purification protocols and strategies must be worked out for each individual protein and with an eye to its intended use. We describe the various factors that have a large effect on soluble protein expression and describe how to change them in order to express folded, active proteins.

  • Influence of gene and/or protein sequence on expression and solubility

    One of the most common reasons that heterologous proteins fail to express is the presence of “rare” codons in the target mRNA. This codon bias can be overcome by codon-optimized gene synthesis. One advantage of gene synthesis is the ability to change the codon bias of the gene to be more compatible with the recombinant host. For E. coli, expression strains supplemented with the rare tRNAs can overcome the codon bias of the recombinant gene.

    The probability of successful soluble protein expression decreases with increasing molecular weight, especially for proteins that are >60 kD. When using E. coli as an expression host, it is advantageous to design constructs of individual protein domains, as opposed to full-length protein and to use solubility-enhancing fusion tags as these tags will intensely aid in protein purification and seldom will adversely affect biological or biochemical activity.

    The starting and ending residues of the target domain can also affect expression yield and solubility. The optimal boundaries for the protein domain construct should be determined using the available functional and structural data of the protein. For a protein of unknown domain structure, threading the target protein sequence onto a homologous protein structure can help in determining the optimal domain boundaries. When a homologous protein structure is not available, the prediction of secondary structural elements should be exploited.

  • Influence of vector on expression and solubility

    DNA sequence elements that direct the transcription and translation of the target gene include promoters, regulatory sequences, the Shine-Dalgarno box, transcriptional terminators, and origins of replication etc. In addition, expression vectors contain a selection element to aid in plasmid selection within the host cell. Another critical feature of E. coli expression vector is the presence of a fusion tag.

    • When selecting a promoter system, the nature of the protein target and its desired downstream use must be considered. If the protein target is a toxic protein, consider using promoter systems that have extremely low basal expression. Alternatively, for maximal protein yields, a strong promoter should be selected. For aggregation-prone proteins, a cold-shock promoter, in which expression is carried out at low temperatures, may be tested.
    • Larger bacterial and heterologous proteins fold more slowly and tend to aggregate. To prevent aggregation and facilitate folding in E. coli, protein chaperones and folding catalysts can be used. The target protein can be co-expressed with a second protein that is encoded on either the same plasmid or a separate plasmid.

    Fusion tags are genetically fused to target proteins to increase protein solubility, or allow easy detection and purification. It is often necessary to test multiple fusion tags to determine which tag results in the maximum yields of soluble proteins. The placement of the tag, either at the N-terminus or C-terminus of the target protein, is also important. N-terminal fusions are the most common and have the added benefit that they often enhance soluble protein expression more successfully than C-terminal fusions.

    • The presence of a fusion tag may interfere with the biological activity of the recombinantly expressed protein, and thus, it may be important to enzymatically remove the tag after the fusion protein has been purified. It is recommended to include a cleavage site for a sequence-specific protease to enable removal of the tag.
  • Influence of host strains on expression of heterologous proteins

    Bacterial host strains have been developed to support the expression of heterologous proteins. Commercially available E. coli strains are specifically designed for the specific expression of proteins that are susceptible to proteolysis, contain rare codons, or require disulfide bonds.

    • For proteins that are susceptible to proteolytic degradation, use of protease-deficient strains such as E. coli BL21 or its derivatives are recommended.
    • Differences in codon frequency between the target gene and the expression host can lead to translational stalling, premature translation termination, and amino acid mis-incorporation. This difference may be overcome by supplying the rare tRNAs during expression. Bacterial strains that contain plasmids that encode rare tRNAs should be used to promote the efficient expression of genes that contain high frequencies of rare codons.
    • For proteins that contain disulfide bonds, expression in thioredoxin reductase (trxB) and/or glutathione reductase (gor) host strains will aid the formation of cytosolic disulfide bonds and will enhance the solubility of folded, disulfide-containing proteins. An alternative strategy to express disulfide-containing proteins would be to target the expressed protein to the E. coli periplasm which is highly oxidative and thus promotes the formation of disulfide bonds.
  • Improving solubility of proteins by changing expression conditions

    The use of strong expression promoters and high inducer concentrations can result in high protein concentrations that would lead to protein aggregation before folding. Reducing the rates of transcription and/or translation will facilitate folding by allowing the newly synthesized protein to fold before it aggregates. Following are the common expression condition parameters that can be manipulated to enhance protein solubility.

    • Temperature: Lowering the expression temperature (15-25°C) will improve the solubility of recombinantly expressed proteins. At lower temperatures, cell processes slow down, and thus lead to reduced rates of transcription, translation, cell division, and reduced protein aggregation. Lowering the expression temperature also results in a reduction in the degradation of proteolytically sensitive proteins.
    • Concentration of the inducer: Lowering the concentration of the induction agent, will reduce the transcription rate, thereby, improving the solubility and activity of recombinant proteins.
    • Choice of media: Batch culture is the most common method to cultivate cells for recombinant protein expression. All nutrients that are required for growth must be supplied from the beginning by inclusion in the growth medium.
  • Improving protein purification

    • Solubilize and purify the protein in a well-buffered solution containing an ionic strength equivalent to 300–500 mM of a monovalent salt, such as NaCl.
    • Use immobilized metal affinity chromatography (IMAC) as the initial purification step.
    • If additional purification is required, use size-exclusion chromatography (gel filtration). If necessary, use ion exchange chromatography as a final ‘polishing’ step.
    • The affinity tag may be removed to minimize non-native sequences in the recombinant protein and to achieve further purification.