Proteome Sequencing Technology
Protein sequencing is the process of determining the complete or partial primary structure of amino acid sequences of a protein, which dictates its higher-level structure and influences its function. The determination of the order of amino acids in proteins is fundamental to protein chemistry research. A wealth of information can be obtained by analyzing the amino acid sequences of proteins, and this can be applied to other related fields, such as protein identification, the design of molecular cloning probes, and the development of biological drugs. Since F. Sanger determined the primary structure of insulin in 1953, the primary structures of about 100,000 different proteins are currently known.
Over the years, significant advancements have been made in protein sequencing. Technological advancements in mass spectrometry and next-generation sequencing (NGS) have completely transformed the field, enabling scientists to analyze proteins with unprecedented speed and accuracy. This historic journey underscores the relentless pursuit of biological science knowledge and lays the groundwork for a deep exploration of protein sequencing techniques.
Edman-Based Protein Sequencing
Edman degradation is a classic method for determining the amino acid sequence of a protein. It is based on selectively cleaving the N-terminal amino acid residue from the peptide chain without affecting the rest of the sequence. The cleaved amino acid is then identified, and this process can be sequentially repeated to determine the complete sequence of the protein.
1. Applications
Edman degradation is suitable for determining the N-terminal sequence of a protein or peptide. It is typically used for medium-sized proteins (usually less than 50 amino acids) and is particularly valuable for small to medium-sized proteins.
2. Advantages
(1) Edman degradation provides accurate N-terminal sequencing information.
(2) It has been widely used and established as a reliable protein sequencing method.
3. Limitations
(1) Edman degradation is very time-consuming and laborious, especially for larger proteins, as it involves multiple reaction and analytical cycles.
(2) It is limited to determining the N-terminal sequence and cannot be used to obtain the C-terminal sequence or the complete protein sequence.
(3) The accuracy of Edman degradation can be affected by certain amino acids, such as proline, which may require special handling.
Mass Spectrometry-Based Protein N-Terminal Sequencing
1. Enzymatic Digestion
To determine the N-terminal sequence, proteolytic enzymes like trypsin or chymotrypsin are typically used to enzymatically digest the protein at specific amino acid residues. These enzymes cleave the protein at precise amino acid locations, producing peptides with known N-termini.
2. MALDI-TOF MS Analysis
After enzymatic digestion, the resulting peptide fragments can be analyzed using MALDI-TOF MS (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry). This technique allows for the direct measurement of peptide mass, facilitating the identification of the protein's N-terminus.
3. Applications
Protein N-terminal sequencing by mass spectrometry is a valuable tool for elucidating the N-terminal amino acid sequence of a protein. It is typically used to determine the start point of a protein or to verify its identity.
4. Advantages
Mass spectrometry provides high sensitivity and precision when performing N-terminal sequencing. It is applicable to a wide range of proteins and peptides.
5. Limitations
The accuracy of N-terminal sequencing can be affected by the presence of specific amino acids and post-translational modifications. Moreover, this method only provides information about the N-terminus and does not yield a complete protein sequence.
Mass Spectrometry-Based Protein C-Terminal Sequencing
1. Enzymatic Digestion
Similar to N-terminal sequencing, this strategy involves using proteolytic enzymes to cleave the protein at specific amino acid positions. Enzymes like Lys-C or Glu-C are used to cut the protein at distinct sites, producing peptides with known C-termini.
2. MALDI-TOF MS Analysis
After enzymatic digestion, the resulting peptide fragments are analyzed by MALDI-TOF MS to determine their mass. This direct mass analysis aids in the identification of the C-terminus.
3. Applications
Protein C-terminal sequencing by mass spectrometry is invaluable for revealing the C-terminal amino acid sequence of a protein. It is used to determine the termination point of a protein and verify its integrity.
4. Advantages
Mass spectrometry offers excellent precision and sensitivity for C-terminal sequencing. This technique is applicable to various proteins and peptides.
5. Limitations
Similar to N-terminal sequencing, C-terminal sequencing can be affected by specific amino acids and post-translational modifications. It provides specific information about the C-terminus and does not produce a complete protein sequence.
Full Protein Sequencing
Full protein sequencing aims to reveal the complete amino acid sequence of a protein, rather than just examining its N- or C-terminus.
1. Applications
Full protein sequence determination is crucial for a comprehensive understanding of a protein's amino acid sequence (including identifying any post-translational modifications). It plays a vital role in proteomics for identifying unknown proteins and verifying gene predictions.
2. Advantages
Mass spectrometry-based techniques are effective and versatile, capable of handling proteins of different sizes. They are also sensitive to post-translational modifications, making them valuable for studying protein changes.
3. Limitations
Performing mass spectrometry-based analysis requires specialized equipment and expertise in data analysis. The success of full protein sequence determination depends on the quality of the data and the availability of suitable data interpretation software.
De Novo Protein Sequencing
The aim of de novo sequencing is to determine the complete amino acid sequence of a protein without relying on a known reference sequence. It relies on high-resolution mass spectrometry and computational algorithms.
1. Applications
(1) Identifying new proteins or isoforms.
(2) Characterizing post-translational modifications.
(3) Studying protein variations in disease.
(4) Studying non-model organisms with limited genomic data.
2. Challenges
(1) Computational intensity requires specialized software and hardware.
(2) More challenging for larger or highly modified proteins.
(3) Sequence accuracy may be lower for proteins with repetitive regions or low sequence complexity.
How to order?