Introduction to De Novo Sequencing: Principles, Workflow, and Applications
What Is De Novo Sequencing?
De Novo sequencing refers to a method that infers the sequences of biological macromolecules (such as proteins, DNA, or RNA) directly from experimental data, without relying on any known reference sequences or databases. In the field of proteomics, it specifically denotes the process of acquiring MS/MS fragment data of enzymatically digested peptides using high-resolution mass spectrometry, followed by algorithmic analysis of the fragment ions to reconstruct the amino acid sequence, ultimately assembling the full-length primary structure of a protein.
Technical Principles of De Novo Sequencing
De Novo sequencing fundamentally relies on mass spectrometry platforms, particularly high-resolution MS/MS technology. The typical process involves the following steps:
1. Protein digestion: The target protein is enzymatically cleaved into short peptide fragments using proteases such as trypsin or chymotrypsin;
2. MS/MS acquisition: Fragment ion spectra (b- and y-type ions) for each peptide are collected through LC-MS/MS;
3. Spectrum interpretation: De Novo algorithms analyze the mass differences between peaks to sequentially infer the amino acid residues;
4. Sequence assembly: Results from multiple peptides are aligned and assembled to reconstruct the complete protein sequence;
5. Manual validation and structural modeling: Key regions undergo manual curation and structural verification.
The overall accuracy and sequence coverage of De Novo sequencing are determined by the quality of the fragment ion spectra, the extent of peptide overlap, and the logical consistency of the sequence assembly.
Standard Workflow of De Novo Sequencing
To achieve high-quality De Novo sequencing results, the workflow typically includes the following steps:
1. Sample Preparation and Quality Assessment
A variety of sample types are accepted (e.g., protein solutions, PAGE bands, supernatants), and each is evaluated for concentration, purity, and potential background signal interference. If necessary, Protein A/G affinity purification or gel excision can be performed to ensure optimal sample quality prior to mass spectrometry analysis.
2. Multi-Enzyme Digestion and Peptide Enrichment
Parallel digestion with multiple proteases (e.g., trypsin, chymotrypsin, AspN, LysC) generates a highly overlapping peptide pool, enhancing sequence coverage in structurally challenging regions such as complementarity-determining regions (CDRs).
3. High-Resolution Mass Spectrometry Acquisition
Multiple rounds of HCD, CID, and ETD fragmentation spectra are acquired using platforms such as Orbitrap Fusion Lumos or timsTOF Pro, ensuring fragment ion completeness and high mass accuracy.
4. Sequence Analysis and Assembly Reconstruction
Outputs from multiple De Novo sequencing algorithms (e.g., PEAKS Studio, pNovo) are integrated, and in-house assembly tools are employed to reconstruct full-length protein sequences. Structural databases are consulted to assign peptide fragments to their corresponding regions.
5. Structural Modeling and Expression Validation (Optional)
Follow-up services may include cDNA back-translation, structural modeling, and expression template construction, thus completing the workflow from protein identification to functional validation.
Applications of De Novo Sequencing
1. Antibody Analysis When Gene Sequences Are Unavailable
This is commonly encountered in animal-derived immune antibodies, clinical serum antibodies, or legacy antibodies whose expression vectors have been lost. De Novo sequencing enables direct reconstruction of light and heavy chain sequences from the protein itself, facilitating recombinant expression and supporting patent development.
2. Proteomics Research in Non-Model Organisms
In studies involving traditional medicinal materials, marine organisms, or microorganisms, protein identification using conventional methods is often hindered by incomplete or missing database annotations. In such cases, De Novo sequencing is essential for reconstructing protein sequences.
3. Discovery of Cancer Neoantigens and Mutated Proteins
De Novo sequencing allows for the identification of tumor-specific mutations and novel fusion proteins without relying on genomic data, providing precise targets for personalized immunotherapy.
4. Consistency Verification and Modification Profiling of Protein Therapeutics
It enables comparison of protein structures across different production batches, expression systems, or processing conditions, and facilitates the detection of subtle mutations or variations in post-translational modifications (PTMs).
5. Known Proteins With Structural Isomers
For proteins exhibiting structural isoforms such as glycosylation variants, oxidative forms, or cleavage products, De Novo sequencing combined with modification analysis can accurately reconstruct the native expression state, aiding in both quality control and functional assessment.
As life sciences advance toward the deep integration of structure and function, De Novo sequencing serves as a crucial bridge between protein identity and functional application. It empowers researchers to overcome database limitations and reconstruct complete structural information directly from proteins, shifting the paradigm of protein research from mere identification to functional reconstruction. Whether you aim to resolve the structure of a natural antibody, elucidate the function of an unknown protein, or verify the identity of a protein therapeutic, De Novo sequencing stands out as a key technical option for your project. If you are planning a related study or designing a project, feel free to contact the technical consultants at MtoZ Biolabs. We will provide you with expert technical advice and sample evaluation services.
MtoZ Biolabs, an integrated chromatography and mass spectrometry (MS) services provider.
Related Services
How to order?