De Novo Protein Sequencing: How to Decode Unknown Proteins?

    In proteomics, clinical sample analysis, and studies involving non-model organisms, researchers increasingly face a recurring challenge: peptide fragments identified through mass spectrometry fail to match any known entries in existing databases. These sequences, termed “unknown proteins” or “orphan peptides,” may originate from:

    • Previously unannotated proteins

    • Novel splice variants

    • Pathogen-derived, tumor-specific, or exogenously expressed proteins

    • Sequence deviations introduced by post-translational modifications or mutations

     

    Conventional database-dependent search tools (e.g., Mascot, MaxQuant), which rely on precompiled FASTA reference databases, exhibit limited capability in identifying such proteins. In these cases, De Novo protein sequencing emerges as an essential approach.

     

    What Is De Novo Protein Sequencing? How Does It Differ from Database Search?

    De Novo sequencing (also known as “from scratch” sequencing) refers to a method that directly deduces protein amino acid sequences from MS/MS fragmentation data without relying on any pre-existing sequence database. Compared to database search methods, key differences include:

     

    de-novo-protein-sequencing-how-to-decode-unknown-proteins-1

     

    What Types of Research Require De Novo Sequencing to Decode Unknown Proteins?

    1. Research on Non-Model Organisms

    In studies involving well-established model organisms such as mice and humans, protein databases are typically comprehensive and well-annotated. However, for non-model organisms—including various fish species, plants, and microorganisms—genomic data are often incomplete or inaccurately annotated, resulting in low identification rates in database-driven searches. Therefore, De Novo sequencing is essential for obtaining accurate protein sequences in such cases.

     

    2. Tumor Neoantigen Screening

    Mutated or fusion-derived proteins often contain single amino acid alterations that prevent their identification through standard database searches. De Novo sequencing enables direct detection of these mutation sites at the mass spectrometry level, making it a critical tool in cancer immunotherapy for identifying novel neoantigens.

     

    3. Characterization of Proteins from Exogenous Expression or Unknown Sources

    In complex biological products such as traditional medicine formulations, natural extracts, or recombinant expression systems, the exact protein composition is often unknown, and no corresponding genomic information is available. In such scenarios, De Novo sequencing provides the only viable route for structural characterization of these proteins.

     

    4. Identification of Post-Translationally Modified Proteins

    Standard database-based search methods struggle to identify peptide sequences that contain novel or uncharacterized post-translational modifications. De Novo sequencing, when combined with specialized modification-detection algorithms, allows simultaneous determination of both peptide sequences and their corresponding modification sites.

     

    What Are the Key Technical Challenges in Decoding Unknown Proteins?

    Challenge 1: Complex Fragmentation Patterns Complicate Algorithmic Interpretation

    Mass spectrometry data often exhibit complications such as neutral loss, co-elution of isomeric amino acids (e.g., Isoleucine/Leucine), and overlapping signals from post-translational modifications. These factors hinder accurate peptide assembly by automated algorithms.

     

    MtoZ Biolabs Solutions:

    • Use of multi-enzyme digestion strategies (e.g., Trypsin combined with Chymotrypsin) to generate overlapping peptide fragments

    • Acquisition of high-resolution MS/MS data using instruments such as the Orbitrap Fusion Lumos

    • Manual spectrum validation and structural modeling to correct potential misassemblies

     

    Challenge 2: Insufficient Sequence Coverage Prevents Full-Length Reconstruction

    Low expression levels or inefficient enzymatic digestion of certain proteins may lead to insufficient peptide coverage, making it difficult to reconstruct the complete protein sequence.

     

    MtoZ Biolabs Solutions:

    • Application of multi-round digestion combined with enrichment techniques (e.g., high-pH reverse-phase chromatography for peptide pre-fractionation)

    • Integration of multi-omics data, such as transcriptomic validation

    • Employment of inference-based assembly algorithms and sequence similarity scoring models to enhance sequence reconstruction quality

     

    Challenge 3: Lack of Functional Annotation Impedes Protein Identification

    Even with a successfully reconstructed sequence, determining the protein’s function remains a significant challenge in the absence of annotation.

     

    MtoZ Biolabs Solutions:

    • Homology-based analysis using tools such as BLAST to infer structure and functional domains

    • In silico prediction of secondary and tertiary structures to assess potential functional categories, such as enzymes, signaling proteins, or antigens

    • Experimental validation of expression and function, including activity assays and ELISA-based methods

     

    How Does MtoZ Biolabs Facilitate the Structural Elucidation of Unknown Proteins?

    We offer a comprehensive De Novo protein sequencing workflow, from sample preparation to functional validation, including:

    1. Technical Platform Support

    • High-resolution mass spectrometry using Orbitrap Fusion Lumos and timsTOF Pro

    • Multi-enzyme digestion combined with fractionation-based enrichment

    • Parallel processing through widely adopted De Novo sequencing algorithms such as PEAKS, pNovo, and Novor

     

    2. Proprietary Algorithms and Expert Review

    • In-house developed modules for sequence assembly optimization and post-translational modification integration

    • Homology-based annotation and functional scoring systems specifically designed for unknown proteins

    • Manual verification of MS/MS results by a team of PhD-level scientists for every project

     

    3. Deliverables

    • Full-length De Novo protein sequences

    • Peptide coverage maps with annotated modifications

    • BLAST alignment reports accompanied by functional prediction annotations

    • Optional services including expression validation and bioactivity assays

     

    As increasingly complex biological samples, non-model organisms, and mutated proteins become central to current research, De Novo protein sequencing is evolving from a niche technology into an indispensable tool for advancing life sciences. Leveraging a robust mass spectrometry platform and extensive experience in unknown protein characterization, MtoZ Biolabs provides dependable support for your research—enabling the decoding of uncharacterized proteins from experimental data and revealing novel functions through structural insights. If you are working on the identification of unknown proteins or aim to characterize the structure of unannotated protein therapeutics, we welcome you to contact the MtoZ Biolabs technical team for sample evaluation and customized sequencing strategies.

     

    MtoZ Biolabs, an integrated chromatography and mass spectrometry (MS) services provider.

    Related Services

Submit Inquiry
Name *
Email Address *
Phone Number
Inquiry Project
Project Description *

 

How to order?


/assets/images/icon/icon-message.png

Submit Inquiry

/assets/images/icon/icon-return.png