Introduction
Pioneer and its companion tool Altimeter are an open-source and performant solution for analysis of protein MS data acquired by data-independent acquisition (DIA). Poineer includes routines for searching DIA experments from Thermo and Sciex instruments and for building spectral libraries using the Koina interface. Given a spectral library of precursor fragment ion intensities and retention time estimates, Pioneer identifies and quantifies peptides from the library in the data.
Key Features
Isotope-Aware DIA Analysis: Narrow isolation windows distort fragment ion isotope distributions because the quadrupole partially transmits precursor isotopic envelopes. Pioneer addresses this by estimating a quadrupole transmission efficiency function for each scan and re-isotoping library spectra accordingly, using methods from Goldfarb et al.. This correction is critical for accurate matching and quantification in narrow-window DIA.
Altimeter: Collision Energy-Independent Spectral Libraries: Altimeter predicts coefficients for B-splines that model total rather than monoisotopic fragment ion intensities as a function of normalized collision energy (NCE). Evaluating the splines at a given NCE produces a complete spectrum, so a single library works across different instruments and acquisition settings. Pioneer calibrates the optimal NCE per data file automatically.
Intensity-Aware Fragment Index: Pioneer implements a fast fragment index search inspired by MSFragger and Sage. Pioneer's implementation uniquely leverages accurate fragment intensity predictions from in silico libraries—indexing only the highest-ranked fragments—to improve both speed and specificity of candidate identification.
Spectral Deconvolution with Robust Regression: Pioneer explains each observed mass spectrum as a linear combination of template spectra from the library. To reduce quantitative bias from interfering signals in chimeric spectra, Pioneer minimizes the pseudo-Huber loss rather than squared error. For other examples of linear regression applied to DIA analyses, see Specter and Chimerys.
Dual-Window Quantification: In narrow-window DIA, a precursor's isotopic envelope is split across adjacent windows. Pioneer normalizes quantification by the isolated precursor fraction and combines signal from adjacent windows for denser chromatographic sampling and improved quantitative accuracy.
Match Between Runs: Pioneer transfers peptide identifications across runs with false transfer rate (FTR) control, increasing coverage in large-scale experiments.
Spectral Library Prediction via Koina: Using Koina, Pioneer constructs fully predicted spectral libraries from a FASTA file and an internet connection. Pioneer uses Chronologer for retention time prediction and Altimeter for fragment ion intensity prediction.
Performance
- Speed: 2–6x faster than DIA-NN and AlphaDIA on benchmark datasets
- FDR Control: Conservative false discovery rate control validated by entrapment analysis
- Scalability: Memory consumption remains constant as the number of raw files grows, scaling to experiments with hundreds of runs
Current Limitations
- Variable modifications: Only oxidation of methionine (Unimod:35) is currently supported as a variable PTM
- Digestion: Fully enzymatic digestion only (no semi-enzymatic or non-specific searches)
- Interface: Command-line only; no graphical user interface yet
Quick Links
Authors and Development
Pioneer is developed and maintained by:
- Nathan Wamsley (Major Lab/Goldfarb Lab, Washington University)
- Dennis Goldfarb (Goldfarb Lab, Washington University)
Citation
If you use Pioneer or Altimeter in your research, please cite:
Wamsley, N. T., Wilkerson, E. M., Major, M., & Goldfarb, D. "Pioneer and Altimeter: Fast Analysis of DIA Proteomics Data Optimized for Narrow Isolation Windows." bioRxiv (2025). DOI: [forthcoming]
Contact
For questions about Pioneer or to collaborate, please contact:
- Nathan Wamsley (wamsleynathan@gmail.com)
- Dennis Goldfarb (dennis.goldfarb@wustl.edu)
For troubleshooting use the Issues page on GitHub. To critique methods or propose features use the Discussions page.
Exported Methods
Pioneer.BuildSpecLibPioneer.GetBuildLibParamsPioneer.GetSearchParamsPioneer.SearchDIAPioneer.convertMzML
Pioneer.SearchDIA — FunctionSearchDIA(params_path::String)Main entry point for the DIA (Data-Independent Acquisition) search workflow. Executes a series of SearchMethods and generates performance metrics.
Parameters:
- params_path: Path to JSON configuration file containing search parameters
Output:
- Generates a log file in the results directory
- Long and wide-formatted tables (.arrow and .csv) for protein-group and precursor level id's and quantitation.
- Reports timing and memory usage statistics
Example:
julia> SearchDIA("/path/to/config.json")
==========================================================================================
Sarting SearchDIA
==========================================================================================
Starting search at: 2024-12-30T14:01:01.510
Output directory: ./../data/ecoli_test/ecoli_test_results
[ Info: Loading Parameters...
[ Info: Loading Spectral Library...
.
.
.If it does not already exist, SearchDIA creates the user-specified results_dir and generates quality control plots, data tables, and logs.
results_dir/
├── pioneer_search_log.txt
├── qc_plots/
│ ├── collision_energy_alignment/
│ │ └── nce_alignment_plots.pdf
│ ├── quad_transmission_model/
│ │ ├── quad_data
│ │ │ └── quad_data_plots.pdf
│ │ └── quad_models
│ │ └── quad_model_plots.pdf
│ ├── rt_alignment_plots/
│ │ └── rt_alignment_plots.pdf
│ ├── mass_error_plots/
│ │ └── mass_error_plots.pdf
│ └── QC_PLOTS.pdf
├── precursors_long.arrow
├── precursors_long.tsv
├── precursors_wide.arrow
├── precurosrs_wide.tsv
├── protein_groups_long.arrow
├── protein_groups_long.tsv
├── protein_groups_wide.arrow
└── protein_groups_wide.tsvPioneer.GetSearchParams — FunctionGetSearchParams(lib_path::String, ms_data_path::String, results_path::String;
params_path::Union{String, Missing} = missing,
simplified::Bool = true)Creates a search parameter configuration file with user-specified paths.
The function loads default parameters from either the simplified or full JSON template (from assets/example_config/) and creates a customized parameter file with the user's file paths. All other parameters retain their default values and can be modified later.
Arguments:
- lib_path: Path to the spectral library file (.poin)
- msdatapath: Path to the MS data directory
- results_path: Path where search results will be stored
- paramspath: Output path for the parameter file. Can be a directory (creates searchparameters.json) or full file path. Defaults to "search_parameters.json" in current directory.
- simplified: If true (default), uses simplified template with essential parameters only. If false, uses full template with all advanced options.
Returns:
- String: Path to the newly created search parameters file
Templates used:
- Simplified:
defaultSearchParamsSimplified.json(basic parameters) - Full:
defaultSearchParams.json(all advanced parameters)
Example:
# Create simplified parameter file
output_path = GetSearchParams(
"/path/to/speclib.poin",
"/path/to/ms/data/dir",
"/path/to/results/dir"
)
# Create full parameter file with custom output location
output_path = GetSearchParams(
"/path/to/speclib.poin",
"/path/to/ms/data/dir",
"/path/to/results/dir";
params_path = "/custom/path/my_params.json",
simplified = false
)Pioneer.BuildSpecLib — FunctionBuildSpecLib(params_path::String)Main function to build a spectral library from parameters. Executes a series of steps:
- Parameter validation and directory setup
- Fragment bound detection
- Retention time prediction (optional)
- Fragment prediction (optional)
- Library index building
Parameters:
- params_path: Path to JSON configuration file containing library building parameters
Output:
- Generates a spectral library in the specified output directory
- Creates a detailed log file with timing and performance metrics
- Returns nothing
Pioneer.GetBuildLibParams — FunctionGetBuildLibParams(out_dir::String, lib_name::String, fasta_inputs;
params_path::Union{String, Missing} = missing,
regex_codes::Union{Missing, Dict, Vector} = missing,
simplified::Bool = true)Creates a library building parameter configuration file with user-specified paths and FASTA files.
The function loads default parameters from either the simplified or full JSON template (from assets/example_config/) and creates a customized parameter file with the user's paths and automatically discovered FASTA files. All other parameters retain their default values and can be modified later.
Arguments:
- out_dir: Output directory path where the library will be built
- lib_name: Name for the spectral library (used for directory and file naming)
- fasta_inputs: FASTA file specification. Can be:
- A single directory path (String) - searches for .fasta/.fasta.gz files
- A single FASTA file path (String)
- An array of directories and/or FASTA file paths
- paramspath: Output path for the parameter file. Can be a directory (creates buildspeclibparams.json) or full file path. Defaults to "buildspeclib_params.json" in current directory.
- regex_codes: Optional FASTA header regex patterns for protein annotation extraction. Can be:
- A single Dict with keys: "accessions", "genes", "proteins", "organisms" (applied to all FASTA files)
- A Vector of Dicts for positional mapping to fasta_inputs
- If missing, uses default patterns from the template
- simplified: If true (default), uses simplified template with essential parameters only. If false, uses full template with all advanced library building options.
Returns:
- String: Path to the newly created library building parameters file
Templates used:
- Simplified:
defaultBuildLibParamsSimplified.json(basic parameters) - Full:
defaultBuildLibParams.json(all advanced parameters)
The function automatically:
- Discovers FASTA files in specified directories
- Generates appropriate library names from FASTA filenames
- Expands regex patterns to match the number of FASTA files found
- Validates that all specified paths exist and are accessible
Example:
# Create simplified parameter file with directory of FASTA files
output_path = GetBuildLibParams(
"/path/to/output",
"my_library",
"/path/to/fasta/directory"
)
# Create full parameter file with specific FASTA files and custom regex
output_path = GetBuildLibParams(
"/path/to/output",
"my_library",
["/path/to/human.fasta", "/path/to/yeast.fasta"];
params_path = "/custom/path/build_params.json",
regex_codes = Dict("accessions" => "^sp\|(\w+)\|", "genes" => " GN=(\S+)"),
simplified = false
)Pioneer.convertMzML — FunctionconvertMzML(mzml_dir::String; skip_scan_header::Bool=true)Convert mzML mass spectrometry data files to Arrow IPC format.
Takes either a directory containing mzML files or a path to a single mzML file and converts them to Arrow format, preserving scan data including m/z arrays, intensity arrays, and scan metadata.
Arguments
mzml_dir::String: Path to either a directory containing mzML files or a path to a single mzML fileskip_scan_header::Bool=true: When true, omits scan header information from the output to reduce file size
Returns
nothing
Output
Creates Arrow (.arrow) files in the same directory as the input mzML files and with the same base filename.
Examples
# Convert all mzML files in a directory
convertMzML("path/to/mzml/files")
# Convert a single mzML file
convertMzML("path/to/single/file.mzML")
# Include scan headers in output
convertMzML("path/to/mzml/files", skip_scan_header=false)Notes
Each mzML file is converted to a corresponding Arrow IPC (.arrow) file in the same directory. This is particularly useful for Sciex data where direct .wiff/.wiff2 conversion is not supported