Pioneer uses JSON configuration files to control analysis. This guide explains the parameters for both SearchDIA and BuildSpecLib.
Most parameters should not be changed, but the following may need adjustment.
first_search.fragment_settings.min_score: The minimum score determines which fragments must match in the fragment-index search in order for the precursor to pass. Each precursor is awarded a score based on which fragments match the spectrum. The score assigned to each fragment depends on its intensity rank. The default scheme is 8,4,4,2,2,1,1. That is, if the 1st, 3rd, and 7th ranking fragments matched the spectrum, the precursor would be awarded a score of 8+4+1=13. If all 7 of the fragments matched, the precursor would be awarded a score of 22. For normal instrument settings on an Orbitrap or Astral mass analyzer, the mass tolerance is about +/- 5-15 ppm and 15 is a reasonable default score threshold. However, for instruments with less mass accuracy (Sciex ZenoTOF 7600 or different Orbitrap scan settings), the score threshold may need to be set higher, perhaps to 20. It may be worthwhile to test different values when searching data from a new instrument or sample type. In order to pass the first search, a precursor need only pass the threshold and score sufficiently well in at least one of the MS data files.
first_search.fragment_settings.max_rank: Search against only the n'th most abundant fragment for each precursor. Including more fragments can improve performance but increase memory consumption, and the search could take longer. From experience, there are diminishing returns after 25-50 fragments.
quant_search.fragment_settings.max_rank: See above
quant_search.fragment_settings.n_isotopes: If searching with non-Altimeter libraries (not recommended), such as Prosit or UniSpec, this should be set to 1 as the second fragment isotopes will not be calculated accurately.
acquisition.nce: This is the initial guess for the normalized collision energy that will best align the Altimeter Library with the empirical data. Altimeter values should agree with those from Thermo Instruments manufactured in Bremen Germany. If upon inspection of the quality control plots the initial guess is far from the estimated value, it might be possible to improve search results slightly by re-searching with a better initial guess.
acquisition.quad_transmission.fit_from_data: Estimate the quad transmission function from the data. Otherwise defaults to symmetric, smooth function.
optimization.machine_learning.max_psm_memory_mb: Memory budget (in MB) for PSMs held in memory during LightGBM training. Pioneer dynamically estimates how many PSMs fit within this budget based on the column sizes of the Arrow file. Default is 2000 MB.
During LightGBM training, any missing feature values are replaced with the column median. If a column is entirely missing, the values are filled with zero of the appropriate type.
| Parameter | Type | Description |
|---|
isotope_settings.err_bounds_first_pass | [Int, Int] | Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [1, 0]) |
isotope_settings.err_bounds_quant_search | [Int, Int] | Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [3, 0]) |
isotope_settings.combine_traces | Boolean | Whether to combine precursor isotope traces in quantification (default: true) |
isotope_settings.partial_capture | Boolean | Whether to estimate the conditional fragment isotope distribution (true) or assume complete transmission of the entire precursor isotopic envelope (default: true) |
isotope_settings.min_fraction_transmitted | Float | Minimum fraction of the precursor isotope distribution that must be isolated for scoring and quantitation (default: 0.25) |
scoring.q_value_threshold | Float | Global q-value threshold for filtering results (default: 0.01) |
normalization.n_rt_bins | Int | Number of retention time bins for quant normalization (default: 100) |
normalization.spline_n_knots | Int | Number of knots in quant normalization spline (default: 7) |
huber_override.override_huber_delta_fit | Boolean | Whether to override the automatic Huber delta fitting with a manual value (default: false) |
huber_override.huber_delta | Float | Huber delta value when override is enabled (default: 1055) |
ms1_scoring | Boolean | Enable MS1-level scoring features (default: true) |
ms1_quant | Boolean | Enable MS1-level quantification (default: false) |
| Parameter | Type | Description |
|---|
fragment_settings.min_count | Int | Minimum number of matching fragment ions (default: 7) |
fragment_settings.max_rank | Int | Maximum rank of fragments to consider (default: 25, means 26th-last most abundant fragments per precursor are filtered out) |
fragment_settings.min_score | [Int, Int] | Minimum fragment-index score thresholds (default: [22, 17]) |
fragment_settings.min_spectral_contrast | Float | Minimum cosine similarity score (default: 0.5) |
fragment_settings.relative_improvement_threshold | Float | Minimum relative Scribe score improvement needed to ignore an interfering peak (default: 1.25) |
fragment_settings.min_log2_ratio | Float | Minimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 1.5) |
fragment_settings.min_top_n | [Int, Int] | Minimum number of top N matches - [requirement, denominator]. Default: [3, 3] |
fragment_settings.n_isotopes | Int | Number of fragment isotopes to consider in matching (default: 1, mono only) |
fragment_settings.intensity_filter_quantile | Float | Quantile for intensity-based fragment filtering (default: 0.50) |
search_settings.min_samples | Int | Minimum number of PSMs required for tuning (default: 1200) |
search_settings.max_presearch_iters | Int | Maximum number of parameter tuning iterations (default: 10) |
search_settings.frag_err_quantile | Float | Quantile for fragment error estimation (default: 0.005) |
search_settings.max_q_value | Float | Maximum q-value for parameter tuning PSMs (default: 0.01) |
search_settings.topn_peaks | Int | Top N peaks per spectrum to consider (default: 200) |
search_settings.max_frags_for_mass_err_estimation | Int | Maximum fragments used for mass error model (default: 12) |
nce_tuning.min_psms | Int | Minimum PSMs for NCE tuning (default: 2000) |
nce_tuning.initial_percent | Float | Initial sampling percentage for NCE tuning (default: 2.5) |
nce_tuning.min_initial_scans | Int | Minimum initial scans for NCE tuning (default: 5000) |
quad_tuning.min_psms_per_thompson | Int | Minimum PSMs per Thompson width for quad transmission tuning (default: 250) |
quad_tuning.min_fragments | Int | Minimum fragments per PSM for quad tuning (default: 3) |
quad_tuning.initial_percent | Float | Initial sampling percentage for quad tuning (default: 2.5) |
iteration_settings.init_mass_tol_ppm | [Float, Float] | Initial fragment mass tolerance guesses in ppm (default: [20.0, 30.0]) |
iteration_settings.ms1_tol_ppm | Float | Initial MS1 mass tolerance in ppm (default: 20.0) |
iteration_settings.scan_counts | [Int] | Scan counts to sample during parameter tuning (default: [10000]) |
| Parameter | Type | Description |
|---|
fragment_settings.min_count | Int | Minimum number of matching fragments (default: 4) |
fragment_settings.max_rank | Int | Maximum fragment rank to consider (default: 25) |
fragment_settings.min_score | Int | Minimum score for fragment matches (default: 15) |
fragment_settings.min_spectral_contrast | Float | Minimum cosine similarity required (default: 0.5) |
fragment_settings.relative_improvement_threshold | Float | Minimum relative Scribe score improvement needed to ignore an interfering peak (default: 1.25) |
fragment_settings.min_log2_ratio | Float | Minimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 0.0, means sum of matched library fragment intensities is equal to the sum of unmatched library fragment intensities for the precursor) |
fragment_settings.min_top_n | [Int, Int] | Minimum top N matches - [requirement, denominator]. Default: [2, 3] |
fragment_settings.n_isotopes | Int | Number of isotopes to consider (default: 1) |
scoring_settings.n_train_rounds | Int | Number of training rounds for scoring model (default: 2) |
scoring_settings.max_iterations | Int | Maximum iterations for scoring optimization (default: 20) |
scoring_settings.max_q_value_probit_rescore | Float | Maximum q-value threshold for semi-supervised learning during probit regression (default: 0.05) |
scoring_settings.max_PEP | Float | Maximum local FDR threshold for passing the first search (default: 0.9) |
scoring_settings.global_pep_threshold | Float | Maximum global PEP for precursor selection in cross-run aggregation (default: 0.5) |
irt_mapping.max_prob_to_impute_irt | Float | If probability of the PSM is less than x in the first-pass search, then impute iRT for the precursor with globally determined value from the other runs (default: 0.75) |
irt_mapping.fwhm_nstd | Float | Number of standard deviations of the FWHM to add to the retention time tolerance (default: 4) |
irt_mapping.irt_nstd | Int | Number of standard deviations of run-to-run iRT tolerance to add to the retention time tolerance (default: 4) |
irt_mapping.plot_rt_alignment | Boolean | Whether to generate RT alignment diagnostic plots (default: false) |
The first search uses a hybrid filter to decide how many precursors to carry forward from each stage. These are hardcoded constants (not JSON-configurable) because they should rarely need adjustment.
Per-file pre-filter (applied per MS file before cross-run aggregation): keeps the largest of three counts:
- PSMs with PEP ≤ 0.95
- PSMs where cumulative decoy/target ratio ≤
PERFILE_QVALUE_THRESHOLD (0.50) PERFILE_MIN_PSMS (10,000, or all PSMs if fewer exist)
Global post-filter (applied after cross-run global PEP computation): keeps the largest of three counts:
- Precursors with global PEP ≤
global_pep_threshold (default 0.5, JSON-configurable) - Precursors where cumulative decoy/target ratio ≤
GLOBAL_QVALUE_THRESHOLD (0.15) GLOBAL_MIN_PRECURSORS (50,000, or all precursors if fewer exist)
The hard minimum floors ensure sparse datasets (e.g. single-cell proteomics) always retain enough precursors for second-pass scoring, while the q-value floors prevent decoy contamination in large experiments.
| Parameter | Type | Description |
|---|
fragment_settings.min_count | Int | Minimum fragment count for quantification (default: 3) |
fragment_settings.min_y_count | Int | Minimum number of y-ions required (default: 2) |
fragment_settings.max_rank | Int | Maximum fragment rank (default: 255) |
fragment_settings.min_spectral_contrast | Float | Minimum spectral contrast score (default: 0.0) |
fragment_settings.min_log2_ratio | Float | Minimum log2 ratio of intensities (default: -1.7) |
fragment_settings.min_top_n | [Int, Int] | Minimum top N matches - [requirement, denominator]. Default: [2, 3] |
fragment_settings.n_isotopes | Int | Number of isotopes for quantification (default: 2, include the M1 and M2 isotopes) |
chromatogram.smoothing_strength | Float | Strength of chromatogram smoothing (default: 1e-6) |
chromatogram.padding | Int | Number of zeros to pad chromatograms on either side (default: 0) |
chromatogram.max_apex_offset | Int | Maximum allowed apex offset in #scans where the precursor could have been detected between the second-pass search and re-integration with 1 percent FDR precursors (default: 2) |
| Parameter | Type | Description |
|---|
nce | Int | Normalized collision energy initial guess (used in pre-search before NCE tuning) (default: 26) |
quad_transmission.fit_from_data | Boolean | Whether to fit quadrupole transmission from data (default: true) |
quad_transmission.overhang | Float | Deprecated (default: 0.25) |
quad_transmission.smoothness | Float | Smoothness parameter for transmission curve. Higher value means more "box-like" shape. (default: 5.0) |
| Parameter | Type | Description |
|---|
n_bins | Int | Number of retention time bins for alignment (default: 200) |
bandwidth | Float | Bandwidth for kernel density estimation (default: 0.25) |
sigma_tolerance | Int | Number of standard deviations for iRT tolerance after pre-search (default: 4) |
min_probability | Float | Minimum probability for alignment PSMs in pre-search (default: 0.95) |
lambda_penalty | Float | Lambda penalty for spline fitting (default: 0.1) |
ransac_threshold_psms | Int | RANSAC threshold in number of PSMs (default: 500) |
min_psms_for_spline | Int | Minimum PSMs required for spline fitting (default: 10) |
The deconvolution parameters are split into ms1 and ms2 sub-objects for separate control over MS1 and MS2 deconvolution, plus shared iteration settings.
| Parameter | Type | Description |
|---|
deconvolution.ms1.lambda | Float | L2 regularization parameter for MS1 deconvolution (default: 0.0001) |
deconvolution.ms1.reg_type | String | Regularization type for MS1: "none", "l1", or "l2" (default: "l2") |
deconvolution.ms1.huber_delta | Float | Huber delta for MS1 loss function (default: 1e9) |
deconvolution.ms2.lambda | Float | L2 regularization parameter for MS2 deconvolution (default: 0.0) |
deconvolution.ms2.reg_type | String | Regularization type for MS2: "none", "l1", or "l2" (default: "none") |
deconvolution.ms2.huber_delta | Float | Huber delta for MS2 loss function (default: 300) |
deconvolution.huber_exp | Float | Exponent for Huber delta progression (default: 1.5) |
deconvolution.huber_iters | Int | Number of Huber outer iterations (default: 15) |
deconvolution.newton_iters | Int | Maximum Newton iterations per outer iteration (default: 50) |
deconvolution.bisection_iters | Int | Maximum bisection iterations when Newton fails (default: 100) |
deconvolution.outer_iters | Int | Maximum outer iterations for convergence (default: 1000) |
deconvolution.newton_accuracy | Float | Absolute convergence threshold for Newton method (default: 10) |
deconvolution.bisection_accuracy | Float | Absolute convergence threshold for bisection method (default: 10) |
deconvolution.max_diff | Float | Relative convergence threshold - maximum relative change in weights between iterations. Also used as relative tolerance for Newton's method (default: 0.01) |
| Parameter | Type | Description |
|---|
machine_learning.max_psm_memory_mb | Real | Memory budget in MB for PSMs held in memory during LightGBM training. Row count is dynamically estimated from Arrow column sizes (default: 2000) |
machine_learning.force_oom | Boolean | Force out-of-memory processing regardless of dataset size (default: false) |
machine_learning.min_trace_prob | Float | Minimum trace probability threshold (default: 0.75) |
machine_learning.min_PEP_neg_threshold_itr | Float | Minimum posterior error probability threshold for reclassifying weak target PSMs as negatives during the ITR stage of LightGBM rescoring (default: 0.90) |
machine_learning.spline_points | Int | Number of points for probability spline (default: 500) |
machine_learning.interpolation_points | Int | Number of interpolation points (default: 10) |
machine_learning.n_quantile_bins | Int | Number of quantile bins for score binning (default: 25) |
machine_learning.enable_model_comparison | Boolean | Enable comparison of scoring models (default: true) |
machine_learning.validation_split_ratio | Float | Fraction of data held out for validation (default: 0.2) |
machine_learning.qvalue_threshold | Float | q-value threshold for model comparison (default: 0.01) |
machine_learning.min_psms_for_comparison | Int | Minimum PSMs to enable model comparison (default: 1000) |
machine_learning.max_psms_for_comparison | Int | Maximum PSMs for in-memory model comparison (default: 100000) |
| Parameter | Type | Description |
|---|
min_peptides | Int | Minimum number of peptides required for a protein group (default: 1) |
| Parameter | Type | Description |
|---|
run_to_run_normalization | Boolean | Whether to use run-to-run normalized abundances for precursor and protein quantification (default: false) |
max_chunk_size_mb | Int | Maximum chunk size in MB for MaxLFQ chunked merge processing (default: 1024) |
| Parameter | Type | Description |
|---|
write_csv | Boolean | Whether to write results to CSV (default: true) |
write_decoys | Boolean | Whether to quantify and include decoys in the output files (default: false) |
delete_temp | Boolean | Whether to delete temporary files (default: true) |
plots_per_page | Int | Number of plots per page in reports (default: 12) |
| Parameter | Type | Description |
|---|
debug_console_level | Int | Verbosity of console debug output (0 disables; higher values include more details). |
max_message_bytes | Int | Maximum bytes of a single log message before truncation (default: 4096). Truncation preserves valid UTF-8 and appends a suffix like … [truncated N bytes]. Can be overridden at runtime with PIONEER_MAX_LOG_MSG_BYTES (values clamped to [1024, 1048576]). |
| Parameter | Type | Description |
|---|
library | String | Path to spectral library file |
ms_data | String | Path to mass spectrometry data directory |
results | String | Path to output results directory |
Pioneer supports flexible FASTA input through GetBuildLibParams:
- Single directory: Scans for all
.fasta and .fasta.gz files - Single file: Directly uses the specified FASTA file
- Mixed array: Any combination of directories and files
The regex patterns for parsing FASTA headers can be configured in three ways:
Single regex set for all files (default):
GetBuildLibParams(out_dir, lib_name, [dir1, dir2, file1])
# All FASTA files use the same default regex patterns
Custom single regex set:
GetBuildLibParams(out_dir, lib_name, [dir1, file1],
regex_codes = Dict(
"accessions" => "^>(\\S+)",
"genes" => "GN=(\\S+)",
"proteins" => "\\s+(.+?)\\s+OS=",
"organisms" => "OS=(.+?)\\s+GN="
))
# All files use these custom patterns
Positional mapping (one regex set per input):
GetBuildLibParams(out_dir, lib_name, [uniprot_dir, custom_file],
regex_codes = [
Dict("accessions" => "^\\w+\\|(\\w+)\\|", ...), # For uniprot_dir files
Dict("accessions" => "^>(\\S+)", ...) # For custom_file
])
| Parameter | Type | Description |
|---|
min_length | Int | Minimum peptide length (default: 7) |
max_length | Int | Maximum peptide length (default: 30) |
min_charge | Int | Minimum charge state (default: 2) |
max_charge | Int | Maximum charge state (default: 4) |
cleavage_regex | String | Regular expression for cleavage sites (default: "[KR][^_|$]", to exclude cleavage after proline: "[KR][^P|$]") |
missed_cleavages | Int | Maximum allowed missed cleavages (default: 1) |
max_var_mods | Int | Maximum variable modifications per peptide (default: 1) |
add_decoys | Boolean | Generate decoy sequences (default: true) |
entrapment_r | Float | Ratio of entrapment sequences (default: 0) |
decoy_method | String | Method for generating decoy sequences: "shuffle" or "reverse" (default: "shuffle") |
entrapment_method | String | Method for generating entrapment sequences: "shuffle" or "reverse" (default: "shuffle") |
fasta_header_regex_accessions | [String] | Regex with a capture group for the accession, one per FASTA file |
fasta_header_regex_genes | [String] | Regex with a capture group for the gene name, one per FASTA file |
fasta_header_regex_proteins | [String] | Regex with a capture group for the protein name, one per FASTA file |
fasta_header_regex_organisms | [String] | Regex with a capture group for the organism, one per FASTA file |
| Parameter | Type | Description |
|---|
nce | Float | Base normalized collision energy (default: 26.0) |
default_charge | Int | Default charge state for NCE calculations (default: 2) |
dynamic_nce | Boolean | Use charge-dependent NCE adjustments (default: true) |
| Parameter | Type | Description |
|---|
rt_bin_tol | Float | Retention time binning tolerance in minutes (default: 1.0) |
frag_bin_tol_ppm | Float | Fragment mass tolerance in PPM (default: 10.0) |
rank_to_score | [Int] | Intensity multipliers for ranked peaks (default: [8,4,4,2,2,1,1]) |
y_start_index | Int | Starting index for y-ion annotation (default: 4) |
b_start_index | Int | Starting index for b-ion annotation (default: 3) |
y_start | Int | Minimum y-ion to consider (default: 3) |
b_start | Int | Minimum b-ion to consider (default: 2) |
include_p_index | Boolean | Include proline-containing index fragments (default: false) |
include_p | Boolean | Include proline-containing fragments (default: false) |
auto_detect_frag_bounds | Boolean | Auto-detect fragment mass bounds from calibration file (default: true) |
frag_mz_min | Float | Minimum fragment m/z (default: 150.0) |
frag_mz_max | Float | Maximum fragment m/z (default: 2020.0) |
prec_mz_min | Float | Minimum precursor m/z (default: 390.0) |
prec_mz_max | Float | Maximum precursor m/z (default: 1010.0) |
max_frag_charge | Int | Maximum fragment ion charge (default: 3) |
max_frag_rank | Int | Maximum fragment rank (default: 255) |
length_to_frag_count_multiple | Float | Multiplier for peptide length to determine fragment count (default: 2) |
min_frag_intensity | Float | Minimum relative fragment intensity (default: 0.00) |
include_isotope | Boolean | Include isotope peak annotations (default: false) |
include_internal | Boolean | Include internal fragment annotations (default: false) |
include_immonium | Boolean | Include immonium ion annotations (default: false) |
include_neutral_diff | Boolean | Include neutral loss annotations (default: true) |
instrument_type | String | Instrument type for predictions (default: "NONE") |
prediction_model | String | Model for fragment predictions (default: "altimeter") |
| Parameter | Type | Description |
|---|
variable_mods.pattern | [String] | Amino acids to modify (default: ["M"]) |
variable_mods.mass | [Float] | Modification masses (default: [15.99491]) |
variable_mods.name | [String] | Modification identifiers (default: ["Unimod:35"]) |
fixed_mods.pattern | [String] | Amino acids to modify (default: ["C"]) |
fixed_mods.mass | [Float] | Modification masses (default: [57.021464]) |
fixed_mods.name | [String] | Modification identifiers (default: ["Unimod:4"]) |
isotope_mod_groups | [Object] | Isotope labeling groups for multiplexed experiments (default: []) |
| Parameter | Type | Description |
|---|
max_koina_requests | Int | Maximum concurrent Koina API requests (default: 24) |
max_koina_batch | Int | Maximum batch size for API requests (default: 1000) |
match_lib_build_batch | Int | Batch size for library building (default: 100000) |
As of version 0.1.13, Koina API retry warnings are now logged at debug level 2 instead of being shown to users by default. To see retry attempts during debugging, set debug_console_level: 2 in your SearchDIA parameters. The library build will only fail if all retry attempts are exhausted.
| Parameter | Type | Description |
|---|
library_path | String | Output path for the spectral library |
fasta_paths | [String] | List of FASTA file or directory paths |
fasta_names | [String] | Names for each FASTA file |
calibration_raw_file | String | Path to calibration Arrow file for automatic m/z range detection (optional) |
include_contaminants | Boolean | Append a contaminants FASTA to the build (default: true) |
predict_fragments | Boolean | Predict fragment intensities (default: true) |