Parameter Configuration

Pioneer uses JSON configuration files to control analysis. This guide explains the parameters for both SearchDIA and BuildSpecLib.

SearchDIA Configuration

Frequently Modified Parameters

Most parameters should not be changed, but the following may need adjustment.

  • first_search.fragment_settings.min_score: The minimum score determines which fragments must match in the fragment-index search in order for the precursor to pass. Each precursor is awarded a score based on which fragments match the spectrum. The score assigned to each fragment depends on its intensity rank. The default scheme is 8,4,4,2,2,1,1. That is, if the 1st, 3rd, and 7th ranking fragments matched the spectrum, the precursor would be awarded a score of 8+4+1=13. If all 7 of the fragments matched, the precursor would be awarded a score of 22. For normal instrument settings on an Orbitrap or Astral mass analyzer, the mass tolerance is about +/- 5-15 ppm and 15 is a reasonable default score threshold. However, for instruments with less mass accuracy (Sciex ZenoTOF 7600 or different Orbitrap scan settings), the score threshold may need to be set higher, perhaps to 20. It may be worthwhile to test different values when searching data from a new instrument or sample type. In order to pass the first search, a precursor need only pass the threshold and score sufficiently well in at least one of the MS data files.

  • first_search.fragment_settings.max_rank: Search against only the n'th most abundant fragment for each precursor. Including more fragments can improve performance but increase memory consumption, and the search could take longer. From experience, there are diminishing returns after 25-50 fragments.

  • quant_search.fragment_settings.max_rank: See above

  • quant_search.fragment_settings.n_isotopes: If searching with non-Altimeter libraries (not recommended), such as Prosit or UniSpec, this should be set to 1 as the second fragment isotopes will not be calculated accurately.

  • acquisition.nce: This is the initial guess for the normalized collision energy that will best align the Altimeter Library with the empirical data. Altimeter values should agree with those from Thermo Instruments manufactured in Bremen Germany. If upon inspection of the quality control plots the initial guess is far from the estimated value, it might be possible to improve search results slightly by re-searching with a better initial guess.

  • acquisition.quad_transmission.fit_from_data: Estimate the quad transmission function from the data. Otherwise defaults to symmetric, smooth function.

  • optimization.machine_learning.max_psm_memory_mb: Memory budget (in MB) for PSMs held in memory during LightGBM training. Pioneer dynamically estimates how many PSMs fit within this budget based on the column sizes of the Arrow file. Default is 2000 MB.

  • During LightGBM training, any missing feature values are replaced with the column median. If a column is entirely missing, the values are filled with zero of the appropriate type.

Global Parameters

ParameterTypeDescription
isotope_settings.err_bounds_first_pass[Int, Int]Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [1, 0])
isotope_settings.err_bounds_quant_search[Int, Int]Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [3, 0])
isotope_settings.combine_tracesBooleanWhether to combine precursor isotope traces in quantification (default: true)
isotope_settings.partial_captureBooleanWhether to estimate the conditional fragment isotope distribution (true) or assume complete transmission of the entire precursor isotopic envelope (default: true)
isotope_settings.min_fraction_transmittedFloatMinimum fraction of the precursor isotope distribution that must be isolated for scoring and quantitation (default: 0.25)
scoring.q_value_thresholdFloatGlobal q-value threshold for filtering results (default: 0.01)
normalization.n_rt_binsIntNumber of retention time bins for quant normalization (default: 100)
normalization.spline_n_knotsIntNumber of knots in quant normalization spline (default: 7)
huber_override.override_huber_delta_fitBooleanWhether to override the automatic Huber delta fitting with a manual value (default: false)
huber_override.huber_deltaFloatHuber delta value when override is enabled (default: 1055)
ms1_scoringBooleanEnable MS1-level scoring features (default: true)
ms1_quantBooleanEnable MS1-level quantification (default: false)

Parameter Tuning Settings

ParameterTypeDescription
fragment_settings.min_countIntMinimum number of matching fragment ions (default: 7)
fragment_settings.max_rankIntMaximum rank of fragments to consider (default: 25, means 26th-last most abundant fragments per precursor are filtered out)
fragment_settings.min_score[Int, Int]Minimum fragment-index score thresholds (default: [22, 17])
fragment_settings.min_spectral_contrastFloatMinimum cosine similarity score (default: 0.5)
fragment_settings.relative_improvement_thresholdFloatMinimum relative Scribe score improvement needed to ignore an interfering peak (default: 1.25)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 1.5)
fragment_settings.min_top_n[Int, Int]Minimum number of top N matches - [requirement, denominator]. Default: [3, 3]
fragment_settings.n_isotopesIntNumber of fragment isotopes to consider in matching (default: 1, mono only)
fragment_settings.intensity_filter_quantileFloatQuantile for intensity-based fragment filtering (default: 0.50)
search_settings.min_samplesIntMinimum number of PSMs required for tuning (default: 1200)
search_settings.max_presearch_itersIntMaximum number of parameter tuning iterations (default: 10)
search_settings.frag_err_quantileFloatQuantile for fragment error estimation (default: 0.005)
search_settings.max_q_valueFloatMaximum q-value for parameter tuning PSMs (default: 0.01)
search_settings.topn_peaksIntTop N peaks per spectrum to consider (default: 200)
search_settings.max_frags_for_mass_err_estimationIntMaximum fragments used for mass error model (default: 12)
nce_tuning.min_psmsIntMinimum PSMs for NCE tuning (default: 2000)
nce_tuning.initial_percentFloatInitial sampling percentage for NCE tuning (default: 2.5)
nce_tuning.min_initial_scansIntMinimum initial scans for NCE tuning (default: 5000)
quad_tuning.min_psms_per_thompsonIntMinimum PSMs per Thompson width for quad transmission tuning (default: 250)
quad_tuning.min_fragmentsIntMinimum fragments per PSM for quad tuning (default: 3)
quad_tuning.initial_percentFloatInitial sampling percentage for quad tuning (default: 2.5)
iteration_settings.init_mass_tol_ppm[Float, Float]Initial fragment mass tolerance guesses in ppm (default: [20.0, 30.0])
iteration_settings.ms1_tol_ppmFloatInitial MS1 mass tolerance in ppm (default: 20.0)
iteration_settings.scan_counts[Int]Scan counts to sample during parameter tuning (default: [10000])

First Search Parameters

ParameterTypeDescription
fragment_settings.min_countIntMinimum number of matching fragments (default: 4)
fragment_settings.max_rankIntMaximum fragment rank to consider (default: 25)
fragment_settings.min_scoreIntMinimum score for fragment matches (default: 15)
fragment_settings.min_spectral_contrastFloatMinimum cosine similarity required (default: 0.5)
fragment_settings.relative_improvement_thresholdFloatMinimum relative Scribe score improvement needed to ignore an interfering peak (default: 1.25)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 0.0, means sum of matched library fragment intensities is equal to the sum of unmatched library fragment intensities for the precursor)
fragment_settings.min_top_n[Int, Int]Minimum top N matches - [requirement, denominator]. Default: [2, 3]
fragment_settings.n_isotopesIntNumber of isotopes to consider (default: 1)
scoring_settings.n_train_roundsIntNumber of training rounds for scoring model (default: 2)
scoring_settings.max_iterationsIntMaximum iterations for scoring optimization (default: 20)
scoring_settings.max_q_value_probit_rescoreFloatMaximum q-value threshold for semi-supervised learning during probit regression (default: 0.05)
scoring_settings.max_PEPFloatMaximum local FDR threshold for passing the first search (default: 0.9)
scoring_settings.global_pep_thresholdFloatMaximum global PEP for precursor selection in cross-run aggregation (default: 0.5)
irt_mapping.max_prob_to_impute_irtFloatIf probability of the PSM is less than x in the first-pass search, then impute iRT for the precursor with globally determined value from the other runs (default: 0.75)
irt_mapping.fwhm_nstdFloatNumber of standard deviations of the FWHM to add to the retention time tolerance (default: 4)
irt_mapping.irt_nstdIntNumber of standard deviations of run-to-run iRT tolerance to add to the retention time tolerance (default: 4)
irt_mapping.plot_rt_alignmentBooleanWhether to generate RT alignment diagnostic plots (default: false)

First Search Filter Constants

The first search uses a hybrid filter to decide how many precursors to carry forward from each stage. These are hardcoded constants (not JSON-configurable) because they should rarely need adjustment.

Per-file pre-filter (applied per MS file before cross-run aggregation): keeps the largest of three counts:

  • PSMs with PEP ≤ 0.95
  • PSMs where cumulative decoy/target ratio ≤ PERFILE_QVALUE_THRESHOLD (0.50)
  • PERFILE_MIN_PSMS (10,000, or all PSMs if fewer exist)

Global post-filter (applied after cross-run global PEP computation): keeps the largest of three counts:

  • Precursors with global PEP ≤ global_pep_threshold (default 0.5, JSON-configurable)
  • Precursors where cumulative decoy/target ratio ≤ GLOBAL_QVALUE_THRESHOLD (0.15)
  • GLOBAL_MIN_PRECURSORS (50,000, or all precursors if fewer exist)

The hard minimum floors ensure sparse datasets (e.g. single-cell proteomics) always retain enough precursors for second-pass scoring, while the q-value floors prevent decoy contamination in large experiments.

Quantification Search Parameters

ParameterTypeDescription
fragment_settings.min_countIntMinimum fragment count for quantification (default: 3)
fragment_settings.min_y_countIntMinimum number of y-ions required (default: 2)
fragment_settings.max_rankIntMaximum fragment rank (default: 255)
fragment_settings.min_spectral_contrastFloatMinimum spectral contrast score (default: 0.0)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of intensities (default: -1.7)
fragment_settings.min_top_n[Int, Int]Minimum top N matches - [requirement, denominator]. Default: [2, 3]
fragment_settings.n_isotopesIntNumber of isotopes for quantification (default: 2, include the M1 and M2 isotopes)
chromatogram.smoothing_strengthFloatStrength of chromatogram smoothing (default: 1e-6)
chromatogram.paddingIntNumber of zeros to pad chromatograms on either side (default: 0)
chromatogram.max_apex_offsetIntMaximum allowed apex offset in #scans where the precursor could have been detected between the second-pass search and re-integration with 1 percent FDR precursors (default: 2)

Acquisition Parameters

ParameterTypeDescription
nceIntNormalized collision energy initial guess (used in pre-search before NCE tuning) (default: 26)
quad_transmission.fit_from_dataBooleanWhether to fit quadrupole transmission from data (default: true)
quad_transmission.overhangFloatDeprecated (default: 0.25)
quad_transmission.smoothnessFloatSmoothness parameter for transmission curve. Higher value means more "box-like" shape. (default: 5.0)

RT Alignment Parameters

ParameterTypeDescription
n_binsIntNumber of retention time bins for alignment (default: 200)
bandwidthFloatBandwidth for kernel density estimation (default: 0.25)
sigma_toleranceIntNumber of standard deviations for iRT tolerance after pre-search (default: 4)
min_probabilityFloatMinimum probability for alignment PSMs in pre-search (default: 0.95)
lambda_penaltyFloatLambda penalty for spline fitting (default: 0.1)
ransac_threshold_psmsIntRANSAC threshold in number of PSMs (default: 500)
min_psms_for_splineIntMinimum PSMs required for spline fitting (default: 10)

Optimization Parameters

Deconvolution

The deconvolution parameters are split into ms1 and ms2 sub-objects for separate control over MS1 and MS2 deconvolution, plus shared iteration settings.

ParameterTypeDescription
deconvolution.ms1.lambdaFloatL2 regularization parameter for MS1 deconvolution (default: 0.0001)
deconvolution.ms1.reg_typeStringRegularization type for MS1: "none", "l1", or "l2" (default: "l2")
deconvolution.ms1.huber_deltaFloatHuber delta for MS1 loss function (default: 1e9)
deconvolution.ms2.lambdaFloatL2 regularization parameter for MS2 deconvolution (default: 0.0)
deconvolution.ms2.reg_typeStringRegularization type for MS2: "none", "l1", or "l2" (default: "none")
deconvolution.ms2.huber_deltaFloatHuber delta for MS2 loss function (default: 300)
deconvolution.huber_expFloatExponent for Huber delta progression (default: 1.5)
deconvolution.huber_itersIntNumber of Huber outer iterations (default: 15)
deconvolution.newton_itersIntMaximum Newton iterations per outer iteration (default: 50)
deconvolution.bisection_itersIntMaximum bisection iterations when Newton fails (default: 100)
deconvolution.outer_itersIntMaximum outer iterations for convergence (default: 1000)
deconvolution.newton_accuracyFloatAbsolute convergence threshold for Newton method (default: 10)
deconvolution.bisection_accuracyFloatAbsolute convergence threshold for bisection method (default: 10)
deconvolution.max_diffFloatRelative convergence threshold - maximum relative change in weights between iterations. Also used as relative tolerance for Newton's method (default: 0.01)

Machine Learning

ParameterTypeDescription
machine_learning.max_psm_memory_mbRealMemory budget in MB for PSMs held in memory during LightGBM training. Row count is dynamically estimated from Arrow column sizes (default: 2000)
machine_learning.force_oomBooleanForce out-of-memory processing regardless of dataset size (default: false)
machine_learning.min_trace_probFloatMinimum trace probability threshold (default: 0.75)
machine_learning.min_PEP_neg_threshold_itrFloatMinimum posterior error probability threshold for reclassifying weak target PSMs as negatives during the ITR stage of LightGBM rescoring (default: 0.90)
machine_learning.spline_pointsIntNumber of points for probability spline (default: 500)
machine_learning.interpolation_pointsIntNumber of interpolation points (default: 10)
machine_learning.n_quantile_binsIntNumber of quantile bins for score binning (default: 25)
machine_learning.enable_model_comparisonBooleanEnable comparison of scoring models (default: true)
machine_learning.validation_split_ratioFloatFraction of data held out for validation (default: 0.2)
machine_learning.qvalue_thresholdFloatq-value threshold for model comparison (default: 0.01)
machine_learning.min_psms_for_comparisonIntMinimum PSMs to enable model comparison (default: 1000)
machine_learning.max_psms_for_comparisonIntMaximum PSMs for in-memory model comparison (default: 100000)

Protein Inference Parameters

ParameterTypeDescription
min_peptidesIntMinimum number of peptides required for a protein group (default: 1)

MaxLFQ Parameters

ParameterTypeDescription
run_to_run_normalizationBooleanWhether to use run-to-run normalized abundances for precursor and protein quantification (default: false)
max_chunk_size_mbIntMaximum chunk size in MB for MaxLFQ chunked merge processing (default: 1024)

Output Parameters

ParameterTypeDescription
write_csvBooleanWhether to write results to CSV (default: true)
write_decoysBooleanWhether to quantify and include decoys in the output files (default: false)
delete_tempBooleanWhether to delete temporary files (default: true)
plots_per_pageIntNumber of plots per page in reports (default: 12)

Logging Parameters

ParameterTypeDescription
debug_console_levelIntVerbosity of console debug output (0 disables; higher values include more details).
max_message_bytesIntMaximum bytes of a single log message before truncation (default: 4096). Truncation preserves valid UTF-8 and appends a suffix like … [truncated N bytes]. Can be overridden at runtime with PIONEER_MAX_LOG_MSG_BYTES (values clamped to [1024, 1048576]).

Path Parameters

ParameterTypeDescription
libraryStringPath to spectral library file
ms_dataStringPath to mass spectrometry data directory
resultsStringPath to output results directory

BuildSpecLib Configuration

FASTA Input and Regex Mapping

Pioneer supports flexible FASTA input through GetBuildLibParams:

Input Options

  1. Single directory: Scans for all .fasta and .fasta.gz files
  2. Single file: Directly uses the specified FASTA file
  3. Mixed array: Any combination of directories and files

Regex Code Mapping

The regex patterns for parsing FASTA headers can be configured in three ways:

  1. Single regex set for all files (default):

    GetBuildLibParams(out_dir, lib_name, [dir1, dir2, file1])
    # All FASTA files use the same default regex patterns
  2. Custom single regex set:

    GetBuildLibParams(out_dir, lib_name, [dir1, file1],
        regex_codes = Dict(
            "accessions" => "^>(\\S+)",
            "genes" => "GN=(\\S+)",
            "proteins" => "\\s+(.+?)\\s+OS=",
            "organisms" => "OS=(.+?)\\s+GN="
        ))
    # All files use these custom patterns
  3. Positional mapping (one regex set per input):

    GetBuildLibParams(out_dir, lib_name, [uniprot_dir, custom_file],
        regex_codes = [
            Dict("accessions" => "^\\w+\\|(\\w+)\\|", ...),  # For uniprot_dir files
            Dict("accessions" => "^>(\\S+)", ...)             # For custom_file
        ])

FASTA Digest Parameters

ParameterTypeDescription
min_lengthIntMinimum peptide length (default: 7)
max_lengthIntMaximum peptide length (default: 30)
min_chargeIntMinimum charge state (default: 2)
max_chargeIntMaximum charge state (default: 4)
cleavage_regexStringRegular expression for cleavage sites (default: "[KR][^_|$]", to exclude cleavage after proline: "[KR][^P|$]")
missed_cleavagesIntMaximum allowed missed cleavages (default: 1)
max_var_modsIntMaximum variable modifications per peptide (default: 1)
add_decoysBooleanGenerate decoy sequences (default: true)
entrapment_rFloatRatio of entrapment sequences (default: 0)
decoy_methodStringMethod for generating decoy sequences: "shuffle" or "reverse" (default: "shuffle")
entrapment_methodStringMethod for generating entrapment sequences: "shuffle" or "reverse" (default: "shuffle")
fasta_header_regex_accessions[String]Regex with a capture group for the accession, one per FASTA file
fasta_header_regex_genes[String]Regex with a capture group for the gene name, one per FASTA file
fasta_header_regex_proteins[String]Regex with a capture group for the protein name, one per FASTA file
fasta_header_regex_organisms[String]Regex with a capture group for the organism, one per FASTA file

NCE Parameters

ParameterTypeDescription
nceFloatBase normalized collision energy (default: 26.0)
default_chargeIntDefault charge state for NCE calculations (default: 2)
dynamic_nceBooleanUse charge-dependent NCE adjustments (default: true)

Library Parameters

ParameterTypeDescription
rt_bin_tolFloatRetention time binning tolerance in minutes (default: 1.0)
frag_bin_tol_ppmFloatFragment mass tolerance in PPM (default: 10.0)
rank_to_score[Int]Intensity multipliers for ranked peaks (default: [8,4,4,2,2,1,1])
y_start_indexIntStarting index for y-ion annotation (default: 4)
b_start_indexIntStarting index for b-ion annotation (default: 3)
y_startIntMinimum y-ion to consider (default: 3)
b_startIntMinimum b-ion to consider (default: 2)
include_p_indexBooleanInclude proline-containing index fragments (default: false)
include_pBooleanInclude proline-containing fragments (default: false)
auto_detect_frag_boundsBooleanAuto-detect fragment mass bounds from calibration file (default: true)
frag_mz_minFloatMinimum fragment m/z (default: 150.0)
frag_mz_maxFloatMaximum fragment m/z (default: 2020.0)
prec_mz_minFloatMinimum precursor m/z (default: 390.0)
prec_mz_maxFloatMaximum precursor m/z (default: 1010.0)
max_frag_chargeIntMaximum fragment ion charge (default: 3)
max_frag_rankIntMaximum fragment rank (default: 255)
length_to_frag_count_multipleFloatMultiplier for peptide length to determine fragment count (default: 2)
min_frag_intensityFloatMinimum relative fragment intensity (default: 0.00)
include_isotopeBooleanInclude isotope peak annotations (default: false)
include_internalBooleanInclude internal fragment annotations (default: false)
include_immoniumBooleanInclude immonium ion annotations (default: false)
include_neutral_diffBooleanInclude neutral loss annotations (default: true)
instrument_typeStringInstrument type for predictions (default: "NONE")
prediction_modelStringModel for fragment predictions (default: "altimeter")

Modification Parameters

ParameterTypeDescription
variable_mods.pattern[String]Amino acids to modify (default: ["M"])
variable_mods.mass[Float]Modification masses (default: [15.99491])
variable_mods.name[String]Modification identifiers (default: ["Unimod:35"])
fixed_mods.pattern[String]Amino acids to modify (default: ["C"])
fixed_mods.mass[Float]Modification masses (default: [57.021464])
fixed_mods.name[String]Modification identifiers (default: ["Unimod:4"])
isotope_mod_groups[Object]Isotope labeling groups for multiplexed experiments (default: [])

Processing Parameters

ParameterTypeDescription
max_koina_requestsIntMaximum concurrent Koina API requests (default: 24)
max_koina_batchIntMaximum batch size for API requests (default: 1000)
match_lib_build_batchIntBatch size for library building (default: 100000)
Koina API Retry Behavior

As of version 0.1.13, Koina API retry warnings are now logged at debug level 2 instead of being shown to users by default. To see retry attempts during debugging, set debug_console_level: 2 in your SearchDIA parameters. The library build will only fail if all retry attempts are exhausted.

Path Parameters

ParameterTypeDescription
library_pathStringOutput path for the spectral library
fasta_paths[String]List of FASTA file or directory paths
fasta_names[String]Names for each FASTA file
calibration_raw_fileStringPath to calibration Arrow file for automatic m/z range detection (optional)
include_contaminantsBooleanAppend a contaminants FASTA to the build (default: true)
predict_fragmentsBooleanPredict fragment intensities (default: true)