Parameter Configuration

Pioneer.jl uses JSON configuration files to control analysis. This guide explains the parameters for both SearchDIA and BuildSpecLib functions.

SearchDIA Configuration

Pioneer.jl uses JSON configuration files to control analysis. This guide explains the parameters for both SearchDIA and BuildSpecLib functions.

SearchDIA Configuration

Frequently Modified Parameters

Most parameters should not be changed, but the following may need adjustement.

  • first_search.fragment_settings.min_score: The minimum score determines which fragments must match in the fragment-index search in order for the precursor to pass. Each precursor is awarded a score based on which fragments match the spectrum. The score assigned to each fragment depends on its intensity rank. The default scheme is 8,4,4,2,2,1,1. That is, if the 1st, 3rd, and 7th ranking fragments matched the spectrum, the precursor would be awarded a score of 8+4+1=13. If all 7 of the fragments matched, the precursor would be awarded a score of 22. For normal instrument settings on an Orbitrap or Astral mass analyzer, the mass tolerance is about +/- 5-15 ppm and 15 is a reasonable default score threshold. However, for instruments with less mass accuracy (Sciex ZenoTOF 7600 or different Orbitrap scan settings), the score threshold may need to be set higher, perhaps to 20. It may be worthwile to test different values when searching data from a new instrument or sample type. In order to pass the first search, a precursor need only pass the threshold and score sufficiently well in at least one of the MS data files.

  • first_search.fragment_settings.max_rank: Search against only the n'th most abundant fragment for each precursor. Including more fragments can improve performance but increase memory consumption, and the search could take longer. From experience, there are diminishing returns after 25-50 fragments.

  • quant_search.fragment_settings.max_rank: See above

  • quant_search.fragment_settings.n_isotopes: If searching with non-Altimeter libraries (not recommended), such as Prosit or UniSpec, this should be set to 1 as the second fragment isotopes will not be calculated accurately.

  • acquisition.nce: This is the initial guess for the normalized collision energy that will best align the Altimeter Library with the empirical data. Altimeter values should agree with those from Thermo Instruments manufactured in Bremen Germany. If upon inspection of the quality control plots the initial guess is far from the estimated value, it might be possible to improve search results slightly by re-searching with a better initial guess.

  • acquisition.quad_transmission.fit_from_data: Estimate the quad transmission function from the data. Otherwise defaults to symmetric, smooth function.

  • optimization.machine_learning.max_samples: This is the maximum number of PSMs to use for training the XGBoost model. These PSMs need to comfortably fit in memory in addition to the spectral library. As a rule of thumb, 7M rows is about 1GB. At the default maximum of 50M rows, the PSMs table will consume 7GB of memory.

  • global.isotope_settings.combine_traces: Some precursors may be split accross different acquisition windows. Pioneer refers to these as seperate isotope traces. When set to true, Pioneer does not distinguish between a precursor's isotope traces. They are combined for scoring and quantitation. With a clever acquisition scheme this can increase the number of data points accross chromatographic peaks. This is recomended only for acquisition windows 2-4 m/z. It should also be combined with aquisition.quad_transmission.fit_from_data = true.

Global Parameters

ParameterTypeDescription
isotope_settings.err_bounds_first_pass[Int, Int]Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [1, 0])
isotope_settings.err_bounds_second_pass[Int, Int]Precursor monoisotope may lie NEUTRON/charge Thompsons (left, right) outside the quadrupole isolation window (default: [3, 1])
isotope_settings.combine_tracesBooleanWhether to combine precursor isotope traces in quantification. Experimental, so set to false (default: false)
isotope_settings.partial_captureBooleanWhether to estimate the conditional fragment isotope distribution (true) or assume complete transmission the entire precursor isotopic envelope (default: true)
isotope_settings.min_fraction_transmittedFloatMinimum fraction of the precursor isotope distribution that must be isolated for scoring and quantitation (default: 0.25)
scoring.q_value_thresholdFloatGlobal q-value threshold for filtering results (default: 0.01)
normalization.n_rt_binsIntNumber of retention time bins for quant normalization (default: 100)
normalization.spline_n_knotsIntNumber of knots in quant normalization spline (default: 7)
match_between_runsBooleanWhether to attempt to transfer peptide identifications across runs. Turning this on will add additional features to the XGBoost model (default: true)

Parameter Tuning Settings

ParameterTypeDescription
fragment_settings.min_countIntMinimum number of matching fragment ions (default: 7)
fragment_settings.max_rankIntMaximum rank of fragments to consider (default: 25, means 26th-last most abundant fragments per precursor are filtered out)
fragment_settings.tol_ppmFloatInitial tragment mass tolerance guess in parts per million (default: 20.0, should be set lower for some TOF instruments)
fragment_settings.min_scoreIntMinimum fragment-index score threshold for fragment matches (default: 22)
fragment_settings.min_spectral_contrastFloatMinimum cosine simmilarity score (default: 0.9)
fragment_settings.relative_improvement_thresholdFloatMinimum relative Scribe score improvement needed to ignore an interferring peak (default: 1.25)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 1.5)
fragment_settings.min_top_n[Int, Int]Minimum number of top N matches - [requirement, denominator]. Default: [3, 3]
fragment_settings.n_isotopesIntNumber of fragment isotopes to consider in matching (default: 1, mono only)
search_settings.sample_rateFloatFraction of spectra to sample during parameter tuning (default: 0.02)
search_settings.min_samplesIntMinimum number of samples required for tuning (default: 3500)
search_settings.min_quad_tuning_psmsIntMinimum number of psms required for estimating quad transmission (default: 5000)
search_settings.min_quad_tuning_fragmentsIntMust match at least n fragments to each quad tuning psm (default: 3)
search_settings.max_presearch_itersIntMaximum number of parameter tuning iterations (default: 10)
search_settings.frag_err_quantileFloatQuantile for fragment error estimation (default: 0.01)

First Search Parameters

ParameterTypeDescription
fragment_settings.min_countIntMinimum number of matching fragments (default: 4)
fragment_settings.max_rankIntMaximum fragment rank to consider (default: 50 means 50th-last most abundant fragments per precursor are filtered out)
fragment_settings.min_scoreIntMinimum score for fragment matches (default: 15)
fragment_settings.min_spectral_contrastFloatMinimum cosine simmilarity required (default: 0.5)
fragment_settings.relative_improvement_thresholdFloatMinimum relative Scribe score improvement needed to ignore an interferring peak (default: 1.25)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of matched library fragment intensities to unmatched library fragment intensities (default: 0.0, means sum of matched library fragment intensities is equal to the sum of unmatched library fragment intensities for the precursor )
fragment_settings.min_top_n[Int, Int]Minimum top N matches - [requirement, denominator]. Default: [2, 3]
fragment_settings.n_isotopesIntNumber of isotopes to consider (default: 1)
scoring_settings.n_train_roundsIntNumber of training rounds for scoring model (default: 2)
scoring_settings.max_iterationsIntMaximum iterations for scoring optimization (default: 20)
scoring_settings.max_q_value_probit_rescoreFloatMaximum q-value threshold for semi-supervised learning durning probit regression (default: 0.05)
scoring_settings.max_local_fdrIntMaximum local FDR threshold for passing the first search (default: 1.0)
irt_mapping.max_prob_to_impute_irtIntIf probability of the psm is less then x in the first-pass search, then impute irt for the precursor with globably determined value from the other runs (default: 0.75)
irt_mapping.fwhm_nstdFloatNumber of standard deviations of the fwhm to add to the retention time tolerance (default: 4)
irt_mapping.irt_nstdIntNumber of standard deviations of run-to-run irt tolerance to add to the retention time tolerance (default: 4)

Quantification Search Parameters

ParameterTypeDescription
fragment_settings.min_countIntMinimum fragment count for quantification (default: 3)
fragment_settings.min_y_countIntMinimum number of y-ions required (default: 2)
fragment_settings.max_rankIntMaximum fragment rank (default: 255)
fragment_settings.min_spectral_contrastFloatMinimum spectral contrast score (default: 0.0)
fragment_settings.min_log2_ratioFloatMinimum log2 ratio of intensities (default: -1.7)
fragment_settings.min_top_n[Int, Int]Minimum top N matches - [requirement, denominator]. Default: [2, 3]
fragment_settings.n_isotopesIntNumber of isotopes for quantification (default: 2, include the M1 and M2 isotopes)
chromatogram.smoothing_strengthFloatStrength of chromatogram smoothing (default: 0.0002)
chromatogram.paddingIntNumber of zeros to pad chromatograms on either side (default: 20)
chromatogram.max_apex_offsetIntMaximum allowed apex offset in #scans where the precursor could have been detected between the second-pass search and re-integration with 1 percent FDR precursors (default: 2)

Acquisition Parameters

ParameterTypeDescription
nceIntNormalized collision energy initial guess (used in pre-search before NCE tuning) (default: 25)
quad_transmission.fit_from_dataBooleanWhether to fit quadrupole transmission from data (default: false)
quad_transmission.overhangFloatdeprecated (default: 0.25)
quad_transmission.smoothnessFloatSmoothness parameter for transmission curve. Higher value means more "box-like" shape. (default: 5.0)

RT Alignment Parameters

ParameterTypeDescription
sigma_toleranceIntNumber of standard deviations for irt tolerance after pre-search (default: 4)
min_probabilityFloatMinimum probability for alignment psms in pre-search (default: 0.95)

Optimization Parameters

ParameterTypeDescription
deconvolution.lambdaFloatRegularization parameter for deconvolution (deprecated, not in use) (default: 0.0)
deconvolution.huber_deltaFloatDelta parameter for Huber loss (default: 300)
deconvolution.huber_expFloatExponent for Huber delta progression (default: 2)
deconvolution.huber_itersIntNumber of Huber iterations (default: 15)
deconvolution.newton_itersIntMaximum Newton iterations (default: 100)
deconvolution.newton_accuracyFloatConvergence threshold for Newton method (default: 10)
deconvolution.max_diffFloatMaximum allowed difference in optimization (default: 0.01)
machine_learning.max_samplesIntMaximum number of samples for XGBoost training (default: 5000000)
machine_learning.min_trace_probFloatMinimum trace probability threshold (default: 0.75)
machine_learning.max_q_value_xgboost_rescoreFloatq-value threshold for semi-supervised learning with XGBoost (default: 0.01)
machine_learning.max_q_value_xgboost_mbr_rescoreFloatq-value threshold for match-between-runs candidates during semi-supervised learning with XGBoost (default: 0.20)
machine_learning.spline_pointsIntNumber of points for probability spline (default: 500)
machine_learning.interpolation_pointsIntNumber of interpolation points (default: 10)

Protein Inference Parameters

ParameterTypeDescription
min_peptidesIntMinimum number of peptides required for a protein group (default: 1)

MaxLFQ Parameters

ParameterTypeDescription
run_to_run_normalizationBooleanWhether to use run-to-run normalized abundances for precursor and protein quantification (default: true)

Output Parameters

ParameterTypeDescription
write_csvBooleanWhether to write results to CSV
delete_tempBooleanWhether to delete temporary files
plots_per_pageIntNumber of plots per page in reports (default: 12)

Path Parameters

ParameterTypeDescription
libraryStringPath to spectral library file
ms_dataStringPath to mass spectrometry data directory
resultsStringPath to output results directory

BuildSpecLib Configuration

FASTA Digest Parameters

ParameterTypeDescription
min_lengthIntMinimum peptide length (default: 7)
max_lengthIntMaximum peptide length (default: 30)
min_chargeIntMinimum charge state (default: 2)
max_chargeIntMaximum charge state (default: 4)
cleavage_regexStringRegular expression for cleavage sites (default: "[KR][^_|$]", to exclude cleavage after proline: "[KR][^P
missed_cleavagesIntMaximum allowed missed cleavages (default: 1)
max_var_modsIntMaximum variable modifications per peptide (default: 1)
add_decoysBooleanGenerate decoy sequences (default: true)
entrapment_rFloatRatio of entrapment sequences (default: 0)

NCE Parameters

ParameterTypeDescription
nceFloatBase normalized collision energy (default: 25.0)
default_chargeIntDefault charge state for NCE calculations (default: 2)
dynamic_nceBooleanUse charge-dependent NCE adjustments (default: true)

Library Parameters

ParameterTypeDescription
rt_bin_tolFloatRetention time binning tolerance in minutes (default: 1.0)
frag_bin_tol_ppmFloatFragment mass tolerance in PPM (default: 10.0)
rank_to_score[Int]Intensity multipliers for ranked peaks (default: [8,4,4,2,2,1,1])
y_start_indexIntStarting index for y-ion annotation (default: 4)
b_start_indexIntStarting index for b-ion annotation (default: 3)
y_startIntMinimum y-ion to consider (default: 3)
b_startIntMinimum b-ion to consider (default: 2)
include_p_indexBooleanInclude proline-containing index fragments (default: false)
include_pBooleanInclude proline-containing fragments (default: false)
auto_detect_frag_boundsBooleanAuto-detect fragment mass bounds (default: true)
calibration_raw_fileStringPath to calibration raw file
frag_mz_minFloatMinimum fragment m/z (default: 150.0)
frag_mz_maxFloatMaximum fragment m/z (default: 2020.0)
prec_mz_minFloatMinimum precursor m/z (default: 390.0)
prec_mz_maxFloatMaximum precursor m/z (default: 1010.0)
max_frag_chargeIntMaximum fragment ion charge (default: 3)
max_frag_rankIntMaximum fragment rank (default: 255)
min_frag_intensityFloatMinimum relative fragment intensity (default: 0.00)
include_isotopeBooleanInclude isotope peak annotations (default: false)
include_internalBooleanInclude internal fragment annotations (default: false)
include_immoniumBooleanInclude immonium ion annotations (default: false)
include_neutral_diffBooleanInclude neutral loss annotations (default: true)
instrument_typeStringInstrument type for predictions (default: "NONE")
prediction_modelStringModel for fragment predictions (default: "altimeter")

Modification Parameters

ParameterTypeDescription
variable_mods.pattern[String]Amino acids to modify (default: ["M"])
variable_mods.mass[Float]Modification masses (default: [15.99491])
variable_mods.name[String]Modification identifiers (default: ["Unimod:35"])
fixed_mods.pattern[String]Amino acids to modify (default: ["C"])
fixed_mods.mass[Float]Modification masses (default: [57.021464])
fixed_mods.name[String]Modification identifiers (default: ["Unimod:4"])

Processing Parameters

ParameterTypeDescription
max_koina_requestsIntMaximum concurrent Prosit API requests (default: 24)
max_koina_batchIntMaximum batch size for API requests (default: 1000)
match_lib_build_batchIntBatch size for library building (default: 100000)

Path Parameters

ParameterTypeDescription
fasta_paths[String]List of FASTA file paths
fasta_names[String]Names for each FASTA file
out_dirStringOutput directory path
lib_nameStringBase name for library files
new_lib_nameStringName for updated library files
out_nameStringOutput filename
predict_fragmentsBooleanPredict fragment intensities (default: true)