Using pygwb_pipe
: a quickstart manual
The various modules of the pygwb
package can be combined into a pipeline, as done in the pygwb_pipe
script. This script
takes the data as input and outputs an estimator of the point estimate and variance of the gravitational-wave background (GWB) for these
data. More information on how the various modules interact and are combined into a pipeline can be found in the pygwb paper.
Note
The proposed pygwb_pipe
pipeline is only one of the many ways to assemble the pygwb
modules, and users should
feel free to create their own pipeline, that addresses their needs.
1. Script parameters
The parameters of the pygwb_pipe
script can be visualized by running the following command:
pygwb_pipe --help
This will display the following set of parameters, which can be passed to the pipeline:
--param_file PARAM_FILE
Parameter file to use for analysis.
--output_path OUTPUT_PATH
Location to save output to.
--calc_coh CALC_COH
Calculate coherence spectrum from data.
--calc_pt_est CALC_PT_EST
Calculate omega point estimate and sigma from data.
--apply_dsc APPLY_DSC
Apply delta sigma cut when calculating final output.
--pickle_out PICKLE_OUT
Pickle output Baseline of the analysis.
--wipe_ifo WIPE_IFO Wipe interferometer data to reduce size of pickled Baseline.
--t0 T0 Initial time.
--tf TF Final time.
--data_type DATA_TYPE
Type of data to access/download; options are private,
public, local. Default is public.
--channel CHANNEL Channel name; needs to match an existing channel. Default is
"GWOSC-16KHZ_R1_STRAIN"
--new_sample_rate NEW_SAMPLE_RATE
Sample rate to use when downsampling the data (Hz). Default
is 4096 Hz.
--input_sample_rate INPUT_SAMPLE_RATE
Sample rate of the read data (Hz). Default is 16384 Hz.
--cutoff_frequency CUTOFF_FREQUENCY
Lower frequency cutoff; applied in filtering in
preprocessing (Hz). Default is 11 Hz.
--segment_duration SEGMENT_DURATION
Duration of the individual segments to analyse (seconds).
Default is 192 seconds.
--number_cropped_seconds NUMBER_CROPPED_SECONDS
Number of seconds to crop at the start and end of the
analysed data (seconds). Default is 2 seconds.
--window_downsampling WINDOW_DOWNSAMPLING
Type of window to use in preprocessing. Default is "hamming"
--ftype FTYPE Type of filter to use in downsampling. Default is "fir"
--frequency_resolution FREQUENCY_RESOLUTION
Frequency resolution of the final output spectrum (Hz).
Default is 1\/32 Hz.
--polarization POLARIZATION
Polarisation type for the overlap reduction function calculation; options are scalar, vector, tensor. Default is tensor.
--alpha ALPHA Spectral index to filter the data for. Default is 0.
--fref FREF Reference frequency to filter the data at (Hz). Default is 25 Hz.
--flow FLOW Lower frequency to include in the analysis (Hz). Default is 20 Hz.
--fhigh FHIGH Higher frequency to include in the analysis (Hz). Default is 1726 Hz.
--coarse_grain COARSE_GRAIN
Whether to apply coarse graining to the spectra. Default is 0.
--interferometer_list INTERFEROMETER_LIST [INTERFEROMETER_LIST ...]
List of interferometers to run the analysis with. Default is
["H1", "L1"]
--local_data_path LOCAL_DATA_PATH
Path(s) to local data, if the local data option is chosen.
Default is empty.
--notch_list_path NOTCH_LIST_PATH
Path to the notch list file. Default is empty.
--N_average_segments_welch_psd N_AVERAGE_SEGMENTS_WELCH_PSD
Number of segments to average over when calculating the psd
with Welch method. Default is 2.
--window_fft_dict WINDOW_FFT_DICT
Dictionary containing name and parameters relative to which
window to use when producing fftgrams for psds and csds.
Default is "hann".
--calibration_epsilon CALIBRATION_EPSILON
Calibation coefficient. Default 0.
--overlap_factor OVERLAP_FACTOR
Factor by which to overlap consecutive segments for
analysis. Default is 0.5 (50% overlap)
--zeropad_csd ZEROPAD_CSD
Whether to zeropad the csd or not. Default is True.
--delta_sigma_cut DELTA_SIGMA_CUT
Cutoff value for the delta sigma cut. Default is 0.2.
--alphas_delta_sigma_cut ALPHAS_DELTA_SIGMA_CUT [ALPHAS_DELTA_SIGMA_CUT ...]
List of spectral indexes to use in delta sigma cut
calculation. Default is [-5, 0, 3].
--save_data_type SAVE_DATA_TYPE
Suffix for the output data file. Options are hdf5, npz,
json, pickle. Default is json.
--time_shift TIME_SHIFT
Seconds to timeshift the data by in preprocessing. Default
is 0.
--gate_data GATE_DATA
Whether to apply self-gating to the data in preprocessing.
Default is False.
--gate_tzero GATE_TZERO
Gate tzero. Default is 1.0.
--gate_tpad GATE_TPAD
Gate tpad. Default is 0.5.
--gate_threshold GATE_THRESHOLD
Gate threshold. Default is 50.
--cluster_window CLUSTER_WINDOW
Cluster window. Default is 0.5.
--gate_whiten GATE_WHITEN
Whether to whiten when gating. Default is True.
--tag TAG Hint for the read_data function to retrieve one specific
type of data, e.g.: C00, C01
--return_naive_and_averaged_sigmas RETURN_NAIVE_AND_AVERAGED_SIGMAS
option to return naive and sliding sigmas from delta sigma
cut. Default value: False
As can be seen, all of the parameters above come with a brief description, which should help the user identify their functionality. In particular,
we note that the above parameters are the ones present in the pygwb.parameters
module. For more information, one can have a look at the
pygwb paper, where more details are provided.
Tip
Feeling overwhelmed with the amount of parameters? Make sure to have a look to the pygwb.parameters
documentation.
Note
The current default for the notch_list_path
is an empty string, which means no notches are applied.
If notching should be applied, a path to a notch list file can be added to these parameters.
An example for such a notch list can be downloaded here
.
This particular notch list was used in the analysis for the third observing run of the LIGO-Virgo-KAGRA network.
This file can also be found in the pygwb/pygwb_pipe
folder.
2. Running the script
Although all of the parameters shown above can be passed to the script, we start by running pygwb_pipe
without passing any optional parameters directly to the script.
The only required argument is a path to a parameter file, which contains the parameter values
to use for the analysis. As an example, one can run the script with the parameters.ini
file provided in the pygwb_pipe
directory of the
repository. To test the pipeline, run the command:
pygwb_pipe --param_file pygwb_pipe/parameters.ini --apply_dsc False
The output of the command above should be:
2023-02-21 14:43:40.817 | SUCCESS | __main__:main:160 - Ran stochastic search over times 1247644138-1247645038
2023-02-24 16:35:25.625 | SUCCESS | __main__:main:163 - POINT ESTIMATE: -6.496991e-06
2023-02-24 16:35:25.625 | SUCCESS | __main__:main:164 - SIGMA: 2.688128e-06
However, one could have decided to run with different parameters. An option is to modify the parameters.ini
file, or one could also pass the parameters as arguments
to the script directly. For example:
pygwb_pipe --param_file {path_to_param_file} --apply_dsc True --gate_data True
Warning
Passing any parameters through the command line overwrites the value in the parameters.ini
file.
Note: detector–specific parameters
It is possible to pass detector–specific parameters, both in the .ini
file and through shell. The syntax is:
param: IFO1:val1,IFO2:val2
For example, if passing different channel names for LIGO Hanford and LIGO Livingston:
channel: H1:GWOSC-16KHZ_R1_STRAIN,L1:PYGWB-SIMULATED_STRAIN
These are the same when passing through shell:
--channel H1:GWOSC-16KHZ_R1_STRAIN,L1:PYGWB-SIMULATED_STRAIN
3. Output of the script
As mentioned previously, the purpose of the pygwb
analysis package is to compute an estimator of the GWB, through the computation of a
point estimate and variance spectrum, which can be translated into one point estimate and variance. By default, the output of the analysis will be saved in
the ./output
folder of your run directory, unless otherwise specified through the --output_path
argument of the script.
A few files can be found in this directory, including a version of the parameters file used for the
analysis. Note that this takes into account any parameters that were modified through the command line. This file will have the naming convention parameters_{t0}_{length_of job}_final.ini
.
Additionally, the power-spectral densities (PSDs) and cross-spectral densities (CSDs) are saved in a file with naming convention:
psds_csds_{start_time_of_job}_{job_duration}.npz
Tip
Not sure about what is exactly in a file? Load in the file and print out all its keys as shown here.
Printing these keys displays the following:
npzfile = numpy.load("psds_csds_{start_time_of_job}_{job_duration}.npz")
print(list(npzfile.keys()))
['freqs', 'avg_freqs', 'csd', 'avg_csd', 'psd_1', 'psd_2', 'avg_psd_1', 'avg_psd_2',
'csd_times', 'avg_csd_times', 'psd_times', 'avg_psd_times',
'coherence', 'psd_1_coh', 'psd_2_coh', 'csd_coh', 'n_segs_coh']
The above keys of the .npz
have corresponding data associated to them, which can be read using:
variable = npzfile['{key}']
More specifically, the frequencies for naive estimates can be accessed through the 'freqs'
key, whereas the ones for averaged
estimates of the spectral densitities can be accessed through the 'avg_freqs'
key. Additionally, the CSD can be read using the
'csd'
key and the average CSD can be found with the key 'avg_csd'
. Analogously, one can load the PSDs of the interferometers.
One can also read the times associated to these spectral densities by using the keys '{insert_spectral_density}_times'
. If the --calc_coh
argument was set to True
during the analysis, the coherence information will also be stored in this file under the 'coherence'
key
together with the PSDs, CSD and amount of segments used to compute coherence.
Note
Depending on the parameters used to run pygwb_pipe
, some keys above might not have a avalue associated to them.
A second file contains the actual point estimate spectrum, variance spectrum, point estimate and variance. These can be found in:
point_estimate_sigma_{start_time_of_job}_{job_duration}.npz
This file can be read in similarly to the previous file, and has the following keys:
['frequencies', 'frequency_mask', 'point_estimate_spectrum', 'sigma_spectrum',
'point_estimate', 'sigma', 'point_estimate_spectrogram', 'sigma_spectrogram',
'badGPStimes', 'delta_sigma_alphas', 'delta_sigma_times', 'delta_sigma_values',
'naive_sigma_values', 'slide_sigma_values', 'ifo_1_gates', 'ifo_1_gate_pad',
'ifo_2_gates', 'ifo_2_gate_pad']
Note
Depending on the parameters used to run pygwb_pipe
, some keys above might not have a avalue associated to them,
in particular the ones related to gating and the delta sigma cut.
The file and associated keys can be read in via the same code as the one shown above. The 'frequencies'
key reads
the frequencies corresponding to those of the point_estimate_spectrum
, which can in turn be
read using the key that is called the same. The spectrograms are read in a nalogously, but with spectrogram at the end of the name
instead of spectrum. The key 'frequency_mask'
provides information about the frequencies which were notched, i.e. not used, in the analysis.
The overall point estimate and its standard deviation can be loaded using the 'point_estimate'
and the 'sigma'
keys.
The output of the data quality checks in pygwb
are also saved in the same file. The output of the delta sigma cut is stored in
different keys. First, one can find the times which are not allowed in the analysis using the key 'badGPStimes'
, i.e., the
times that do not pass the cut. The spectral indices used for the delta sigma cut are stored in 'delta_sigma_alphas'
, times
in 'delta_sigma_times'
, and the actual values of the computed delta sigmas can be found through the 'delta_sigma_values'
key.
The cut computes both the naive and sliding sigma values, which are also stored in the keys 'naive_sigma_values'
and 'slide_sigma_values'
.
If gating was applied during the analysis, the gated times are saved in 'ifo_{i}_gates'
where i
can be 1 or 2, labeling the interferometer.
The 'ifo_{i}_gate_pad'
refers to the value of the parameter gate_tpad
during the analysis.
To conclude, if the script was run with --pickle_out True
, a pickle
file will be present in the output directory, containing a pickled
version of the baseline. This contains all the information present in the other two npz
files, but allows the user to create a baseline object
from this pickle
file. More information about how to create a baseline from such a file can be found here.
Warning
Saving pickle
files can take up a lot of memory. Furthermore, loading in a baseline from pickle
file can take quite some time. Working
with npz
files is therefore recommended, when possible.
Note
Depending on the parameters used to run pygwb_pipe
, the output of the script and amount of files might differ from the one described here.
This tutorial provides a brief overview of the pygwb_pipe
script and how to run it for one job, i.e., a small stretch of data. In practice,
however, one probably wants to analyze months, if not years, of data. To address this need, pygwb_pipe
can be run on multiple jobs, i.e., different
stretches of data, through parallelization using Condor (more information about Condor can be found here).
The concrete implementation within the pygwb
package is outlined in the following tutorial.