pygwb.preprocessing

The preprocessing module combines all the functions that handle the preprocessing of the data used in the analysis. This is anything related to the preparation of the data for the pygwb analysis run. It can read data from frame files, locally or publicly (for additional information on frame files, see here). Other functionalities include resampling the data, applying a high-pass filter to data or applying a timeshift. These functionalities come together in the triplet of preprocessing_data functions which read in data and resample and/or high-passe the data on the fly. The triplet can work for a gwpy.timeseries.TimeSeries, a normal array or using a gravitational-wave channel that will read data from that channel using the provided local or public frame files. Another functionality of the module is to gate data based on the gating function in gwpy, gwpy.timeseries.TimeSeries.gate. More information can be found here.

Examples

As an example, we read in some data from a certain channel and then resample, high-pass and apply gating to the data. First, we have to import the module.

>>> import pygwb.preprocessing as ppp

Then, we read in some data using the read_data method. For concreteness, we read in public data from the LIGO Hanford “H1” detector. This can be done as shown below. The “public” tag indicates we are obtaining public data from the GWOSC servers.

>>> IFO = "H1"
>>> data_timeseries = ppp.read_data(
        IFO,
        "public",                   # data_type
        "H1:GWOSC-16KHZ_R1_STRAIN", # channel
        1247644138,                 # t0
        1247648138,                 # tf
        "",                         # local_data_path
        16384                       # input_sample_rate
    )
>>> print(data_timeseries.sample_rate)
16384.0 Hz

The sample rate is shown for illustrative purposes. Now, we preprocess the data, meaning it is resampled and a high-pass filter is applied to the data. As an example, the data is resampled to 4 kHz.

>>> new_sample_rate = 4096
>>> preprocessed_timeseries = ppp.preprocessing_data_gwpy_timeseries(
        IFO,
        data_timeseries,
        new_sample_rate,
        11,        # cutoff_frequency
        2,         # number_cropped_seconds
        "hamming", # window_downsampling
        "fir",     # ftype
        0          # timeshift
    )
>>> print(preprocessed_timeseries.sample_rate)
4096.0 Hz

One can see that the sample rate was indeed modified. Another important part of preprocessing is gating the data. In that case, using again default values for parameters, one can run the following lines:

>>> gated_timeseries, deadtime = ppp.self_gate_data(
        preprocessed_timeseries,
        1.0,  # gate_tzero
        0.5,  # gate_tpad
        50.0, # gate_threshold
        0.5,  # cluster_window
        True  # gate_whiten
    )

More information on the gating procedure can be found here.

Functions

apply_high_pass_filter(timeseries, ...[, ...])

Function to apply a high pass filter to a timeseries.

preprocessing_data_channel_name(IFO, t0, tf, ...)

Function doing the pre-processing of the data to be used in the remainder of the code.

preprocessing_data_gwpy_timeseries(...[, ...])

Function doing the pre-processing of a gwpy timeseries to be used in the remainder of the code.

preprocessing_data_timeseries_array(t0, tf, ...)

Function performing the pre-processing of a time-series array to be used in the remainder of the code.

read_data(IFO, data_type, channel, t0, tf[, ...])

Function that read in the data to be used in the rest of the code.

resample_filter(time_series_data, ...[, ...])

Function doing part of the pre-processing (resampling and filtering) of the data to be used in the remainder of the code.

self_gate_data(time_series_data[, tzero, ...])

Function to self-gate data to be used in the stochastic pipeline.

set_start_time(job_start_GPS, job_end_GPS, ...)

Function to identify segment start times either with or without sidereal option.

shift_timeseries(time_series_data[, time_shift])

Function that shifts a timeseries by an amount time_shift in order to perform the timeshifted analysis.