==================================== Running multiple ``pygwb_pipe`` jobs ==================================== In practice, one will probably want to run ``pygwb`` on long stretches of data. This is achieved most easily by splitting the large data set in smaller chunks of data. These can then be analyzed individually, and combined after the analysis to form one overall result for the whole data set. To this end, ``pygwb`` comes with two scripts: ``pygwb_dag`` and ``pygwb_combine``. The former allows the user to run ``pygwb_pipe`` (for which a tutorial can be found `here `_) simultaneously on shorter stretches of data, whereas the latter allows to combine the output of the individual runs into an overall result for the whole data set. **The pygwb_dag script** ======================== **1. Script parameteres** ------------------------- To be able to run multiple ``pygwb_pipe`` jobs simultaneously, ``pygwb`` relies on `Condor `_. This requires a ``dag`` file, which contains information about all the jobs, i.e., running ``pygwb_pipe`` on different stretches of data. In ``pygwb``, this file can be created by using the ``pygwb_dag`` script. To visualize the expected arguments of the script, one can call: .. code-block:: shell pygwb_dag --help This will display the required parameters, together with a small description: .. code-block:: shell --subfile SUBFILE Submission file. --jobfile JOBFILE Job file with start and end times and duration for each job. --flag FLAG Flag that is searched for in the DQSegDB. --t0 T0 Begin time of analysed data, will query the DQSegDB. If used with jobfile, it is an optional argument if one does not wish to analyse the whole job file --tf TF End time of analysed data, will query the DQSegDB. If used with jobfile, it is an optional argument if one does not wish to analyse the whole job file --parentdir PARENTDIR Starting folder. --param_file PARAM_FILE Path to parameters.ini file. --dag_name DAG_NAME Dag file name. --apply_dsc APPLY_DSC Apply delta-sigma cut flag for pygwb_pipe. --pickle_out PICKLE_OUT Pickle output Baseline of the analysis. --wipe_ifo WIPE_IFO Wipe interferometer data to reduce size of pickled Baseline. --calc_pt_est CALC_PT_EST Calculate omega point estimate and sigma from data. An important argument of the script, is the path to the job file, passed through ``--jobfile``. The job file is a simple ``.txt`` file and contains the different jobs, or in other words, the different stretches of data to run the analysis on. For concretenes, consider the case where one would want to run ``pygwb`` on 12000 seconds of data, but split into smaller jobs. The job file could then look as follows: .. code-block:: shell 1 0 4000 4000 1 4000 9000 5000 1 9000 12000 3000 The first column does not play a role, the second and third colum indicate the start and end time of the job, respectively, whereas the last column shows the duration of the job, i.e., the difference between end and start time. The job file therefore allows the script to *know* on which stretches of data to run. In case one wants to run on a subset of the jobs in the job file, one can pass an additional start and end time to the script through the ``--t0`` and ``--tf`` arguments. The ``--parentdir`` allows to pass the full path to the run directory, and the ``--param_file`` should point to the parameter file to be used by ``pygwb_pipe``. .. seealso:: For more information about ``pygwb_pipe`` and the usage of a parameter file, we refer the user to the tutorial `here `_. For the remainder of the arguments, we refer the user to the ``pygwb_pipe`` `tutorial `_, as the ``dag`` file passes the relevant arguments to ``pygwb_pipe`` behind the screens, e.g., the parameter file and the ``apply_dsc`` flag. Note that an additional argument should be passed to the script, namely the submission file. This file passes necessary information to Condor, and the cluster/server on which the user is running the ``pygwb`` jobs. .. warning:: The Condor submission file, passed through ``--subfile``, is not included in the ``pygwb`` package. Its specific implementation will depend on the server or cluster where the user runs the analysis. More information about Condor, together with inspiration for the submission file can be found `here `_. **2. Running the script** ------------------------- The arguments described above can be passed to the script through the following command: .. code-block:: shell pygwb_dag {your-dag-file.dag} --subfile {full_path_to_subfile} --jobfile {full_path_to_jobfile} --parent_dir {full_path_to_parent_dir} --param_file {full_path_to_param_file} .. note:: If the ``dag`` name was not specified when calling ``pygwb_dag`` in the previous step, the default name ``dag_name.dag`` is used. The ``dag`` file is now created in the ``{full_path_to_parent_dir}/output`` folder. To submit the job to condor and actually run all the jobs, navigate to that folder and run the following line in the command line: .. code-block:: shell condor_submit_dag {your-dag-file.dag} To check the status of the jobs, one can execute the command: .. code-block:: shell condor_q For additional information on Condor jobs, we refer the user to the Condor `documentation `_. **3. Output of the script** --------------------------- Once all the jobs submitted through Condor and the ``dag`` file finish running, the output folder should contain similar files as the ones already discussed in the ``pygwb_pipe`` tutorial `here `_. However, there will be many more files compared to a single run, as ``pygwb_pipe`` was run for all the jobs, and therefore produced the output for each of the jobs. We refrain from repeating the information about the output of ``pygwb_pipe`` and refer to the previous `tutorial `_ for more information about the output. **Combining runs with pygwb_combine** ===================================== The ``pygwb_dag`` script described above runs multiple ``pygwb_pipe`` jobs on stretches of data. For each of these runs, the usual ``pygwb_pipe`` output is produced (see `here `_ for more information on the output of the ``pygwb_pipe`` script). However, the user is usually interested in an overall result for the whole data set. This is where ``pygwb_combine`` comes in, by allowing the user to combine their separate results into an overall result. For example, all separate point estimate and variance spectra will be combined into one overall spectrum for the whole data set. More information on this procedure can be found in the `pygwb paper `_. **1. Script parameteres** ------------------------- The required arguments of the ``pygwb_combine`` script can be displayed through: .. code-block:: shell pygwb_combine -h This shows the following arguments with a short description: .. code-block:: shell --data_path DATA_PATH [DATA_PATH ...] Path to data files or folder. --alpha ALPHA Spectral index alpha to use for spectral re-weighting. --fref FREF Reference frequency to use when presenting results. --param_file PARAM_FILE Parameter file --h0 H0 Value of h0 to use. Default is pygwb.constants.h0. --combine_coherence COMBINE_COHERENCE Calculate combined coherence over all available data. --coherence_path COHERENCE_PATH [COHERENCE_PATH ...] Path to coherence data files, if individual files are passed. --out_path OUT_PATH Output path. --file_tag FILE_TAG File naming tag. By default, reads in first and last time in dataset. **2. Running the script** ------------------------- To run the script, one executes the following command: .. code-block:: shell pygwb_combine --data_path {my_pygwb_output_folder} --alpha {my_spectral_index} --fref {my_fref} --param_file {my_parameter_file_path} --out_path {my_combine_folder} Note that not all arguments listed above are required to be able to run the script. .. warning:: The ``--combine_coherence`` functionality is not supported when combining runs as a result of the ``pygwb_dag`` script. **3. Output of the script** --------------------------- As mentioned above, the output of the ``pygwb_combine`` script is one overall point estimate and variance (spectrum). The directory passed through the ``--out_path`` argument should contain a file that looks as follows: .. code-block:: shell point_estimate_sigma_spectra_alpha_0.0_fref_25_t0-tf.npz This file contains the combined spectra, where the notation indicates it was run with a spectral index of 0, reference frequency of 25 Hz, and t0 and tf would be actual numbers corresponding to the start and end time of the analysis, respectively. The keys of this ``npz`` file are: .. code-block:: shell ['point_estimate', 'sigma', 'point_estimate_spectrum', 'sigma_spectrum', 'frequencies', 'frequency_mask', 'point_estimates_seg_UW', 'sigmas_seg_UW'] The value associated with the key can be accessed from the ``npz`` file through: .. code-block:: shell npzfile = numpy.load("point_estimate_sigma_spectra_alpha_0.0_fref_25_t0-tf.npz") variable = npzfile["key"] One obtains the value for the overall point estimate and its standard deviation through the ``point_estimate`` and ``sigma`` keys, respectively. The corresponding spectra are found by using the ``point_estimate_spectrum`` and ``sigma_spectrum`` keys. The frequencies for these spectra can be retrieved through the ``frequencies`` key. The ``frequency_mask`` key returns the notched frequencies. For more information about notching, check the demo `here `_ or the API of the notch module `here `_. Lastly, one can also access the unweighted, i.e., without reweighting of the spectral index, point estimates and their standard deviations for every segment in the analysis. These are labeled with ``_UW`` at the end of the keys. .. tip:: Not sure about what is exactly in the ``.npz`` file? Load in the file and print out all its `keys` as shown `here `_. If the ``pygwb_pipe`` analyses were run with the delta sigma cut turned on, a file ``delta_sigma_cut_t0-tf.npz`` should be present in the output directory as well. This file contains the following keys: .. code-block:: shell ['naive_sigma_values', 'slide_sigma_values', 'delta_sigma_values', 'badGPStimes', 'delta_sigma_times', 'ifo_1_gates', 'ifo_2_gates', 'ifo_1_gate_pad', 'ifo_2_gate_pad'] The times flagged by the delta sigma cut that are excluded from the analysis can be retrieved with the ``'badGPStimes'`` key. The alphas used for the delta sigma cut are stored in ``'delta_sigma_alphas'`` key, the times in ``'delta_sigma_times'``, and the actual values of the delta sigmas in ``'delta_sigma_values'``. The delta sigma cut computes both the naive and sliding sigma values, which are stored in the keys ``'naive_sigma_values'`` and ``'slide_sigma_values'``. If gating is turned on, the gated times are saved in ``'ifo_{i}_gates'`` where ``i`` denotes the first and second onterferometer used for the analysis. The ``'ifo_{i}_gate_pad'`` refers to the value of the parameter ``gate_tpad`` during the analysis.