# Interpret the output of the `statistical_checks` module

This notebook is not meant as a tutorial on how to run the `statistical_checks` module, as this was already covered in the tutorial about the `pygwb_stats` script [here](stat_checks.html). In this tutorial, we go over the plots generated by the module, and provide a brief description of each of the plots to help the user interpret the results. 

For more information on the module, we refer the user to the module API page [here](api/pygwb.statistical_checks.html). For more general information about the module, we refer the user to the [pygwb paper](https://arxiv.org/pdf/2303.15696.pdf).

## Plots

### Plotting running quantities

The two plots below show the running point estimate and the standard deviation as the analysis run evolves, i.e., as more data is accumulated. This is particularly useful to determine whether the point estimate evolves towards a non-zero value, meaning it is detecting something, or not. In addition, the standard deviation should follow a $1/\sqrt{\rm time}$ behavior, which can be easily verified in the plot. More information about the above quantities can be found in the [pygwb paper](https://arxiv.org/pdf/2303.15696.pdf).

**Output of plot_running_point_estimate()**

![image.png](attachment:4d16dbdf-04c3-4a53-a30c-8abd2303e6e5.png)

**Output of plot_running_sigma()**

![image.png](attachment:bb201497-387d-4f35-bc79-3f56d9202c1c.png)

### Plotting the IIFT of the point estimate integrand

**Output of plot_IFFT_point_estimate_integrand()**

The plot below shows the inverse Fourier transform of the point estimate integrand. In the case of detection, the plot should display a peak around 0, corresponding to a signal that is present only when the two data-streams are not time-shifted, whereas the signal disappears for other non-zero values. Combining this plot with the previous two plots provides additional certainty in the detection of a signal.

![image.png](attachment:20408486-f6b9-48b7-a3a3-bf83cd2a4ef4.png)

### Plotting spectra

The plots below show the signal-to-noise ratio (SNR) as a function of frequency, together with the standard deviation. Both the areal and imaginary part of the spectrum are shown, as the real part of the spectrum contains information about the signal, whereas the imaginary part contains information about the noise. The information of the standard deviation plot allows us to tell how noisy some of the frequency bins were, and therefore, together with the information of the SNR plot, how much a frequency bin contributed to the analysis.

**Output of plot_SNR_spectrum()**

![image.png](attachment:2a9c210d-c60e-448a-87eb-863cb591adf8.png)

**Output of plot_cumulative_SNR_spectrum()**

![image.png](attachment:084036c3-8aa5-41c6-bd3c-75f1633ab8ee.png)

**Output of plot_real_SNR_spectrum()**

![image.png](attachment:ee703e9c-6e7c-46ff-abb4-a54672f590d8.png)

**Output of plot_imag_SNR_spectrum()**

![image.png](attachment:ca448dac-8f08-4011-b374-d46deb94a6e8.png)

**Output of plot_sigma_spectrum()**

![image.png](attachment:9ab728dd-9056-4899-8332-5c4fe4279a07.png)

**Output of plot_cumulative_sensitivity()**

![image.png](attachment:9284c882-ebe4-4d69-b741-d524d57663b6.png)

### Plotting the evolution as a function of time (i.e. analysis segment)

The plot below shows the value of the point estimate, and its standard deviation per analysis segment, as a function of time. Deviations from the mean of the whole analysis run are shown as well. The quantities are shown before and after the delta-sigma cut, allowing to visualize its effect. Outliers in the standard deviation plot should be removed after the delta-sigma cut. If not, these should be flagged for follow-up. Any other irregularities in these plots are easily picked up and can then be flagged for further investigation.

**Output of plot_omega_sigma_in_time()**

![image.png](attachment:11b58872-5c44-42d0-959b-977c16e3efb8.png)

### Plotting the effect of the delta-sigma cut

The few plots below illustrate the effect of the delta-sigma cut on the quantities of interest (see [here](api/pygwb.delta_sigma_cut.html) for information about the delta-sigma cut in `pygwb`). A clear cut should be seen in the value of the delta-sigma itself when comparing before and after cut values (the cut should be at the value specified in the parameter file for the delta-sigma cut). Note that any outliers should in principle be removed by the delta-sigma cut, as this removes any abnormally loud segments. Any remaining outliers should be flagged for follow-up.

**Output of plot_hist_sigma_dsc()**

![image.png](attachment:45bb2d9a-5cc9-49e3-a505-790eb07d8fb4.png)

**Output of plot_scatter_sigma_dsc()**

![image.png](attachment:8c9846eb-478f-4375-bb96-2e5bc3bd3c9f.png)

**Output of plot_scatter_omega_sigma_dsc()**

![image.png](attachment:70c1bc17-206c-42da-b33e-9c91ac36f4c1.png)

**Output of plot_hist_omega_pre_post_dsc()**

![image.png](attachment:d5de6a9f-3d7f-4ffd-bf9d-f51e4cffb20f.png)

### Testing Gaussianity

The Kolmogorov-Smirnov test can be used to verify the consistency of the data with some assumed distribution. For stochastic searches, we assume Gaussianity of our data. Hence, this test is good to verify this assumption. The plot shows the cumulative distribution function, together with the one of the data. The maximum deviation from the assumed distribution is displayed as a test statistic, together with the p-value. More information about the KS test can be found [here](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).

**Output of plot_KS_test()**

![image.png](attachment:b6b39195-1906-4805-b690-1a09ad6dc59f.png)

### Visualizing the variation in variance

The plot below illustrates the variation in the variance. The histogram should be centered around 1, as this would correspond to values of the variance centered around the mean of the variance. Large fluctuations or outlier bins should be investigated.

**Output of plot_hist_sigma_squared()**

![image.png](attachment:92b30dba-31d8-435b-95e7-c75d23587ccb.png)

### Fitting as a function of time

The two plots below plot the values of the point estimate and the standard deviation as a function of time and allow for a linear fit to be performed. Any trend in these quantities can easily be visualized in this plot, together with the numerical values of the fit parameters.

**Output of plot_omega_time_fit()**

![image.png](attachment:b5a7267a-1cca-4954-9ea5-42c77170ff43.png)

**Output of plot_sigma_time_fit()**

![image.png](attachment:1fe8f0e3-94c4-4c39-8be6-d046c71c2e31.png)

### Effect of gating

If gating was performed during the analysis run, the duration of the gates can be represented visually as a function of the analysis segment. Gates lasting longer than a few seconds should be flagged for follow-up. More information on the gating procedure can be found in the [pygwb paper](https://arxiv.org/pdf/2303.15696.pdf) or in the implementation of gating in the `pygwb`module [here](api/pygwb.preprocessing.self_gate_data.html#pygwb.preprocessing.self_gate_data).

**Output of plot_gates_in_time()**

![image.png](attachment:fa842663-568e-4280-b43c-001a27969988.png)