Statistics functions¶
Some generally helpful statistical functions for base QC.
- marine_qc.statistics.missing_mean(inarr)[source]
Return mean of input array.
- marine_qc.statistics.p_data_given_good(x, q, r_hi, r_lo, mu, sigma)[source]
Probability of an observed value assuming it comes from a “good” measurement.
Calculate the probability of an observed value x given a normal distribution with mean mu standard deviation of sigma, where x is constrained to fall between R_hi and R_lo and is known only to an integer multiple of Q, the quantization level.
- Parameters:
x (
float) – Observed value for which probability is required.q (
float) – Quantization of x, i.e. x is an integer multiple of Q.r_hi (
float) – The upper limit on x imposed by previous QC choices.r_lo (
float) – The lower limit on x imposed by previous QC choices.mu (
float) – The mean of the distribution.sigma (
float) – The standard deviation of the distribution.
- Return type:
- Returns:
float– Probability of the observed value given the specified distribution.- Raises:
ValueError – When inputs are incorrectly specified: q<=0, sigma<=0, r_lo > r_hi, x < r_lo or x > r_hi.
- marine_qc.statistics.p_data_given_gross(q, r_hi, r_lo)[source]
Probability of an observed value assuming it is a gross error.
Calculate the probability of the data given a gross error assuming gross errors are uniformly distributed between R_low and R_high and that the quantization, rounding level is Q
- Parameters:
- Return type:
- Returns:
float– Probability of the observed value given that it is a gross error.- Raises:
ValueError – When limits are not ascending or q<=0.
- marine_qc.statistics.p_gross(p0, q, r_hi, r_lo, x, mu, sigma)[source]
Posterior probability that an observation is a gross error.
Calculate the posterior probability of a gross error given the prior probability p0, the quantization level of the observed value, Q, previous limits on the observed value, R_hi and R_lo, the observed value, x, and the mean (mu) and standard deviation (sigma) of the distribution of good observations assuming they are normally distributed. Gross errors are assumed to be uniformly distributed between R_lo and R_hi.
- Parameters:
p0 (
float) – Prior probability of gross error.q (
float) – Quantization of x, i.e. x is an integer multiple of Q.r_hi (
float) – The upper limit on x imposed by previous QC choices.r_lo (
float) – The lower limit on x imposed by previous QC choices.x (
float) – Observed value for which probability is required.mu (
float) – The mean of the distribution of good obs.sigma (
float) – The standard deviation of the distribution of good obs.
- Return type:
- Returns:
float– Probability of gross error given an observed value.- Raises:
ValueError – When inputs are incorrectly specified: p0 < 0, p0 > 1, q <= 0, r_hi < r_lo, x < r_lo, x > r_hi, sigma <= 0.
- marine_qc.statistics.trim_mean(inarr, trim)[source]
Calculate a resistant (aka robust) mean of an input array given a trimming criteria.
- marine_qc.statistics.trim_std(inarr, trim)[source]
Calculate a resistant (aka robust) standard deviation of an input array given a trimming criteria.
- marine_qc.statistics.winsorised_mean(inarr)[source]
Compute the 25% winsorised mean of the input array.
The winsorised mean is a resistant way of calculating an average.
- Parameters:
- Return type:
- Returns:
float– The winsorised mean of the input array with a 25% trimming.- Raises:
ValueError – if length of inarr is equal to 0.
Notes
The winsorised mean is that which you get if you set the first quarter of the sorted input array to the 1st quartile value and the last quarter to the 3rd quartile and then take the mean. This is quite a heavy trimming of the distribution. It makes it very resistant - about half the obs can be egregiously bad without affecting the mean strongly - but it will be less accurate if there are lots of observations, or the quality of the obs is higher.