marine_qc package ¶

Returns:

float – Azimuth.

marine_qc.astronomical_geometry.calculate_sun_parameters(time)[source]¶

Calculate both right ascension and declination of sun.

Parameters:: time (float) – Time value.
Return type:: tuple[float, float]
Returns:: tuple of float – A tuple of two floats representing right ascension and declination of sun.

marine_qc.astronomical_geometry.convert_degrees(deg)[source]¶

Convert degrees.

Parameters:: deg (float) – Value in degrees.
Return type:: float
Returns:: float – Degree (from 0 to 360).

marine_qc.astronomical_geometry.elliptic_angle(time)[source]¶

Get angle plane of elliptic to plane of celestial equator.

Parameters:: time (float) – Time value.
Return type:: float
Returns:: float – Angle plane of elliptic to plane of celestial equator.

marine_qc.astronomical_geometry.mean_earth_anomaly(time, theta)[source]¶

Calculate mean anomaly of earth (g).

Parameters:

time (float) – Time value.
theta (float) – Position of the sun.

Return type:

Returns:

float – Mean anomaly of the earth (g).

marine_qc.astronomical_geometry.sin_of_elevation(phi, declination, hour_angle)[source]¶

Get sinus of geometric elevation.

Parameters:

phi (float) – Latitude value in rad.
declination (float) – Declination.
hour_angle (float) – Hour angle.

Return type:

Returns:

float – Sinus of geometric elevation.

marine_qc.astronomical_geometry.sun_ascension(long_of_sun, sin_long_of_sun, angle_of_elliptic)[source]¶

Calculate right ascension.

Parameters:

long_of_sun (float) – Longitude of the sun.
sin_long_of_sun (float) – Sinus of the longitude of the sun.
angle_of_elliptic (float) – Angle of elliptic.

Return type:

Returns:

float – Right ascension.

marine_qc.astronomical_geometry.sun_azimuth(phi, declination)[source]¶

Get azimuth.

Parameters:

phi (float) – Latitude value in rad.
declination (float) – Declination.

Return type:

Returns:

float – Azimuth.

marine_qc.astronomical_geometry.sun_declination(sin_long_of_sun, angle_of_elliptic)[source]¶

Calculate declination of sun.

Parameters:

sin_long_of_sun (float) – Sinus of the longitude of the sun.
angle_of_elliptic (float) – Angle of elliptic.

Return type:

Returns:

float – Declination of sun.

marine_qc.astronomical_geometry.sun_hour_angle(local_siderial_time, right_ascension)[source]¶

Get hour angle.

Parameters:

local_siderial_time (float) – Local siderial time value.
right_ascension (float) – Right ascension.

Return type:

Returns:

float – Hour angle.

marine_qc.astronomical_geometry.sun_longitude(time)[source]¶

Get longitude of sun.

Parameters:: time (float) – Time value.
Return type:: float
Returns:: float – Longitude of the sun.

marine_qc.astronomical_geometry.sun_position(time)[source]¶

Find position of sun in celestial sphere, assuming circular orbit (radians).

Parameters:: time (float) – Time value.
Return type:: float
Returns:: float – Position of the sun.

marine_qc.astronomical_geometry.sunangle(year, day, hour, minute, sec, zone, dasvtm, lat, lon)[source]¶

Calculate the local azimuth and elevation of the sun at a specified location and time.

Parameters:

year (int) – Year.
day (int) – Day number of year starting with 1 for Jan 1st and running up to 365/6.
hour (int) – Hour.
minute (int) – Minute.
sec (int) – Second.
zone (int) – The local international time zone, counted westward from Greenwich.
dasvtm (int) – 1 if daylight saving time is in effect, otherwise 0.
lat (float) – Latitude in degrees, north is positive.
lon (float) – Longitude in degrees, east is positive.

Return type:

tuple[float, float, float, float, float, float]

Returns:

tuple of float – A tuple of six floats representing Azimuth angle of the sun (degrees east of north), Elevation of sun (degrees), Right ascension of sun (degrees), Hour angle of sun (degrees), Hour angle of local siderial time (degrees) and Declination of sun (degrees).

Notes

Copied from Rob Hackett’s area 28 Apr 1998 by J.Arnott. Add protection for ASIN near +/- 90 degrees 07 Jan 2002 by J.Arnott. Pythonised 25/09/2015 by J.J. Kennedy

The Python version gets within a fraction of a degree of the original Fortran code from which it was ported for a range of values. The differences are larger if single precision values are used suggesting that this is not the most numerically robust scheme.

marine_qc.astronomical_geometry.to_local_siderial_time(time, time_in_hours, delyear, lon)[source]¶

Convert to local siderial time.

Parameters:

time (float) – Time value.
time_in_hours (float) – Time value in hours.
delyear (int) – Relative year number.
lon (float) – Longitude value in degrees.

Return type:

Returns:

float – Local siderial time.

marine_qc.astronomical_geometry.to_siderial_time(time, delyear)[source]¶

Convert to siderial time.

Parameters:

time (float) – Time value.
delyear (int) – Relative year number.

Return type:

Returns:

float – Siderial time.

marine_qc.auxiliary module¶

Auxiliary functions for QC.

marine_qc.auxiliary.convert_to(value, source_units, target_units)[source]¶

Convert a float or sequence from source units to target units.

Parameters:

value (SequenceNumberType) – A single float value, None, or a sequence (e.g., list, tuple, array-like) containing floats and/or None values. None values are passed through unchanged.
source_units (str) – The unit(s) of the input value(s), e.g., ‘degC’, ‘km/h’.
target_units (str) – The unit(s) to convert to, e.g., ‘K’, ‘m/s’. If set to “unknown”, the value(s) will be converted to the base SI units of the source_units, e.g., ‘degC’ to ‘kelvin’, ‘km/h’ to ‘meter/s’.

Return type:

SequenceNumberType

Returns:

SequenceNumberType – The converted value(s), preserving the input structure (scalar, list, tuple, array). None values remain unchanged.

Examples

>>> convert_to(100, "degC", "K")
373.15

>>> convert_to([0, 100], "degC", "K")
[273.15, 373.15]

>>> convert_to([None, 100], "degC", "K")
[None, 373.15]

>>> convert_to(5, "km", "unknown")  # Converts to base unit 'meter'
5000.0

marine_qc.auxiliary.convert_units(**units_by_name)[source]¶

Decorator to automatically convert specified function arguments to target units.

This decorator allows a function to accept inputs in various units and automatically converts them to desired target units before the function executes. It is especially useful for scientific or engineering functions where users may provide inputs in different unit systems.

Parameters:: **units_by_name (str) – Keyword arguments mapping function argument names to their target units. Special case: if a target unit is “unknown”, it will be converted to the base SI unit for the given source unit (e.g., “degC” ? “K”, “km/h” ? “m/s”).
Return type:: Callable[..., Any]
Returns:: Callable[..., Any] – A decorator that converts specified parameters to the target units prior to executing the decorated function.

Notes

The decorated function must be called with a units keyword argument, which can be:
- A dictionary mapping argument names to their source units, or
- A single string unit applied to all arguments.
Parameters not listed in units_by_name are not converted.
Parameters with None values are skipped.
If a target unit is “unknown”, the value is converted to the base SI unit.

Examples

>>> @convert_units(temperature="K")
... def func_single(temperature):
...     print(f"Temperature: {temperature:.2f} K")

>>> func_single(25.0, units={"temperature": "degC"})
Temperature: 298.15 K

>>> @convert_units(speed="m/s", altitude="m")
... def func_multiple(speed, altitude):
...     print(f"Speed: {speed:.1f} m/s, Altitude: {altitude:.0f} m")

>>> func_multiple(72.0, 0.5, units={"speed": "km/h", "altitude": "km"})
Speed: 20.0 m/s, Altitude: 500 m

>>> @convert_units(distance="unknown")
... def func_base(distance):
...     print(f"Distance in SI units: {distance} m")

>>> func_base(1.2, units={"distance": "km"})
Distance in SI units: 1200.0 m

marine_qc.auxiliary.ensure_arrays(**values)[source]¶

Ensure that all input values are NumPy arrays.

Parameters:: **values (Mapping[str, Any]) – Mapping of names to values expected to be NumPy arrays.
Return type:: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...]
Returns:: tuple of np.ndarray – A tuple containing the NumPy arrays corresponding to the input values, in the same order as provided.
Raises:: TypeError – If any input value is not a NumPy array.

marine_qc.auxiliary.format_return_type(result_array, *input_values, dtype=<class 'int'>)[source]¶

Convert the result numpy array(s) to the same type as the input value.

If result_array is a sequence of arrays, format each element recursively, preserving the container type.

Parameters:

result_array (np.ndarray) – The numpy array of results.
*input_values (Any) – One or more original input values to infer the desired return type from.
dtype (type, optional) – Desired data type of the result. Default is int.

Return type:

Any

Returns:

Same type as input(s) – The result formatted to match the type of the first valid input value.

marine_qc.auxiliary.generic_decorator(pre_handler=None, post_handler=None)[source]¶

Create a decorator that binds function arguments and applies pre- and post-processing handlers.

This decorator factory allows you to inspect, modify, or validate function arguments before and after the original function is called. Reserved keyword arguments can be passed to the handlers via _decorator_kwargs and removed from the call to the original function.

Parameters:

pre_handler (Callable[[dict], None]) – A function that takes the bound arguments dictionary (bound_args.arguments) and optionally additional keyword arguments, to inspect or modify arguments before the decorated function executes. Signature: handler(arguments: dict, **meta_kwargs) -> None.
post_handler (Callable[[dict], None]) – A function that takes the bound arguments dictionary (bound_args.arguments) and optionally additional keyword arguments, to inspect or modify arguments after the decorated function executes. Signature: handler(arguments: dict, **meta_kwargs) -> None.

Return type:

Returns:

Callable – A decorator that wraps any function. When applied, the function’s arguments are bound and passed to the handlers before execution.

Notes

Handlers can define a _decorator_kwargs attribute (a set of reserved keyword argument names). These reserved kwargs will be extracted from the decorated function’s call kwargs, passed to the handler, and removed before calling the original function.
The original function is called with the possibly modified bound arguments after handler processing.

marine_qc.auxiliary.inspect_arrays(params, sortby=None)[source]¶

Decorator to convert and validate specified function input parameters as 1D NumPy arrays.

This decorator ensures that specified input arguments are sequence-like, converts them to 1D NumPy arrays, validates that they are one-dimensional, and checks that all arrays have the same length. Optionally, the arrays can be sorted by another parameter and later restored to the original order.

Parameters:

params (list of str) – Names of parameters to inspect in the decorated function. Each specified parameter will be converted to a 1D NumPy array and validated.
sortby (str, optional) – Name of a parameter to sort the arrays by, if desired. The result will be returned in the original order of this parameter.

Return type:

Returns:

Callable[..., Any] – A decorator that, when applied, converts the specified parameters to 1D NumPy arrays, validates them, optionally sorts them, and passes them to the decorated function.

Raises:

ValueError – If a specified parameter is missing from the function arguments. If any specified parameter is not one-dimensional. If the lengths of the specified arrays do not all match.

Notes

If sortby is specified, the result of the function is reordered to match the original order of sortby after the function executes.

Examples

>>> @inspect_arrays(["a", "b"])
... def add_arrays(a, b):
...     return a + b

>>> add_arrays([1, 2, 3], [4, 5, 6])
array([5, 7, 9])

>>> add_arrays([1, 2], [3, 4, 5])
Traceback (most recent call last):
    ...
ValueError: Input ['a', 'b'] must all have the same length.

marine_qc.auxiliary.is_scalar_like(x)[source]¶

Return True if the input is scalar-like.

A value is considered scalar-like if it is one of the following:

Built-in Python scalars: int, float, bool, None
Strings and bytes
NumPy scalars (subclasses of np.generic), e.g. np.int32, np.float64, np.datetime64
Zero-dimensional NumPy arrays (e.g. np.array(5))
Pandas scalar types:
- pd.Timestamp
- pd.Timedelta
- pd.NA
- pd.NaT
Python datetime types:
- datetime.date
- datetime.datetime
- datetime.time

Container types such as lists, tuples, sets, dicts, pandas Series, pandas DataFrame, and NumPy arrays with one or more dimensions are not considered scalar-like.

Parameters:: x (Any) – The value to check.
Return type:: bool
Returns:: bool – True if x is scalar-like, False otherwise.

marine_qc.auxiliary.isvalid(inval)[source]¶

Check if a value(s) are numerically valid (not None or NaN).

Parameters:: inval (ValueNumberType) – Input value(s) to be tested.
Return type:: bool | ndarray[tuple[Any, ...], dtype[bool]]
Returns:: bool or np.ndarray of bool – Returns False where the input is None or NaN, True otherwise. Returns a boolean scalar if input is scalar, else a boolean array.

marine_qc.auxiliary.post_format_return_type(params, dtype=<class 'int'>, multiple=False)[source]¶

Decorator to format a function’s return value to match the type of its original input(s).

This decorator ensures that the output of the decorated function is converted back to the same structure/type as the original input(s) specified by params. It uses a context object (_ctx) if available to retrieve the original inputs before any preprocessing was applied. If no context is found, it falls back to the current bound arguments.

Parameters:

params (list of str) – List of parameter names whose original input types should be used to format the return value.
dtype (type, optional) – Desired data type of the result. Default is int.
multiple (bool, optional) – If True, assumes the function returns a sequence of results (e.g., a tuple), and applies format_return_type to each element individually. If False (default), applies format_return_type once on the entire result.

Return type:

Returns:

Callable[..., Any] – A decorator that modifies the decorated function’s output to match the input types.

Notes

Assumes a TypeContext object may be passed via _ctx keyword argument, storing original input values for accurate type formatting.
Falls back gracefully if no context is available, using current arguments.
Useful when function inputs are preprocessed (e.g., converted to arrays), and the output should match the original input types.

marine_qc.buoy_tracking_qc module¶

Buoy tracking QC module.

Module containing QC functions for sequential reports from a single drifting buoy.

class marine_qc.buoy_tracking_qc.AgroundChecker(lons, lats, dates, smooth_win, min_win_period, max_win_period)[source]¶

Bases: object

Class used to carry out do_aground_check().

Check to see whether a drifter has run aground based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed aground, else flag=0.

Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Longitude and latitude timeseries are smoothed prior to assessment to reduce position ‘jitter’. Some post-smoothing position ‘jitter’ may remain and its expected magnitude is set within the function by the ‘tolerance’ parameter. A drifter is deemed aground when, after a period of time, the distance between reports is less than the ‘tolerance’. The minimum period of time over which this assessment is made is set by ‘min_win_period’. This period must be long enough such that slow moving drifters are not falsely flagged as aground given errors in position (e.g. a buoy drifting at around 1 cm/s will travel around 1 km/day; given ‘tolerance’ and precision errors of a few km’s the ‘min_win_period’ needs to be several days to ensure distance-travelled exceeds the error so that motion is reliably detected and the buoy is not falsely flagged as aground). However, min_win_period should not be longer than necessary as buoys that run aground for less than min_win_period will not be detected.

Because temporal sampling can be erratic the time period over which an assessment is made is specified as a range (bound by ‘min_win_period’ and ‘max_win_period’) - assessment uses the longest time separation available within this range. If a drifter is deemed aground and subsequently starts moving (e.g. if a drifter has moved very slowly for a prolonged period) incorrectly flagged reports will be reinstated.

smooth_win: length of window (odd number) in datapoints used for smoothing lon/lat
min_win_period: minimum period of time in days over which position is assessed for no movement (see description)
max_win_period: maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.
min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).
max_win_period (int or None) – Maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

do_aground_check()[source]¶

Perform the actual aground check.

Return type:: None

get_qc_outcomes()[source]¶

Return the QC outcomes.

Return type:: ndarray
Returns:: array-like of int, shape (n,) – 1-dimensional array containing QC flags.

hrs_smooth: ndarray¶

lat_smooth: ndarray¶

lon_smooth: ndarray¶

smooth_arrays()[source]¶

Perform the preprocessing of the lat lon and time arrays.

Return type:: None

tolerance = 1.5725359013624185¶

valid_arrays()[source]¶

Check the input arrays are valid. Raises a warning and returns False if not valid.

Return type:: bool
Returns:: bool – True if array is valid, otherwise False.

valid_parameters()[source]¶

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:: bool
Returns:: bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.NewSpeedChecker(lons, lats, dates, speed_limit, min_win_period, ship_speed_limit, delta_d, delta_t, n_neighbours)[source]¶

Bases: object

Class used to carry out do_new_speed_check().

Check to see whether a drifter has been picked up by a ship (out of water) based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed picked up, else flag=0.

A drifter is deemed picked up if it is moving faster than might be expected for a fast ocean current (a few m/s). Unreasonably fast movement is detected when speed of travel between report-pairs exceeds the chosen ‘speed_limit’ (speed is estimated as distance between reports divided by time separation - this ‘straight line’ speed between the two points is a minimum speed estimate given a less-direct path may have been followed). Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Reports must be separated by a suitably long period of time (the ‘min_win_period’) to minimise the effect of these errors when calculating speed e.g. for reports separated by 9 hours errors of order 10 cm/s would result which are a few percent of fast ocean current speed. Conversely, the period of time chosen should not be too long so as to resolve short-lived burst of speed on manouvering ships. Larger positional errors may also trigger the check.

For each report, speed is assessed over the shortest available period that exceeds ‘min_win_period’.

Prior to assessment the drifter record is screened for positional errors using the iQuam track check method (from ex.Voyage). When running the iQuam check the record is treated as a ship (not a drifter) so as to avoid accidentally filtering out observations made aboard a ship (which is what we are trying to detect). This iQuam track check does not overwrite any existing iQuam track check flags.

IMPORTANT - for optimal performance, drifter records with observations failing this check should be subsequently manually reviewed. Ships move around in all sorts of complicated ways that can readily confuse such a simple check (e.g. pausing at sea, crisscrossing its own path) and once some erroneous movement is detected it is likely a human operator can then better pick out the actual bad data. False fails caused by positional errors (particularly in fast ocean currents) will also need reinstating.

The class has the following class attributes which can be modified using the set_parameters method.

iquam_parameters: Parameter dictionary for Voyage.iquam_track_check() function.
speed_limit: maximum allowable speed for an in situ drifting buoy (metres per second)
min_win_period: minimum period of time in days over which position is assessed for speed estimates (see description)

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).
min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).
ship_speed_limit (float) – Ship speed limit for the IQUAM track check.
delta_d (float) – The smallest increment in distance that can be resolved. For 0.01 degrees of lat-lon this is 1.11 km. Used in the IQUAM track check.
delta_t (float) – The smallest increment in time that can be resolved. For hourly data expressed as a float this is 0.01 hours. Used in the IQUAM track check.
n_neighbours (int) – Number of neighbours considered in the IQUAM track check.

do_new_speed_check()[source]¶

Perform the actual new speed check.

Return type:: None

get_qc_outcomes()[source]¶

Retrieve the QC outcomes for all observations.

Return type:: ndarray
Returns:: np.ndarray – Array of QC flags for each observation (0 = valid, 1 = flagged, untested otherwise).

iquam_track_ship: ndarray¶

perform_iquam_track_check()[source]¶

Perform iQuam track check as if reports are from a ship.

A deep copy of reps is made so metadata can be safely modified ahead of iQuam check an array of qc flags (iquam_track_ship) is the result.

Return type:: None

valid_arrays()[source]¶

Validate the input arrays (longitude, latitude, and time differences).

Checks for: - NaN values in longitude, latitude, or time arrays. - Monotonicity of the time array (self.hrs).

Warnings are raised for any issues detected.

Return type:: bool
Returns:: bool – True if all arrays are valid, False otherwise.

valid_parameters()[source]¶

Validate the QC parameters to ensure they are sensible.

Checks that: - speed_limit is non-negative. - min_win_period is non-negative.

Warnings are raised for any invalid parameters.

Return type:: bool
Returns:: bool – True if all parameters are valid, False otherwise.

class marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker(lat, lon, dates, sst, ostia, bgvar, ice, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]¶

Bases: object

Class used to perform the do_sst_biased_check(), do_sst_noisy_check(), and do_sst_biased_noisy_short_check().

Check to see whether a drifter sea surface temperature record is unacceptably biased or noisy as a whole.

The check makes an assessment of the quality of data in a drifting buoy record by comparing to a background reference field. If the record is found to be unacceptably biased or noisy relative to the background all observations are flagged by the check. For longer records the flags ‘drf_bias’ and ‘drf_noise’ are set for each input report: flag=1 for records with erroneous data, else flag=0. For shorter records ‘drf_short’ is set for each input report: flag=1 for reports with erroneous data, else flag=0.

When making the comparison an allowance is made for background error variance and also normal drifter error (both bias and random measurement error). A background error variance limit is also specified, beyond which the background is deemed unreliable and is excluded from comparison. Observations made during the day, in icy regions or where the background value is missing are also excluded from the comparison.

The check has two separate streams; a ‘long-record check’ and a ‘short-record check’. Records with at least n_eval observations are passed to the long-record check, else they are passed to the short-record check. The long-record check looks for records that are too biased or noisy as a whole. The short record check looks for individual observations exceeding a noise limit within a record. The purpose of n_eval is to ensure records with too few observations for their bias and noise to be reliably estimated are handled separately by the short-record check.

The correlation of the background error is treated as unknown and handled differently for each assessment. For the long-record noise-check and the short-record check the background error is treated as uncorrelated, which maximises the possible impact of background error on these assessments. For the long-record bias-check a limit (bias_lim) is specified beyond which the record is considered biased. The default value for this limit was chosen based on histograms of drifter-background bias. An alternative approach would be to treat the background error as entirely correlated across a long-record, which maximises its possible impact on the bias assessment. In this case the histogram approach was used as the limit could be tuned to give better results.

Parameters:

lat (SequenceNumberType) – 1-dimensional latitude array in degrees.
lon (SequenceNumberType) – 1-dimensional longitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional sea surface temperature array in K.
ostia (SequenceNumberType) – 1-dimensional background field sea surface temperature array in K.
bgvar (SequenceNumberType) – 1-dimensional background variance array in K^2.
ice (SequenceNumberType) – 1-dimensional ice concentration array in range 0,1.
n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.
bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.
n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

bgerr: ndarray¶

bgvar_is_masked: bool¶

do_sst_biased_noisy_check()[source]¶

Perform the bias/noise check QC.

Return type:: None

get_qc_outcomes_bias()[source]¶

Return the QC outcomes for the bias check.

Return type:: ndarray
Returns:: array-like of int, shape (n,) – 1-dimensional array containing QC flags.

get_qc_outcomes_noise()[source]¶

Return the QC outcomes for the noisy check.

Return type:: ndarray
Returns:: array-like of int, shape (n,) – 1-dimensional array containing QC flags.

get_qc_outcomes_short()[source]¶

Return the QC outcomes for the short check.

Return type:: ndarray
Returns:: array-like of int, shape (n,) – 1-dimensional array containing QC flags.

set_all_qc_outcomes_to(input_state)[source]¶

Set all the QC outcomes to the specified input_state.

Parameters:: input_state (int) – QC flag to map to the QC outcomes.
Return type:: None

sst_anom: ndarray¶

valid_parameters()[source]¶

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:: bool
Returns:: bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.SSTTailChecker(lat, lon, sst, ostia, ice, bgvar, dates, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]¶

Bases: object

Class used to carry out do_sst_start_tail_check() and do_sst_end_tail_check().

Check to see whether there is erroneous sea surface temperature data at the beginning or end of a drifter record (referred to as ‘tails’). Flags are set for each input report: flag=1 for reports with erroneous data, else flag=0, ‘drf_tail1’ is used for bad data at the beginning of a record, ‘drf_tail2’ is used for bad data at the end of a record.

The tail check makes an assessment of the quality of data at the start and end of a drifting buoy record by comparing to a background reference field. Data found to be unacceptably biased or noisy relative to the background are flagged by the check. When making the comparison an allowance is made for background error variance and also normal drifter error (both bias and random measurement error). The correlation of the background error is treated as unknown and takes on a value which maximises background error dependent on the assessment being made. A background error variance limit is also specified, beyond which the background is deemed unreliable. Observations made during the day, in icy regions or where the background value is missing are excluded from the comparison.

The check proceeds in two steps; a ‘long tail-check’ followed by a ‘short tail-check’. The idea is that the short tail-check has finer resolution but lower sensitivity than the long tail-check and may pick off noisy data not picked up by the long tail check. Only observations that pass the long tail-check are passed to the short tail-check. Both of these tail checks proceed by moving a window over the data and assessing the data in each window. Once good data are found the check stops and any bad data preceding this are flagged. If unreliable background data are encountered the check stops. The checks are run forwards and backwards over the record so as to assess data at the start and end of the record. If the whole record fails no observations are flagged as there are then no ‘tails’ in the data (this is left for other checks). The long tail check looks for groups of observations that are too biased or noisy as a whole. The short tail check looks for individual observations exceeding a noise limit within the window.

Parameters:

lat (SequenceNumberType) – 1-dimensional latitude array in degrees.
lon (SequenceNumberType) – 1-dimensional longitude array in degrees.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
dates (SequenceDatetimeType) – 1-dimensional date array.
long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).
long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.
short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.
short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.
short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

bgerr: ndarray¶

do_sst_tail_check(start_tail)[source]¶

Perform the actual SST tail check.

Parameters:: start_tail (bool) – If True flag the start of the record as failed, otherwise flag the end of the record as failed.
Return type:: None

end_tail_ind: int¶

get_qc_outcomes()[source]¶

Return the QC outcomes.

Return type:: ndarray
Returns:: array-like of int, shape (n,) – 1-dimensional array containing QC flags.

reps_ind: ndarray¶

sst_anom: ndarray¶

start_tail_ind: int¶

valid_parameters()[source]¶

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:: bool
Returns:: bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.SpeedChecker(lons, lats, dates, speed_limit, min_win_period, max_win_period)[source]¶

Bases: object

Class used to carry out do_speed_check().

The check identifies whether a drifter has been picked up by a ship (out of water) based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed picked up, else flag=0.

A drifter is deemed picked up if it is moving faster than might be expected for a fast ocean current (a few m/s). Unreasonably fast movement is detected when speed of travel between report-pairs exceeds the chosen ‘speed_limit’ (speed is estimated as distance between reports divided by time separation - this ‘straight line’ speed between the two points is a minimum speed estimate given a less-direct path may have been followed). Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Reports must be separated by a suitably long period of time (the ‘min_win_period’) to minimise the effect of these errors when calculating speed e.g. for reports separated by 24 hours errors of several cm/s would result which are two orders of magnitude less than a fast ocean current which seems reasonable. Conversely, the period of time chosen should not be too long so as to resolve short-lived burst of speed on manoeuvring ships. Larger positional errors may also trigger the check. Because temporal sampling can be erratic the time period over which this assessment is made is specified as a range (bound by ‘min_win_period’ and ‘max_win_period’) - assessment uses the longest time separation available within this range.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).
min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).
max_win_period (float) – Maximum period of time in days over which position is assessed for speed estimates (this should be greater than min_win_period and allow for some erratic temporal sampling e.g. min_win_period + 0.2 to allow for gaps of up to 0.2 - days in sampling).

do_speed_check()[source]¶

Perform the actual speed check.

Return type:: None

get_qc_outcomes()[source]¶

Retrieve the QC outcomes for all observations.

Return type:: ndarray
Returns:: np.ndarray – Array of QC flags for each observation (0 = passed, 1 = failed, untested otherwise).

valid_arrays()[source]¶

Validate the input observation arrays (longitude, latitude, and time differences).

Checks for: - NaN values in longitude, latitude, or time arrays. - Monotonicity of the time array (self.hrs).

Warnings are raised for any issues detected.

Return type:: bool
Returns:: bool – True if all arrays are valid, False otherwise.

valid_parameters()[source]¶

Validate the QC parameters to ensure they are sensible.

Checks that: - speed_limit is non-negative. - min_win_period is non-negative. - max_win_period is greater than or equal to min_win_period.

Warnings are raised for any invalid parameters.

Return type:: bool
Returns:: bool – True if all parameters are valid, False otherwise.

marine_qc.buoy_tracking_qc.do_aground_check(lons, lats, dates, smooth_win, min_win_period, max_win_period)[source]¶

Perform the aground check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.
min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).
max_win_period (int or None) – Maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if aground check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

smooth_win = 41
min_win_period = 8
max_win_period = 10

marine_qc.buoy_tracking_qc.do_new_aground_check(lons, lats, dates, smooth_win, min_win_period)[source]¶

Perform the new aground check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.
min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if new aground check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

smooth_win = 41
min_win_period = 8

marine_qc.buoy_tracking_qc.do_new_speed_check(lons, lats, dates, speed_limit, min_win_period, ship_speed_limit, delta_d, delta_t, n_neighbours)[source]¶

Perform the new speed check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).
min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).
ship_speed_limit (float) – Ship speed limit for the IQUAM track check.
delta_d (float) – The smallest increment in distance that can be resolved. For 0.01 degrees of lat-lon this is 1.11 km. Used in the IQUAM track check.
delta_t (float) – The smallest increment in time that can be resolved. For hourly data expressed as a float this is 0.01 hours. Used in the IQUAM track check.
n_neighbours (int) – Number of neighbours considered in the IQUAM track check.

Return type:

Returns:

array-like of int, shape (n,) – Array containing the QC outcomes for the new speed check.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

speed_limit = 3.0
min_win_period = 0.375

And, for the IQUAM-specific parameters:

ship_speed_limit = 60.0
delta_d = 1.11
delta_t = 0.01
n_neighbours = 5

marine_qc.buoy_tracking_qc.do_speed_check(lons, lats, dates, speed_limit, min_win_period, max_win_period)[source]¶

Perform the Track QC speed check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).
min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).
max_win_period (float) – Maximum period of time in days over which position is assessed for speed estimates (this should be greater than min_win_period and allow for some erratic temporal sampling e.g. min_win_period + 0.2 to allow for gaps of up to 0.2 - days in sampling).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if speed check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

speed_limit = 2.5
min_win_period = 0.8
max_win_perido = 1.8

marine_qc.buoy_tracking_qc.do_sst_biased_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]¶

Perform the SST bias check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.
bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.
n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST bias check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

n_eval = 30
bias_lim = 1.10
drif_intra = 1.0
drif_inter = 0.29
err_std_n = 3.0
n_bad = 2
background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_biased_noisy_short_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]¶

Perform the SST short check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.
bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.
n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST short check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

n_eval = 30
bias_lim = 1.10
drif_intra = 1.0
drif_inter = 0.29
err_std_n = 3.0
n_bad = 2
background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_end_tail_check(lons, lats, dates, sst, ostia, ice, bgvar, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]¶

Perform the SST Start Tail Check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).
long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.
short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.
short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.
short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST start tail check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

long_win_len = 121
long_err_std_n = 3.0
short_win_len = 30
short_err_std_n = 3.0
short_win_n_bad = 2
drif_inter = 0.29
drif_intra = 1.00
background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_noisy_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]¶

Perform the SST noise check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.
bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.
n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST noise check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

n_eval = 30
bias_lim = 1.10
drif_intra = 1.0
drif_inter = 0.29
err_std_n = 3.0
n_bad = 2
background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_start_tail_check(lons, lats, dates, sst, ostia, ice, bgvar, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]¶

Perform the SST Start Tail Check.

Parameters:

lons (SequenceNumberType) – 1-dimensional longitude array in degrees.
lats (SequenceNumberType) – 1-dimensional latitude array in degrees.
dates (SequenceDatetimeType) – 1-dimensional date array.
sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.
ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.
ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.
bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.
long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).
long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.
short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.
short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.
short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.
drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).
drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).
background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

Return type:

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST start tail check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

long_win_len = 121
long_err_std_n = 3.0
short_win_len = 30
short_err_std_n = 3.0
short_win_n_bad = 2
drif_inter = 0.29
drif_intra = 1.00
background_err_lim = 0.3

marine_qc.buoy_tracking_qc.is_monotonic(inarr)[source]¶

Test if elements in an array are increasing monotonically.

I.e. each element is greater than or equal to the preceding element.

Parameters:: inarr (array-like of datetime, shape (n,)) – 1-dimensional date array.
Return type:: bool
Returns:: bool – True if array is increasing monotonically, False otherwise.

marine_qc.buoy_tracking_qc.track_day_test(year, month, day, hour, lat, lon, elevdlim=-2.5)[source]¶

Given date, time, lat and lon calculate if the sun elevation is > elevdlim.

This is the “day” test used by tracking QC to decide whether an SST measurement is night or day. This is important because daytime diurnal heating can affect comparison with an SST background. It uses the function sunangle to calculate the elevation of the sun. A default solar_zenith angle of 92.5 degrees (elevation of -2.5 degrees) delimits night from day.

Parameters:

year (int) – Year.
month (int) – Month.
day (int) – Day.
hour (float) – Hour expressed as decimal fraction (e.g. 20.75 = 20:45 pm).
lat (float) – Latitude in degrees.
lon (float) – Longitude in degrees.
elevdlim (float, default: -2.5) – Elevation day/night delimiter in degrees above horizon.

Return type:

Returns:

bool – True if daytime, else False.

Raises:

ValueError – If either year, month, day, hour, lat or lon is numerically invalid or None of if either month, day, hour or lat is not in valid range.

marine_qc.calculate_humidity module¶

The CalcHums module contains a set of functions for calculating humidity variables.

At present, it can only cope with scalars, not arrays.

There are routines for: specific humidity from dew point temperature and temperature and pressure vapour pressure from dew point temperature and temperature and pressure relative humidity from dew point temperature and temperature and pressure wet bulb temperature from dew point temperature and temperature and pressure dew point depression from dew point temperature and temperature

There are also routines for: vapour pressure from specific humidity and pressure and temperature dew point temperature from vapour pressure and temperature and pressure relative humidity from vapour pressure and temperature and pressure wet bulb temperature from vapour pressure and dew point temperature and temperature (and pressure?)

Where vapour pressure is used as part of the equation a pseudo wet bulb temperature is calculated. If this is at or below 0 deg C then the ice bulb equation is used.

ALL NUMBERS ARE RETURNED TO ONE SIGNIFICANT DECIMAL FIGURE.

THIS ROUTINE CANNOT COPE WITH MISSING DATA

THIS ROUTINE HAS a roundit=True/False. The default is True - round to one decimal place. Otherwise - set roundit=False

Written by Kate Willett 7th Feb 2016

marine_qc.calculate_humidity.dpd(td, t, roundit=True)[source]¶

Calculate dew point depression from dew point temperature and dry bulb temperature.

Parameters:

td (float) – Dew point temperature in degrees C (array or scalar).
t (float) – Dry bulb temperature in degrees C (array or scalar).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Dew point depression in degrees C (array or scalar).

Notes

Ref:

TESTED! dpd = dpd(10..,15.) dpd = 5.0

marine_qc.calculate_humidity.rh(td, t, p, roundit=True)[source]¶

Calculate relative humidity from dew point temperature, dry bulb temperature and pressure.

It calculates the saturated vapour pressure from t. It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK). To test whether to apply the ice or water calculation a dewpoint and dry bulb temperature are needed. We can assume that the dry bulb t is the same as the wet bulb t at saturation. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:

td (float) – Dew point temperature in degrees C (array or scalar).
t (float) – Dry bulb temperature in degrees C (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Relative humidity in %rh (array or scalar).

Notes

Ref:

TESTED! rh = rh(10.,15.,1013.) rh = 72.0

marine_qc.calculate_humidity.sh(td, t, p, roundit=True)[source]¶

Calculate specific humidity dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK).

Parameters:

td (float) – Dew point temperature in degrees C (array or scalar).
t (float) – Dry bulb temperature in degrees C (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Specific humidity in g/kg (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! sh = sh(10.,15.,1013.) sh = 7.6

marine_qc.calculate_humidity.sh_from_vap(e, p, roundit=True)[source]¶

Calculate specific humidity from vapour pressure and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK).

Parameters:

e (float) – Vapour pressure in hPa (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Specific humidity in g/kg (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! sh = sh(10.,15.,1013.) sh = 7.6

marine_qc.calculate_humidity.td_from_vap(e, p, t, roundit=True)[source]¶

Calculate dew point depression from vapour pressure, pressure and dry bulb temperature.

It also requires temperature to check whether the wet bulb temperature is <= 0.0 - if so the ice bulb calculation is used.

Parameters:

e (float) – Vapour pressure in hPa (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
t (float) – Dry bulb temperature in degrees C (array or scalar).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Dew point depression in degrees C (array or scalar).

Notes

Buck 1981 Buck, A. L.: New equations for computing vapor pressure and enhancement factor, J. Appl. Meteorol., 20, 1527?1532, 1981. Jenson et al. 1990 Jensen, M. E., Burman, R. D., and Allen, R. G. (Eds.): Evapotranspiration and Irrigation Water Requirements: ASCE Manuals and Reports on Engineering Practices No. 70, American Society of Civil Engineers, New York, 360 pp., 1990.

TESTED! td = td_from_vap(12.3,1013.,15.) td = 10.0

marine_qc.calculate_humidity.vap(td, t, p, roundit=True)[source]¶

Calculate vapour pressure from dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if dewpoint temperature is an array (CHECK). To test whether to apply the ice or water calculation a dry bulb temperature is needed. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:

td (float) – Dew point temperature in degrees C (array or scalar).
t (float) – Dry bulb temperature in degrees C (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Vapour pressure in hPa (array or scalar).

Notes

TESTED! e = vap(10.,15.,1013.) e = 12.3

marine_qc.calculate_humidity.vap_from_sh(sh, p, roundit=True)[source]¶

Calculate vapour pressure from specific humidity and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if specific humidity is an array (CHECK).

Parameters:

sh (float) – Specific humidity in g/kg (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Vapour pressure in hPa (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! e = vap_from_sh(7.6,1013.) e = 12.3

marine_qc.calculate_humidity.wb(td, t, p, roundit=True)[source]¶

Calculate wet bulb temperature from dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even ifvapour pressure is an array (CHECK). To test whether to apply the ice or water calculation a dewpoint and dry bulb temperature are needed. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:

td (float) – Dew point temperature in degrees C (array or scalar).
t (float) – Dry bulb temperature in degrees C (array or scalar).
p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).
roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

Returns:

float – Wet bulb temperature in degrees C (array or scalar).

Notes

Ref: Jenson et al. 1990 Jensen, M. E., Burman, R. D., and Allen, R. G. (Eds.): Evapotranspiration and Irrigation Water Requirements: ASCE Manuals and Reports on Engineering Practices No. 70, American Society of Civil Engineers, New York, 360 pp., 1990.

TESTED! wb = wb(10.,15.,1013) wb = 12.2

marine_qc.external_clim module¶

Module to read external climatology files.

class marine_qc.external_clim.Climatology(data, time_axis=None, lat_axis=None, lon_axis=None, source_units=None, target_units=None, valid_ntime=None)[source]¶

Bases: object

Class for dealing with climatologies, reading, extracting values etc.

Automatically detects if this is a single field, pentad or daily climatology.

Parameters:

data (xr.DataArray) – Climatology data.
time_axis (str, optional) – Name of time axis. Set if time axis in data is not CF compatible.
lat_axis (str, optional) – Name of latitude axis. Set if latitude axis in data is not CF compatible.
lon_axis (str, optional) – Name of longitude axis. Set if longitude axis in data is not CF compatible.
source_units (str, optional) – Name of units in data. Set if units are not defined in data.
target_units (str, optional) – Name of target units to which units must conform.
valid_ntime (int or list, default: [1, 73, 365]) – Number of valid time steps: - 1: single field climatology - 73: pentad climatology - 365: daily climatology

convert_units_to(target_units, source_units=None)[source]¶

Convert units to user-specific units.

Parameters:

target_units (str) – Target units to which units must conform.
source_units (str, optional) – Source units if not specified in Climatology.

Return type:

Notes

For more information see: xclim.core.units.convert_units_to()

static get_t_index(month, day, ntime)[source]¶

Convert arrays of months and days to an array of indices for the grid.

Parameters:

month (ndarray) – Array of months.
day (ndarray) – Array of days.
ntime (int) – Number of time points in the grid, valid values are 1, 73 (pentad resolution) and 365 (daily resolution).

Return type:

Returns:

ndarray – Array of indices.

get_tindex(month, day)[source]¶

Get the time index of the input month and day.

Parameters:

month (int) – Month for which the time index is required.
day (int) – Day for which the time index is required.

Return type:

Returns:

int – Time index for specified month and day.

get_value(lat, lon, date=None, month=None, day=None)[source]¶

Get the value from a climatology at the give position and time.

Parameters:

lat (SequenceNumberType, optional) – Latitude of location to extract value from in degrees.
lon (SequenceNumberType, optional) – Longitude of location to extract value from in degrees.
date (SequenceDatetimeType, optional) – Date for which the value is required.
month (SequenceIntType, optional) – Month for which the value is required.
day (SequenceIntType, optional) – Day for which the value is required.

Return type:

ndarray | Series

Returns:

ndarray or pd.Series – Climatology value at specified location and time.

Notes

Use only exact matches for selecting time and nearest valid index value for selecting location.

get_value_fast(lat, lon, date=None, month=None, day=None)[source]¶

Get the value from a climatology at the give position and time.

Parameters:

lat (SequenceNumberType, optional) – Latitude of location to extract value from in degrees.
lon (SequenceNumberType, optional) – Longitude of location to extract value from in degrees.
date (SequenceDatetimeType, optional) – Date for which the value is required.
month (SequenceIntType, optional) – Month for which the value is required.
day (SequenceIntType, optional) – Day for which the value is required.

Return type:

ndarray | Series

Returns:

ndarray or pd.Series – Climatology value at specified location and time.

Notes

Assumes that the grid is a regular latitude longitude grid. The alternative method get_value works with non-regular grids.

static get_x_index(lon_arr, lon_axis)[source]¶

Convert an array of longitudes to an array of indices for the grid.

Parameters:

lon_arr (ndarray) – Array of longitudes.
lon_axis (ndarray) – Array containing the longitude axis.

Return type:

Returns:

ndarray – Array of indices.

static get_y_index(lat_arr, lat_axis)[source]¶

Convert an array of latitudes to an array of indices for the grid.

Parameters:

lat_arr (np.ndarray) – Array of latitudes.
lat_axis (np.ndarray) – Array containing the latitude axis.

Return type:

Returns:

np.ndarray – Array of indices.

classmethod open_netcdf_file(file_name, clim_name, **kwargs)[source]¶

Open a NetCDF climatology file and construct a Climatology instance.

Parameters:

file_name (str or path-like) – Path to the NetCDF file to open.
clim_name (str) – Name of the climatology variable within the NetCDF file.
**kwargs (dict) – Additional keyword arguments passed to the Climatology constructor.

Return type:

Climatology

Returns:

Climatology – A Climatology instance constructed from the specified variable in the NetCDF file. If the file cannot be opened, an empty climatology object is returned.

marine_qc.external_clim.get_climatological_value(climatology, **kwargs)[source]¶

Get the value from a climatology.

Parameters:

climatology (Climatology) – Climatology class.
**kwargs (dict) – Pass keyword-arguments to :py:func:~Climatology.get_value`.

Return type:

Returns:

ndarray – Climatology value at specified location and time.

marine_qc.external_clim.inspect_climatology(*climatology_keys, optional=None)[source]¶

A decorator factory to preprocess function arguments that may be Climatology objects.

This decorator inspects the specified function arguments and normalizes them to concrete numerical values before the decorated function is executed. Supported input types include raw numeric values, xarray objects, file paths, and Climatology instances.

Parameters:

*climatology_keys (str) – Names of required function arguments to be inspected. These should be arguments that may be:
- a numeric value
- a xr.DataArray
- a xr.Dataset
- a string or path-like object pointing to a valid NetCDF file on disk
- a Climatology instance
If a Climatology object (or an object convertible to one) is detected, it will be resolved to a concrete value using its .get_value_fast(**kwargs) method.
optional (str or sequence of str, optional) – Argument names that should be treated as optional. If they are explicitly passed when the decorated function is called, they will be treated the same way as climatology_keys.

Return type:

Returns:

Callable[..., Any] – A decorator that wraps the target function, processing specified arguments before the function is called.

Raises:

TypeError – If a required climatology argument is missing from the decorated function call
ValueError – If an xr.Dataset is provided without specifying clim_name, or if a string/Path input does not point to a valid file on disk.

Warns:

UserWarning – Issued if required keyword arguments for get_value_fast() are missing. This warning does not stop execution; missing values are replaced with np.nan.

Notes

xr.Dataset inputs require the keyword argument clim_name to select the relevant data variable.
xr.DataArray inputs are automatically wrapped in a Climatology object.
String or path-like inputs must point to an existing file and are opened via open_netcdf_file().
If a Climatology object is processed, it is resolved using .get_value_fast(**kwargs).
If required keyword arguments for .get_value_fast() are missing, a warning is issued.
If resolution fails due to TypeError or ValueError, the value is replaced with np.nan.

marine_qc.external_clim.open_xrdataset(files, use_cftime=True, decode_cf=False, decode_times=False, parallel=False, data_vars='minimal', chunks='default', coords='minimal', compat='override', combine='by_coords', **kwargs)[source]¶

Optimized function for opening large CF-compliant datasets with xarray.

This implementation follows guidance from: https://github.com/pydata/xarray/issues/1385#issuecomment-561920115

decode_timedelta=False is added to leave variables and coordinates with time units in {“days”, “hours”, “minutes”, “seconds”, “milliseconds”, “microseconds”} encoded as numbers.

Parameters:

files (str or list of str or path-like) – See the documentation for xarray.open_mfdataset().
use_cftime (bool, default: True) – See the documentation for xarray.decode_cf().
decode_cf (bool, default: True) – See the documentation for xarray.decode_cf().
decode_times (bool, default: False) – See the documentation for xarray.decode_cf().
parallel (bool, default: False) – See the documentation for xarray.open_mfdataset().
data_vars ({"minimal", "different", "all"} or list of str, default: "minimal") – See the documentation for xarray.open_mfdataset().
chunks (int, dict, "auto" or None, optional, default: "default") – If chunks is "default", chunks are set to {"time": 1}. See the documentation for xarray.open_mfdataset().
coords ({"minimal", "different", "all"} or list of str, optional, default: "minimal") – See the documentation for xarray.open_mfdataset().
compat ({"identical", "equals", "broadcast_equals", "no_conflicts", "override", "minimal"},) – default: “override” See the documentation for xarray.open_mfdataset().
combine ({"by_coords", "nested"}, optional, default: "by_coords") – See the documentation for xarray.open_mfdataset().
**kwargs (dict) – Additional keyword arguments passed to xarray.open_mfdataset().

Return type:

Dataset

Returns:

xarray.Dataset – Opened xarray Dataset, optimized for large CF datasets.

marine_qc.location_control module¶

Some generally helpful location control functions for base QC.

marine_qc.location_control.fill_missing_vals(q11, q12, q21, q22)[source]¶

Fill missing values.

For a group of four neighbouring grid boxes which form a square, with values q11, q12, q21, q22, fill gaps using means of neighbours.

Parameters:

q11 (float) – Value of first gridbox.
q12 (float) – Value of second gridbox.
q21 (float) – Value of third gridbox.
q22 (float) – Value of fourth gridbox.

Return type:

tuple[float | None, float | None, float | None, float | None]

Returns:

tuple of float – A tuple of four floats representing neighbour means.

marine_qc.location_control.filler(value_to_fill, neighbour1, neighbour2, opposite)[source]¶

Fill invalid values.

If the value_to_fill is invalid it is replaced with the mean of the neighbours and if it is still invalid then it is replaced with the value from the opposite member.

Parameters:

value_to_fill (float) – The value to fill.
neighbour1 (float) – The first neighbour.
neighbour2 (float) – The second neighbour.
opposite (float) – The opposite member.

Return type:

float | None

Returns:

float – Filled invalid input values.

marine_qc.location_control.get_four_surrounding_points(lat, lon, res, max90=True)[source]¶

Get the four surrounding points of a specified latitude and longitude point.

Parameters:

lat (float) – Latitude of point.
lon (float) – Longitude of point.
res (int) – Resolution of the grid in degrees.
max90 (bool, default: True) – If True then cap latitude at 90.0, otherwise don’t cap latitude.

Return type:

tuple[float, float, float, float]

Returns:

tuple of floats – A tuple of floats representing the longitudes of the leftmost and rightmost pairs of points, and the latitudes of the topmost and bottommost pairs of points.

marine_qc.location_control.lat_to_yindex(lat, res)[source]¶

For a given latitude return the y index in a 1x1x5-day global grid.

Parameters:

lat (float) – Latitude of the point.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

int – Grid box index.

Notes

The routine assumes that the structure of the SST array is a grid that is 360 x 180 x 73 i.e. one year of 1degree lat x 1degree lon data split up into pentads. The west-most box is at 180degrees with index 0 and the northernmost box also has index zero. Inputs on the border between grid cells are pushed south.

In previous versions, res had the default value 1.0.

marine_qc.location_control.lon_to_xindex(lon, res)[source]¶

For a given longitude return the x index in a 1x1x5-day global grid.

Parameters:

lon (float) – Longitude of the point.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

int – Grid box index.

Notes

The routine assumes that the structure of the SST array is a grid that is 360 x 180 x 73 i.e. one year of 1degree lat x 1degree lon data split up into pentads. The west-most box is at 180degrees W with index 0 and the northernmost box also has index zero. Inputs on the border between grid cells are pushed east.

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lat_to_yindex(lat, res)[source]¶

For a given latitude return the y-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:

lat (float) – Latitude of the point.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

int – Grid box index.

Notes

In the northern hemisphere, borderline latitudes which fall on grid boundaries are pushed north, except 90 which goes south. In the southern hemisphere, they are pushed south, except -90 which goes north. At 0 degrees they are pushed south.

Expects that latitudes run from 90N to 90S

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lat_to_yindex_fast(lat, res)[source]¶

For a given latitude return the y-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:

lat (np.ndarray) – Latitude(s) of observation in degrees.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

np.ndarray – Grid box indexes.

Notes

Expects that latitudes run from 90N to 90S

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lon_to_xindex(lon, res)[source]¶

For a given longitude return the x-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:

lon (float) – Longitude of the point.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

int – Grid box index.

Notes

In the western hemisphere, borderline longitudes which fall on grid boundaries are pushed west, except -180 which goes east. In the eastern hemisphere, they are pushed east, except 180 which goes west. At 0 degrees they are pushed west.

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lon_to_xindex_fast(lon, res)[source]¶

For a given longitude return the x-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:

lon (np.ndarray) – Longitude(s) of observation in degrees.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

np.ndarray – Grid box indexes.

Notes

In previous versions, res had the default value 1.0.

marine_qc.location_control.xindex_to_lon(xindex, res)[source]¶

Convert xindex to longitude.

Parameters:

xindex (int) – Index of the longitude.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

float – Longitude (degrees).

Notes

In previous versions, res had the default value 1.0.

marine_qc.location_control.yindex_to_lat(yindex, res)[source]¶

Convert yindex to latitude.

Parameters:

yindex (int) – Index of the latitude.
res (float) – Resolution of grid in degrees.

Return type:

Returns:

float – Latitude (degrees).

Notes

In previous versions, res had the default value 1.0.

marine_qc.multiple_checks module¶

Module containing base QC which call multiple QC functions and could be applied on a DataBundle.

marine_qc.multiple_checks.do_multiple_grouped_check(data, qc_dict=None, preproc_dict=None, return_method='all')[source]¶

Apply one or more buddy-check quality-control (QC) functions to a DataFrame or Series.

Parameters:

data (pd.Series or pd.DataFrame) – Hashable input data.
qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments). For more information see Examples.
preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.
return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame or pd.Series – A DataFrame (or Series if the input was a Series) whose columns correspond to the QC names in qc_dict and whose values contain QC flags for each row. Flags depend on the QC functions used.

Raises:

NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.
ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see do_multiple_individual_checks().

marine_qc.multiple_checks.do_multiple_individual_check(data, qc_dict=None, preproc_dict=None, return_method='all')[source]¶

Apply one or more quality-control (QC) functions independently to each row of a DataFrame or Series.

Parameters:

data (pd.Series or pd.DataFrame) – Hashable input data.
qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments). For more information see Examples.
preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.
return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

Raises:

NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.
ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see Examples.

Examples

An example qc_dict for a hard limit test:

qc_dict = {
    "hard_limit_check": {
        "func": "do_hard_limit_check",
        "names": "ATEMP",
        "arguments": {"limits": [193.15, 338.15]},
    }
}

An example qc_dict for a climatology test. Variable “climatology” was previously defined:

qc_dict = {
    "climatology_check": {
        "func": "do_climatology_check",
        "names": {
            "value": "observation_value",
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time",
        },
        "arguments": {
            "climatology": climatology,
            "maximum_anomaly": 10.0,  # K
        },
    },
}

An example preproc_dict for extracting a climatological value:

preproc_dict = {
    "func": "get_climatological_value",
    "names": {
        "lat": "latitude",
        "lon": "longitude",
        "date": "date_time",
    },
    "inputs": climatology,
}

Make use of both dictionaries:

preproc_dict = {
    "func": "get_climatological_value",
    "names": {
        "lat": "latitude",
        "lon": "longitude",
        "date": "date_time",
    },
    "inputs": climatology,
}

qc_dict = {
    "climatology_check": {
        "func": "do_climatology_check",
        "names": {
            "value": "observation_value",
        },
        "arguments": {
            "climatology": "__preprocessed__",
            "maximum_anomaly": 10.0,  # K
        },
    },
}

Finally, run the function:

do_multiple_individual_check(
    data=df,
    qc_dict=qc_dict,
    preproc_dict=preproc_dict,
    return_method="failed",
)

marine_qc.multiple_checks.do_multiple_sequential_check(data, groupby=None, qc_dict=None, preproc_dict=None, return_method='all')[source]¶

Apply one or more sequential quality-control (QC) functions to groups of a DataFrame or Series.

Typically for time-ordered or track-based checks.

Parameters:

data (pd.Series or pd.DataFrame) – Hashable input data.
groupby (str, iterable of str, or pandas GroupBy, optional) – Specifies how the data should be grouped before applying QC functions. If a string or iterable of strings, data.groupby is called on those keys. If a pandas.DataFrameGroupBy object is provided, its groups are used directly. Any groups that contain indices not present in data are automatically trimmed. If None, the entire input data is treated as a single group. For more information see Examples.
qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments).
preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.
return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

Raises:

NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.
ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see do_multiple_individual_checks().

marine_qc.plot_qc_outcomes module¶

Plot QC outcomes.

Some plotting routines for QC outcomes

marine_qc.plot_qc_outcomes.latitude_longitude_plot(lat, lon, qc_outcomes, filename=None)[source]¶

Plot a graph of points showing the latitude and longitude of a set of observations coloured according to the QC outcomes.

Parameters:

lat (np.ndarray) – Array of latitude values in degrees.
lon (np.ndarray) – Array of longitude values in degrees.
qc_outcomes (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.
filename (str or None) – Filename to save the figure to. If None, the figure is saved with a standard name.

Return type:

Figure

Returns:

Figure – The main figure obkect created by plt.subplots().

marine_qc.plot_qc_outcomes.latitude_variable_plot(lat, value, qc_outcomes, filename=None)[source]¶

Plot a graph of points showing the latitude and value of a set of observations coloured according to the QC oucomes.

Parameters:

lat (np.ndarray) – Array of latitude values in degrees.
value (np.ndarray) – Array of observed values for the variable.
qc_outcomes (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.
filename (str or None) – Filename to save the figure to. If None, the figure is saved with a standard name.

Return type:

Figure

Returns:

Figure – The main figure obkect created by plt.subplots().

marine_qc.qc_grouped_reports module¶

QC of grouped reports.

Module containing QC functions for quality control of grouped marine reports.

class marine_qc.qc_grouped_reports.SuperObsGrid[source]¶

Bases: object

Class for gridding data in buddy check, based on numpy arrays.

add_multiple_observations(lat, lon, value, date=None, month=None, day=None)[source]¶

Add a series of observations to the grid and take the grid average.

Parameters:

lat (SequenceNumberType) – 1-dimensional latitude array.
lon (SequenceNumberType) – 1-dimensional longitude array.
value (SequenceNumberType) – 1-dimensional anomaly array.
date (SequenceDatetimeType, optional) – 1-dimensional datetime array.
month (SequenceIntType, optional) – 1-dimensional month array. Used if date is not provided.
day (SequenceIntType, optional) – 1-dimensional day array. Used if date is not provided.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Return type:

Notes

The observations should be anomalies.

add_single_observation(lat, lon, month, day, anom)[source]¶

Add an anomaly to the grid from specified lat lon and date.

Parameters:

lat (float) – Latitude of the observation in degrees.
lon (float) – Longitude of the observation in degrees.
month (int) – Month of the observation.
day (int) – Day of the observation.
anom (float) – Value to be added to the grid.

Return type:

Returns:

None – The function performs its operations in-place and does not return anything.

get_buddy_limits_with_parameters(pentad_stdev, limits, number_of_obs_thresholds, multipliers)[source]¶

Get buddy limits with parameters.

Parameters:

pentad_stdev (Climatology) – Climatology containing the 3-dimensional latitude array containing the standard deviations.
limits (list[list[int]]) – List of the limits.
number_of_obs_thresholds (list[list[int]]) – List containing the number of obs thresholds.
multipliers (list[list[float]]) – List containing the multipliers to be applied.

Return type:

Returns:

None – The function performs its operations in-place and does not return anything.

get_buddy_mean(lat, lon, month, day)[source]¶

Get the buddy mean from the grid for a specified time and place.

Parameters:

lat (float) – Latitude of the location for which the buddy mean is desired.
lon (float) – Longitude of the location for which the buddy mean is desired.
month (int) – Month for which the buddy mean is desired.
day (int) – Day for which the buddy mean is desired.

Return type:

Returns:

float – Buddy mean at the specified location.

get_buddy_stdev(lat, lon, month, day)[source]¶

Get the buddy standard deviation from the grid for a specified time and place.

Parameters:

lat (float) – Latitude of the location for which the buddy standard deviation is desired.
lon (float) – Longitude of the location for which the buddy standard deviation is desired.
month (int) – Month for which the buddy standard deviation is desired.
day (int) – Day for which the buddy standard deviation is desired.

Return type:

Returns:

float – Buddy standard deviation at the specified location.

get_neighbour_anomalies(search_radius, xindex, yindex, pindex)[source]¶

Search within a specified search radius of the given point and extract the neighbours for buddy check.

Parameters:

search_radius (list[int]) – Three element array search radius [lon, lat, time].
xindex (int) – The xindex of the gridcell to start from.
yindex (int) – The yindex of the gridcell to start from.
pindex (int) – The pindex of the gridcell to start from.

Return type:

tuple[list[float], list[float]]

Returns:

tuple of list of float – Anomalies and numbers of observations in two lists.

get_new_buddy_limits(stdev1, stdev2, stdev3, limits, sigma_m, noise_scaling)[source]¶

Get buddy limits for new bayesian buddy check.

Parameters:

stdev1 (Climatology) – Field of standard deviations representing standard deviation of difference between target gridcell and complete neighbour average (grid area to neighbourhood difference).
stdev2 (Climatology) – Field of standard deviations representing standard deviation of difference between a single observation and the target gridcell average (point to grid area difference).
stdev3 (Climatology) – Field of standard deviations representing standard deviation of difference between random neighbour gridcell and full neighbour average (uncertainty in neighbour average).
limits (list[int, int, int]) – Three membered list of number of degrees in latitude and longitude and number of pentads.
sigma_m (float) – Estimated measurement error uncertainty.
noise_scaling (float) – Scale noise by a factor of noise_scaling used to match observed variability.

Return type:

Returns:

None – The function performs its operations in-place and does not return anything.

Notes

The original default values for limits, sigma_m, and noise_scaling originally defaulted to:

limits = (2, 2, 4)
sigma_m = 1.0
noise_scaling = 3.0

take_average()[source]¶

Take the average of a grid to which reps have been added using add_rep.

Return type:: None

marine_qc.qc_grouped_reports.do_bayesian_buddy_check(lat, lon, date, value, climatology, stdev1, stdev2, stdev3, prior_probability_of_gross_error, quantization_interval, one_sigma_measurement_uncertainty, limits, noise_scaling, maximum_anomaly, fail_probability, ignore_indexes=None)[source]¶

Do the Bayesian buddy check.

The bayesian buddy check assigns a probability of gross error to each observation, which is rounded down to the tenth and then multiplied by 10 to yield a flag between 0 and 9.

Parameters:

lat (SequenceNumberType) – 1-dimensional latitude array.
lon (SequenceNumberType) – 1-dimensional longitude array.
date (SequenceDatetimeType) – 1-dimensional date array.
value (SequenceNumberType) – 1-dimensional anomaly array.
climatology (ClimArgType) – The climatological average(s) used to calculate anomalies. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.
stdev1 (Climatology) – Field of standard deviations representing standard deviation of difference between target gridcell and complete neighbour average (grid area to neighbourhood difference).
stdev2 (Climatology) – Field of standard deviations representing standard deviation of difference between a single observation and the target gridcell average (point to grid area difference).
stdev3 (Climatology) – Field of standard deviations representing standard deviation of difference between random neighbour gridcell and full neighbour average (uncertainty in neighbour average).
prior_probability_of_gross_error (float) – Prior probability of gross error, which is the background rate of gross errors.
quantization_interval (float) – Smallest possible increment in the input values.
one_sigma_measurement_uncertainty (float) – Estimated one sigma measurement uncertainty.
limits (list[int]) – List with three members which specify the search range for the buddy check.
noise_scaling (float) – Tuning parameter used to multiply stdev2. This was determined to be approximately 3.0 by comparison with observed point data. stdev2 was estimated from OSTIA data and typically underestimates the point to area-average difference by this factor.
maximum_anomaly (float) – Largest absolute anomaly, assumes that the maximum and minimum anomalies have the same magnitude.
fail_probability (float) – Probability of gross error that corresponds to a failed test. Anything with a probability of gross error greater than fail_probability will be considered failing.
ignore_indexes (list[int], optional) – List of row numbers to be skipped.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 2s if there are no buddies in the specified limits
Returns array/sequence/Series of 1s if the bayesian buddy check fails
Returns or array/sequence/Series of 0s otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions the default values for the parameters were:

prior_probability_of_gross_error = 0.05
quantization_interval = 0.1
limits = [2, 2, 4]
noise_scaling = 3.0
one_sigma_measurement_uncertainty = 1.0
maximum_anomaly = 8.0
fail_probability = 0.3

marine_qc.qc_grouped_reports.do_mds_buddy_check(lat, lon, date, value, climatology, standard_deviation, limits, number_of_obs_thresholds, multipliers, ignore_indexes=None)[source]¶

Do the old style buddy check.

The buddy check compares an observation to the average of its near neighbours (called the buddy mean). Depending on how many neighbours there are and their proximity to the observation being tested a multiplier is set. If the difference between the observation and the buddy mean is larger than the multiplier times the standard deviation then the observation fails the buddy check. If no buddy observations are found within the specified limits, then the limits are expanded until the check runs out of specified limits or observations are found within the limits.

Parameters:

lat (SequenceNumberType) – 1-dimensional latitude array.
lon (SequenceNumberType) – 1-dimensional longitude array.
date (SequenceDatetimeType) – 1-dimensional date array.
value (SequenceNumberType) – 1-dimensional anomaly array.
climatology (ClimArgType) – The climatological average(s) used to calculate anomalies. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.
standard_deviation (Climatology) – Field of standard deviations of 1x1xpentad standard deviations.
limits (list[list]) – Limits a list of lists. Each list member is a three-membered list specifying the longitudinal, latitudinal, and time range within which buddies are sought at each level of search.
number_of_obs_thresholds (list[list]) – Number of observations corresponding to each multiplier in multipliers. The initial list should be the same length as the limits list.
multipliers (list[list]) – Multiplier, x, used for buddy check mu +- x * sigma. The list should have the same structure as number_of_obs_threshold.
ignore_indexes (list[int], optional) – List of row numbers to be skipped.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the MDS buddy check fails
Returns or array/sequence/Series of 0s otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

The limits, number_of_obs_thresholds, and multipliers parameters are rather complex. The buddy check basically looks within a lat-lon-time range specified by the first element in limits. If there are more than zero observations in the search range then a multiplier is chosen based on how many observations there are.

If the first element of limits is [1,1,2] then we first look within a distance equivalent to 1 degree latitude and longitude at the equator and 2 pentads in time. If there are more than zero observations then we calculate the buddy mean, and we consult the number_of_obs_threshold. If, for example, this is [0, 5, 15, 100] then we look for the first entry where the number of obs is greater than that threshold. We then look up the multiplier in the appropriate list (say [4, 3.5, 3.0, 2.5]). If the difference between an observation and the buddy mean is greater than the multiplier times the standard deviation at that point then it fails the buddy check. So, if there were 10 observations then the multiplier would be 3.5.

Previous versions had default values for the parameters of:

limits = [[1, 1, 2], [2, 2, 2], [1, 1, 4], [2, 2, 4]]
number_of_obs_thresholds = [[0, 5, 15, 100], [0], [0, 5, 15, 100], [0]]
multipliers = [[4.0, 3.5, 3.0, 2.5], [4.0], [4.0, 3.5, 3.0, 2.5], [4.0]]

marine_qc.qc_grouped_reports.get_threshold_multiplier(total_nobs, nob_limits, multiplier_values)[source]¶

Find the highest value of i such that total_nobs is greater than nob_limits[i] and return multiplier_values[i].

This routine is used by the buddy check. It’s a bit niche.

Parameters:

total_nobs (int) – Total number of neighbour observations.
nob_limits (list[int]) – List containing the limiting numbers of observations in ascending order first element must be zero.
multiplier_values (list[float]) – List containing the multiplier values associated..

Return type:

Returns:

float – The multiplier value.

marine_qc.qc_individual_reports module¶

QC of individual reports.

Module containing main QC functions which could be applied on a DataBundle.

marine_qc.qc_individual_reports.do_climatology_check(value, climatology, maximum_anomaly, standard_deviation='default', standard_deviation_limits=None, lowbar=None)[source]¶

Climatology check to compare a value with a climatological average within specified anomaly limits.

This check supports optional parameters to customize the comparison.

If standard_deviation is provided, the value is converted into a standardised anomaly. Optionally, if standard deviation is outside the range specified by standard_deviation_limits then standard_deviation is set to whichever of the lower or upper limits is closest. If lowbar is provided, the anomaly must be greater than lowbar to fail regardless of standard_deviation.

Parameters:

value (ValueNumberType) – Value(s) to be compared to climatology. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
climatology (ClimArgType) – The climatological average(s) to which the values(s) will be compared. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.
maximum_anomaly (float) – Largest allowed anomaly. If standard_deviation is provided, this is interpreted as the largest allowed standardised anomaly.
standard_deviation (ClimArgType, default: "default") – The standard deviation(s) used to standardise the anomaly If set to “default”, it is internally treated as 1.0. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.
standard_deviation_limits (tuple of float, optional) – A tuple of two floats representing the upper and lower limits for standard deviation used in the check.
lowbar (float, optional) – The anomaly must be greater than lowbar to fail regardless of standard deviation.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

Returns 2 (or array/sequence/Series of 2s) if standard_deviation_limits[1] is less than or equal to standard_deviation_limits[0], or if maximum_anomaly is less than or equal to 0, or if any of value, climate_normal, or standard_deviation is numerically invalid (None or NaN).
Returns 1 (or array/sequence/Series of 1s) if the difference is outside the specified range.
Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

Notes

If either climatology or standard_deviation is a Climatology object, pass lon and lat and date, or month and day, as keyword arguments to extract the relevant climatological value(s).

marine_qc.qc_individual_reports.do_date_check(date=None, year=None, month=None, day=None, year_init=None, year_end=None)[source]¶

Perform the date QC check on the report. Checks whether the given date or date components are valid.

Parameters:

date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
year (ValueIntType, optional) – Year(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
month (ValueIntType, optional) – Month(s) of observation (1-12). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
day (ValueIntType, optional) – Day(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
year_init (int, optional) – Initial valid year.
year_end (int, optional) – Last valid year.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

Returns 2 (or array/sequence/Series of 2s) if any of year, month, or day is numerically invalid or None,
Returns 1 (or array/sequence/Series of 1s) if the date is not valid,
Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_day_check(date=None, year=None, month=None, day=None, hour=None, lat=None, lon=None, time_since_sun_above_horizon=None)[source]¶

Determine if the sun was above the horizon a specified time before the report.

This “day” test is used to classify Marine Air Temperature (MAT) measurements as either Night MAT (NMAT) or Day MAT, accounting for solar heating biases and a potential lag between sun rise and the onset of significant warming. The function calculates the sun’s elevation using the sunangle function, offset by the specified time_since_sun_above_horizon.

Parameters:

date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
year (ValueIntType, optional) – Year(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
month (ValueIntType, optional) – Month(s) of observation (1-12). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
day (ValueIntType, optional) – Day(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
hour (ValueFloatType, optional) – Hour(s) of observation (minutes as decimal). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lat (ValueNumberType, optional) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (ValueNumberType, optional) – Longitude() of observation in degree. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
time_since_sun_above_horizon (float) – Maximum time sun can have been above horizon (or below) to still count as night. Original QC test had this set to 1.0 i.e. it was night between one hour after sundown and one hour after sunrise.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

Returns 2 (or array/sequence/Series of 2s) if any of do_position_check, do_date_check, or do_time_check returns 2.
Returns 1 (or array/sequence/Series of 1s) if any of do_position_check, do_date_check, or do_time_check returns 1 or if it is night (sun below horizon an hour ago).
Returns 0 if it is day (sun above horizon an hour ago).

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_sequential_reports module¶

QC of sequential reports.

Module containing QC functions for track checking which could be applied on a DataBundle.

marine_qc.qc_sequential_reports.do_few_check(value)[source]¶

Check if number of observations is less than 3.

Parameters:

value (SequenceNumberType) – One-dimensional array of values to be analyzed. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if number of observations is less than 3.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If either input is not 1-dimensional.
TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.qc_sequential_reports.do_iquam_track_check(lat, lon, date, speed_limit, delta_d, delta_t, n_neighbours)[source]¶

Perform the IQUAM track check as detailed in Xu and Ignatov 2013.

The track check calculates speeds between pairs of observations and counts how many exceed a threshold speed. The ob with the most violations of this limit is flagged as bad and removed from the calculation. Then the next worst is found and removed until no violations remain.

Parameters:

lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
speed_limit (float) – Speed limit of platform in kilometers per hour. Typically, 60.0 for ships and 15.0 for drifting buoys.
delta_d (float) – Latitude tolerance in degrees.
delta_t (float) – Time tolerance in hundredths of an hour.
n_neighbours (int) – Number of neighbouring points considered in the analysis.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the IQUAM QC fails.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

speed_limit = 60.0 for ships and 15.0 for drifting buoys
delta_d = 1.11
delta_t = 0.01
n_neighbours = 5

marine_qc.qc_sequential_reports.do_spike_check(value, lat, lon, date, max_gradient_space, max_gradient_time, delta_t, n_neighbours)[source]¶

Perform IQUAM-like spike check.

Parameters:

value (SequenceNumberType) – One-dimensional array of values to be analyzed. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lat (SequenceNumberType) – One-dimensional array of latitudes in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional array of longitudes in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional array of datetime values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
max_gradient_space (float, default: 0.5) – Maximum allowed spatial gradient. The unit is “units of value” per kilometer.
max_gradient_time (float, default: 1.0) – Maximum allowed temporal gradient. The unit is “units of value” per hour.
delta_t (float, default: 2.0) – Temperature delta used in the comparison. Typically set to 2.0 for ships and 1.0 for drifting buoys.
n_neighbours (int, default: 5) – Number of neighboring points considered in the analysis.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the spike check fails.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

max_gradient_space: float = 0.5
max_gradient_time: float = 1.0
delta_t: float = 2.0
n_neighbours: int = 5

marine_qc.qc_sequential_reports.do_track_check(vsi, dsi, lat, lon, date, max_direction_change, max_speed_change, max_absolute_speed, max_midpoint_discrepancy)[source]¶

Perform one pass of the track check.

This is an implementation of the MDS track check code which was originally written in the 1990s. I don’t know why this piece of historic trivia so exercises my mind, but it does: the 1990s! I wish my code would last so long.

Parameters:

vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
max_direction_change (float, default: 60.0) – Maximum valid direction change in degrees.
max_speed_change (float, default: 10.0) – Maximum valid speed change in km/h.
max_absolute_speed (float, default: 40.0) – Maximum valid absolute speed in km/h.
max_midpoint_discrepancy (float, default: 150.0) – Maximum valid midpoint discrepancy in meters.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the track check fails.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

If number of observations is less than three, the track check always passes.

In previous versions, the default values of the parameters were:

max_direction_change = 60.0
max_speed_change = 10.0
max_absolute_speed = 40.0
max_midpoint_discrepancy = 150.0

marine_qc.qc_sequential_reports.find_multiple_rounded_values(value, min_count, threshold)[source]¶

Find instances when more than “threshold” of the observations are whole numbers and set the ‘round’ flag.

Used in the humidity QC where there are times when the values are rounded and this may have caused a bias.

Parameters:

value (SequenceNumberType) – One-dimensional array of values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
min_count (int, default: 20) – Minimum number of rounded figures that will trigger the test.
threshold (float, default: 0.5) – Minimum fraction of all observations that will trigger the test.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the value is a whole number.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If threshold is not between 0.0 and 1.0.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

min_count = 20
threshold = 0.5

marine_qc.qc_sequential_reports.find_repeated_values(value, min_count, threshold)[source]¶

Find cases where more than a given proportion of SSTs have the same value.

This function goes through a voyage and finds any cases where more than a threshold fraction of the observations have the same values for a specified variable.

Parameters:

value (SequenceNumberType) – One-dimensional array of values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
min_count (int, default: 20) – Minimum number of repeated values that will trigger the test.
threshold (float, default: 0.7) – Smallest fraction of all observations that will trigger the test.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if the value is repeated.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError –
- If threshold is not between 0.0 and 1.0.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

min_count = 20
threshold = 0.7

marine_qc.qc_sequential_reports.find_saturated_runs(at, dpt, lat, lon, date, min_time_threshold, shortest_run)[source]¶

Perform checks on persistence of 100% rh while going through the voyage.

While going through the voyage repeated strings of 100 %rh (AT == DPT) are noted. If a string extends beyond 20 reports and two days/48 hrs in time then all values are set to fail the repsat qc flag.

Parameters:

at (SequenceNumberType) – One-dimensional air temperature array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
dpt (SequenceNumberType) – One-dimensional dew point temperature array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
min_time_threshold (float, default: 48.0) – Minimum time threshold in hours.
shortest_run (int, default: 4) – Shortest number of observations.

Return type:

Returns:

SequenceIntType – Same type as input, but with integer values

Returns array/sequence/Series of 1s if a saturated run is found.
Returns array/sequence/Series of 0s otherwise.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous version, default values for the parameters were:

min_time_threshold = 48.0
shortest_run = 4

marine_qc.spherical_geometry module¶

Quality control suite spherical geometry module.

The spherical geometry module is a simple collection of calculations on a sphere Sourced from https://edwilliams.org/avform147.htm formerly williams.best.vwh.net/avform.htm

marine_qc.spherical_geometry.angular_distance(lat1, lon1, lat2, lon2)[source]¶

Calculate the great-circle angular distance between two points on a sphere.

Input latitudes and longitudes should be in degrees. Output distance is returned in radians.

Parameters:

lat1 (SequenceNumberType) – Latitude of the first point in degrees.
lon1 (SequenceNumberType) – Longitude of the first point in degrees.
lat2 (SequenceNumberType) – Latitude of the second point in degrees.
lon2 (SequenceNumberType) – Longitude of the second point in degrees.

Return type:

Returns:

np.ndarray – Angular great-circle distance between the two points in radians. NaN is returned for any invalid input values.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.course_between_points(lat1, lon1, lat2, lon2)[source]¶

Given two points find the initial true course at point1 inputs are in degrees and output is in degrees.

Parameters:

lat1 (SequenceNumberType) – Latitude of the first point in degrees.
lon1 (SequenceNumberType) – Longitude of the first point in degrees.
lat2 (SequenceNumberType) – Latitude of the second point in degrees.
lon2 (SequenceNumberType) – Longitude of the second point in degrees.

Return type:

SequenceFloatType

Returns:

SequenceFloatType – Initial true course in degrees at point one along the great circle between point one and point two.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.intermediate_point(lat1, lon1, lat2, lon2, f)[source]¶

Compute the intermediate point along the great-circle path between two points.

Given two lat,lon points find the latitude and longitude that are a fraction f of the great circle distance between them https://edwilliams.org/avform147.htm formerly williams.best.vwh.net/avform.htm#Intermediate

Parameters:

lat1 (SequenceNumberType) – Latitude of the first point in degrees.
lon1 (SequenceNumberType) – Longitude of the first point in degrees.
lat2 (SequenceNumberType) – Latitude of the second point in degrees.
lon2 (SequenceNumberType) – Longitude of the second point in degrees.
f (float) – Fraction of distance between the two points.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple of (np.ndarray, np.ndarray) – A tuple containing: - Latitude(s) of the intermediate point(s) in degrees. - Longitude(s) of the intermediate point(s) in degrees. The outputs have the same shape as the broadcasted inputs.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.lat_lon_from_course_and_distance(lat1, lon1, tc, d)[source]¶

Calculate latitude and longitude given a starting point, true course and distance.

Uses spherical trigonometry formulas from https://edwilliams.org/avform147.htm to compute the endpoint given a starting latitude and longitude, a true coure (bearing), and a distance traveled along a great-circle path.

Parameters:

lat1 (SequenceNumberType) – Latitude of the first point in degrees.
lon1 (SequenceNumberType) – Longitude of the first point in degrees.
tc (float) – True course measured clockwise from north in degrees.
d (float) – Distance travelled in kilometres.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple of (SequenceFloatType, SequenceFloatType) – A tuple containing: - Latitude(s) of the intermediate point(s) in degrees. - Longitude(s) of the intermediate point(s) in degrees. The outputs have the same shape as the broadcasted inputs.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.sphere_distance(lat1, lon1, lat2, lon2)[source]¶

Calculate the great circle angular distance between two points on a sphere.

Input latitudes and longitudes should be in degrees. Output distance is returned in radians.

The great circle distance is the shortest distance between any two points on the Earths surface. The calculation is done by first calculating the angular distance between the points and then multiplying that by the radius of the Earth. The angular distance calculation is handled by another function.

Parameters:

lat1 (SequenceNumberType) – Latitude of the first point in degrees.
lon1 (SequenceNumberType) – Longitude of the first point in degrees.
lat2 (SequenceNumberType) – Latitude of the second point in degrees.
lon2 (SequenceNumberType) – Longitude of the second point in degrees.

Return type:

Returns:

np.ndarray – Angular great-circle distance between the two points in kilometres.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.statistics module¶

Some generally helpful statistical functions for base QC.

marine_qc.statistics.missing_mean(inarr)[source]¶

Return mean of input array.

Parameters:: inarr (list of float) – List of values for which mean is required. Missing values represented by None in list.
Return type:: float | None
Returns:: float or None – Mean of non-missing values or None.

marine_qc.statistics.p_data_given_good(x, q, r_hi, r_lo, mu, sigma)[source]¶

Probability of an observed value assuming it comes from a “good” measurement.

Calculate the probability of an observed value x given a normal distribution with mean mu standard deviation of sigma, where x is constrained to fall between R_hi and R_lo and is known only to an integer multiple of Q, the quantization level.

Parameters:

x (float) – Observed value for which probability is required.
q (float) – Quantization of x, i.e. x is an integer multiple of Q.
r_hi (float) – The upper limit on x imposed by previous QC choices.
r_lo (float) – The lower limit on x imposed by previous QC choices.
mu (float) – The mean of the distribution.
sigma (float) – The standard deviation of the distribution.

Return type:

Returns:

float – Probability of the observed value given the specified distribution.

Raises:

ValueError – When inputs are incorrectly specified: q<=0, sigma<=0, r_lo > r_hi, x < r_lo or x > r_hi.

marine_qc.statistics.p_data_given_gross(q, r_hi, r_lo)[source]¶

Probability of an observed value assuming it is a gross error.

Calculate the probability of the data given a gross error assuming gross errors are uniformly distributed between R_low and R_high and that the quantization, rounding level is Q

Parameters:

q (float) – Quantization of x, i.e. x is an integer multiple of Q.
r_hi (float) – The upper limit on x imposed by previous QC choices.
r_lo (float) – The lower limit on x imposed by previous QC choices.

Return type:

Returns:

float – Probability of the observed value given that it is a gross error.

Raises:

ValueError – When limits are not ascending or q<=0.

marine_qc.statistics.p_gross(p0, q, r_hi, r_lo, x, mu, sigma)[source]¶

Posterior probability that an observation is a gross error.

Calculate the posterior probability of a gross error given the prior probability p0, the quantization level of the observed value, Q, previous limits on the observed value, R_hi and R_lo, the observed value, x, and the mean (mu) and standard deviation (sigma) of the distribution of good observations assuming they are normally distributed. Gross errors are assumed to be uniformly distributed between R_lo and R_hi.

Parameters:

p0 (float) – Prior probability of gross error.
q (float) – Quantization of x, i.e. x is an integer multiple of Q.
r_hi (float) – The upper limit on x imposed by previous QC choices.
r_lo (float) – The lower limit on x imposed by previous QC choices.
x (float) – Observed value for which probability is required.
mu (float) – The mean of the distribution of good obs.
sigma (float) – The standard deviation of the distribution of good obs.

Return type:

Returns:

float – Probability of gross error given an observed value.

Raises:

ValueError – When inputs are incorrectly specified: p0 < 0, p0 > 1, q <= 0, r_hi < r_lo, x < r_lo, x > r_hi, sigma <= 0.

marine_qc.statistics.trim_mean(inarr, trim)[source]¶

Calculate a resistant (aka robust) mean of an input array given a trimming criteria.

Parameters:

inarr (array-like of float, shape (n,)) – 1-dimensional value array.
trim (int) – Trimming criteria. A value of 10 trims one tenth of the values off each end of the sorted array before calculating the mean.

Return type:

Returns:

float – Trimmed mean.

marine_qc.statistics.trim_std(inarr, trim)[source]¶

Calculate a resistant (aka robust) standard deviation of an input array given a trimming criteria.

Parameters:

inarr (array-like of float, shape (n,)) – 1-dimensional value array.
trim (int) – Trimming criteria. A value of 10 trims one tenth of the values off each end of the sorted array before calculating the standard deviation.

Return type:

Returns:

float – Trimmed standard deviation.

marine_qc.statistics.winsorised_mean(inarr)[source]¶

Compute the 25% winsorised mean of the input array.

The winsorised mean is a resistant way of calculating an average.

Parameters:: inarr (list of float) – Input array to be averaged.
Return type:: float
Returns:: float – The winsorised mean of the input array with a 25% trimming.
Raises:: ValueError – if length of inarr is equal to 0.

Notes

The winsorised mean is that which you get if you set the first quarter of the sorted input array to the 1st quartile value and the last quarter to the 3rd quartile and then take the mean. This is quite a heavy trimming of the distribution. It makes it very resistant - about half the obs can be egregiously bad without affecting the mean strongly - but it will be less accurate if there are lots of observations, or the quality of the obs is higher.

marine_qc.time_control module¶

Some generally helpful time control functions for base QC.

marine_qc.time_control.convert_date(params)[source]¶

Decorator to extract date components and inject them as function parameters.

This decorator intercepts the ‘date’ argument from the function call, splits it into its components (e.g., year, month, day), and assigns those components to specified parameters in the wrapped function. It supports scalar or sequence inputs for ‘date’.

Parameters:: params (list of str) – List of parameter names corresponding to date components to be extracted and passed to the decorated function.
Return type:: Callable[..., Any]
Returns:: Callable[..., Any] – A decorator that wraps a function, extracting date components before calling it.

Notes

The decorator expects the wrapped function to accept the parameters listed in params. If a parameter is missing, it raises a ValueError.
If the ‘date’ argument is None, the original function is called without modification.
Supports scalar-like ‘date’ values as well as iterable sequences.
Assumes a helper function split_date exists that splits a date into components and returns a dictionary mapping parameter names to their values.

marine_qc.time_control.convert_date_to_hours(dates)[source]¶

Convert an array of datetimes to an array of hours since the first element.

Parameters:: dates (array-like of datetime, shape (n,)) – 1-dimensional date array.
Return type:: Sequence[float]
Returns:: array-like of float, shape (n,) – 1- dimensional array containing hours since the first element in the array.

marine_qc.time_control.convert_time_in_hours(hour, minute, sec, zone, daylight_savings_time)[source]¶

Convert integer hour, minute, and second to time in decimal hours.

Parameters:

hour (int) – Hour.
minute (int) – Minute.
sec (int) – Second.
zone (int or float) – Correction for timezone.
daylight_savings_time (float) – Set to 1 if daylight savings time is in effect else set to 0.

Return type:

Returns:

float – Time converted to decimal hour in day.

marine_qc.time_control.day_in_year(year=None, month=1, day=1)[source]¶

Get the day in year from 1 to 365 or 366.

Parameters:

year (int, optional, default: None) – Year to be tested. If none, set year to default leap year.
month (int, default: 1) – Month to be tested.
day (int, default: 1) – Day to be tested.

Return type:

Returns:

int – Day in year. If year is not specified then the year is treated as a non-leap year and 29 February returns the same value as 1 March.

marine_qc.time_control.day_in_year_array(month, day)[source]¶

Get the day in year from 1 to 365. Leap years are dealt with by allowing Feb 29 and Mar 1 to be the same day.

Parameters:

month (1D np.ndarray) – Array of months.
day (1D np.ndarray) – Array of days.

Return type:

Returns:

np.ndarray – Array of day number from 1-365.

marine_qc.time_control.get_month_lengths(year)[source]¶

Return a list holding the lengths of the months in a given year.

Parameters:: year (int) – Year for which you want month lengths.
Return type:: list[int]
Returns:: list of int – List of month lengths.

marine_qc.time_control.jul_day(year, month, day)[source]¶

Routine to calculate julian day. This is the weird Astronomical thing which counts from 1 Jan 4713 BC.

Parameters:

year (int) – Year.
month (int) – Month.
day (int) – Day.

Return type:

Returns:

int – Julian day.

Notes

This is one of those routines that looks baffling but works. No one is sure exactly how. It gets written once and then remains untouched for centuries, mysteriously working.

marine_qc.time_control.leap_year(years_since_1980)[source]¶

Check if input year is a Leap year.

Parameters:: years_since_1980 (int) – Number of years since 1980.
Return type:: int
Returns:: int – 1 if it is a leap year, 0 otherwise.

marine_qc.time_control.leap_year_correction(time_in_hours, day, years_since_1980)[source]¶

Make leap year correction.

Parameters:

time_in_hours (float) – Time in hours.
day (int) – Day number.
years_since_1980 (int) – Years since 1980.

Return type:

Returns:

float – Leap year corrected time.

marine_qc.time_control.pentad_to_month_day(p)[source]¶

Given a pentad number, return the month and day of the first day in the pentad.

Parameters:: p (int) – Pentad number from 1 to 73.
Return type:: tuple[int, int]
Returns:: tuple of int – A tuple of two ints representing month and day of the first day of the pentad.

marine_qc.time_control.relative_year_number(year, reference=1979)[source]¶

Get number of year relative to reference year (1979 by default).

Parameters:

year (int) – Year.
reference (int, default: 1979) – Reference year.

Return type:

Returns:

int – Number of year relative to reference year.

marine_qc.time_control.split_date(date)[source]¶

Split datetime date into year, month, day and hour.

Parameters:: date (datetime) – Date to split.
Return type:: dict[str, float]
Returns:: dict – Dictionary containing year, month, day and hour.

marine_qc.time_control.time_difference(times1, times2)[source]¶

Convert two arrays of datetimes to the difference in hours.

Parameters:

times1 (SequenceDatetimeType) – 1-dimensional array of reference time points.
times2 (SequenceDatetimeType) – 1-dimensional array of time points to compare against times1.

Return type:

Returns:

array-like of float, shape (n,) – 1-dimensional array containing the time difference in hours computed as times2 - times1.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.time_control.time_in_whole_days(time_in_hours, day, years_since_1980, leap)[source]¶

Calculate from time in hours to time in whole days.

Parameters:

time_in_hours (int) – Time in hours.
day (int) – Day number.
years_since_1980 (int) – Number of years since 1980.
leap (int) – Set to 1 for a leap year, else set to 0.

Return type:

Returns:

float – Time in whole days.

marine_qc.time_control.valid_month_day(year=None, month=1, day=1)[source]¶

Return True if month and day combination are allowed, False otherwise. Assumes that Feb 29th is valid.

Parameters:

year (int, optional, default: None) – Year to be tested. If none, set year to default leap year.
month (int, default: 1) – Month to be tested.
day (int, default: 1) – Day to be tested.

Return type:

Returns:

bool – True if month and day (or year month and day) are a valid combination (e.g. 12th March) and False if not (e.g. 30th February).

Notes

Assumes that February 29th is a valid date.

marine_qc.time_control.which_pentad(month, day)[source]¶

Take month and day as inputs and return pentad in range 1-73.

Parameters:

month (int) – Month containing the day for which we want to calculate the pentad.
day (int) – Day for the day for which we want to calculate the pentad.

Return type:

Returns:

int – Pentad (5-day period) containing input day, from 1 (1 Jan-5 Jan) to 73 (27-31 Dec).

Raises:

ValueError – If month not in range 1-12 or day not in range 1-31.

Notes

The calculation is rather simple. It just loops through the year and adds up days till it reaches the day we are interested in. February 29th is treated as though it were March 1st in a regular year.

marine_qc.time_control.which_pentad_array(month, day)[source]¶

Take month and day arrays as inputs and return array of pentads in range 1-73.

Parameters:

month (ndarray) – Month containing the day for which we want to calculate the pentad.
day (ndarray) – Day for the day for which we want to calculate the pentad.

Return type:

Returns:

ndarray – Pentad (5-day period) containing input day, from 1 (1 Jan-5 Jan) to 73 (27-31 Dec).

marine_qc.track_check_utils module¶

The New Track Check QC module provides the functions needed to perform the track check.

The main routine is mds_full_track_check which takes a list of class`.MarineReport` from a single ship and runs the track check on them. This is an update of the MDS system track check in that it assumes the Earth is a sphere. In practice, it gives similar results to the cylindrical earth formerly assumed.

marine_qc.track_check_utils.backward_discrepancy(lat, lon, date, vsi, dsi)[source]¶

Calculate the distance between the projected position and the actual position.

The projected position is based on the reported speed and heading at the current and previous time steps. The calculation proceeds from the final, later observation to the first (in contrast to distr1 which runs in time order)

This takes the speed and direction reported by the ship and projects it forwards half a time step, it then projects it forwards another half-time step using the speed and direction for the next report, to which the projected location is then compared. The distances between the projected and actual locations is returned

Parameters:

lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

SequenceFloatType

Returns:

SequenceFloatType – Same type as input, but with float values, shape (n,)

One-dimensional array, sequence, or pandas Series containing distances from estimated positions.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.calculate_course_parameters(lat_later, lat_earlier, lon_later, lon_earlier, date_later, date_earlier)[source]¶

Calculate course parameters.

Parameters:

lat_later (float) – Latitude in degrees of later timestamp.
lat_earlier (float) – Latitude in degrees of earlier timestamp.
lon_later (float) – Longitude in degrees of later timestamp.
lon_earlier (float) – Longitude in degrees of earlier timestamp.
date_later (datetime) – Date of later timestamp.
date_earlier (datetime) – Date of earlier timestamp.

Return type:

tuple[float, float, float, float]

Returns:

tuple of float – A tuple of four floats representing the speed, distance, course and time difference.

marine_qc.track_check_utils.calculate_midpoint(lat, lon, timediff)[source]¶

Interpolate between alternate reports and compare the interpolated location to the actual location.

E.g. take difference between reports 2 and 4 and interpolate to get an estimate for the position at the time of report 3. Then compare the estimated and actual positions at the time of report 3.

The calculation linearly interpolates the latitudes and longitudes (allowing for wrapping around the dateline and so on).

Parameters:

lat (SequenceNumberType) – One-dimensional latitude array in degrees.
lon (SequenceNumberType) – One-dimensional longitude array in degrees.
timediff (SequenceNumberType) – One-dimensional time difference array.

Return type:

Returns:

1D np.ndarray of float – One-dimensional array of distances from estimated positions in kilometers.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.

marine_qc.track_check_utils.calculate_speed_course_distance_time_difference(lat, lon, date, alternating=False)[source]¶

Calculate speeds, courses, distances and time differences using consecutive reports.

Parameters:

lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
alternating (bool, default: False) – Whether to use alternating reports for calculation.

Return type:

tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

tuple of np.ndarray, each with float values, shape (n,) – A tuple containing four one-dimensional arrays representing: speed, distance, course, and time difference.

marine_qc.track_check_utils.check_distance_from_estimate(vsi, time_differences, fwd_diff_from_estimated, rev_diff_from_estimated, vsi_previous=None)[source]¶

Check that distances from estimated positions are less than calculated distance.

The estimated positions are calculated forward and backwards in time. The calculated distance is the time difference multiplied by the average reported speeds.

Parameters:

vsi (SequenceNumberType) – Reported speed in km/h at current time step.
time_differences (SequenceNumberType) – Calculated time differences between reports in hours.
fwd_diff_from_estimated (SequenceNumberType) – Distance in km from estimated position, estimates made forward in time.
rev_diff_from_estimated (SequenceNumberType) – Distance in km from estimated position, estimates made backward in time.
vsi_previous (SequenceNumberType, optional) – One-dimensional array of reported speed in km/h at previous time step. If None, get vsi_previous from vsi.

Return type:

Returns:

np.ndarray – Returned array elements set to 10 if estimated and reported positions differ by more than the reported speed multiplied by the calculated time difference, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.direction_continuity(dsi, directions, dsi_previous=None, max_direction_change=60.0)[source]¶

Check if reported and calculated directions are within the allowed change.

This function compares the heading at the previous time step with the calculated ship direction from reported positions, flagging differences that exceed the maximum allowed direction change.

Parameters:

dsi (SequenceNumberType) – Heading at current time step in degrees.
directions (SequenceNumberType) – Calculated ship direction from reported positions in degrees.
dsi_previous (SequenceNumberType, optional) – Heading at previous time step in degrees. If None, get dsi_previous from dsi.
max_direction_change (float) – Largest deviations that will not be flagged in degrees.

Return type:

Returns:

np.ndarray – Returned array elements are 10.0 if the difference between reported and calculated direction is greater than the max_direction_change (default, 60 degrees), 0.0 otherwise.

marine_qc.track_check_utils.forward_discrepancy(lat, lon, date, vsi, dsi)[source]¶

Calculate the distance between the projected position and the actual position.

The projected position is based on the reported speed and heading at the current and previous time steps. The observations are taken in time order.

This takes the speed and direction reported by the ship and projects it forwards half a time step, it then projects it forwards another half time-step using the speed and direction for the next report, to which the projected location is then compared. The distances between the projected and actual locations is returned

Parameters:

lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.
dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

SequenceFloatType

Returns:

SequenceFloatType – Same type as input, but with float values, shape (n,)

One-dimensional array, sequence, or pandas Series containing distances from estimated positions.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.
TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.increment_position(alat1, alon1, avs, ads, timediff)[source]¶

Compute latitude and longitude increments over half a time interval.

This function takes latitudes and longitude, a speed, a direction and a time difference and returns increments of latitude and longitude which correspond to half the time difference.

Parameters:

alat1 (SequenceNumberType) – One-dimensional array of Latitude at starting point in degrees.
alon1 (SequenceNumberType) – One-dimensional array of Longitude at starting point in degrees.
avs (SequenceNumberType) – One-dimensional array of speed of ship in km/h.
ads (SequenceNumberType) – One-dimensional array of heading of ship in degrees.
timediff (SequenceNumberType) – One-dimensional array of time difference between the points in hours.

Return type:

tuple[ndarray, ndarray]

Returns:

1D np.ndarray of float – Returns latitude and longitude increment or None and None if timediff is None.

marine_qc.track_check_utils.modal_speed(speeds)[source]¶

Calculate the modal speed from the input array in 3 knot bins.

Returns thebin-centre for the modal group.

The data are binned into 3-knot bins with the first from 0-3 knots having a bin centre of 1.5 and the highest containing all speed in excess of 33 knots with a bin centre of 34.5. The bin with the most speeds in it is found. The higher of the modal speed or 8.5 is returned:

Bins- 0-3, 3-6, 6-9, 9-12, 12-15, 15-18, 18-21, 21-24, 24-27, 27-30, 30-33, 33-36 Centres-1.5, 4.5, 7.5, 10.5, 13.5, 16.5, 19.5, 22.5, 25.5, 28.5, 31.5, 34.5

Parameters:: speeds (list) – Input speeds in km/h.
Return type:: float
Returns:: float – Bin-centre speed (expressed in km/h) for the 3 knot bin which contains most speeds in input array, or 8.5, whichever is higher.

marine_qc.track_check_utils.set_speed_limits(amode)[source]¶

Take a modal speed and calculate speed limits for the track checker.

Parameters:: amode (float) – Modal speed in km/h.
Return type:: tuple[float, float, float]
Returns:: (float, float, float) – Max speed, maximum max speed and min speed.

marine_qc.track_check_utils.speed_continuity(vsi, speeds, vsi_previous=None, max_speed_change=10.0)[source]¶

Check if reported speeds are within the allowed change from calculated speeds.

This function compares the reported speed at the current and previous time steps with the speed calculated from positions. Flags positions where the change exceeds the maximum allowed speed change.

Parameters:

vsi (SequenceNumberType) – One-dimensional array of reported speed in km/h at current time step.
speeds (SequenceNumberType) – One-dimensional array of speed of ship calculated from locations at current and previous time steps in km/h.
vsi_previous (SequenceNumberType, optional) – One-dimensional array of reported speed in km/h at previous time step. If None, get vsi_previous from vsi.
max_speed_change (float, optional) – Largest change of speed that will not raise flag in km/h, default 10 (km/h).

Return type:

Returns:

np.ndarray – Returned array elements are 10 if the reported and calculated speeds differ by more than 10 knots, 0 otherwise.

marine_qc.validations module¶

Module containing base QC which call multiple QC functions and could be applied on a DataBundle.

marine_qc.validations.is_func_param(func, param)[source]¶

Return True if param is the name of a parameter of function func.

Parameters:

func (Callable) – Function whose parameters are to be inspected.
param (str) – Name of the parameter.

Return type:

Returns:

bool – Returns True if param is one of the functions parameters or the function uses **kwargs.

marine_qc.validations.is_in_data(name, data)[source]¶

Return True if named column or variable, name, is in data.

Parameters:

name (str) – Name of variable.
data (pd.Series or pd.DataFrame) – Pandas Series or DataFrame to be tested.

Return type:

Returns:

bool – Returns True if name is one of the columns or variables in data, False otherwise.

Raises:

TypeError – If data type is not pd.Series or pd.DataFrame.

marine_qc.validations.validate_arg(key, value, func_name, parameters, type_hints, reserved_keys, has_arguments)[source]¶

Validate argument against a function’s signature, taking decorators into account.

Parameters:

key (str) – The name of the argument to validate.
value (Any) – The value of the argument to validate.
func_name (str) – The name of the function (used in error message).
parameters (Mapping[str, inspect.Parameter]) – A mapping of parameter names to inspect.Parameter objects, typically from inspect.signature(func).parameters.
type_hints (Mapping[str, type]) – A mapping of parameter names to expected types, typically from typing.get_type_hints(func).
reserved_keys (set[str]) – Argument names that are considered reserved and should nor raise errors.
has_arguments (bool) – Whether the function accepts arbitrary arguments.

Return type:

marine_qc.validations.validate_args(func, args=None, kwargs=None)[source]¶

Validate positional and keyword arguments against a function’s signature, taking decorators into account.

This function checks that: - All provided keyword arguments correspond to valid parameters of the given function. - All required parameters of the function (i.e., parameters without default values) are present in the provided keyword arguments.

Parameters:

func (Callable[..., Any]) – The function whose signature is used for validation.
args (Sequence[Any], optional) – Sequence of arguments intended to be passed to func.
kwargs (Mapping[str, Any], optional) – Dictionary of keyword arguments intended to be passed to func.

Raises:

ValueError – If kwargs contains a key that is not a parameter of func.
TypeError – If a required parameter of func is missing from kwargs.

Return type:

marine_qc.validations.validate_dict(input_dict)[source]¶

Validate that the input is a dictionary with string keys and dictionary values.

This function checks that: - input_dict is a dictionary. - All keys in the dictionary are strings. - All top-level values in the dictionary are themselves dictionaries.

Parameters:: input_dict (Mapping[str, Mapping[str, Any]]) – The object to validate.
Raises:: TypeError – If input_dict is not a dictionary, if any key is not a string, or if any value is not a dictionary.
Return type:: None

marine_qc.validations.validate_type(value, expected)[source]¶

Recursively validate that a value matches the expected type hint.

Parameters:

value (Any) – The value to validate.
expected (Any) – The expected value type for validation.

Return type: