marine_qc package

Marine Quality Control package.

Submodules

marine_qc.astronomical_geometry module

Some generally helpful astronomical geometry functions for base QC.

marine_qc.astronomical_geometry.azimuth_elevation(lat, declination, hour_angle)[source]

Get both azimuth and geometric elevation of sun.

Parameters:
  • lat (float) – Latitude value in degrees.

  • declination (float) – Declination.

  • hour_angle (float) – Hour angle.

Return type:

tuple[float, float]

Returns:

tuple of float – A tuple of two floats representing azimuth and geometric elevation of sun.

marine_qc.astronomical_geometry.calculate_azimuth(declination, hour_angle, elevation, phi)[source]

Calculate azimuth.

Parameters:
  • declination (float) – Declination.

  • hour_angle (float) – Hour angle.

  • elevation (float) – Elevation.

  • phi (float) – Latitude value in rad.

Return type:

float

Returns:

float – Azimuth.

marine_qc.astronomical_geometry.calculate_sun_parameters(time)[source]

Calculate both right ascension and declination of sun.

Parameters:

time (float) – Time value.

Return type:

tuple[float, float]

Returns:

tuple of float – A tuple of two floats representing right ascension and declination of sun.

marine_qc.astronomical_geometry.convert_degrees(deg)[source]

Convert degrees.

Parameters:

deg (float) – Value in degrees.

Return type:

float

Returns:

float – Degree (from 0 to 360).

marine_qc.astronomical_geometry.elliptic_angle(time)[source]

Get angle plane of elliptic to plane of celestial equator.

Parameters:

time (float) – Time value.

Return type:

float

Returns:

float – Angle plane of elliptic to plane of celestial equator.

marine_qc.astronomical_geometry.mean_earth_anomaly(time, theta)[source]

Calculate mean anomaly of earth (g).

Parameters:
  • time (float) – Time value.

  • theta (float) – Position of the sun.

Return type:

float

Returns:

float – Mean anomaly of the earth (g).

marine_qc.astronomical_geometry.sin_of_elevation(phi, declination, hour_angle)[source]

Get sinus of geometric elevation.

Parameters:
  • phi (float) – Latitude value in rad.

  • declination (float) – Declination.

  • hour_angle (float) – Hour angle.

Return type:

float

Returns:

float – Sinus of geometric elevation.

marine_qc.astronomical_geometry.sun_ascension(long_of_sun, sin_long_of_sun, angle_of_elliptic)[source]

Calculate right ascension.

Parameters:
  • long_of_sun (float) – Longitude of the sun.

  • sin_long_of_sun (float) – Sinus of the longitude of the sun.

  • angle_of_elliptic (float) – Angle of elliptic.

Return type:

float

Returns:

float – Right ascension.

marine_qc.astronomical_geometry.sun_azimuth(phi, declination)[source]

Get azimuth.

Parameters:
  • phi (float) – Latitude value in rad.

  • declination (float) – Declination.

Return type:

float

Returns:

float – Azimuth.

marine_qc.astronomical_geometry.sun_declination(sin_long_of_sun, angle_of_elliptic)[source]

Calculate declination of sun.

Parameters:
  • sin_long_of_sun (float) – Sinus of the longitude of the sun.

  • angle_of_elliptic (float) – Angle of elliptic.

Return type:

float

Returns:

float – Declination of sun.

marine_qc.astronomical_geometry.sun_hour_angle(local_siderial_time, right_ascension)[source]

Get hour angle.

Parameters:
  • local_siderial_time (float) – Local siderial time value.

  • right_ascension (float) – Right ascension.

Return type:

float

Returns:

float – Hour angle.

marine_qc.astronomical_geometry.sun_longitude(time)[source]

Get longitude of sun.

Parameters:

time (float) – Time value.

Return type:

float

Returns:

float – Longitude of the sun.

marine_qc.astronomical_geometry.sun_position(time)[source]

Find position of sun in celestial sphere, assuming circular orbit (radians).

Parameters:

time (float) – Time value.

Return type:

float

Returns:

float – Position of the sun.

marine_qc.astronomical_geometry.sunangle(year, day, hour, minute, sec, zone, dasvtm, lat, lon)[source]

Calculate the local azimuth and elevation of the sun at a specified location and time.

Parameters:
  • year (int) – Year.

  • day (int) – Day number of year starting with 1 for Jan 1st and running up to 365/6.

  • hour (int) – Hour.

  • minute (int) – Minute.

  • sec (int) – Second.

  • zone (int) – The local international time zone, counted westward from Greenwich.

  • dasvtm (int) – 1 if daylight saving time is in effect, otherwise 0.

  • lat (float) – Latitude in degrees, north is positive.

  • lon (float) – Longitude in degrees, east is positive.

Return type:

tuple[float, float, float, float, float, float]

Returns:

tuple of float – A tuple of six floats representing Azimuth angle of the sun (degrees east of north), Elevation of sun (degrees), Right ascension of sun (degrees), Hour angle of sun (degrees), Hour angle of local siderial time (degrees) and Declination of sun (degrees).

Notes

Copied from Rob Hackett’s area 28 Apr 1998 by J.Arnott. Add protection for ASIN near +/- 90 degrees 07 Jan 2002 by J.Arnott. Pythonised 25/09/2015 by J.J. Kennedy

The Python version gets within a fraction of a degree of the original Fortran code from which it was ported for a range of values. The differences are larger if single precision values are used suggesting that this is not the most numerically robust scheme.

marine_qc.astronomical_geometry.to_local_siderial_time(time, time_in_hours, delyear, lon)[source]

Convert to local siderial time.

Parameters:
  • time (float) – Time value.

  • time_in_hours (float) – Time value in hours.

  • delyear (int) – Relative year number.

  • lon (float) – Longitude value in degrees.

Return type:

float

Returns:

float – Local siderial time.

marine_qc.astronomical_geometry.to_siderial_time(time, delyear)[source]

Convert to siderial time.

Parameters:
  • time (float) – Time value.

  • delyear (int) – Relative year number.

Return type:

float

Returns:

float – Siderial time.

marine_qc.auxiliary module

Auxiliary functions for QC.

marine_qc.auxiliary.convert_to(value, source_units, target_units)[source]

Convert a float or sequence from source units to target units.

Parameters:
  • value (SequenceNumberType) – A single float value, None, or a sequence (e.g., list, tuple, array-like) containing floats and/or None values. None values are passed through unchanged.

  • source_units (str) – The unit(s) of the input value(s), e.g., ‘degC’, ‘km/h’.

  • target_units (str) – The unit(s) to convert to, e.g., ‘K’, ‘m/s’. If set to “unknown”, the value(s) will be converted to the base SI units of the source_units, e.g., ‘degC’ to ‘kelvin’, ‘km/h’ to ‘meter/s’.

Return type:

SequenceNumberType

Returns:

SequenceNumberType – The converted value(s), preserving the input structure (scalar, list, tuple, array). None values remain unchanged.

Examples

>>> convert_to(100, "degC", "K")
373.15
>>> convert_to([0, 100], "degC", "K")
[273.15, 373.15]
>>> convert_to([None, 100], "degC", "K")
[None, 373.15]
>>> convert_to(5, "km", "unknown")  # Converts to base unit 'meter'
5000.0
marine_qc.auxiliary.convert_units(**units_by_name)[source]

Decorator to automatically convert specified function arguments to target units.

This decorator allows a function to accept inputs in various units and automatically converts them to desired target units before the function executes. It is especially useful for scientific or engineering functions where users may provide inputs in different unit systems.

Parameters:

**units_by_name (str) – Keyword arguments mapping function argument names to their target units. Special case: if a target unit is “unknown”, it will be converted to the base SI unit for the given source unit (e.g., “degC” ? “K”, “km/h” ? “m/s”).

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – A decorator that converts specified parameters to the target units prior to executing the decorated function.

Notes

  • The decorated function must be called with a units keyword argument, which can be:
    • A dictionary mapping argument names to their source units, or

    • A single string unit applied to all arguments.

  • Parameters not listed in units_by_name are not converted.

  • Parameters with None values are skipped.

  • If a target unit is “unknown”, the value is converted to the base SI unit.

Examples

>>> @convert_units(temperature="K")
... def func_single(temperature):
...     print(f"Temperature: {temperature:.2f} K")
>>> func_single(25.0, units={"temperature": "degC"})
Temperature: 298.15 K
>>> @convert_units(speed="m/s", altitude="m")
... def func_multiple(speed, altitude):
...     print(f"Speed: {speed:.1f} m/s, Altitude: {altitude:.0f} m")
>>> func_multiple(72.0, 0.5, units={"speed": "km/h", "altitude": "km"})
Speed: 20.0 m/s, Altitude: 500 m
>>> @convert_units(distance="unknown")
... def func_base(distance):
...     print(f"Distance in SI units: {distance} m")
>>> func_base(1.2, units={"distance": "km"})
Distance in SI units: 1200.0 m
marine_qc.auxiliary.ensure_arrays(**values)[source]

Ensure that all input values are NumPy arrays.

Parameters:

**values (Mapping[str, Any]) – Mapping of names to values expected to be NumPy arrays.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[Any]], ...]

Returns:

tuple of np.ndarray – A tuple containing the NumPy arrays corresponding to the input values, in the same order as provided.

Raises:

TypeError – If any input value is not a NumPy array.

marine_qc.auxiliary.format_return_type(result_array, *input_values, dtype=<class 'int'>)[source]

Convert the result numpy array(s) to the same type as the input value.

If result_array is a sequence of arrays, format each element recursively, preserving the container type.

Parameters:
  • result_array (np.ndarray) – The numpy array of results.

  • *input_values (Any) – One or more original input values to infer the desired return type from.

  • dtype (type, optional) – Desired data type of the result. Default is int.

Return type:

Any

Returns:

Same type as input(s) – The result formatted to match the type of the first valid input value.

marine_qc.auxiliary.generic_decorator(pre_handler=None, post_handler=None)[source]

Create a decorator that binds function arguments and applies pre- and post-processing handlers.

This decorator factory allows you to inspect, modify, or validate function arguments before and after the original function is called. Reserved keyword arguments can be passed to the handlers via _decorator_kwargs and removed from the call to the original function.

Parameters:
  • pre_handler (Callable[[dict], None]) – A function that takes the bound arguments dictionary (bound_args.arguments) and optionally additional keyword arguments, to inspect or modify arguments before the decorated function executes. Signature: handler(arguments: dict, **meta_kwargs) -> None.

  • post_handler (Callable[[dict], None]) – A function that takes the bound arguments dictionary (bound_args.arguments) and optionally additional keyword arguments, to inspect or modify arguments after the decorated function executes. Signature: handler(arguments: dict, **meta_kwargs) -> None.

Return type:

Callable[..., Any]

Returns:

Callable – A decorator that wraps any function. When applied, the function’s arguments are bound and passed to the handlers before execution.

Notes

  • Handlers can define a _decorator_kwargs attribute (a set of reserved keyword argument names). These reserved kwargs will be extracted from the decorated function’s call kwargs, passed to the handler, and removed before calling the original function.

  • The original function is called with the possibly modified bound arguments after handler processing.

marine_qc.auxiliary.inspect_arrays(params, sortby=None)[source]

Decorator to convert and validate specified function input parameters as 1D NumPy arrays.

This decorator ensures that specified input arguments are sequence-like, converts them to 1D NumPy arrays, validates that they are one-dimensional, and checks that all arrays have the same length. Optionally, the arrays can be sorted by another parameter and later restored to the original order.

Parameters:
  • params (list of str) – Names of parameters to inspect in the decorated function. Each specified parameter will be converted to a 1D NumPy array and validated.

  • sortby (str, optional) – Name of a parameter to sort the arrays by, if desired. The result will be returned in the original order of this parameter.

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – A decorator that, when applied, converts the specified parameters to 1D NumPy arrays, validates them, optionally sorts them, and passes them to the decorated function.

Raises:

ValueError – If a specified parameter is missing from the function arguments. If any specified parameter is not one-dimensional. If the lengths of the specified arrays do not all match.

Notes

  • If sortby is specified, the result of the function is reordered to match the original order of sortby after the function executes.

Examples

>>> @inspect_arrays(["a", "b"])
... def add_arrays(a, b):
...     return a + b
>>> add_arrays([1, 2, 3], [4, 5, 6])
array([5, 7, 9])
>>> add_arrays([1, 2], [3, 4, 5])
Traceback (most recent call last):
    ...
ValueError: Input ['a', 'b'] must all have the same length.
marine_qc.auxiliary.is_scalar_like(x)[source]

Return True if the input is scalar-like.

A value is considered scalar-like if it is one of the following:

  • Built-in Python scalars: int, float, bool, None

  • Strings and bytes

  • NumPy scalars (subclasses of np.generic), e.g. np.int32, np.float64, np.datetime64

  • Zero-dimensional NumPy arrays (e.g. np.array(5))

  • Pandas scalar types:
    • pd.Timestamp

    • pd.Timedelta

    • pd.NA

    • pd.NaT

  • Python datetime types:
    • datetime.date

    • datetime.datetime

    • datetime.time

Container types such as lists, tuples, sets, dicts, pandas Series, pandas DataFrame, and NumPy arrays with one or more dimensions are not considered scalar-like.

Parameters:

x (Any) – The value to check.

Return type:

bool

Returns:

bool – True if x is scalar-like, False otherwise.

marine_qc.auxiliary.isvalid(inval)[source]

Check if a value(s) are numerically valid (not None or NaN).

Parameters:

inval (ValueNumberType) – Input value(s) to be tested.

Return type:

bool | ndarray[tuple[Any, ...], dtype[bool]]

Returns:

bool or np.ndarray of bool – Returns False where the input is None or NaN, True otherwise. Returns a boolean scalar if input is scalar, else a boolean array.

marine_qc.auxiliary.post_format_return_type(params, dtype=<class 'int'>, multiple=False)[source]

Decorator to format a function’s return value to match the type of its original input(s).

This decorator ensures that the output of the decorated function is converted back to the same structure/type as the original input(s) specified by params. It uses a context object (_ctx) if available to retrieve the original inputs before any preprocessing was applied. If no context is found, it falls back to the current bound arguments.

Parameters:
  • params (list of str) – List of parameter names whose original input types should be used to format the return value.

  • dtype (type, optional) – Desired data type of the result. Default is int.

  • multiple (bool, optional) – If True, assumes the function returns a sequence of results (e.g., a tuple), and applies format_return_type to each element individually. If False (default), applies format_return_type once on the entire result.

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – A decorator that modifies the decorated function’s output to match the input types.

Notes

  • Assumes a TypeContext object may be passed via _ctx keyword argument, storing original input values for accurate type formatting.

  • Falls back gracefully if no context is available, using current arguments.

  • Useful when function inputs are preprocessed (e.g., converted to arrays), and the output should match the original input types.

marine_qc.buoy_tracking_qc module

Buoy tracking QC module.

Module containing QC functions for sequential reports from a single drifting buoy.

class marine_qc.buoy_tracking_qc.AgroundChecker(lons, lats, dates, smooth_win, min_win_period, max_win_period)[source]

Bases: object

Class used to carry out do_aground_check().

Check to see whether a drifter has run aground based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed aground, else flag=0.

Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Longitude and latitude timeseries are smoothed prior to assessment to reduce position ‘jitter’. Some post-smoothing position ‘jitter’ may remain and its expected magnitude is set within the function by the ‘tolerance’ parameter. A drifter is deemed aground when, after a period of time, the distance between reports is less than the ‘tolerance’. The minimum period of time over which this assessment is made is set by ‘min_win_period’. This period must be long enough such that slow moving drifters are not falsely flagged as aground given errors in position (e.g. a buoy drifting at around 1 cm/s will travel around 1 km/day; given ‘tolerance’ and precision errors of a few km’s the ‘min_win_period’ needs to be several days to ensure distance-travelled exceeds the error so that motion is reliably detected and the buoy is not falsely flagged as aground). However, min_win_period should not be longer than necessary as buoys that run aground for less than min_win_period will not be detected.

Because temporal sampling can be erratic the time period over which an assessment is made is specified as a range (bound by ‘min_win_period’ and ‘max_win_period’) - assessment uses the longest time separation available within this range. If a drifter is deemed aground and subsequently starts moving (e.g. if a drifter has moved very slowly for a prolonged period) incorrectly flagged reports will be reinstated.

  • smooth_win: length of window (odd number) in datapoints used for smoothing lon/lat

  • min_win_period: minimum period of time in days over which position is assessed for no movement (see description)

  • max_win_period: maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.

  • min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).

  • max_win_period (int or None) – Maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

do_aground_check()[source]

Perform the actual aground check.

Return type:

None

get_qc_outcomes()[source]

Return the QC outcomes.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags.

hrs_smooth: ndarray
lat_smooth: ndarray
lon_smooth: ndarray
smooth_arrays()[source]

Perform the preprocessing of the lat lon and time arrays.

Return type:

None

tolerance = 1.5725359013624185
valid_arrays()[source]

Check the input arrays are valid. Raises a warning and returns False if not valid.

Return type:

bool

Returns:

bool – True if array is valid, otherwise False.

valid_parameters()[source]

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:

bool

Returns:

bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.NewSpeedChecker(lons, lats, dates, speed_limit, min_win_period, ship_speed_limit, delta_d, delta_t, n_neighbours)[source]

Bases: object

Class used to carry out do_new_speed_check().

Check to see whether a drifter has been picked up by a ship (out of water) based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed picked up, else flag=0.

A drifter is deemed picked up if it is moving faster than might be expected for a fast ocean current (a few m/s). Unreasonably fast movement is detected when speed of travel between report-pairs exceeds the chosen ‘speed_limit’ (speed is estimated as distance between reports divided by time separation - this ‘straight line’ speed between the two points is a minimum speed estimate given a less-direct path may have been followed). Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Reports must be separated by a suitably long period of time (the ‘min_win_period’) to minimise the effect of these errors when calculating speed e.g. for reports separated by 9 hours errors of order 10 cm/s would result which are a few percent of fast ocean current speed. Conversely, the period of time chosen should not be too long so as to resolve short-lived burst of speed on manouvering ships. Larger positional errors may also trigger the check.

For each report, speed is assessed over the shortest available period that exceeds ‘min_win_period’.

Prior to assessment the drifter record is screened for positional errors using the iQuam track check method (from ex.Voyage). When running the iQuam check the record is treated as a ship (not a drifter) so as to avoid accidentally filtering out observations made aboard a ship (which is what we are trying to detect). This iQuam track check does not overwrite any existing iQuam track check flags.

IMPORTANT - for optimal performance, drifter records with observations failing this check should be subsequently manually reviewed. Ships move around in all sorts of complicated ways that can readily confuse such a simple check (e.g. pausing at sea, crisscrossing its own path) and once some erroneous movement is detected it is likely a human operator can then better pick out the actual bad data. False fails caused by positional errors (particularly in fast ocean currents) will also need reinstating.

The class has the following class attributes which can be modified using the set_parameters method.

  • iquam_parameters: Parameter dictionary for Voyage.iquam_track_check() function.

  • speed_limit: maximum allowable speed for an in situ drifting buoy (metres per second)

  • min_win_period: minimum period of time in days over which position is assessed for speed estimates (see description)

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).

  • min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).

  • ship_speed_limit (float) – Ship speed limit for the IQUAM track check.

  • delta_d (float) – The smallest increment in distance that can be resolved. For 0.01 degrees of lat-lon this is 1.11 km. Used in the IQUAM track check.

  • delta_t (float) – The smallest increment in time that can be resolved. For hourly data expressed as a float this is 0.01 hours. Used in the IQUAM track check.

  • n_neighbours (int) – Number of neighbours considered in the IQUAM track check.

do_new_speed_check()[source]

Perform the actual new speed check.

Return type:

None

get_qc_outcomes()[source]

Retrieve the QC outcomes for all observations.

Return type:

ndarray

Returns:

np.ndarray – Array of QC flags for each observation (0 = valid, 1 = flagged, untested otherwise).

iquam_track_ship: ndarray
perform_iquam_track_check()[source]

Perform iQuam track check as if reports are from a ship.

A deep copy of reps is made so metadata can be safely modified ahead of iQuam check an array of qc flags (iquam_track_ship) is the result.

Return type:

None

valid_arrays()[source]

Validate the input arrays (longitude, latitude, and time differences).

Checks for: - NaN values in longitude, latitude, or time arrays. - Monotonicity of the time array (self.hrs).

Warnings are raised for any issues detected.

Return type:

bool

Returns:

bool – True if all arrays are valid, False otherwise.

valid_parameters()[source]

Validate the QC parameters to ensure they are sensible.

Checks that: - speed_limit is non-negative. - min_win_period is non-negative.

Warnings are raised for any invalid parameters.

Return type:

bool

Returns:

bool – True if all parameters are valid, False otherwise.

class marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker(lat, lon, dates, sst, ostia, bgvar, ice, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]

Bases: object

Class used to perform the do_sst_biased_check(), do_sst_noisy_check(), and do_sst_biased_noisy_short_check().

Check to see whether a drifter sea surface temperature record is unacceptably biased or noisy as a whole.

The check makes an assessment of the quality of data in a drifting buoy record by comparing to a background reference field. If the record is found to be unacceptably biased or noisy relative to the background all observations are flagged by the check. For longer records the flags ‘drf_bias’ and ‘drf_noise’ are set for each input report: flag=1 for records with erroneous data, else flag=0. For shorter records ‘drf_short’ is set for each input report: flag=1 for reports with erroneous data, else flag=0.

When making the comparison an allowance is made for background error variance and also normal drifter error (both bias and random measurement error). A background error variance limit is also specified, beyond which the background is deemed unreliable and is excluded from comparison. Observations made during the day, in icy regions or where the background value is missing are also excluded from the comparison.

The check has two separate streams; a ‘long-record check’ and a ‘short-record check’. Records with at least n_eval observations are passed to the long-record check, else they are passed to the short-record check. The long-record check looks for records that are too biased or noisy as a whole. The short record check looks for individual observations exceeding a noise limit within a record. The purpose of n_eval is to ensure records with too few observations for their bias and noise to be reliably estimated are handled separately by the short-record check.

The correlation of the background error is treated as unknown and handled differently for each assessment. For the long-record noise-check and the short-record check the background error is treated as uncorrelated, which maximises the possible impact of background error on these assessments. For the long-record bias-check a limit (bias_lim) is specified beyond which the record is considered biased. The default value for this limit was chosen based on histograms of drifter-background bias. An alternative approach would be to treat the background error as entirely correlated across a long-record, which maximises its possible impact on the bias assessment. In this case the histogram approach was used as the limit could be tuned to give better results.

Parameters:
  • lat (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • lon (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional sea surface temperature array in K.

  • ostia (SequenceNumberType) – 1-dimensional background field sea surface temperature array in K.

  • bgvar (SequenceNumberType) – 1-dimensional background variance array in K^2.

  • ice (SequenceNumberType) – 1-dimensional ice concentration array in range 0,1.

  • n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.

  • bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.

  • n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

bgerr: ndarray
bgvar_is_masked: bool
do_sst_biased_noisy_check()[source]

Perform the bias/noise check QC.

Return type:

None

get_qc_outcomes_bias()[source]

Return the QC outcomes for the bias check.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags.

get_qc_outcomes_noise()[source]

Return the QC outcomes for the noisy check.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags.

get_qc_outcomes_short()[source]

Return the QC outcomes for the short check.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags.

set_all_qc_outcomes_to(input_state)[source]

Set all the QC outcomes to the specified input_state.

Parameters:

input_state (int) – QC flag to map to the QC outcomes.

Return type:

None

sst_anom: ndarray
valid_parameters()[source]

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:

bool

Returns:

bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.SSTTailChecker(lat, lon, sst, ostia, ice, bgvar, dates, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]

Bases: object

Class used to carry out do_sst_start_tail_check() and do_sst_end_tail_check().

Check to see whether there is erroneous sea surface temperature data at the beginning or end of a drifter record (referred to as ‘tails’). Flags are set for each input report: flag=1 for reports with erroneous data, else flag=0, ‘drf_tail1’ is used for bad data at the beginning of a record, ‘drf_tail2’ is used for bad data at the end of a record.

The tail check makes an assessment of the quality of data at the start and end of a drifting buoy record by comparing to a background reference field. Data found to be unacceptably biased or noisy relative to the background are flagged by the check. When making the comparison an allowance is made for background error variance and also normal drifter error (both bias and random measurement error). The correlation of the background error is treated as unknown and takes on a value which maximises background error dependent on the assessment being made. A background error variance limit is also specified, beyond which the background is deemed unreliable. Observations made during the day, in icy regions or where the background value is missing are excluded from the comparison.

The check proceeds in two steps; a ‘long tail-check’ followed by a ‘short tail-check’. The idea is that the short tail-check has finer resolution but lower sensitivity than the long tail-check and may pick off noisy data not picked up by the long tail check. Only observations that pass the long tail-check are passed to the short tail-check. Both of these tail checks proceed by moving a window over the data and assessing the data in each window. Once good data are found the check stops and any bad data preceding this are flagged. If unreliable background data are encountered the check stops. The checks are run forwards and backwards over the record so as to assess data at the start and end of the record. If the whole record fails no observations are flagged as there are then no ‘tails’ in the data (this is left for other checks). The long tail check looks for groups of observations that are too biased or noisy as a whole. The short tail check looks for individual observations exceeding a noise limit within the window.

Parameters:
  • lat (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • lon (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).

  • long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.

  • short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.

  • short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.

  • short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

bgerr: ndarray
do_sst_tail_check(start_tail)[source]

Perform the actual SST tail check.

Parameters:

start_tail (bool) – If True flag the start of the record as failed, otherwise flag the end of the record as failed.

Return type:

None

end_tail_ind: int
get_qc_outcomes()[source]

Return the QC outcomes.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags.

reps_ind: ndarray
sst_anom: ndarray
start_tail_ind: int
valid_parameters()[source]

Check the parameters are valid. Raises a warning and returns False if not valid.

Return type:

bool

Returns:

bool – True if parameter is valid, otherwise False.

class marine_qc.buoy_tracking_qc.SpeedChecker(lons, lats, dates, speed_limit, min_win_period, max_win_period)[source]

Bases: object

Class used to carry out do_speed_check().

The check identifies whether a drifter has been picked up by a ship (out of water) based on 1/100th degree precision positions. A flag is set for each input report: flag=1 for reports deemed picked up, else flag=0.

A drifter is deemed picked up if it is moving faster than might be expected for a fast ocean current (a few m/s). Unreasonably fast movement is detected when speed of travel between report-pairs exceeds the chosen ‘speed_limit’ (speed is estimated as distance between reports divided by time separation - this ‘straight line’ speed between the two points is a minimum speed estimate given a less-direct path may have been followed). Positional errors introduced by lon/lat ‘jitter’ and data precision can be of order several km’s. Reports must be separated by a suitably long period of time (the ‘min_win_period’) to minimise the effect of these errors when calculating speed e.g. for reports separated by 24 hours errors of several cm/s would result which are two orders of magnitude less than a fast ocean current which seems reasonable. Conversely, the period of time chosen should not be too long so as to resolve short-lived burst of speed on manoeuvring ships. Larger positional errors may also trigger the check. Because temporal sampling can be erratic the time period over which this assessment is made is specified as a range (bound by ‘min_win_period’ and ‘max_win_period’) - assessment uses the longest time separation available within this range.

IMPORTANT - for optimal performance, drifter records with observations failing this check should be subsequently manually reviewed. Ships move around in all sorts of complicated ways that can readily confuse such a simple check (e.g. pausing at sea, crisscrossing its own path) and once some erroneous movement is detected it is likely a human operator can then better pick out the actual bad data. False fails caused by positional errors (particularly in fast ocean currents) will also need reinstating.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).

  • min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).

  • max_win_period (float) – Maximum period of time in days over which position is assessed for speed estimates (this should be greater than min_win_period and allow for some erratic temporal sampling e.g. min_win_period + 0.2 to allow for gaps of up to 0.2 - days in sampling).

do_speed_check()[source]

Perform the actual speed check.

Return type:

None

get_qc_outcomes()[source]

Retrieve the QC outcomes for all observations.

Return type:

ndarray

Returns:

np.ndarray – Array of QC flags for each observation (0 = passed, 1 = failed, untested otherwise).

valid_arrays()[source]

Validate the input observation arrays (longitude, latitude, and time differences).

Checks for: - NaN values in longitude, latitude, or time arrays. - Monotonicity of the time array (self.hrs).

Warnings are raised for any issues detected.

Return type:

bool

Returns:

bool – True if all arrays are valid, False otherwise.

valid_parameters()[source]

Validate the QC parameters to ensure they are sensible.

Checks that: - speed_limit is non-negative. - min_win_period is non-negative. - max_win_period is greater than or equal to min_win_period.

Warnings are raised for any invalid parameters.

Return type:

bool

Returns:

bool – True if all parameters are valid, False otherwise.

marine_qc.buoy_tracking_qc.do_aground_check(lons, lats, dates, smooth_win, min_win_period, max_win_period)[source]

Perform the aground check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.

  • min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).

  • max_win_period (int or None) – Maximum period of time in days over which position is assessed for no movement (this should be greater than min_win_period and allow for erratic temporal sampling e.g. min_win_period+2 to allow for gaps of up to 2-days in sampling).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if aground check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • smooth_win = 41

  • min_win_period = 8

  • max_win_period = 10

marine_qc.buoy_tracking_qc.do_new_aground_check(lons, lats, dates, smooth_win, min_win_period)[source]

Perform the new aground check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • smooth_win (int) – Length of window (odd number) in datapoints used for smoothing lon/lat.

  • min_win_period (int) – Minimum period of time in days over which position is assessed for no movement (see description).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if new aground check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • smooth_win = 41

  • min_win_period = 8

marine_qc.buoy_tracking_qc.do_new_speed_check(lons, lats, dates, speed_limit, min_win_period, ship_speed_limit, delta_d, delta_t, n_neighbours)[source]

Perform the new speed check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).

  • min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).

  • ship_speed_limit (float) – Ship speed limit for the IQUAM track check.

  • delta_d (float) – The smallest increment in distance that can be resolved. For 0.01 degrees of lat-lon this is 1.11 km. Used in the IQUAM track check.

  • delta_t (float) – The smallest increment in time that can be resolved. For hourly data expressed as a float this is 0.01 hours. Used in the IQUAM track check.

  • n_neighbours (int) – Number of neighbours considered in the IQUAM track check.

Return type:

ndarray

Returns:

array-like of int, shape (n,) – Array containing the QC outcomes for the new speed check.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • speed_limit = 3.0

  • min_win_period = 0.375

And, for the IQUAM-specific parameters:

  • ship_speed_limit = 60.0

  • delta_d = 1.11

  • delta_t = 0.01

  • n_neighbours = 5

marine_qc.buoy_tracking_qc.do_speed_check(lons, lats, dates, speed_limit, min_win_period, max_win_period)[source]

Perform the Track QC speed check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • speed_limit (float) – Maximum allowable speed for an in situ drifting buoy (metres per second).

  • min_win_period (float) – Minimum period of time in days over which position is assessed for speed estimates (see description).

  • max_win_period (float) – Maximum period of time in days over which position is assessed for speed estimates (this should be greater than min_win_period and allow for some erratic temporal sampling e.g. min_win_period + 0.2 to allow for gaps of up to 0.2 - days in sampling).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if speed check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • speed_limit = 2.5

  • min_win_period = 0.8

  • max_win_perido = 1.8

marine_qc.buoy_tracking_qc.do_sst_biased_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]

Perform the SST bias check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.

  • bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.

  • n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST bias check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • n_eval = 30

  • bias_lim = 1.10

  • drif_intra = 1.0

  • drif_inter = 0.29

  • err_std_n = 3.0

  • n_bad = 2

  • background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_biased_noisy_short_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]

Perform the SST short check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.

  • bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.

  • n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST short check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • n_eval = 30

  • bias_lim = 1.10

  • drif_intra = 1.0

  • drif_inter = 0.29

  • err_std_n = 3.0

  • n_bad = 2

  • background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_end_tail_check(lons, lats, dates, sst, ostia, ice, bgvar, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]

Perform the SST Start Tail Check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).

  • long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.

  • short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.

  • short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.

  • short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST start tail check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • long_win_len = 121

  • long_err_std_n = 3.0

  • short_win_len = 30

  • short_err_std_n = 3.0

  • short_win_n_bad = 2

  • drif_inter = 0.29

  • drif_intra = 1.00

  • background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_noisy_check(lons, lats, dates, sst, ostia, ice, bgvar, n_eval, bias_lim, drif_intra, drif_inter, err_std_n, n_bad, background_err_lim)[source]

Perform the SST noise check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • n_eval (int) – The minimum number of drifter observations required to be assessed by the long-record check.

  • bias_lim (float) – Maximum allowable drifter-background bias, beyond which a record is considered biased (degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which short-record data are deemed suspicious.

  • n_bad (int) – Minimum number of suspicious data points required for failure of short-record check.

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST noise check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • n_eval = 30

  • bias_lim = 1.10

  • drif_intra = 1.0

  • drif_inter = 0.29

  • err_std_n = 3.0

  • n_bad = 2

  • background_err_lim = 0.3

marine_qc.buoy_tracking_qc.do_sst_start_tail_check(lons, lats, dates, sst, ostia, ice, bgvar, long_win_len, long_err_std_n, short_win_len, short_err_std_n, short_win_n_bad, drif_inter, drif_intra, background_err_lim)[source]

Perform the SST Start Tail Check.

Parameters:
  • lons (SequenceNumberType) – 1-dimensional longitude array in degrees.

  • lats (SequenceNumberType) – 1-dimensional latitude array in degrees.

  • dates (SequenceDatetimeType) – 1-dimensional date array.

  • sst (SequenceNumberType) – 1-dimensional array of sea surface temperatures in K.

  • ostia (SequenceNumberType) – 1-dimensional array of background field sea surface temperatures in K.

  • ice (SequenceNumberType) – 1-dimensional array of ice concentrations in the range 0.0 to 1.0.

  • bgvar (SequenceNumberType) – 1-dimensional array of background sea surface temperature fields variances in K^2.

  • long_win_len (int) – Length of window (in data-points) over which to make long tail-check (must be an odd number).

  • long_err_std_n (float) – Number of standard deviations of combined background and drifter bias error, beyond which data fail bias check.

  • short_win_len (int) – Length of window (in data-points) over which to make the short tail-check.

  • short_err_std_n (float) – Number of standard deviations of combined background and drifter error, beyond which data are deemed suspicious.

  • short_win_n_bad (int) – Minimum number of suspicious data points required for failure of short check window.

  • drif_inter (float) – Spread of biases expected in drifter data (standard deviation, degC or K).

  • drif_intra (float) – Maximum random measurement uncertainty reasonably expected in drifter data (standard deviation, degC or K).

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared).

Return type:

ndarray

Returns:

array-like of int, shape (n,) – 1-dimensional array containing QC flags. 1 if SST start tail check fails, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • long_win_len = 121

  • long_err_std_n = 3.0

  • short_win_len = 30

  • short_err_std_n = 3.0

  • short_win_n_bad = 2

  • drif_inter = 0.29

  • drif_intra = 1.00

  • background_err_lim = 0.3

marine_qc.buoy_tracking_qc.is_monotonic(inarr)[source]

Test if elements in an array are increasing monotonically.

I.e. each element is greater than or equal to the preceding element.

Parameters:

inarr (array-like of datetime, shape (n,)) – 1-dimensional date array.

Return type:

bool

Returns:

bool – True if array is increasing monotonically, False otherwise.

marine_qc.buoy_tracking_qc.track_day_test(year, month, day, hour, lat, lon, elevdlim=-2.5)[source]

Given date, time, lat and lon calculate if the sun elevation is > elevdlim.

This is the “day” test used by tracking QC to decide whether an SST measurement is night or day. This is important because daytime diurnal heating can affect comparison with an SST background. It uses the function sunangle to calculate the elevation of the sun. A default solar_zenith angle of 92.5 degrees (elevation of -2.5 degrees) delimits night from day.

Parameters:
  • year (int) – Year.

  • month (int) – Month.

  • day (int) – Day.

  • hour (float) – Hour expressed as decimal fraction (e.g. 20.75 = 20:45 pm).

  • lat (float) – Latitude in degrees.

  • lon (float) – Longitude in degrees.

  • elevdlim (float, default: -2.5) – Elevation day/night delimiter in degrees above horizon.

Return type:

bool

Returns:

bool – True if daytime, else False.

Raises:

ValueError – If either year, month, day, hour, lat or lon is numerically invalid or None of if either month, day, hour or lat is not in valid range.

marine_qc.calculate_humidity module

The CalcHums module contains a set of functions for calculating humidity variables.

At present, it can only cope with scalars, not arrays.

There are routines for: specific humidity from dew point temperature and temperature and pressure vapour pressure from dew point temperature and temperature and pressure relative humidity from dew point temperature and temperature and pressure wet bulb temperature from dew point temperature and temperature and pressure dew point depression from dew point temperature and temperature

There are also routines for: vapour pressure from specific humidity and pressure and temperature dew point temperature from vapour pressure and temperature and pressure relative humidity from vapour pressure and temperature and pressure wet bulb temperature from vapour pressure and dew point temperature and temperature (and pressure?)

Where vapour pressure is used as part of the equation a pseudo wet bulb temperature is calculated. If this is at or below 0 deg C then the ice bulb equation is used.

ALL NUMBERS ARE RETURNED TO ONE SIGNIFICANT DECIMAL FIGURE.

THIS ROUTINE CANNOT COPE WITH MISSING DATA

THIS ROUTINE HAS a roundit=True/False. The default is True - round to one decimal place. Otherwise - set roundit=False

Written by Kate Willett 7th Feb 2016

marine_qc.calculate_humidity.dpd(td, t, roundit=True)[source]

Calculate dew point depression from dew point temperature and dry bulb temperature.

Parameters:
  • td (float) – Dew point temperature in degrees C (array or scalar).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Dew point depression in degrees C (array or scalar).

Notes

Ref:

TESTED! dpd = dpd(10..,15.) dpd = 5.0

marine_qc.calculate_humidity.rh(td, t, p, roundit=True)[source]

Calculate relative humidity from dew point temperature, dry bulb temperature and pressure.

It calculates the saturated vapour pressure from t. It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK). To test whether to apply the ice or water calculation a dewpoint and dry bulb temperature are needed. We can assume that the dry bulb t is the same as the wet bulb t at saturation. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:
  • td (float) – Dew point temperature in degrees C (array or scalar).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Relative humidity in %rh (array or scalar).

Notes

Ref:

TESTED! rh = rh(10.,15.,1013.) rh = 72.0

marine_qc.calculate_humidity.sh(td, t, p, roundit=True)[source]

Calculate specific humidity dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK).

Parameters:
  • td (float) – Dew point temperature in degrees C (array or scalar).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Specific humidity in g/kg (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! sh = sh(10.,15.,1013.) sh = 7.6

marine_qc.calculate_humidity.sh_from_vap(e, p, roundit=True)[source]

Calculate specific humidity from vapour pressure and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if vapour pressure is an array (CHECK).

Parameters:
  • e (float) – Vapour pressure in hPa (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Specific humidity in g/kg (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! sh = sh(10.,15.,1013.) sh = 7.6

marine_qc.calculate_humidity.td_from_vap(e, p, t, roundit=True)[source]

Calculate dew point depression from vapour pressure, pressure and dry bulb temperature.

It also requires temperature to check whether the wet bulb temperature is <= 0.0 - if so the ice bulb calculation is used.

Parameters:
  • e (float) – Vapour pressure in hPa (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Dew point depression in degrees C (array or scalar).

Notes

Buck 1981 Buck, A. L.: New equations for computing vapor pressure and enhancement factor, J. Appl. Meteorol., 20, 1527?1532, 1981. Jenson et al. 1990 Jensen, M. E., Burman, R. D., and Allen, R. G. (Eds.): Evapotranspiration and Irrigation Water Requirements: ASCE Manuals and Reports on Engineering Practices No. 70, American Society of Civil Engineers, New York, 360 pp., 1990.

TESTED! td = td_from_vap(12.3,1013.,15.) td = 10.0

marine_qc.calculate_humidity.vap(td, t, p, roundit=True)[source]

Calculate vapour pressure from dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if dewpoint temperature is an array (CHECK). To test whether to apply the ice or water calculation a dry bulb temperature is needed. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:
  • td (float) – Dew point temperature in degrees C (array or scalar).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Vapour pressure in hPa (array or scalar).

Notes

Buck 1981 Buck, A. L.: New equations for computing vapor pressure and enhancement factor, J. Appl. Meteorol., 20, 1527?1532, 1981. Jenson et al. 1990 Jensen, M. E., Burman, R. D., and Allen, R. G. (Eds.): Evapotranspiration and Irrigation Water Requirements: ASCE Manuals and Reports on Engineering Practices No. 70, American Society of Civil Engineers, New York, 360 pp., 1990.

TESTED! e = vap(10.,15.,1013.) e = 12.3

marine_qc.calculate_humidity.vap_from_sh(sh, p, roundit=True)[source]

Calculate vapour pressure from specific humidity and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even if specific humidity is an array (CHECK).

Parameters:
  • sh (float) – Specific humidity in g/kg (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Vapour pressure in hPa (array or scalar).

Notes

Peixoto & Oort, 1996, Ross & Elliott, 1996 Peixoto, J. P. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 3443?3463, 1996.

TESTED! e = vap_from_sh(7.6,1013.) e = 12.3

marine_qc.calculate_humidity.wb(td, t, p, roundit=True)[source]

Calculate wet bulb temperature from dew point temperature, dry bulb temperature and pressure.

It requires a sea (station actually but sea level ok for marine data) level pressure value. This can be a scalar or an array, even ifvapour pressure is an array (CHECK). To test whether to apply the ice or water calculation a dewpoint and dry bulb temperature are needed. This allows calculation of a pseudo-wet bulb temperature (imprecise) first. If the wet bulb temperature is at or below 0 deg C then the ice calculation is used.

Parameters:
  • td (float) – Dew point temperature in degrees C (array or scalar).

  • t (float) – Dry bulb temperature in degrees C (array or scalar).

  • p (float) – Pressure at observation level in hPa (array or scalar - can be scalar even if others are arrays).

  • roundit (bool) – Flag to tell function to round to one decimal place, default TRUE.

Return type:

float

Returns:

float – Wet bulb temperature in degrees C (array or scalar).

Notes

Ref: Jenson et al. 1990 Jensen, M. E., Burman, R. D., and Allen, R. G. (Eds.): Evapotranspiration and Irrigation Water Requirements: ASCE Manuals and Reports on Engineering Practices No. 70, American Society of Civil Engineers, New York, 360 pp., 1990.

TESTED! wb = wb(10.,15.,1013) wb = 12.2

marine_qc.external_clim module

Module to read external climatology files.

class marine_qc.external_clim.Climatology(data, time_axis=None, lat_axis=None, lon_axis=None, source_units=None, target_units=None, valid_ntime=None)[source]

Bases: object

Class for dealing with climatologies, reading, extracting values etc.

Automatically detects if this is a single field, pentad or daily climatology.

Parameters:
  • data (xr.DataArray) – Climatology data.

  • time_axis (str, optional) – Name of time axis. Set if time axis in data is not CF compatible.

  • lat_axis (str, optional) – Name of latitude axis. Set if latitude axis in data is not CF compatible.

  • lon_axis (str, optional) – Name of longitude axis. Set if longitude axis in data is not CF compatible.

  • source_units (str, optional) – Name of units in data. Set if units are not defined in data.

  • target_units (str, optional) – Name of target units to which units must conform.

  • valid_ntime (int or list, default: [1, 73, 365]) – Number of valid time steps: - 1: single field climatology - 73: pentad climatology - 365: daily climatology

convert_units_to(target_units, source_units=None)[source]

Convert units to user-specific units.

Parameters:
  • target_units (str) – Target units to which units must conform.

  • source_units (str, optional) – Source units if not specified in Climatology.

Return type:

None

Notes

For more information see: xclim.core.units.convert_units_to()

static get_t_index(month, day, ntime)[source]

Convert arrays of months and days to an array of indices for the grid.

Parameters:
  • month (ndarray) – Array of months.

  • day (ndarray) – Array of days.

  • ntime (int) – Number of time points in the grid, valid values are 1, 73 (pentad resolution) and 365 (daily resolution).

Return type:

ndarray

Returns:

ndarray – Array of indices.

get_tindex(month, day)[source]

Get the time index of the input month and day.

Parameters:
  • month (int) – Month for which the time index is required.

  • day (int) – Day for which the time index is required.

Return type:

int

Returns:

int – Time index for specified month and day.

get_value(lat, lon, date=None, month=None, day=None)[source]

Get the value from a climatology at the give position and time.

Parameters:
Return type:

ndarray | Series

Returns:

ndarray or pd.Series – Climatology value at specified location and time.

Notes

Use only exact matches for selecting time and nearest valid index value for selecting location.

get_value_fast(lat, lon, date=None, month=None, day=None)[source]

Get the value from a climatology at the give position and time.

Parameters:
Return type:

ndarray | Series

Returns:

ndarray or pd.Series – Climatology value at specified location and time.

Notes

Assumes that the grid is a regular latitude longitude grid. The alternative method get_value works with non-regular grids.

static get_x_index(lon_arr, lon_axis)[source]

Convert an array of longitudes to an array of indices for the grid.

Parameters:
  • lon_arr (ndarray) – Array of longitudes.

  • lon_axis (ndarray) – Array containing the longitude axis.

Return type:

ndarray

Returns:

ndarray – Array of indices.

static get_y_index(lat_arr, lat_axis)[source]

Convert an array of latitudes to an array of indices for the grid.

Parameters:
  • lat_arr (np.ndarray) – Array of latitudes.

  • lat_axis (np.ndarray) – Array containing the latitude axis.

Return type:

ndarray

Returns:

np.ndarray – Array of indices.

classmethod open_netcdf_file(file_name, clim_name, **kwargs)[source]

Open a NetCDF climatology file and construct a Climatology instance.

Parameters:
  • file_name (str or path-like) – Path to the NetCDF file to open.

  • clim_name (str) – Name of the climatology variable within the NetCDF file.

  • **kwargs (dict) – Additional keyword arguments passed to the Climatology constructor.

Return type:

Climatology

Returns:

Climatology – A Climatology instance constructed from the specified variable in the NetCDF file. If the file cannot be opened, an empty climatology object is returned.

marine_qc.external_clim.get_climatological_value(climatology, **kwargs)[source]

Get the value from a climatology.

Parameters:
  • climatology (Climatology) – Climatology class.

  • **kwargs (dict) – Pass keyword-arguments to :py:func:~Climatology.get_value`.

Return type:

ndarray

Returns:

ndarray – Climatology value at specified location and time.

marine_qc.external_clim.inspect_climatology(*climatology_keys, optional=None)[source]

A decorator factory to preprocess function arguments that may be Climatology objects.

This decorator inspects the specified function arguments and normalizes them to concrete numerical values before the decorated function is executed. Supported input types include raw numeric values, xarray objects, file paths, and Climatology instances.

Parameters:
  • *climatology_keys (str) – Names of required function arguments to be inspected. These should be arguments that may be:

    • a numeric value

    • a xr.DataArray

    • a xr.Dataset

    • a string or path-like object pointing to a valid NetCDF file on disk

    • a Climatology instance

    If a Climatology object (or an object convertible to one) is detected, it will be resolved to a concrete value using its .get_value_fast(**kwargs) method.

  • optional (str or sequence of str, optional) – Argument names that should be treated as optional. If they are explicitly passed when the decorated function is called, they will be treated the same way as climatology_keys.

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – A decorator that wraps the target function, processing specified arguments before the function is called.

Raises:
  • TypeError – If a required climatology argument is missing from the decorated function call

  • ValueError – If an xr.Dataset is provided without specifying clim_name, or if a string/Path input does not point to a valid file on disk.

Warns:

UserWarning – Issued if required keyword arguments for get_value_fast() are missing. This warning does not stop execution; missing values are replaced with np.nan.

Notes

  • xr.Dataset inputs require the keyword argument clim_name to select the relevant data variable.

  • xr.DataArray inputs are automatically wrapped in a Climatology object.

  • String or path-like inputs must point to an existing file and are opened via open_netcdf_file().

  • If a Climatology object is processed, it is resolved using .get_value_fast(**kwargs).

  • If required keyword arguments for .get_value_fast() are missing, a warning is issued.

  • If resolution fails due to TypeError or ValueError, the value is replaced with np.nan.

marine_qc.external_clim.open_xrdataset(files, use_cftime=True, decode_cf=False, decode_times=False, parallel=False, data_vars='minimal', chunks='default', coords='minimal', compat='override', combine='by_coords', **kwargs)[source]

Optimized function for opening large CF-compliant datasets with xarray.

This implementation follows guidance from: https://github.com/pydata/xarray/issues/1385#issuecomment-561920115

decode_timedelta=False is added to leave variables and coordinates with time units in {“days”, “hours”, “minutes”, “seconds”, “milliseconds”, “microseconds”} encoded as numbers.

Parameters:
Return type:

Dataset

Returns:

xarray.Dataset – Opened xarray Dataset, optimized for large CF datasets.

marine_qc.location_control module

Some generally helpful location control functions for base QC.

marine_qc.location_control.fill_missing_vals(q11, q12, q21, q22)[source]

Fill missing values.

For a group of four neighbouring grid boxes which form a square, with values q11, q12, q21, q22, fill gaps using means of neighbours.

Parameters:
  • q11 (float) – Value of first gridbox.

  • q12 (float) – Value of second gridbox.

  • q21 (float) – Value of third gridbox.

  • q22 (float) – Value of fourth gridbox.

Return type:

tuple[float | None, float | None, float | None, float | None]

Returns:

tuple of float – A tuple of four floats representing neighbour means.

marine_qc.location_control.filler(value_to_fill, neighbour1, neighbour2, opposite)[source]

Fill invalid values.

If the value_to_fill is invalid it is replaced with the mean of the neighbours and if it is still invalid then it is replaced with the value from the opposite member.

Parameters:
  • value_to_fill (float) – The value to fill.

  • neighbour1 (float) – The first neighbour.

  • neighbour2 (float) – The second neighbour.

  • opposite (float) – The opposite member.

Return type:

float | None

Returns:

float – Filled invalid input values.

marine_qc.location_control.get_four_surrounding_points(lat, lon, res, max90=True)[source]

Get the four surrounding points of a specified latitude and longitude point.

Parameters:
  • lat (float) – Latitude of point.

  • lon (float) – Longitude of point.

  • res (int) – Resolution of the grid in degrees.

  • max90 (bool, default: True) – If True then cap latitude at 90.0, otherwise don’t cap latitude.

Return type:

tuple[float, float, float, float]

Returns:

tuple of floats – A tuple of floats representing the longitudes of the leftmost and rightmost pairs of points, and the latitudes of the topmost and bottommost pairs of points.

marine_qc.location_control.lat_to_yindex(lat, res)[source]

For a given latitude return the y index in a 1x1x5-day global grid.

Parameters:
  • lat (float) – Latitude of the point.

  • res (float) – Resolution of grid in degrees.

Return type:

int

Returns:

int – Grid box index.

Notes

The routine assumes that the structure of the SST array is a grid that is 360 x 180 x 73 i.e. one year of 1degree lat x 1degree lon data split up into pentads. The west-most box is at 180degrees with index 0 and the northernmost box also has index zero. Inputs on the border between grid cells are pushed south.

In previous versions, res had the default value 1.0.

marine_qc.location_control.lon_to_xindex(lon, res)[source]

For a given longitude return the x index in a 1x1x5-day global grid.

Parameters:
  • lon (float) – Longitude of the point.

  • res (float) – Resolution of grid in degrees.

Return type:

int

Returns:

int – Grid box index.

Notes

The routine assumes that the structure of the SST array is a grid that is 360 x 180 x 73 i.e. one year of 1degree lat x 1degree lon data split up into pentads. The west-most box is at 180degrees W with index 0 and the northernmost box also has index zero. Inputs on the border between grid cells are pushed east.

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lat_to_yindex(lat, res)[source]

For a given latitude return the y-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:
  • lat (float) – Latitude of the point.

  • res (float) – Resolution of grid in degrees.

Return type:

int

Returns:

int – Grid box index.

Notes

In the northern hemisphere, borderline latitudes which fall on grid boundaries are pushed north, except 90 which goes south. In the southern hemisphere, they are pushed south, except -90 which goes north. At 0 degrees they are pushed south.

Expects that latitudes run from 90N to 90S

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lat_to_yindex_fast(lat, res)[source]

For a given latitude return the y-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:
  • lat (np.ndarray) – Latitude(s) of observation in degrees.

  • res (float) – Resolution of grid in degrees.

Return type:

ndarray

Returns:

np.ndarray – Grid box indexes.

Notes

In the northern hemisphere, borderline latitudes which fall on grid boundaries are pushed north, except 90 which goes south. In the southern hemisphere, they are pushed south, except -90 which goes north. At 0 degrees they are pushed south.

Expects that latitudes run from 90N to 90S

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lon_to_xindex(lon, res)[source]

For a given longitude return the x-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:
  • lon (float) – Longitude of the point.

  • res (float) – Resolution of grid in degrees.

Return type:

int

Returns:

int – Grid box index.

Notes

In the western hemisphere, borderline longitudes which fall on grid boundaries are pushed west, except -180 which goes east. In the eastern hemisphere, they are pushed east, except 180 which goes west. At 0 degrees they are pushed west.

In previous versions, res had the default value 1.0.

marine_qc.location_control.mds_lon_to_xindex_fast(lon, res)[source]

For a given longitude return the x-index as it was in MDS2/3 in a 1x1 global grid.

Parameters:
  • lon (np.ndarray) – Longitude(s) of observation in degrees.

  • res (float) – Resolution of grid in degrees.

Return type:

ndarray

Returns:

np.ndarray – Grid box indexes.

Notes

In the western hemisphere, borderline longitudes which fall on grid boundaries are pushed west, except -180 which goes east. In the eastern hemisphere, they are pushed east, except 180 which goes west. At 0 degrees they are pushed west.

In previous versions, res had the default value 1.0.

marine_qc.location_control.xindex_to_lon(xindex, res)[source]

Convert xindex to longitude.

Parameters:
  • xindex (int) – Index of the longitude.

  • res (float) – Resolution of grid in degrees.

Return type:

float

Returns:

float – Longitude (degrees).

Notes

In previous versions, res had the default value 1.0.

marine_qc.location_control.yindex_to_lat(yindex, res)[source]

Convert yindex to latitude.

Parameters:
  • yindex (int) – Index of the latitude.

  • res (float) – Resolution of grid in degrees.

Return type:

float

Returns:

float – Latitude (degrees).

Notes

In previous versions, res had the default value 1.0.

marine_qc.multiple_checks module

Module containing base QC which call multiple QC functions and could be applied on a DataBundle.

marine_qc.multiple_checks.do_multiple_grouped_check(data, qc_dict=None, preproc_dict=None, return_method='all')[source]

Apply one or more buddy-check quality-control (QC) functions to a DataFrame or Series.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments). For more information see Examples.

  • preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.

  • return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame or pd.Series – A DataFrame (or Series if the input was a Series) whose columns correspond to the QC names in qc_dict and whose values contain QC flags for each row. Flags depend on the QC functions used.

Raises:
  • NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.

  • ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see do_multiple_individual_checks().

marine_qc.multiple_checks.do_multiple_individual_check(data, qc_dict=None, preproc_dict=None, return_method='all')[source]

Apply one or more quality-control (QC) functions independently to each row of a DataFrame or Series.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments). For more information see Examples.

  • preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.

  • return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame or pd.Series – A DataFrame (or Series if the input was a Series) whose columns correspond to the QC names in qc_dict and whose values contain QC flags for each row. Flags depend on the QC functions used.

Raises:
  • NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.

  • ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see Examples.

Examples

An example qc_dict for a hard limit test:

qc_dict = {
    "hard_limit_check": {
        "func": "do_hard_limit_check",
        "names": "ATEMP",
        "arguments": {"limits": [193.15, 338.15]},
    }
}

An example qc_dict for a climatology test. Variable “climatology” was previously defined:

qc_dict = {
    "climatology_check": {
        "func": "do_climatology_check",
        "names": {
            "value": "observation_value",
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time",
        },
        "arguments": {
            "climatology": climatology,
            "maximum_anomaly": 10.0,  # K
        },
    },
}

An example preproc_dict for extracting a climatological value:

preproc_dict = {
    "func": "get_climatological_value",
    "names": {
        "lat": "latitude",
        "lon": "longitude",
        "date": "date_time",
    },
    "inputs": climatology,
}

Make use of both dictionaries:

preproc_dict = {
    "func": "get_climatological_value",
    "names": {
        "lat": "latitude",
        "lon": "longitude",
        "date": "date_time",
    },
    "inputs": climatology,
}

qc_dict = {
    "climatology_check": {
        "func": "do_climatology_check",
        "names": {
            "value": "observation_value",
        },
        "arguments": {
            "climatology": "__preprocessed__",
            "maximum_anomaly": 10.0,  # K
        },
    },
}

Finally, run the function:

do_multiple_individual_check(
    data=df,
    qc_dict=qc_dict,
    preproc_dict=preproc_dict,
    return_method="failed",
)
marine_qc.multiple_checks.do_multiple_sequential_check(data, groupby=None, qc_dict=None, preproc_dict=None, return_method='all')[source]

Apply one or more sequential quality-control (QC) functions to groups of a DataFrame or Series.

Typically for time-ordered or track-based checks.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • groupby (str, iterable of str, or pandas GroupBy, optional) – Specifies how the data should be grouped before applying QC functions. If a string or iterable of strings, data.groupby is called on those keys. If a pandas.DataFrameGroupBy object is provided, its groups are used directly. Any groups that contain indices not present in data are automatically trimmed. If None, the entire input data is treated as a single group. For more information see Examples.

  • qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments).

  • preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.

  • return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame or pd.Series – A DataFrame (or Series if the input was a Series) whose columns correspond to the QC names in qc_dict and whose values contain QC flags for each row. Flags depend on the QC functions used.

Raises:
  • NameError – If a function listed in qc_dict or preproc_dict is not defined. If columns listed in qc_dict or preproc_dict are not available in data.

  • ValueError – If return_method is not one of [“all”, “passed”, “failed”] If variable names listed in qc_dict or preproc_dict are not valid parameters of the QC function.

Notes

If a variable is pre-processed using preproc_dict, mark the variable name as “__preprocessed__” in qc_dict. For example: “climatology”: “__preprocessed__”.

For more information, see do_multiple_individual_checks().

marine_qc.plot_qc_outcomes module

Plot QC outcomes.

Some plotting routines for QC outcomes

marine_qc.plot_qc_outcomes.latitude_longitude_plot(lat, lon, qc_outcomes, filename=None)[source]

Plot a graph of points showing the latitude and longitude of a set of observations coloured according to the QC outcomes.

Parameters:
  • lat (np.ndarray) – Array of latitude values in degrees.

  • lon (np.ndarray) – Array of longitude values in degrees.

  • qc_outcomes (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.

  • filename (str or None) – Filename to save the figure to. If None, the figure is saved with a standard name.

Return type:

Figure

Returns:

Figure – The main figure obkect created by plt.subplots().

marine_qc.plot_qc_outcomes.latitude_variable_plot(lat, value, qc_outcomes, filename=None)[source]

Plot a graph of points showing the latitude and value of a set of observations coloured according to the QC oucomes.

Parameters:
  • lat (np.ndarray) – Array of latitude values in degrees.

  • value (np.ndarray) – Array of observed values for the variable.

  • qc_outcomes (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.

  • filename (str or None) – Filename to save the figure to. If None, the figure is saved with a standard name.

Return type:

Figure

Returns:

Figure – The main figure obkect created by plt.subplots().

marine_qc.qc_grouped_reports module

QC of grouped reports.

Module containing QC functions for quality control of grouped marine reports.

class marine_qc.qc_grouped_reports.SuperObsGrid[source]

Bases: object

Class for gridding data in buddy check, based on numpy arrays.

add_multiple_observations(lat, lon, value, date=None, month=None, day=None)[source]

Add a series of observations to the grid and take the grid average.

Parameters:
Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Return type:

None

Notes

The observations should be anomalies.

add_single_observation(lat, lon, month, day, anom)[source]

Add an anomaly to the grid from specified lat lon and date.

Parameters:
  • lat (float) – Latitude of the observation in degrees.

  • lon (float) – Longitude of the observation in degrees.

  • month (int) – Month of the observation.

  • day (int) – Day of the observation.

  • anom (float) – Value to be added to the grid.

Return type:

None

Returns:

None – The function performs its operations in-place and does not return anything.

get_buddy_limits_with_parameters(pentad_stdev, limits, number_of_obs_thresholds, multipliers)[source]

Get buddy limits with parameters.

Parameters:
  • pentad_stdev (Climatology) – Climatology containing the 3-dimensional latitude array containing the standard deviations.

  • limits (list[list[int]]) – List of the limits.

  • number_of_obs_thresholds (list[list[int]]) – List containing the number of obs thresholds.

  • multipliers (list[list[float]]) – List containing the multipliers to be applied.

Return type:

None

Returns:

None – The function performs its operations in-place and does not return anything.

get_buddy_mean(lat, lon, month, day)[source]

Get the buddy mean from the grid for a specified time and place.

Parameters:
  • lat (float) – Latitude of the location for which the buddy mean is desired.

  • lon (float) – Longitude of the location for which the buddy mean is desired.

  • month (int) – Month for which the buddy mean is desired.

  • day (int) – Day for which the buddy mean is desired.

Return type:

float

Returns:

float – Buddy mean at the specified location.

get_buddy_stdev(lat, lon, month, day)[source]

Get the buddy standard deviation from the grid for a specified time and place.

Parameters:
  • lat (float) – Latitude of the location for which the buddy standard deviation is desired.

  • lon (float) – Longitude of the location for which the buddy standard deviation is desired.

  • month (int) – Month for which the buddy standard deviation is desired.

  • day (int) – Day for which the buddy standard deviation is desired.

Return type:

float

Returns:

float – Buddy standard deviation at the specified location.

get_neighbour_anomalies(search_radius, xindex, yindex, pindex)[source]

Search within a specified search radius of the given point and extract the neighbours for buddy check.

Parameters:
  • search_radius (list[int]) – Three element array search radius [lon, lat, time].

  • xindex (int) – The xindex of the gridcell to start from.

  • yindex (int) – The yindex of the gridcell to start from.

  • pindex (int) – The pindex of the gridcell to start from.

Return type:

tuple[list[float], list[float]]

Returns:

tuple of list of float – Anomalies and numbers of observations in two lists.

get_new_buddy_limits(stdev1, stdev2, stdev3, limits, sigma_m, noise_scaling)[source]

Get buddy limits for new bayesian buddy check.

Parameters:
  • stdev1 (Climatology) – Field of standard deviations representing standard deviation of difference between target gridcell and complete neighbour average (grid area to neighbourhood difference).

  • stdev2 (Climatology) – Field of standard deviations representing standard deviation of difference between a single observation and the target gridcell average (point to grid area difference).

  • stdev3 (Climatology) – Field of standard deviations representing standard deviation of difference between random neighbour gridcell and full neighbour average (uncertainty in neighbour average).

  • limits (list[int, int, int]) – Three membered list of number of degrees in latitude and longitude and number of pentads.

  • sigma_m (float) – Estimated measurement error uncertainty.

  • noise_scaling (float) – Scale noise by a factor of noise_scaling used to match observed variability.

Return type:

None

Returns:

None – The function performs its operations in-place and does not return anything.

Notes

The original default values for limits, sigma_m, and noise_scaling originally defaulted to:

  • limits = (2, 2, 4)

  • sigma_m = 1.0

  • noise_scaling = 3.0

take_average()[source]

Take the average of a grid to which reps have been added using add_rep.

Return type:

None

marine_qc.qc_grouped_reports.do_bayesian_buddy_check(lat, lon, date, value, climatology, stdev1, stdev2, stdev3, prior_probability_of_gross_error, quantization_interval, one_sigma_measurement_uncertainty, limits, noise_scaling, maximum_anomaly, fail_probability, ignore_indexes=None)[source]

Do the Bayesian buddy check.

The bayesian buddy check assigns a probability of gross error to each observation, which is rounded down to the tenth and then multiplied by 10 to yield a flag between 0 and 9.

Parameters:
  • lat (SequenceNumberType) – 1-dimensional latitude array.

  • lon (SequenceNumberType) – 1-dimensional longitude array.

  • date (SequenceDatetimeType) – 1-dimensional date array.

  • value (SequenceNumberType) – 1-dimensional anomaly array.

  • climatology (ClimArgType) – The climatological average(s) used to calculate anomalies. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • stdev1 (Climatology) – Field of standard deviations representing standard deviation of difference between target gridcell and complete neighbour average (grid area to neighbourhood difference).

  • stdev2 (Climatology) – Field of standard deviations representing standard deviation of difference between a single observation and the target gridcell average (point to grid area difference).

  • stdev3 (Climatology) – Field of standard deviations representing standard deviation of difference between random neighbour gridcell and full neighbour average (uncertainty in neighbour average).

  • prior_probability_of_gross_error (float) – Prior probability of gross error, which is the background rate of gross errors.

  • quantization_interval (float) – Smallest possible increment in the input values.

  • one_sigma_measurement_uncertainty (float) – Estimated one sigma measurement uncertainty.

  • limits (list[int]) – List with three members which specify the search range for the buddy check.

  • noise_scaling (float) – Tuning parameter used to multiply stdev2. This was determined to be approximately 3.0 by comparison with observed point data. stdev2 was estimated from OSTIA data and typically underestimates the point to area-average difference by this factor.

  • maximum_anomaly (float) – Largest absolute anomaly, assumes that the maximum and minimum anomalies have the same magnitude.

  • fail_probability (float) – Probability of gross error that corresponds to a failed test. Anything with a probability of gross error greater than fail_probability will be considered failing.

  • ignore_indexes (list[int], optional) – List of row numbers to be skipped.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 2s if there are no buddies in the specified limits

  • Returns array/sequence/Series of 1s if the bayesian buddy check fails

  • Returns or array/sequence/Series of 0s otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions the default values for the parameters were:

  • prior_probability_of_gross_error = 0.05

  • quantization_interval = 0.1

  • limits = [2, 2, 4]

  • noise_scaling = 3.0

  • one_sigma_measurement_uncertainty = 1.0

  • maximum_anomaly = 8.0

  • fail_probability = 0.3

marine_qc.qc_grouped_reports.do_mds_buddy_check(lat, lon, date, value, climatology, standard_deviation, limits, number_of_obs_thresholds, multipliers, ignore_indexes=None)[source]

Do the old style buddy check.

The buddy check compares an observation to the average of its near neighbours (called the buddy mean). Depending on how many neighbours there are and their proximity to the observation being tested a multiplier is set. If the difference between the observation and the buddy mean is larger than the multiplier times the standard deviation then the observation fails the buddy check. If no buddy observations are found within the specified limits, then the limits are expanded until the check runs out of specified limits or observations are found within the limits.

Parameters:
  • lat (SequenceNumberType) – 1-dimensional latitude array.

  • lon (SequenceNumberType) – 1-dimensional longitude array.

  • date (SequenceDatetimeType) – 1-dimensional date array.

  • value (SequenceNumberType) – 1-dimensional anomaly array.

  • climatology (ClimArgType) – The climatological average(s) used to calculate anomalies. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • standard_deviation (Climatology) – Field of standard deviations of 1x1xpentad standard deviations.

  • limits (list[list]) – Limits a list of lists. Each list member is a three-membered list specifying the longitudinal, latitudinal, and time range within which buddies are sought at each level of search.

  • number_of_obs_thresholds (list[list]) – Number of observations corresponding to each multiplier in multipliers. The initial list should be the same length as the limits list.

  • multipliers (list[list]) – Multiplier, x, used for buddy check mu +- x * sigma. The list should have the same structure as number_of_obs_threshold.

  • ignore_indexes (list[int], optional) – List of row numbers to be skipped.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the MDS buddy check fails

  • Returns or array/sequence/Series of 0s otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

Notes

The limits, number_of_obs_thresholds, and multipliers parameters are rather complex. The buddy check basically looks within a lat-lon-time range specified by the first element in limits. If there are more than zero observations in the search range then a multiplier is chosen based on how many observations there are.

If the first element of limits is [1,1,2] then we first look within a distance equivalent to 1 degree latitude and longitude at the equator and 2 pentads in time. If there are more than zero observations then we calculate the buddy mean, and we consult the number_of_obs_threshold. If, for example, this is [0, 5, 15, 100] then we look for the first entry where the number of obs is greater than that threshold. We then look up the multiplier in the appropriate list (say [4, 3.5, 3.0, 2.5]). If the difference between an observation and the buddy mean is greater than the multiplier times the standard deviation at that point then it fails the buddy check. So, if there were 10 observations then the multiplier would be 3.5.

Previous versions had default values for the parameters of:

  • limits = [[1, 1, 2], [2, 2, 2], [1, 1, 4], [2, 2, 4]]

  • number_of_obs_thresholds = [[0, 5, 15, 100], [0], [0, 5, 15, 100], [0]]

  • multipliers = [[4.0, 3.5, 3.0, 2.5], [4.0], [4.0, 3.5, 3.0, 2.5], [4.0]]

marine_qc.qc_grouped_reports.get_threshold_multiplier(total_nobs, nob_limits, multiplier_values)[source]

Find the highest value of i such that total_nobs is greater than nob_limits[i] and return multiplier_values[i].

This routine is used by the buddy check. It’s a bit niche.

Parameters:
  • total_nobs (int) – Total number of neighbour observations.

  • nob_limits (list[int]) – List containing the limiting numbers of observations in ascending order first element must be zero.

  • multiplier_values (list[float]) – List containing the multiplier values associated..

Return type:

float

Returns:

float – The multiplier value.

marine_qc.qc_individual_reports module

QC of individual reports.

Module containing main QC functions which could be applied on a DataBundle.

marine_qc.qc_individual_reports.do_climatology_check(value, climatology, maximum_anomaly, standard_deviation='default', standard_deviation_limits=None, lowbar=None)[source]

Climatology check to compare a value with a climatological average within specified anomaly limits.

This check supports optional parameters to customize the comparison.

If standard_deviation is provided, the value is converted into a standardised anomaly. Optionally, if standard deviation is outside the range specified by standard_deviation_limits then standard_deviation is set to whichever of the lower or upper limits is closest. If lowbar is provided, the anomaly must be greater than lowbar to fail regardless of standard_deviation.

Parameters:
  • value (ValueNumberType) – Value(s) to be compared to climatology. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • climatology (ClimArgType) – The climatological average(s) to which the values(s) will be compared. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • maximum_anomaly (float) – Largest allowed anomaly. If standard_deviation is provided, this is interpreted as the largest allowed standardised anomaly.

  • standard_deviation (ClimArgType, default: "default") – The standard deviation(s) used to standardise the anomaly If set to “default”, it is internally treated as 1.0. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • standard_deviation_limits (tuple of float, optional) – A tuple of two floats representing the upper and lower limits for standard deviation used in the check.

  • lowbar (float, optional) – The anomaly must be greater than lowbar to fail regardless of standard deviation.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if standard_deviation_limits[1] is less than or equal to standard_deviation_limits[0], or if maximum_anomaly is less than or equal to 0, or if any of value, climate_normal, or standard_deviation is numerically invalid (None or NaN).

  • Returns 1 (or array/sequence/Series of 1s) if the difference is outside the specified range.

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

Notes

If either climatology or standard_deviation is a Climatology object, pass lon and lat and date, or month and day, as keyword arguments to extract the relevant climatological value(s).

marine_qc.qc_individual_reports.do_date_check(date=None, year=None, month=None, day=None, year_init=None, year_end=None)[source]

Perform the date QC check on the report. Checks whether the given date or date components are valid.

Parameters:
  • date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • year (ValueIntType, optional) – Year(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • month (ValueIntType, optional) – Month(s) of observation (1-12). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • day (ValueIntType, optional) – Day(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • year_init (int, optional) – Initial valid year.

  • year_end (int, optional) – Last valid year.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if any of year, month, or day is numerically invalid or None,

  • Returns 1 (or array/sequence/Series of 1s) if the date is not valid,

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_day_check(date=None, year=None, month=None, day=None, hour=None, lat=None, lon=None, time_since_sun_above_horizon=None)[source]

Determine if the sun was above the horizon a specified time before the report.

This “day” test is used to classify Marine Air Temperature (MAT) measurements as either Night MAT (NMAT) or Day MAT, accounting for solar heating biases and a potential lag between sun rise and the onset of significant warming. The function calculates the sun’s elevation using the sunangle function, offset by the specified time_since_sun_above_horizon.

Parameters:
  • date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • year (ValueIntType, optional) – Year(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • month (ValueIntType, optional) – Month(s) of observation (1-12). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • day (ValueIntType, optional) – Day(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • hour (ValueFloatType, optional) – Hour(s) of observation (minutes as decimal). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lat (ValueNumberType, optional) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (ValueNumberType, optional) – Longitude() of observation in degree. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • time_since_sun_above_horizon (float) – Maximum time sun can have been above horizon (or below) to still count as night. Original QC test had this set to 1.0 i.e. it was night between one hour after sundown and one hour after sunrise.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if any of do_position_check, do_date_check, or do_time_check returns 2.

  • Returns 1 (or array/sequence/Series of 1s) if any of do_position_check, do_date_check, or do_time_check returns 1 or if it is night (sun below horizon an hour ago).

  • Returns 0 if it is day (sun above horizon an hour ago).

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

See also

do_night_check

Determine if the sun was above the horizon an hour ago based on date, time, and position.

Notes

In previous versions, time_since_sun_above_horizon has the default value 1.0 as one hour is used as a definition of “day” for marine air temperature QC. Solar heating biases were considered to be negligible mmore than one hour after sunset and up to one hour after sunrise.

marine_qc.qc_individual_reports.do_hard_limit_check(value, limits)[source]

Check if a value is outside specified limits.

Parameters:
  • value (ValueNumberType) – The value(s) to be tested against the limits. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • limits (tuple of float) – A tuple of two floats representing the lower and upper limit.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if the upper limit is less than or equal to the lower limit, or if the input is invalid (None or NaN).

  • Returns 1 (or array/sequence/Series of 1s) if value(s) are outside the specified limits.

  • Returns 0 (or array/sequence/Series of 0s) if value(s) are within limits.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_landlocked_check(lat, lon, land_sea_mask, land_flag)[source]

Check input position(s) to determine whether they correspond to a land point.

Parameters:
  • lat (ValueNumberType) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (ValueNumberType) – Longitude() of observation in degree. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • land_sea_mask (ClimArgType) – Land-sea classification value(s) to which the latitude and longitude values(s) will be compared. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • land_flag (int) – Integer value in land_sea_mask that denotes a land point.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if either latitude or longitude is numerically invalid (None/NaN).

  • Returns 1 (or array/sequence/Series of 1s) if the position does not correspond to a land point

  • Returns 0 (or array/sequence/Series of 0s) otherwise

Raises:

ValueError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_maritime_check(lat, lon, sea_land_mask, sea_flag)[source]

Check input position(s) to determine whether they correspond to a sea point.

Parameters:
  • lat (ValueNumberType) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (ValueNumberType) – Longitude() of observation in degree. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • sea_land_mask (ClimArgType) – Sea-land classification value(s) to which the latitude and longitude values(s) will be compared. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • sea_flag (int) – Integer value in sea_land_mask that denotes a sea point.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if either latitude or longitude is numerically invalid (None/NaN).

  • Returns 1 (or array/sequence/Series of 1s) if latitude and longitude denotes not a sea point

  • Returns 0 (or array/sequence/Series of 0s) otherwise

Raises:

ValueError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_missing_value_check(value)[source]

Check if a value is equal to None or numerically invalid (NaN).

Parameters:

value (ValueNumberType) – The input value(s) to be tested. Can be a scalar, sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 1 (or array/sequence/Series of 1s) if the input value is None or numerically invalid (NaN)

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays in value_check() does not return np.ndarrays.

marine_qc.qc_individual_reports.do_missing_value_clim_check(climatology, **kwargs)[source]

Check if a climatological value is equal to None or numerically invalid (NaN).

Parameters:
  • climatology (ClimArgType) – The input climatological value(s) to be tested. Can be a scalar, sequence, a one-dimensional NumPy array, a pandas Series, a Climatology, a path-like string on disk, a xarray Dataset or a xarray DataArray.

  • **kwargs (dict) – Additional keyword arguments passed by the decorator framework (unused).

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 1 (or array/sequence/Series of 1s) if the input value is None or numerically invalid (NaN)

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays in value_check() does not return np.ndarrays.

Notes

If climatology is a Climatology object, pass lon and lat and date, or month and day, as keyword arguments to extract the relevant climatological value.

marine_qc.qc_individual_reports.do_night_check(date=None, year=None, month=None, day=None, hour=None, lat=None, lon=None, time_since_sun_above_horizon=None)[source]

Determine if the sun was below the horizon a specified time before the report.

This “night” test is used to classify Marine Air Temperature (MAT) measurements as either Night MAT (NMAT) or Day MAT, accounting for solar heating biases and a potential lag between sun rise and the onset of significant warming. The function calculates the sun’s elevation using the sunangle function, offset by the specified time_since_sun_above_horizon.

Parameters:
  • date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • year (ValueIntType, optional) – Year(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • month (ValueIntType, optional) – Month(s) of observation (1-12). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • day (ValueIntType, optional) – Day(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • hour (ValueFloatType, optional) – Hour(s) of observation (minutes as decimal). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lat (ValueNumberType, optional) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (ValueNumberType, optionalt) – Longitude() of observation in degree. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • time_since_sun_above_horizon (float) – Maximum time sun can have been above horizon (or below) to still count as night. Original QC test had this set to 1.0 i.e. it was night between one hour after sundown and one hour after sunrise.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if any of do_position_check, do_date_check, or do_time_check returns 2.

  • Returns 1 (or array/sequence/Series of 1s) if any of do_position_check, do_date_check, or do_time_check returns 1 or if it is day (sun above horizon an hour ago).

  • Returns 0 if it is night (sun below horizon an hour ago).

Raises:
  • ValueError – If mode is not in valid list [“day”, “night”].

  • TypeError – If decorator inspect_arrays does not return np.ndarrays.

See also

do_day_check

Determine if the sun was above the horizon an hour ago based on date, time, and position.

Notes

In previous versions, time_since_sun_above_horizon has the default value 1.0 as one hour is used as a definition of “day” for marine air temperature QC. Solar heating biases were considered to be negligible mmore than one hour after sunset and up to one hour after sunrise.

marine_qc.qc_individual_reports.do_position_check(lat, lon)[source]

Perform the positional QC check on the report.

Simple check to make sure that the latitude and longitude are within specified bounds: - Latitude is between -90 and 90. - Longitude is between 180 and 360.

Parameters:
  • lat (ValueNumberType) – Latitude(s) of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (ValueNumberType) – Longitude() of observation in degrees. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if either latitude or longitude is numerically invalid (None/NaN).

  • Returns 1 (or array/sequence/Series of 1s) if either latitude or longitude is out of the valid range.

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_sst_freeze_check(sst, freezing_point, freeze_check_n_sigma='default', sst_uncertainty='default')[source]

Check input sea-surface temperature(s) to see if it is above freezing.

This is a simple freezing point check made slightly more complex. We want to check if a measurement of SST is above freezing, but there are two problems. First, the freezing point can vary from place to place depending on the salinity of the water. Second, there is uncertainty in SST measurements. If we place a hard cut-off at -1.8C, then we are likely to bias the average of many measurements too high when they are near the freezing point - observational error will push the measurements randomly higher and lower, and this test will trim out the lower tail, thus biasing the result. The inclusion of an SST uncertainty parameter might mitigate that, and we allow that possibility here. Note also that many ships make sea-surface temperature measurements to the nearest whole degree, which in the case of water at or close to freezing would round to -2C and would fail a naive test.

Parameters:
  • sst (ValueNumberType) – Input sea-surface temperature value(s) to be checked. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • freezing_point (float, optional) – The freezing point of the water.

  • freeze_check_n_sigma (float, optional, default: "default") – Number of uncertainty standard deviations that sea surface temperature can be below the freezing point before the QC check fails.

  • sst_uncertainty (float, optional, default: "default") – The uncertainty in the SST value.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if any of sst, freezing_point, sst_uncertainty, or n_sigma is numerically invalid (None or NaN).

  • Returns 1 (or array/sequence/Series of 1s) if sst is below freezing_point by more than n_sigma times sst_uncertainty.

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

Notes

In previous versions, some parameters had default values:

  • sst_uncertainty: 0.0

  • freezing_point: -1.80

  • n_sigma: 2.0

marine_qc.qc_individual_reports.do_supersaturation_check(dpt, at2)[source]

Perform the super saturation check.

Check if a valid dewpoint temperature is greater than a valid air temperature.

Parameters:
  • dpt (ValueNumberType) – Dewpoint temperature value(s). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • at2 (ValueNumberType) – Air temperature values(s). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if either dpt or at2 is invalid (None or NaN).

  • Returns 1 (or array/sequence/Series of 1s) if supersaturation is detected,

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_time_check(date=None, hour=None)[source]

Check that the time is valid i.e. in the range 0.0 to 23.99999…

Parameters:
  • date (ValueDatetimeType, optional) – Date(s) of observation. Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • hour (ValueFloatType, optional) – Hour(s) of observation (minutes as decimal). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if hour is numerically invalid or None,

  • Returns 1 (or array/sequence/Series of 1s) if hour is not a valid hour,

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.do_wind_consistency_check(wind_speed, wind_direction)[source]

Test to compare windspeed to winddirection to check if they are consistent.

Zero windspeed should correspond to no particular direction (variable) and wind speeds above a threshold should correspond to a particular direction.

Parameters:
  • wind_speed (ValueNumberType) – Wind speed value(s). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • wind_direction (ValueNumberType) – Wind direction value(s). Can be a scalar, a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 2 (or array/sequence/Series of 2s) if either wind_speed or wind_direction is invalid (None or NaN).

  • Returns 1 (or array/sequence/Series of 1s) if wind_speed and wind_direction are inconsistent,

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.qc_individual_reports.value_check(value)[source]

Check if a value is equal to None or numerically invalid (NaN).

Parameters:

value (ValueNumberType) – The input value(s) to be tested. Can be a scalar, sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

ValueIntType

Returns:

ValueIntType – Same type as input, but with integer values

  • Returns 1 (or array/sequence/Series of 1s) if the input value is None or numerically invalid (NaN)

  • Returns 0 (or array/sequence/Series of 0s) otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.qc_sequential_reports module

QC of sequential reports.

Module containing QC functions for track checking which could be applied on a DataBundle.

marine_qc.qc_sequential_reports.do_few_check(value)[source]

Check if number of observations is less than 3.

Parameters:

value (SequenceNumberType) – One-dimensional array of values to be analyzed. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if number of observations is less than 3.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If either input is not 1-dimensional.

  • TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.qc_sequential_reports.do_iquam_track_check(lat, lon, date, speed_limit, delta_d, delta_t, n_neighbours)[source]

Perform the IQUAM track check as detailed in Xu and Ignatov 2013.

The track check calculates speeds between pairs of observations and counts how many exceed a threshold speed. The ob with the most violations of this limit is flagged as bad and removed from the calculation. Then the next worst is found and removed until no violations remain.

Parameters:
  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • speed_limit (float) – Speed limit of platform in kilometers per hour. Typically, 60.0 for ships and 15.0 for drifting buoys.

  • delta_d (float) – Latitude tolerance in degrees.

  • delta_t (float) – Time tolerance in hundredths of an hour.

  • n_neighbours (int) – Number of neighbouring points considered in the analysis.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the IQUAM QC fails.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

  • speed_limit = 60.0 for ships and 15.0 for drifting buoys

  • delta_d = 1.11

  • delta_t = 0.01

  • n_neighbours = 5

marine_qc.qc_sequential_reports.do_spike_check(value, lat, lon, date, max_gradient_space, max_gradient_time, delta_t, n_neighbours)[source]

Perform IQUAM-like spike check.

Parameters:
  • value (SequenceNumberType) – One-dimensional array of values to be analyzed. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lat (SequenceNumberType) – One-dimensional array of latitudes in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional array of longitudes in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional array of datetime values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • max_gradient_space (float, default: 0.5) – Maximum allowed spatial gradient. The unit is “units of value” per kilometer.

  • max_gradient_time (float, default: 1.0) – Maximum allowed temporal gradient. The unit is “units of value” per hour.

  • delta_t (float, default: 2.0) – Temperature delta used in the comparison. Typically set to 2.0 for ships and 1.0 for drifting buoys.

  • n_neighbours (int, default: 5) – Number of neighboring points considered in the analysis.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the spike check fails.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous versions, default values for the parameters were:

  • max_gradient_space: float = 0.5

  • max_gradient_time: float = 1.0

  • delta_t: float = 2.0

  • n_neighbours: int = 5

marine_qc.qc_sequential_reports.do_track_check(vsi, dsi, lat, lon, date, max_direction_change, max_speed_change, max_absolute_speed, max_midpoint_discrepancy)[source]

Perform one pass of the track check.

This is an implementation of the MDS track check code which was originally written in the 1990s. I don’t know why this piece of historic trivia so exercises my mind, but it does: the 1990s! I wish my code would last so long.

Parameters:
  • vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • max_direction_change (float, default: 60.0) – Maximum valid direction change in degrees.

  • max_speed_change (float, default: 10.0) – Maximum valid speed change in km/h.

  • max_absolute_speed (float, default: 40.0) – Maximum valid absolute speed in km/h.

  • max_midpoint_discrepancy (float, default: 150.0) – Maximum valid midpoint discrepancy in meters.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the track check fails.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

If number of observations is less than three, the track check always passes.

In previous versions, the default values of the parameters were:

  • max_direction_change = 60.0

  • max_speed_change = 10.0

  • max_absolute_speed = 40.0

  • max_midpoint_discrepancy = 150.0

marine_qc.qc_sequential_reports.find_multiple_rounded_values(value, min_count, threshold)[source]

Find instances when more than “threshold” of the observations are whole numbers and set the ‘round’ flag.

Used in the humidity QC where there are times when the values are rounded and this may have caused a bias.

Parameters:
  • value (SequenceNumberType) – One-dimensional array of values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • min_count (int, default: 20) – Minimum number of rounded figures that will trigger the test.

  • threshold (float, default: 0.5) – Minimum fraction of all observations that will trigger the test.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the value is a whole number.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If threshold is not between 0.0 and 1.0.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

  • min_count = 20

  • threshold = 0.5

marine_qc.qc_sequential_reports.find_repeated_values(value, min_count, threshold)[source]

Find cases where more than a given proportion of SSTs have the same value.

This function goes through a voyage and finds any cases where more than a threshold fraction of the observations have the same values for a specified variable.

Parameters:
  • value (SequenceNumberType) – One-dimensional array of values. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • min_count (int, default: 20) – Minimum number of repeated values that will trigger the test.

  • threshold (float, default: 0.7) – Smallest fraction of all observations that will trigger the test.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if the value is repeated.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError

    • If threshold is not between 0.0 and 1.0.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

Previous versions had default values for the parameters of:

  • min_count = 20

  • threshold = 0.7

marine_qc.qc_sequential_reports.find_saturated_runs(at, dpt, lat, lon, date, min_time_threshold, shortest_run)[source]

Perform checks on persistence of 100% rh while going through the voyage.

While going through the voyage repeated strings of 100 %rh (AT == DPT) are noted. If a string extends beyond 20 reports and two days/48 hrs in time then all values are set to fail the repsat qc flag.

Parameters:
  • at (SequenceNumberType) – One-dimensional air temperature array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • dpt (SequenceNumberType) – One-dimensional dew point temperature array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • min_time_threshold (float, default: 48.0) – Minimum time threshold in hours.

  • shortest_run (int, default: 4) – Shortest number of observations.

Return type:

SequenceIntType

Returns:

SequenceIntType – Same type as input, but with integer values

  • Returns array/sequence/Series of 1s if a saturated run is found.

  • Returns array/sequence/Series of 0s otherwise.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If inspect_arrays does not return np.ndarrays.

Notes

In previous version, default values for the parameters were:

  • min_time_threshold = 48.0

  • shortest_run = 4

marine_qc.spherical_geometry module

Quality control suite spherical geometry module.

The spherical geometry module is a simple collection of calculations on a sphere Sourced from https://edwilliams.org/avform147.htm formerly williams.best.vwh.net/avform.htm

marine_qc.spherical_geometry.angular_distance(lat1, lon1, lat2, lon2)[source]

Calculate the great-circle angular distance between two points on a sphere.

Input latitudes and longitudes should be in degrees. Output distance is returned in radians.

Parameters:
Return type:

ndarray

Returns:

np.ndarray – Angular great-circle distance between the two points in radians. NaN is returned for any invalid input values.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.course_between_points(lat1, lon1, lat2, lon2)[source]

Given two points find the initial true course at point1 inputs are in degrees and output is in degrees.

Parameters:
Return type:

SequenceFloatType

Returns:

SequenceFloatType – Initial true course in degrees at point one along the great circle between point one and point two.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.intermediate_point(lat1, lon1, lat2, lon2, f)[source]

Compute the intermediate point along the great-circle path between two points.

Given two lat,lon points find the latitude and longitude that are a fraction f of the great circle distance between them https://edwilliams.org/avform147.htm formerly williams.best.vwh.net/avform.htm#Intermediate

Parameters:
Return type:

tuple[ndarray, ndarray]

Returns:

tuple of (np.ndarray, np.ndarray) – A tuple containing: - Latitude(s) of the intermediate point(s) in degrees. - Longitude(s) of the intermediate point(s) in degrees. The outputs have the same shape as the broadcasted inputs.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.lat_lon_from_course_and_distance(lat1, lon1, tc, d)[source]

Calculate latitude and longitude given a starting point, true course and distance.

Uses spherical trigonometry formulas from https://edwilliams.org/avform147.htm to compute the endpoint given a starting latitude and longitude, a true coure (bearing), and a distance traveled along a great-circle path.

Parameters:
  • lat1 (SequenceNumberType) – Latitude of the first point in degrees.

  • lon1 (SequenceNumberType) – Longitude of the first point in degrees.

  • tc (float) – True course measured clockwise from north in degrees.

  • d (float) – Distance travelled in kilometres.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple of (SequenceFloatType, SequenceFloatType) – A tuple containing: - Latitude(s) of the intermediate point(s) in degrees. - Longitude(s) of the intermediate point(s) in degrees. The outputs have the same shape as the broadcasted inputs.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.spherical_geometry.sphere_distance(lat1, lon1, lat2, lon2)[source]

Calculate the great circle angular distance between two points on a sphere.

Input latitudes and longitudes should be in degrees. Output distance is returned in radians.

The great circle distance is the shortest distance between any two points on the Earths surface. The calculation is done by first calculating the angular distance between the points and then multiplying that by the radius of the Earth. The angular distance calculation is handled by another function.

Parameters:
Return type:

ndarray

Returns:

np.ndarray – Angular great-circle distance between the two points in kilometres.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.statistics module

Some generally helpful statistical functions for base QC.

marine_qc.statistics.missing_mean(inarr)[source]

Return mean of input array.

Parameters:

inarr (list of float) – List of values for which mean is required. Missing values represented by None in list.

Return type:

float | None

Returns:

float or None – Mean of non-missing values or None.

marine_qc.statistics.p_data_given_good(x, q, r_hi, r_lo, mu, sigma)[source]

Probability of an observed value assuming it comes from a “good” measurement.

Calculate the probability of an observed value x given a normal distribution with mean mu standard deviation of sigma, where x is constrained to fall between R_hi and R_lo and is known only to an integer multiple of Q, the quantization level.

Parameters:
  • x (float) – Observed value for which probability is required.

  • q (float) – Quantization of x, i.e. x is an integer multiple of Q.

  • r_hi (float) – The upper limit on x imposed by previous QC choices.

  • r_lo (float) – The lower limit on x imposed by previous QC choices.

  • mu (float) – The mean of the distribution.

  • sigma (float) – The standard deviation of the distribution.

Return type:

float

Returns:

float – Probability of the observed value given the specified distribution.

Raises:

ValueError – When inputs are incorrectly specified: q<=0, sigma<=0, r_lo > r_hi, x < r_lo or x > r_hi.

marine_qc.statistics.p_data_given_gross(q, r_hi, r_lo)[source]

Probability of an observed value assuming it is a gross error.

Calculate the probability of the data given a gross error assuming gross errors are uniformly distributed between R_low and R_high and that the quantization, rounding level is Q

Parameters:
  • q (float) – Quantization of x, i.e. x is an integer multiple of Q.

  • r_hi (float) – The upper limit on x imposed by previous QC choices.

  • r_lo (float) – The lower limit on x imposed by previous QC choices.

Return type:

float

Returns:

float – Probability of the observed value given that it is a gross error.

Raises:

ValueError – When limits are not ascending or q<=0.

marine_qc.statistics.p_gross(p0, q, r_hi, r_lo, x, mu, sigma)[source]

Posterior probability that an observation is a gross error.

Calculate the posterior probability of a gross error given the prior probability p0, the quantization level of the observed value, Q, previous limits on the observed value, R_hi and R_lo, the observed value, x, and the mean (mu) and standard deviation (sigma) of the distribution of good observations assuming they are normally distributed. Gross errors are assumed to be uniformly distributed between R_lo and R_hi.

Parameters:
  • p0 (float) – Prior probability of gross error.

  • q (float) – Quantization of x, i.e. x is an integer multiple of Q.

  • r_hi (float) – The upper limit on x imposed by previous QC choices.

  • r_lo (float) – The lower limit on x imposed by previous QC choices.

  • x (float) – Observed value for which probability is required.

  • mu (float) – The mean of the distribution of good obs.

  • sigma (float) – The standard deviation of the distribution of good obs.

Return type:

float

Returns:

float – Probability of gross error given an observed value.

Raises:

ValueError – When inputs are incorrectly specified: p0 < 0, p0 > 1, q <= 0, r_hi < r_lo, x < r_lo, x > r_hi, sigma <= 0.

marine_qc.statistics.trim_mean(inarr, trim)[source]

Calculate a resistant (aka robust) mean of an input array given a trimming criteria.

Parameters:
  • inarr (array-like of float, shape (n,)) – 1-dimensional value array.

  • trim (int) – Trimming criteria. A value of 10 trims one tenth of the values off each end of the sorted array before calculating the mean.

Return type:

float

Returns:

float – Trimmed mean.

marine_qc.statistics.trim_std(inarr, trim)[source]

Calculate a resistant (aka robust) standard deviation of an input array given a trimming criteria.

Parameters:
  • inarr (array-like of float, shape (n,)) – 1-dimensional value array.

  • trim (int) – Trimming criteria. A value of 10 trims one tenth of the values off each end of the sorted array before calculating the standard deviation.

Return type:

float

Returns:

float – Trimmed standard deviation.

marine_qc.statistics.winsorised_mean(inarr)[source]

Compute the 25% winsorised mean of the input array.

The winsorised mean is a resistant way of calculating an average.

Parameters:

inarr (list of float) – Input array to be averaged.

Return type:

float

Returns:

float – The winsorised mean of the input array with a 25% trimming.

Raises:

ValueError – if length of inarr is equal to 0.

Notes

The winsorised mean is that which you get if you set the first quarter of the sorted input array to the 1st quartile value and the last quarter to the 3rd quartile and then take the mean. This is quite a heavy trimming of the distribution. It makes it very resistant - about half the obs can be egregiously bad without affecting the mean strongly - but it will be less accurate if there are lots of observations, or the quality of the obs is higher.

marine_qc.time_control module

Some generally helpful time control functions for base QC.

marine_qc.time_control.convert_date(params)[source]

Decorator to extract date components and inject them as function parameters.

This decorator intercepts the ‘date’ argument from the function call, splits it into its components (e.g., year, month, day), and assigns those components to specified parameters in the wrapped function. It supports scalar or sequence inputs for ‘date’.

Parameters:

params (list of str) – List of parameter names corresponding to date components to be extracted and passed to the decorated function.

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – A decorator that wraps a function, extracting date components before calling it.

Notes

  • The decorator expects the wrapped function to accept the parameters listed in params. If a parameter is missing, it raises a ValueError.

  • If the ‘date’ argument is None, the original function is called without modification.

  • Supports scalar-like ‘date’ values as well as iterable sequences.

  • Assumes a helper function split_date exists that splits a date into components and returns a dictionary mapping parameter names to their values.

marine_qc.time_control.convert_date_to_hours(dates)[source]

Convert an array of datetimes to an array of hours since the first element.

Parameters:

dates (array-like of datetime, shape (n,)) – 1-dimensional date array.

Return type:

Sequence[float]

Returns:

array-like of float, shape (n,) – 1- dimensional array containing hours since the first element in the array.

marine_qc.time_control.convert_time_in_hours(hour, minute, sec, zone, daylight_savings_time)[source]

Convert integer hour, minute, and second to time in decimal hours.

Parameters:
  • hour (int) – Hour.

  • minute (int) – Minute.

  • sec (int) – Second.

  • zone (int or float) – Correction for timezone.

  • daylight_savings_time (float) – Set to 1 if daylight savings time is in effect else set to 0.

Return type:

float

Returns:

float – Time converted to decimal hour in day.

marine_qc.time_control.day_in_year(year=None, month=1, day=1)[source]

Get the day in year from 1 to 365 or 366.

Parameters:
  • year (int, optional, default: None) – Year to be tested. If none, set year to default leap year.

  • month (int, default: 1) – Month to be tested.

  • day (int, default: 1) – Day to be tested.

Return type:

int

Returns:

int – Day in year. If year is not specified then the year is treated as a non-leap year and 29 February returns the same value as 1 March.

marine_qc.time_control.day_in_year_array(month, day)[source]

Get the day in year from 1 to 365. Leap years are dealt with by allowing Feb 29 and Mar 1 to be the same day.

Parameters:
  • month (1D np.ndarray) – Array of months.

  • day (1D np.ndarray) – Array of days.

Return type:

ndarray

Returns:

np.ndarray – Array of day number from 1-365.

marine_qc.time_control.get_month_lengths(year)[source]

Return a list holding the lengths of the months in a given year.

Parameters:

year (int) – Year for which you want month lengths.

Return type:

list[int]

Returns:

list of int – List of month lengths.

marine_qc.time_control.jul_day(year, month, day)[source]

Routine to calculate julian day. This is the weird Astronomical thing which counts from 1 Jan 4713 BC.

Parameters:
  • year (int) – Year.

  • month (int) – Month.

  • day (int) – Day.

Return type:

int

Returns:

int – Julian day.

Notes

This is one of those routines that looks baffling but works. No one is sure exactly how. It gets written once and then remains untouched for centuries, mysteriously working.

marine_qc.time_control.leap_year(years_since_1980)[source]

Check if input year is a Leap year.

Parameters:

years_since_1980 (int) – Number of years since 1980.

Return type:

int

Returns:

int – 1 if it is a leap year, 0 otherwise.

marine_qc.time_control.leap_year_correction(time_in_hours, day, years_since_1980)[source]

Make leap year correction.

Parameters:
  • time_in_hours (float) – Time in hours.

  • day (int) – Day number.

  • years_since_1980 (int) – Years since 1980.

Return type:

float

Returns:

float – Leap year corrected time.

marine_qc.time_control.pentad_to_month_day(p)[source]

Given a pentad number, return the month and day of the first day in the pentad.

Parameters:

p (int) – Pentad number from 1 to 73.

Return type:

tuple[int, int]

Returns:

tuple of int – A tuple of two ints representing month and day of the first day of the pentad.

marine_qc.time_control.relative_year_number(year, reference=1979)[source]

Get number of year relative to reference year (1979 by default).

Parameters:
  • year (int) – Year.

  • reference (int, default: 1979) – Reference year.

Return type:

int

Returns:

int – Number of year relative to reference year.

marine_qc.time_control.split_date(date)[source]

Split datetime date into year, month, day and hour.

Parameters:

date (datetime) – Date to split.

Return type:

dict[str, float]

Returns:

dict – Dictionary containing year, month, day and hour.

marine_qc.time_control.time_difference(times1, times2)[source]

Convert two arrays of datetimes to the difference in hours.

Parameters:
Return type:

ndarray

Returns:

array-like of float, shape (n,) – 1-dimensional array containing the time difference in hours computed as times2 - times1.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.time_control.time_in_whole_days(time_in_hours, day, years_since_1980, leap)[source]

Calculate from time in hours to time in whole days.

Parameters:
  • time_in_hours (int) – Time in hours.

  • day (int) – Day number.

  • years_since_1980 (int) – Number of years since 1980.

  • leap (int) – Set to 1 for a leap year, else set to 0.

Return type:

float

Returns:

float – Time in whole days.

marine_qc.time_control.valid_month_day(year=None, month=1, day=1)[source]

Return True if month and day combination are allowed, False otherwise. Assumes that Feb 29th is valid.

Parameters:
  • year (int, optional, default: None) – Year to be tested. If none, set year to default leap year.

  • month (int, default: 1) – Month to be tested.

  • day (int, default: 1) – Day to be tested.

Return type:

bool

Returns:

bool – True if month and day (or year month and day) are a valid combination (e.g. 12th March) and False if not (e.g. 30th February).

Notes

Assumes that February 29th is a valid date.

marine_qc.time_control.which_pentad(month, day)[source]

Take month and day as inputs and return pentad in range 1-73.

Parameters:
  • month (int) – Month containing the day for which we want to calculate the pentad.

  • day (int) – Day for the day for which we want to calculate the pentad.

Return type:

int

Returns:

int – Pentad (5-day period) containing input day, from 1 (1 Jan-5 Jan) to 73 (27-31 Dec).

Raises:

ValueError – If month not in range 1-12 or day not in range 1-31.

Notes

The calculation is rather simple. It just loops through the year and adds up days till it reaches the day we are interested in. February 29th is treated as though it were March 1st in a regular year.

marine_qc.time_control.which_pentad_array(month, day)[source]

Take month and day arrays as inputs and return array of pentads in range 1-73.

Parameters:
  • month (ndarray) – Month containing the day for which we want to calculate the pentad.

  • day (ndarray) – Day for the day for which we want to calculate the pentad.

Return type:

ndarray

Returns:

ndarray – Pentad (5-day period) containing input day, from 1 (1 Jan-5 Jan) to 73 (27-31 Dec).

marine_qc.track_check_utils module

The New Track Check QC module provides the functions needed to perform the track check.

The main routine is mds_full_track_check which takes a list of class`.MarineReport` from a single ship and runs the track check on them. This is an update of the MDS system track check in that it assumes the Earth is a sphere. In practice, it gives similar results to the cylindrical earth formerly assumed.

marine_qc.track_check_utils.backward_discrepancy(lat, lon, date, vsi, dsi)[source]

Calculate the distance between the projected position and the actual position.

The projected position is based on the reported speed and heading at the current and previous time steps. The calculation proceeds from the final, later observation to the first (in contrast to distr1 which runs in time order)

This takes the speed and direction reported by the ship and projects it forwards half a time step, it then projects it forwards another half-time step using the speed and direction for the next report, to which the projected location is then compared. The distances between the projected and actual locations is returned

Parameters:
  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

SequenceFloatType

Returns:

SequenceFloatType – Same type as input, but with float values, shape (n,)

One-dimensional array, sequence, or pandas Series containing distances from estimated positions.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.calculate_course_parameters(lat_later, lat_earlier, lon_later, lon_earlier, date_later, date_earlier)[source]

Calculate course parameters.

Parameters:
  • lat_later (float) – Latitude in degrees of later timestamp.

  • lat_earlier (float) – Latitude in degrees of earlier timestamp.

  • lon_later (float) – Longitude in degrees of later timestamp.

  • lon_earlier (float) – Longitude in degrees of earlier timestamp.

  • date_later (datetime) – Date of later timestamp.

  • date_earlier (datetime) – Date of earlier timestamp.

Return type:

tuple[float, float, float, float]

Returns:

tuple of float – A tuple of four floats representing the speed, distance, course and time difference.

marine_qc.track_check_utils.calculate_midpoint(lat, lon, timediff)[source]

Interpolate between alternate reports and compare the interpolated location to the actual location.

E.g. take difference between reports 2 and 4 and interpolate to get an estimate for the position at the time of report 3. Then compare the estimated and actual positions at the time of report 3.

The calculation linearly interpolates the latitudes and longitudes (allowing for wrapping around the dateline and so on).

Parameters:
Return type:

ndarray

Returns:

1D np.ndarray of float – One-dimensional array of distances from estimated positions in kilometers.

Raises:

ValueError – If either input is not 1-dimensional or if their lengths do not match.

marine_qc.track_check_utils.calculate_speed_course_distance_time_difference(lat, lon, date, alternating=False)[source]

Calculate speeds, courses, distances and time differences using consecutive reports.

Parameters:
  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • alternating (bool, default: False) – Whether to use alternating reports for calculation.

Return type:

tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

tuple of np.ndarray, each with float values, shape (n,) – A tuple containing four one-dimensional arrays representing: speed, distance, course, and time difference.

marine_qc.track_check_utils.check_distance_from_estimate(vsi, time_differences, fwd_diff_from_estimated, rev_diff_from_estimated, vsi_previous=None)[source]

Check that distances from estimated positions are less than calculated distance.

The estimated positions are calculated forward and backwards in time. The calculated distance is the time difference multiplied by the average reported speeds.

Parameters:
  • vsi (SequenceNumberType) – Reported speed in km/h at current time step.

  • time_differences (SequenceNumberType) – Calculated time differences between reports in hours.

  • fwd_diff_from_estimated (SequenceNumberType) – Distance in km from estimated position, estimates made forward in time.

  • rev_diff_from_estimated (SequenceNumberType) – Distance in km from estimated position, estimates made backward in time.

  • vsi_previous (SequenceNumberType, optional) – One-dimensional array of reported speed in km/h at previous time step. If None, get vsi_previous from vsi.

Return type:

ndarray

Returns:

np.ndarray – Returned array elements set to 10 if estimated and reported positions differ by more than the reported speed multiplied by the calculated time difference, 0 otherwise.

Raises:

TypeError – If inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.direction_continuity(dsi, directions, dsi_previous=None, max_direction_change=60.0)[source]

Check if reported and calculated directions are within the allowed change.

This function compares the heading at the previous time step with the calculated ship direction from reported positions, flagging differences that exceed the maximum allowed direction change.

Parameters:
  • dsi (SequenceNumberType) – Heading at current time step in degrees.

  • directions (SequenceNumberType) – Calculated ship direction from reported positions in degrees.

  • dsi_previous (SequenceNumberType, optional) – Heading at previous time step in degrees. If None, get dsi_previous from dsi.

  • max_direction_change (float) – Largest deviations that will not be flagged in degrees.

Return type:

ndarray

Returns:

np.ndarray – Returned array elements are 10.0 if the difference between reported and calculated direction is greater than the max_direction_change (default, 60 degrees), 0.0 otherwise.

marine_qc.track_check_utils.forward_discrepancy(lat, lon, date, vsi, dsi)[source]

Calculate the distance between the projected position and the actual position.

The projected position is based on the reported speed and heading at the current and previous time steps. The observations are taken in time order.

This takes the speed and direction reported by the ship and projects it forwards half a time step, it then projects it forwards another half time-step using the speed and direction for the next report, to which the projected location is then compared. The distances between the projected and actual locations is returned

Parameters:
  • lat (SequenceNumberType) – One-dimensional latitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • lon (SequenceNumberType) – One-dimensional longitude array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • date (SequenceDatetimeType) – One-dimensional date array. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • vsi (SequenceNumberType) – One-dimensional reported speed array in km/h. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

  • dsi (SequenceNumberType) – One-dimensional reported heading array in degrees. Can be a sequence (e.g., list or tuple), a one-dimensional NumPy array, or a pandas Series.

Return type:

SequenceFloatType

Returns:

SequenceFloatType – Same type as input, but with float values, shape (n,)

One-dimensional array, sequence, or pandas Series containing distances from estimated positions.

Raises:
  • ValueError – If either input is not 1-dimensional or if their lengths do not match.

  • TypeError – If decorator inspect_arrays does not return np.ndarrays.

marine_qc.track_check_utils.increment_position(alat1, alon1, avs, ads, timediff)[source]

Compute latitude and longitude increments over half a time interval.

This function takes latitudes and longitude, a speed, a direction and a time difference and returns increments of latitude and longitude which correspond to half the time difference.

Parameters:
  • alat1 (SequenceNumberType) – One-dimensional array of Latitude at starting point in degrees.

  • alon1 (SequenceNumberType) – One-dimensional array of Longitude at starting point in degrees.

  • avs (SequenceNumberType) – One-dimensional array of speed of ship in km/h.

  • ads (SequenceNumberType) – One-dimensional array of heading of ship in degrees.

  • timediff (SequenceNumberType) – One-dimensional array of time difference between the points in hours.

Return type:

tuple[ndarray, ndarray]

Returns:

1D np.ndarray of float – Returns latitude and longitude increment or None and None if timediff is None.

marine_qc.track_check_utils.modal_speed(speeds)[source]

Calculate the modal speed from the input array in 3 knot bins.

Returns thebin-centre for the modal group.

The data are binned into 3-knot bins with the first from 0-3 knots having a bin centre of 1.5 and the highest containing all speed in excess of 33 knots with a bin centre of 34.5. The bin with the most speeds in it is found. The higher of the modal speed or 8.5 is returned:

Bins- 0-3, 3-6, 6-9, 9-12, 12-15, 15-18, 18-21, 21-24, 24-27, 27-30, 30-33, 33-36 Centres-1.5, 4.5, 7.5, 10.5, 13.5, 16.5, 19.5, 22.5, 25.5, 28.5, 31.5, 34.5

Parameters:

speeds (list) – Input speeds in km/h.

Return type:

float

Returns:

float – Bin-centre speed (expressed in km/h) for the 3 knot bin which contains most speeds in input array, or 8.5, whichever is higher.

marine_qc.track_check_utils.set_speed_limits(amode)[source]

Take a modal speed and calculate speed limits for the track checker.

Parameters:

amode (float) – Modal speed in km/h.

Return type:

tuple[float, float, float]

Returns:

(float, float, float) – Max speed, maximum max speed and min speed.

marine_qc.track_check_utils.speed_continuity(vsi, speeds, vsi_previous=None, max_speed_change=10.0)[source]

Check if reported speeds are within the allowed change from calculated speeds.

This function compares the reported speed at the current and previous time steps with the speed calculated from positions. Flags positions where the change exceeds the maximum allowed speed change.

Parameters:
  • vsi (SequenceNumberType) – One-dimensional array of reported speed in km/h at current time step.

  • speeds (SequenceNumberType) – One-dimensional array of speed of ship calculated from locations at current and previous time steps in km/h.

  • vsi_previous (SequenceNumberType, optional) – One-dimensional array of reported speed in km/h at previous time step. If None, get vsi_previous from vsi.

  • max_speed_change (float, optional) – Largest change of speed that will not raise flag in km/h, default 10 (km/h).

Return type:

ndarray

Returns:

np.ndarray – Returned array elements are 10 if the reported and calculated speeds differ by more than 10 knots, 0 otherwise.

marine_qc.validations module

Module containing base QC which call multiple QC functions and could be applied on a DataBundle.

marine_qc.validations.is_func_param(func, param)[source]

Return True if param is the name of a parameter of function func.

Parameters:
  • func (Callable) – Function whose parameters are to be inspected.

  • param (str) – Name of the parameter.

Return type:

bool

Returns:

bool – Returns True if param is one of the functions parameters or the function uses **kwargs.

marine_qc.validations.is_in_data(name, data)[source]

Return True if named column or variable, name, is in data.

Parameters:
  • name (str) – Name of variable.

  • data (pd.Series or pd.DataFrame) – Pandas Series or DataFrame to be tested.

Return type:

bool

Returns:

bool – Returns True if name is one of the columns or variables in data, False otherwise.

Raises:

TypeError – If data type is not pd.Series or pd.DataFrame.

marine_qc.validations.validate_arg(key, value, func_name, parameters, type_hints, reserved_keys, has_arguments)[source]

Validate argument against a function’s signature, taking decorators into account.

Parameters:
  • key (str) – The name of the argument to validate.

  • value (Any) – The value of the argument to validate.

  • func_name (str) – The name of the function (used in error message).

  • parameters (Mapping[str, inspect.Parameter]) – A mapping of parameter names to inspect.Parameter objects, typically from inspect.signature(func).parameters.

  • type_hints (Mapping[str, type]) – A mapping of parameter names to expected types, typically from typing.get_type_hints(func).

  • reserved_keys (set[str]) – Argument names that are considered reserved and should nor raise errors.

  • has_arguments (bool) – Whether the function accepts arbitrary arguments.

Return type:

None

marine_qc.validations.validate_args(func, args=None, kwargs=None)[source]

Validate positional and keyword arguments against a function’s signature, taking decorators into account.

This function checks that: - All provided keyword arguments correspond to valid parameters of the given function. - All required parameters of the function (i.e., parameters without default values) are present in the provided keyword arguments.

Parameters:
  • func (Callable[..., Any]) – The function whose signature is used for validation.

  • args (Sequence[Any], optional) – Sequence of arguments intended to be passed to func.

  • kwargs (Mapping[str, Any], optional) – Dictionary of keyword arguments intended to be passed to func.

Raises:
  • ValueError – If kwargs contains a key that is not a parameter of func.

  • TypeError – If a required parameter of func is missing from kwargs.

Return type:

None

marine_qc.validations.validate_dict(input_dict)[source]

Validate that the input is a dictionary with string keys and dictionary values.

This function checks that: - input_dict is a dictionary. - All keys in the dictionary are strings. - All top-level values in the dictionary are themselves dictionaries.

Parameters:

input_dict (Mapping[str, Mapping[str, Any]]) – The object to validate.

Raises:

TypeError – If input_dict is not a dictionary, if any key is not a string, or if any value is not a dictionary.

Return type:

None

marine_qc.validations.validate_type(value, expected)[source]

Recursively validate that a value matches the expected type hint.

Parameters:
  • value (Any) – The value to validate.

  • expected (Any) – The expected value type for validation.

Return type:

bool

Returns:

bool

  • True if type of value does match expected.

  • False if type of value does not match expected.