Helpers in QC checks for individual reports module

marine_qc.qc_individual_reports._do_daytime_check(year, month, day, hour, lat, lon, time_since_sun_above_horizon, mode)[source]

Determine if the sun was above the horizon a specified time before the report.

Parameters:
  • year (1D np.ndarray of int) – Year(s) of observation.

  • month (1D np.ndarray of int) – Month(s) of observation (1-12).

  • day (1D np.ndarray of int) – Day(s) of observation.

  • hour (1D np.ndarray of float) – Hour(s) of observation (minutes as decimal).

  • lat (1D np.ndarray of float) – Latitude(s) of observation in degrees.

  • lon (1D np.ndarray of float) – Longitude() of observation in degree.

  • time_since_sun_above_horizon (float) – Maximum time sun can have been above horizon (or below) to still count as night. Original QC test had this set to 1.0 i.e. it was night between one hour after sundown and one hour after sunrise.

  • mode ({"day", "night"}) – If “day”, check if the sun is above the horizon. If “night”, check if the sun is below the horizon.

Return type:

ndarray

Returns:

np.ndarray of int

  • Returns 2 (or array/sequence/Series of 2s) if any of do_position_check, do_date_check, or do_time_check returns 2.

  • Returns 1 (or array/sequence/Series of 1s) if any of do_position_check, do_date_check, or do_time_check returns 1 or if it is night (sun below horizon an hour ago).

  • Returns 0 if it is day (sun above horizon an hour ago).

Raises:

ValueError – If mode is not in valid list [“day”, “night”].

Helpers in multiple checks module

marine_qc.multiple_checks._get_function(name)[source]

Return the function of a given name or raises a NameError.

Parameters:

name (str) – Name of the function to be returned.

Return type:

Callable[..., Any]

Returns:

Callable[..., Any] – Function of a given name.

Raises:

NameError – If a callable with the given name does not exist.

marine_qc.multiple_checks._get_requests_from_params(params, func, data)[source]

Get requests from func or data using params.

Given a dictionary of key value pairs where the keys are parameters in the function, func, and the values are columns or variables in data, create a new dictionary in which the keys are the parameter names (as in the original dictionary) and the values are the numbers extracted from data.

Parameters:
  • params (Mapping or None) – Dictionary. Keys are parameter names for the function func, and values are the names of columns or variables in data.

  • func (Callable) – Function for which the parameters will be checked.

  • data (pd.Series or pd.DataFrame) – DataSeries or DataFrame containing the data to be extracted.

Return type:

Mapping[str, Series | Any]

Returns:

Mapping – Dictionary containing the key value pairs where the keys are as in the input dictionary and the values are extracted from the corresponding columns of data.

Raises:
  • ValueError – If one of the dictionary keys from params is not a valid argument in func.

  • NameError – If one of the dictionary values from params is not a column or variable in data.

marine_qc.multiple_checks._prepare_functions(config, data, preprocessed=None, execute=False)[source]

Prepare functions defined in a configuration dictionary.

Parameters:
  • config (Mapping[str, Mapping[str, Any]]) – Dictionary describing functions, their inputs, and arguments.

  • data (pd.DataFrame or pd.Series) – Data used to extract requested parameters.

  • preprocessed (Mapping[str, Any], optional) – Previously computed preprocessed variables (used for QC functions).

  • execute (bool, default: False) – If True, execute the functions and return their results. If False, return function references and resolved arguments.

Return type:

Mapping[str, Any]

Returns:

Mapping[str, Any] – If execute=True, returns a dict mapping names to results. If execute=False, returns a dict mapping names to dicts: {“function”: callable, “requests”: dict, “kwargs”: dict}.

marine_qc.multiple_checks._apply_qc_to_masked_rows(qc_func, args, kwargs, data_index, mask)[source]

Apply a QC function to masked rows and return a Series aligned to data_index.

Parameters:
  • qc_func (Callable) – QC function to execute.

  • args (Mapping[str, Any]) – Keyword arguments constructed from requests.

  • kwargs (Mapping[str, Any]) – Additional keyword arguments, typically from preprocessed variables.

  • data_index (pd.Index) – Full index of the dataset for aligning the QC result.

  • mask (pd.Series) – Boolean mask indicating which rows the QC function applies to.

Return type:

Series

Returns:

pd.Series – A Series indexed by data_index containing QC results for masked rows and default values elsewhere.

marine_qc.multiple_checks._normalize_groupby(data, groupby)[source]

Return iterable of (name, group_df) pairs, trimming invalid rows.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • groupby (DataFrameGroupBy or object) – A groupby object or column(s) to group by. If None, the full DataFrame is returned as a single group.

Return type:

list[tuple[Any, DataFrame]]

Returns:

list[tuple[Any, pd.DataFrame]] – A list of tuples containing the group name (or None) and the corresponding DataFrame slice.

marine_qc.multiple_checks._normalize_input(data, return_method)[source]

Validate the return method and ensure the input is a DataFrame.

Converts a Series to a single-column DataFrame and tracks if the original input was a Series.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • return_method ({'all', 'passed', 'failed'}) – Specifies which rows to return; must be one of ‘all’, ‘passed’, or ‘failed’.

Return type:

tuple[DataFrame, bool]

Returns:

tuple of (pd.DataFrame, bool)

  • Normalized DataFrame version of the input.

  • Boolean indicating if the original input was a Series.

marine_qc.multiple_checks._prepare_all_inputs(data, qc_dict, preproc_dict)[source]

Build all inputs required for QC execution.

This includes preporcessed variables, resolved QC function arguments, an initial boolean mask, and an empty results table.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • qc_dict (Mapping or None) – Dictionary defining QC functions and their arguments.

  • preproc_dict (Mapping or None) – Dictionary defining preprocessing steps.

Return type:

tuple[Mapping[str, Any], Series, DataFrame]

Returns:

tuple of (Mapping, pd.Series, pd.DataFrame)

  • QC inputs dictionary: {qc_name: {function, requests, kwargs}}.

  • Initial boolean mask Series (all True).

  • Empty results DataFrame with shape (n_rows, n_qcs).

marine_qc.multiple_checks._group_iterator(data, groupby)[source]

Yield groups of a DataFrame as (group_name, group_df) pairs.

If groupby is None, yields the entire DataFrame as a single group. Otherwise, yields each group as returned by _normalize_groupby.

Parameters:
  • data (pd.DataFrame or pd.Series) – The DataFrame to iterate over in groups.

  • groupby (str, iterable of str, DataFrameGroupBy, or None) – Column(s) or a groupby object to split data into groups. If None, the full DataFrame is returned as a single group.

Yields:

tuple of (Any, pd.DataFrame) – Tuples containing the group key (or None) and the corresponding DataFrame for that group.

Return type:

Iterator[tuple[Any | None, DataFrame | Series]]

marine_qc.multiple_checks._run_qc_engine(data, qc_inputs, groups, return_method)[source]

Execute QC checks on the provided data groups and collect the results.

Each QC function is applied to the corresponding group, respecting a shared mask that propagates pass/fail status. The results are stored in a DataFrame aligned with the original data.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • qc_inputs (Mapping) – Dictionary of QC inputs, each containing: {“function”: callable, “requests”: dict, “kwargs”: dict}.

  • groups (iterable) – Iterable of (group_name, group_df) pairs, as returned by _group_iterator.

  • return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame – DataFrame of QC results with the same index as data and columns corresponding to QC names.

marine_qc.multiple_checks._do_multiple_check(data, groupby=None, qc_dict=None, preproc_dict=None, return_method='all')[source]

Internal entry point for performing QC checks on data.

Prepares inputs, constructs groups, and executes the QC engine for individual, sequential, or grouped checks.

Parameters:
  • data (pd.Series or pd.DataFrame) – Hashable input data.

  • groupby (str, iterable of str, or pandas GroupBy, optional) – Specifies how the data should be grouped before applying QC functions. If a string or iterable of strings, data.groupby is called on those keys. If a pandas.DataFrameGroupBy object is provided, its groups are used directly. Any groups that contain indices not present in data are automatically trimmed. If None, the entire input data is treated as a single group.

  • qc_dict (Mapping, optional) – Nested QC dictionary. Keys represent arbitrary user-specified names for the checks. The values are dictionaries which contain the keys “func” (name of the QC function), “names” (input data names as keyword arguments, that will be retrieved from data) and, if necessary, “arguments” (the corresponding keyword arguments). For more information see Examples.

  • preproc_dict (Mapping, optional) – Nested pre-processing dictionary. Keys represent variable names that can be used by qc_dict. The values are dictionaries which contain the keys “func” (name of the pre-processing function), “names” (input data names as keyword arguments, that will be retrieved from data), and “inputs” (list of input-given variables). For more information see Examples.

  • return_method ({"all", "passed", "failed"}, default: "all") – If “all”, return QC dictionary containing all requested QC check flags. If “passed”: return QC dictionary containing all requested QC check flags until the first check passes. Other QC checks are flagged as unstested (3). If “failed”: return QC dictionary containing all requested QC check flags until the first check fails. Other QC checks are flagged as unstested (3).

Return type:

DataFrame | Series

Returns:

pd.DataFrame or pd.Series – A DataFrame (or Series if the input was a Series) whose columns correspond to the QC names in qc_dict and whose values contain QC flags for each row. Flags depend on the QC functions used.

Helpers in external climatology module

marine_qc.external_clim._select_point(i, da_slice, lat_arr, lon_arr, lat_axis, lon_axis)[source]

Select nearest grid point value for a single lat/lon pair.

Parameters:
  • i (int) – Index of the latitude/longitude pair.

  • da_slice (xr.DataArray) – DataArray slice to sample from.

  • lat_arr (SequenceNumberType) – Array of latitude values.

  • lon_arr (SequenceNumberType) – Array of longitude values.

  • lat_axis (str) – Name of the latitude dimension in da_slice.

  • lon_axis (str) – Name of the longitude dimension in da_slice.

Return type:

tuple[int, float]

Returns:

tuple of (int, float) – Index i and the selected grid-point value.

marine_qc.external_clim._empty_dataarray()[source]

Create an empty 3D DataArray with latitude, time, and longitude dimensions.

Return type:

DataArray

Returns:

xr.DataArray – Empty 3D DataArray with latitude, time, and longitude dimensions.

Helpers in spherical geometry module

marine_qc.spherical_geometry._geod_inv(lon1, lat1, lon2, lat2)[source]

Compute forward azimuth, back azimuth, and distance between two points using an ellipsoidal model.

Parameters:
Return type:

tuple[ndarray, ndarray, ndarray]

Returns:

tuple of (np.ndarray, np.ndarray, np.ndarray) – A tuple containing: - Forward azimuth(s) from point 1 to point 2 in degrees. - Back azimuth(s) from point 2 to point 1 in degrees. - Distance(s) between the points in meters. The outputs have the same shape as the broadcasted inputs.

Helpers in statistical functions module

marine_qc.statistics._trim_stat(inarr, trim, stat)[source]

Calculate a resistant (aka robust) statistics of an input array given a trimming criteria.

Parameters:
  • inarr (array-like of float, shape (n,)) – 1-dimensional value array.

  • trim (int) – Trimming criteria. A value of 10 trims one tenth of the values off each end of the sorted array before calculating the mean.

  • stat (str) – Name of the numpy statistic function to apply, e.g., “mean”, “std”.

Return type:

float

Returns:

float – The trimmed statistic calculated on the array.

Helpers in plotting module

marine_qc.plot_qc_outcomes._get_colours_labels(qc_outcomes)[source]

Get color lebels.

Parameters:

qc_outcomes (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.

Return type:

tuple[ndarray, list[Line2D]]

Returns:

tuple of (list of str, list of Line2D) – Color names and legend elements.

marine_qc.plot_qc_outcomes._make_plot(xvalue, yvalue, flags, xlim, ylim, xlabel, ylabel, filename)[source]

Make plot.

Parameters:
  • xvalue (np.ndarray) – Array of x values.

  • yvalue (np.ndarray) – Array of y values.

  • flags (np.ndarray) – Array containing the QC outcomes, with 0 meaning pass and non-zero entries indicating failure.

  • xlim (list of float or None) – If not None: set xlim for plotting.

  • ylim (list of float or None) – If not None: set ylim for plotting.

  • xlabel (str) – Name of the x axis.

  • ylabel (str) – Name of the y axis.

  • filename (str or None) – Filename to save the figure to. If None, the figure is saved with a standard name.

Return type:

Figure

Returns:

Figure – The main figure obkect created by plt.subplots().

Static methods of buoy tracking QC classes

marine_qc.buoy_tracking_qc.SSTTailChecker._parse_rep(lat, lon, ostia, ice, bgvar, dates)

Process a report.

Parameters:
  • lat (float) – Latitude.

  • lon (float) – Longitude.

  • ostia (float) – OSTIA value matched to this observation.

  • ice (float) – Ice concentration value matched to this observation.

  • bgvar (float) – Background variance value matched to this observation.

  • dates (np.datetime) – Date and time of the observation.

Return type:

tuple[float, float, float, bool, bool]

Returns:

(float, float, float, bool) – Background value, ice concentration, background variance, and a boolean variable indicating whether the report is “good”.

marine_qc.buoy_tracking_qc.SSTTailChecker._preprocess_reps(self)

Process the reps and calculate the values used in the QC check.

Return type:

bool

Returns:

bool – True if any invalid observations or background values were encountered during preprocessing, otherwise False.

marine_qc.buoy_tracking_qc.SSTTailChecker._do_long_tail_check(self, forward=True)

Perform the long tail check.

Parameters:

forward (bool) – Flag to set for a forward (True) or backward (False) pass of the long tail check.

Return type:

None

marine_qc.buoy_tracking_qc.SSTTailChecker._do_short_tail_check(self, first_pass_ind, last_pass_ind, forward=True)

Perform the short tail check.

Parameters:
  • first_pass_ind (int) – Index.

  • last_pass_ind (int) – Index.

  • forward (bool) – Flag to set for a forward (True) or backward (False) pass of the short tail check.

Return type:

None

marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker._parse_rep(lat, lon, ostia, ice, bgvar, dates, background_err_lim)

Extract QC-relevant variables from a marine report.

Parameters:
  • lat (float) – Latitude of the observation to be parsed.

  • lon (float) – Longitude of the observation to be parsed.

  • ostia (float) – Background SST field value.

  • ice (float) – Ice concentration field value.

  • bgvar (float) – Background variance field value.

  • dates (datetime) – Date and time of the observation to be parsed.

  • background_err_lim (float) – Background error variance beyond which the SST background is deemed unreliable (degC squared or K squared).

Return type:

tuple[float, float, float, bool, bool, bool]

Returns:

float, float, float, bool, bool, bool – Returns the background SST value, ice value, background SST variance, a flag that indicates a good match, and a flag that indicates if the background variance is valid, and a flag that indicates if the observation is valid overall.

marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker._preprocess_reps(self)

Fill SST anomalies, background errors, and flags for invalid background values.

This method processes each observation to compute sea surface temperature (SST) anomalies, background error standard deviations, and flags any missing or invalid background values. It also checks whether the time series is sorted and sets a mask flag if any background variances are masked.

Return type:

bool

Returns:

bool – True if any invalid observations or background values were encountered during preprocessing, otherwise False.

marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker._long_record_qc(self)

Perform the long record check.

Return type:

None

marine_qc.buoy_tracking_qc.SSTBiasedNoisyChecker._short_record_qc(self)

Perform the short record check.

Return type:

None

Internal data type aliases

marine_qc.PandasNAType = <class 'pandas.api.typing.NAType'>

NA (“not available”) missing value indicator.

Warning

Experimental: the behaviour of NA can still change without warning.

The NA singleton is a missing value indicator defined by pandas. It is used in certain new extension dtypes (currently the “string” dtype).

See also

numpy.nan

Floating point representation of Not a Number (NaN) for numerical data.

isna

Detect missing values for an array-like object.

notna

Detect non-missing values for an array-like object.

DataFrame.fillna

Fill missing values in a DataFrame.

Series.fillna

Fill missing values in a Series.

Examples

>>> pd.NA
<NA>
>>> True | pd.NA
True
>>> True & pd.NA
<NA>
>>> pd.NA != pd.NA
<NA>
>>> pd.NA == pd.NA
<NA>
>>> True | pd.NA
True
marine_qc.PandasNaTType = <class 'pandas.api.typing.NaTType'>

(N)ot-(A)-(T)ime, the time equivalent of NaN.

NaT is used to denote missing or null values in datetime and timedelta objects in pandas. It functions similarly to how NaN is used for numerical data. Operations with NaT will generally propagate the NaT value, similar to NaN. NaT can be used in pandas data structures like Series and DataFrame to represent missing datetime values. It is useful in data analysis and time series analysis when working with incomplete or sparse time-based data. Pandas provides robust handling of NaT to ensure consistency and reliability in computations involving datetime objects.

See also

NA

NA (“not available”) missing value indicator.

isna

Detect missing values (NaN or NaT) in an array-like object.

notna

Detect non-missing values.

numpy.nan

Floating point representation of Not a Number (NaN) for numerical data.

Examples

>>> pd.DataFrame([pd.Timestamp("2023"), np.nan], columns=["col_1"])
        col_1
0  2023-01-01
1         NaT
marine_qc.ScalarIntType = int | numpy.integer | pandas.api.typing.NAType | None

Represent a union type

E.g. for int | str

marine_qc.ScalarFloatType = float | numpy.floating | pandas.api.typing.NAType | None

Represent a union type

E.g. for int | str

marine_qc.ScalarNumberType = int | numpy.integer | pandas.api.typing.NAType | None | float | numpy.floating

Represent a union type

E.g. for int | str

marine_qc.ScalarDatetimeType = datetime.datetime | numpy.datetime64 | pandas.Timestamp | pandas.api.typing.NaTType | None

Represent a union type

E.g. for int | str

marine_qc.SequenceIntType = collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.SequenceFloatType = collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.SequenceNumberType = collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]]

Represent a union type

E.g. for int | str

marine_qc.SequenceDatetimeType = collections.abc.Sequence[datetime.datetime | numpy.datetime64 | pandas.Timestamp | pandas.api.typing.NaTType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.datetime64]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.ValueIntType = int | numpy.integer | pandas.api.typing.NAType | None | collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.ValueFloatType = float | numpy.floating | pandas.api.typing.NAType | None | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.ValueNumberType = int | numpy.integer | pandas.api.typing.NAType | None | collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray | float | numpy.floating | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]]

Represent a union type

E.g. for int | str

marine_qc.ValueDatetimeType = datetime.datetime | numpy.datetime64 | pandas.Timestamp | pandas.api.typing.NaTType | None | collections.abc.Sequence[datetime.datetime | numpy.datetime64 | pandas.Timestamp | pandas.api.typing.NaTType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.datetime64]] | pandas.Series | numpy.ndarray

Represent a union type

E.g. for int | str

marine_qc.ClimArgType = int | numpy.integer | pandas.api.typing.NAType | None | collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray | float | numpy.floating | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]] | marine_qc.external_clim.Climatology | str | os.PathLike[str] | xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset

Represent a union type

E.g. for int | str

marine_qc.ClimIntType = int | numpy.integer | pandas.api.typing.NAType | None | collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray | marine_qc.external_clim.Climatology

Represent a union type

E.g. for int | str

marine_qc.ClimFloatType = float | numpy.floating | pandas.api.typing.NAType | None | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]] | pandas.Series | numpy.ndarray | marine_qc.external_clim.Climatology

Represent a union type

E.g. for int | str

marine_qc.ClimNumberType = int | numpy.integer | pandas.api.typing.NAType | None | collections.abc.Sequence[int | numpy.integer | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.integer]] | pandas.Series | numpy.ndarray | float | numpy.floating | collections.abc.Sequence[float | numpy.floating | pandas.api.typing.NAType | None] | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.floating]] | marine_qc.external_clim.Climatology

Represent a union type

E.g. for int | str

marine_qc.ClimInputType = str | os.PathLike[str] | xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset

Represent a union type

E.g. for int | str