merge_region_samples#

merge_region_samples(samples: dict[str, pandas.core.frame.DataFrame], margins: tuple[int, int, int]) DataFrame[source]#

Merge different region samples into single valid samples dataframe.

The input samples are formatted as:

{
    "DEFAULT": (dataframe with columns = id, x, y, z),
    "<REGION>": (dataframe with columns = id, x, y, z),
    "<REGION>": (dataframe with columns = id, x, y, z),
    ...
}

The DEFAULT region is used as the superset of (x, y, z) samples; any sample found only in a non-DEFAULT region are ignored. For a given id, there must be at least one sample in each region.

The output samples are formatted as:

┍━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━┑
│  id  │    x     │    y     │    z     │  region  │
┝━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━┥
│ <id> │ <x + dx> │ <y + dy> │ <z + dz> │ DEFAULT  │
│ <id> │ <x + dx> │ <y + dy> │ <z + dz> │ <REGION> │
│ ...  │   ...    │   ...    │   ...    │   ...    │
│ <id> │ <x + dx> │ <y + dy> │ <z + dz> │ <REGION> │
┕━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━┙

Samples that are found in the DEFAULT region, but not in any non-DEFAULT region are marked as DEFAULT. Otherwise, the sample is marked with the corresponding region. Region samples should be mutually exclusive.

Parameters:
  • samples – Map of region names to region samples.

  • margins – Margin in the x, y, and z directions applied to sample locations.

Returns:

Dataframe of merged samples with applied margins.

transform_sample_coordinates(samples: pd.DataFrame, margins: tuple[int, int, int], reference: pd.DataFrame | None = None) pd.DataFrame[source]#

Transform samples into centered coordinates.

Parameters:
  • samples – Sample cell ids and coordinates.

  • margins – Margin size in x, y, and z directions.

  • reference – Reference samples used to calculate transformation.

Returns:

Transformed sample cell ids and coordinates.

filter_valid_samples(samples: DataFrame) DataFrame[source]#

Filter samples for valid cell ids.

Filter conditions include:

  • Each cell must have at least one sample assigned to each specified region

Parameters:

samples – Sample cell ids and coordinates.

Returns:

Valid sample cell ids and coordinates.