spcoral.pp.downsampling

Contents

spcoral.pp.downsampling#

spcoral.pp.downsampling(adata, resolution, method='sum', use_obsm='spatial', celltype_label=None, drop_min=0)#

Downsample a spatial omics dataset by binning spots/cells into a regular grid.

This function aggregates expression data from individual spots or cells into rectangular bins defined by a fixed physical resolution. Cells are assigned to bins based on their spatial coordinates. Aggregation can be performed by summing counts (default) or averaging (method='mean'). Optionally, cell-type composition per bin can be recorded if a cell-type annotation column is provided.

Parameters:
  • adata (anndata.AnnData) – Input AnnData object containing spatial transcriptomics data. Spatial coordinates must be stored in obsm[use_obsm] (typically obsm['spatial']).

  • resolution (float) – Physical size of each bin (same units as spatial coordinates). The grid is aligned to the minimum coordinates and covers the full extent of the tissue.

  • method ({'sum', 'mean'}, optional (default: 'sum')) – Aggregation method: - ‘sum’: total counts per gene in each bin. - ‘mean’: average expression per gene (total counts divided by number of cells/spots in the bin).

  • use_obsm (str, optional (default: 'spatial')) – Key in .obsm where spatial coordinates (n_cells × 2) are stored.

  • celltype_label (str, optional) – Column name in adata.obs containing cell-type annotations. If provided, a matrix of cell-type counts per bin will be stored in adata_bin.uns['cell_type'].

  • drop_min (int, optional (default: 0)) – Minimum number of cells/spots required in a bin for it to be retained. Bins with fewer cells are discarded.

Returns:

Downsampled AnnData object where each observation corresponds to a spatial bin: - .X: aggregated gene expression matrix (sum or mean). - .obsm['spatial']: integer bin coordinates (grid indices). - .obsm['spatial_raw']: physical center coordinates of each bin. - .obs['cell_counts']: number of original cells/spots in each bin. - .uns['cell_type'] (if celltype_label provided): DataFrame of cell-type counts per bin.

Return type:

anndata.AnnData