Python

The Python interface can be used to access the atlas approximation API from Python. It uses caching to speed up multiple requests so it is generally as fast or faster than using the REST API directly.

Quick start

api = atlasapprox.API()
human_organs = api.organs(organism="h_sapiens")

Requirements

You need the following Python packages:
  • requests

  • pandas

Installation

You can use pip to install the atlasapprox package:

pip install atlasapprox

Getting started

Instantiate the API object:

api = atlasapprox.API()

Use whichever method you wish, e.g.:

human_organs = api.organs(organism="h_sapiens")
print(human_organs)

If you are exploring the API from scratch, you would usually:

  1. Ask about available organisms.

  2. Ask about available organs within your organism of interest.

  3. Ask about average gene expression in that organ.

Each API method is described in detail below.

Reference API

Cell atlas approximations, Python API interface.

class atlasapprox.API

Main object used to access the atlas approximation API

average(organism: str, organ: str, features: Sequence[str], measurement_type: str = 'gene_expression')

Get average gene expression for specific features.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • features – The features (e.g. genes) to query.

  • measurement_type – The measurement type to query.

Return: A pandas.DataFrame with the gene expression. Each column is

a cell type, each row a feature. The unit of measurement, or normalisation, is counts per ten thousand (cptt).

celltype_location(organism: str, cell_type: str, measurement_type: str = 'gene_expression')

Get the organs/locations where a cell type is found.

Parameters:
  • organism – The organism to query.

  • cell_type – The cell type to get markers for.

  • measurement_type – The measurement type to query.

Returns: A list of organs where that cell type is found.

celltypes(organism: str, organ: str, measurement_type: str = 'gene_expression')

Get a list of celltypes in an organ and organism.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • measurement_type – The measurement type to query.

Return: A list of cell types.

celltypexorgan(organism: str, organs: None | str = None, measurement_type: str = 'gene_expression', boolean=False)

Get the table of cell types x organ across a whole organism.

Parameters:
  • organism – The organism to query.

  • organs (optional) – If None, cover all organs from the chosen organism. If a list of organs, limit the table to those organs.

  • measurement_type – The measurement type to query.

  • boolean – If True, return a presence/absence matrix for each cell type in each organ. If False (default), return the number of sampled cells/nuclei for each cell type in each organ.

Returns: A pandas.DataFrame with the presence/absence or number of sampled cells/nuclei

for each cell type (index) in each organ (columns).

data_sources()

List the cell atlases used as data sources.

dotplot(organism: str, organ: str, features: Sequence[str], measurement_type: str = 'gene_expression')

Get average and fraction detected for specific features.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • features – The features (e.g. genes) to query.

  • measurement_type – The measurement type to query.

Return: A pandas.DataFrame with the fraction expressing. Each column is

a cell type, each row a feature.

features(organism: str, measurement_type: str = 'gene_expression')

Get names of features (e.g. genes) in this organism and measurement type.

Parameters:
  • organism – The organism to query.

  • measurement_type – The measurement type to query.

Return: A pandas.Index with the features.

fraction_detected(organism: str, organ: str, features: Sequence[str], measurement_type: str = 'gene_expression')

Get fraction of detected gene expression for specific features.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • features – The features (e.g. genes) to query.

  • measurement_type – The measurement type to query.

Return: A pandas.DataFrame with the fraction expressing. Each column is

a cell type, each row a feature.

highest_measurement(organism: str, feature: str, number: int, measurement_type: str = 'gene_expression')

Get the highest measurements by cell type across an organism.

Parameters:
  • organism – The organism to query.

  • number – The number of cell types to list. The actual number might be lower if not enough cell types were found.

  • measurement_type – The measurement type to query.

Returns: A pandas.Series with a multi-index containing cell type and

organ and values corresponding to the average measurement (e.g. gene expression) for that feature in that cell type and organ.

markers(organism: str, organ: str, cell_type: str, number: int, measurement_type: str = 'gene_expression')

Get marker features (e.g. genes) for a cell type within an organ.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • cell_type – The cell type to get markers for.

  • number – The number of markers to look for. The actual number might be lower if not enough distinctive features were found.

  • measurement_type – The measurement type to query.

Returns: A list of markers for the specified cell type in that organ.

The number of markers might be less than requested if the cell type lacks distinctive features.

measurement_types()

Get a list of measurement types.

Returns: A list of measurement types.

neighborhood(organism: str, organ: str, features: None | Sequence[str] = None, include_embeding: bool = True, measurement_type: str = 'gene_expression')

Neighborhood or cell state information.

Parameters:
  • organism – The organism to query.

  • features – The features (e.g. genes) to query. This argument is optional.

  • measurement_type – The measurement type to query.

Return: A dict with a few key/value pairs:

TODO

organisms(measurement_type: str = 'gene_expression')

Get a list of available organisms.

Parameters:

measurement_type – The measurement type to query.

Returns: A list of organisms.

organs(organism: str, measurement_type: str = 'gene_expression')

Get a list of available organs.

Parameters:
  • organism – The organism to query.

  • measurement_type – The measurement type to query.

Returns: A list of organs.

sequences(organism: str, features: Sequence[str], measurement_type: str = 'gene_expression')

Return the sequences of the requested features and their type.

Parameters:
  • organism – The organism to query.

  • features – The features (e.g. genes) to query.

  • measurement_type – The measurement type to query.

Return: A dictionary with two keys, “type” indicating what kind of sequences

they are, and “sequences” with a list of the sequences in the same order. If a feature sequence is not found, it is set to None.

similar_celltypes(organism: str, organ: str, celltype: str, number: int, method: str = 'correlation', measurement_type: str = 'gene_expression')

Get cell types most similar to a focal one, across organs.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query. This and the next argument are to be interpreted together as fully specifying a cell type of interest.

  • celltype – The cell type to look for similar featues to.

  • number – The number of similar cell types to return.

  • method

    The method used to compute similarity between features. The following methods are available: - correlation (default): Pearson correlation of the fraction_detected - cosine: Cosine similarity/distance of the fraction_detected - euclidean: Euclidean distance of average measurement (e.g. expression) - manhattan: Taxicab/Manhattan/L1 distance of average measurement - log-euclidean: Log the average measurement with a pseudocount

    of 0.001, then compute euclidean distance. This tends to highlight sparsely measured features

  • measurement_type – The measurement type to query.

Return: A pandas.Series with the similar (organ, celltype) and their

distance from the focal feature according to the chosen method.

similar_features(organism: str, organ: str, feature: str, number: int, method: str = 'correlation', measurement_type: str = 'gene_expression')

Get features most similar to a focal one.

Parameters:
  • organism – The organism to query.

  • organ – The organ to query.

  • feature – The feature (e.g. gene) to look for similar featues to.

  • number – The number of similar features to return.

  • method

    The method used to compute similarity between features. The following methods are available: - correlation (default): Pearson correlation of the fraction_detected - cosine: Cosine similarity/distance of the fraction_detected - euclidean: Euclidean distance of average measurement (e.g. expression) - manhattan: Taxicab/Manhattan/L1 distance of average measurement - log-euclidean: Log the average measurement with a pseudocount

    of 0.001, then compute euclidean distance. This tends to highlight sparsely measured features

  • measurement_type – The measurement type to query.

Return: A pandas.Series with the similar features and their distance

from the focal feature according to the chosen method.

exception atlasapprox.BadRequestError

The API request was not formulated correctly.