lamindb.core.MappedCollection¶
- class lamindb.core.MappedCollection(path_list, layers_keys=None, obs_keys=None, obsm_keys=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None)¶
Bases:
object
Map-style collection for use in data loaders.
This class virtually concatenates
AnnData
arrays as a pytorch map-style dataset.If your
AnnData
collection is in the cloud, move them into a local cache first for faster access.__getitem__
of theMappedCollection
object takes a single integer index and returns a dictionary with the observation data sample for this index from theAnnData
objects inpath_list
. The dictionary has keys forlayers_keys
(.X
is in"X"
),obs_keys
,obsm_keys
(underf"obsm_{key}"
) and also"_store_idx"
for the index of theAnnData
object containing this observation sample.Note
For a guide, see Train a machine learning model on a collection.
For more convenient use within
MappedCollection
, seemapped()
.This currently only works for collections of
AnnData
objects.The implementation was influenced by the SCimilarity data loader.
- Parameters:
path_list (
list
[str
|Path
]) – A list of paths toAnnData
objects stored in.h5ad
or.zarr
formats.layers_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.layers
slot.layers_keys=None
or"X"
in the list retrieves.X
.obsm_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.obsm
slots.obs_keys (
str
|list
[str
] |None
, default:None
) – Keys from the.obs
slots.join (
Literal
['inner'
,'outer'
] |None
, default:'inner'
) –"inner"
or"outer"
virtual joins. IfNone
is passed, does not join.encode_labels (
bool
|list
[str
], default:True
) – Encode labels into integers. Can be a list with elements fromobs_keys
.unknown_label (
str
|dict
[str
,str
] |None
, default:None
) – Encode this label to -1. Can be a dictionary with keys fromobs_keys
ifencode_labels=True
or fromencode_labels
if it is a list.cache_categories (
bool
, default:True
) – Enable caching categories ofobs_keys
for faster access.parallel (
bool
, default:False
) – Enable sampling with multiple processes.dtype (
str
|None
, default:None
) – Convert numpy arrays from.X
,.layers
and.obsm
Attributes¶
- closed¶
Check if connections to array streaming backend are closed.
- original_shapes¶
Shapes of the underlying AnnData objects.
- shape¶
Shape of the (virtually aligned) dataset.
Methods¶
- close()¶
Close connections to array streaming backend.
No effect if
parallel=True
.
- get_label_weights(obs_keys)¶
Get all weights for the given label keys.
- get_merged_categories(label_key)¶
Get merged categories for
label_key
from all.obs
.
- get_merged_labels(label_key)¶
Get merged labels for
label_key
from all.obs
.
- static torch_worker_init_fn(worker_id)¶
worker_init_fn
fortorch.utils.data.DataLoader
.Improves performance for
num_workers > 1
.