lamindb.FeatureSet¶
- class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)¶
-
Feature sets.
Stores references to sets of
Feature
and other registries that may be used to identify features (e.g., class:~bionty.Gene
or class:~bionty.Protein
).Why does LaminDB model feature sets, not just features?
Performance: Imagine you measure the same panel of 20k transcripts in 1M samples. By modeling the panel as a feature set, you can link all your artifacts against one feature set and only need to store 1M instead of 1M x 20k = 20B links.
Interpretation: Model protein panels, gene panels, etc.
Data integration: Feature sets provide the currency that determines whether two collections can be easily concatenated.
These reasons do not hold for label sets. Hence, LaminDB does not model label sets.
- Parameters:
features –
Iterable[Registry]
An iterable ofFeature
records to hash, e.g.,[Feature(...), Feature(...)]
. Is turned into a set upon instantiation. If you’d like to pass values, usefrom_values()
orfrom_df()
.type –
str | None = None
The simple type. Defaults toNone
for sets ofFeature
records, and otherwise defaults to"number"
(e.g., for sets ofGene
).name –
str | None = None
A name.
Note
A feature set is identified by the hash of the feature uids in the set.
A
slot
provides a string key to access feature sets. It’s typically the accessor within the registered data object, herepd.DataFrame.columns
.See also
from_values()
Create from values.
from_df()
Create from dataframe columns.
Examples
Create a featureset from df with types:
>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]}) >>> feature_set = ln.FeatureSet.from_df(df)
Create a featureset from features:
>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float) >>> feature_set = ln.FeatureSet(features)
Create a featureset from feature values:
>>> import bionty as bt >>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse") >>> feature_set.save()
Link a feature set to an artifact:
>>> artifact.features.add_feature_set(feature_set, slot="var")
Link features to an artifact (will create a featureset under the hood):
>>> artifact.features.add_values(features)
Attributes¶
- objects Manager¶
Fields¶
- created_at DateTimeField¶
Time of creation of record.
- id AutoField¶
Internal id, valid only in one DB instance.
- uid CharField¶
A universal id (hash of the set of feature values).
- name CharField¶
A name (optional).
- n IntegerField¶
Number of features in the set.
- dtype CharField¶
Data type, e.g., “number”, “float”, “int”. Is
None
forFeature
.For
Feature
, types are expected to be heterogeneous and defined on a per-feature level.
- registry CharField¶
The registry that stores the feature identifiers, e.g.,
'core.Feature'
or'bionty.Gene'
.Depending on the registry,
.members
stores, e.g.Feature
orGene
records.
- hash CharField¶
The hash of the set.
Methods¶
- classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)¶
Create feature set for validated features..
- Return type:
FeatureSet
|None
- classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None, raise_validation_error=True)¶
Create feature set for validated features.
- Parameters:
values (
List
[str
] |Series
|array
) – A list of values, like feature names or ids.field (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field of a reference registry to map values.type (
str
|None
, default:None
) – The simple type. Defaults toNone
if reference registry isFeature
, defaults to"float"
otherwise.name (
str
|None
, default:None
) – A name.organism (
str
|Registry
|None
, default:None
) – An organism to resolve gene mapping.public_source (
Registry
|None
, default:None
) – A public ontology to resolve feature identifier mapping.raise_validation_error (
bool
, default:True
) – Whether to raise a validation error if some values are not valid.
- Raises:
ValidationError – If some values are not valid.
- Return type:
Examples
>>> features = ["feat1", "feat2"] >>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"] >>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)
.
- save(*args, **kwargs)¶
Save.
- Return type:
None