Jupyter Notebook

Validate & register scRNA-seq datasets#

scRNA-seq measures gene expression of individual cells.

Their analysis is typically based on data objects like AnnData, SingleCellExperiment & Seurat objects.

These objects often contain non-validated metadata, making data integration & interpretation hard.

In this notebook, LaminDB is used to turn AnnData objects into validated & queryable assets.

Setup#

!lamin init --storage ./test-scrna --schema bionty
Hide code cell output
πŸ’‘ creating schemas: core==0.47.3 bionty==0.30.3 
βœ… saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-04 09:33:15)
βœ… saved: Storage(id='Szavfu1U', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-04 09:33:15, created_by_id='DzTjkKse')
βœ… loaded instance: testuser1/test-scrna
πŸ’‘ did not register local instance on hub (if you want, call `lamin register`)

import lamindb as ln
import lnschema_bionty as lb
import pandas as pd

ln.track()
βœ… loaded instance: testuser1/test-scrna (lamindb 0.52.1)
πŸ’‘ notebook imports: lamindb==0.52.1 lnschema_bionty==0.30.3 pandas==1.5.3
βœ… saved: Transform(id='Nv48yAceNSh8z8', name='Validate & register scRNA-seq datasets', short_name='scrna', version='0', type=notebook, updated_at=2023-09-04 09:33:17, created_by_id='DzTjkKse')
βœ… saved: Run(id='e7dMSW0bVIQUINDGufH3', run_at=2023-09-04 09:33:17, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')

Human immune cells: Conde22#

lb.settings.species = "human"


Access #

Let’s look at a scRNA-seq count matrix in form of an AnnData object that we’d like to ingest into LaminDB:

adata = ln.dev.datasets.anndata_human_immune_cells(
    populate_registries=True  # this pre-populates registries
)
Hide code cell output








adata
AnnData object with n_obs Γ— n_vars = 1648 Γ— 36503
    obs: 'donor', 'tissue', 'cell_type', 'assay'
    var: 'feature_is_filtered', 'feature_reference', 'feature_biotype'
    uns: 'cell_type_ontology_term_id_colors', 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

This AnnData object does not require filtering, normalizing or formatting, hence, there is no step.

Validate #

Validate genes in .var#

lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
βœ… 36355 terms (99.60%) are validated for ensembl_gene_id
❗ 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...

148 gene identifiers can’t be validated (not currently in the Gene registry). Lt’s inspect them to see what to do:

inspector = lb.Gene.inspect(adata.var.index, lb.Gene.ensembl_gene_id)
Hide code cell output
βœ… 36355 terms (99.60%) are validated for ensembl_gene_id
❗ 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
πŸ’‘    detected 35 Gene terms in Bionty for ensembl_gene_id: 'ENSG00000198840', 'ENSG00000198727', 'ENSG00000198899', 'ENSG00000275249', 'ENSG00000277836', 'ENSG00000212907', 'ENSG00000198804', 'ENSG00000274847', 'ENSG00000277196', 'ENSG00000278384', 'ENSG00000198712', 'ENSG00000275869', 'ENSG00000275063', 'ENSG00000277630', 'ENSG00000278704', 'ENSG00000198938', 'ENSG00000276345', 'ENSG00000198786', 'ENSG00000278817', 'ENSG00000277475', ...
πŸ’‘ β†’  add records from Bionty to your {model_name} registry via .from_values()
πŸ’‘    couldn't validate 113 terms: 'ENSG00000228906', 'ENSG00000271734', 'ENSG00000259855', 'ENSG00000236996', 'ENSG00000258861', 'ENSG00000276814', 'ENSG00000271409', 'ENSG00000236886', 'ENSG00000227220', 'ENSG00000261490', 'ENSG00000286699', 'ENSG00000233776', 'ENSG00000249860', 'ENSG00000267637', 'ENSG00000237133', 'ENSG00000270394', 'ENSG00000268955', 'ENSG00000280095', 'ENSG00000239665', 'ENSG00000251044', ...
πŸ’‘ β†’  if you are sure, create new records via ln.Gene() and save to your registry

Logging says 35 of the non-validated ids can be found in the Bionty reference. Let’s register them:

records = lb.Gene.from_values(inspector.non_validated, lb.Gene.ensembl_gene_id)
ln.save(records)
Hide code cell output
βœ… created 35 Gene records from Bionty matching ensembl_gene_id: 'ENSG00000198804', 'ENSG00000198712', 'ENSG00000228253', 'ENSG00000198899', 'ENSG00000198938', 'ENSG00000198840', 'ENSG00000212907', 'ENSG00000198886', 'ENSG00000198786', 'ENSG00000198695', 'ENSG00000198727', 'ENSG00000278704', 'ENSG00000277400', 'ENSG00000274847', 'ENSG00000276256', 'ENSG00000277630', 'ENSG00000278384', 'ENSG00000273748', 'ENSG00000271254', 'ENSG00000277475', ...
❗ did not create Gene records for 113 non-validated ensembl_gene_ids: 'ENSG00000112096', 'ENSG00000182230', 'ENSG00000203812', 'ENSG00000204092', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000224739', 'ENSG00000224745', 'ENSG00000225932', 'ENSG00000226377', 'ENSG00000226380', 'ENSG00000226403', 'ENSG00000227021', 'ENSG00000227220', 'ENSG00000227902', 'ENSG00000228139', 'ENSG00000228906', 'ENSG00000229352', 'ENSG00000231575', 'ENSG00000232196', ...

The remaining 113 are legacy IDs, not present in the current Ensembl assembly (e.g. ENSG00000112096).

We’d still like to register them:

validated = lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id)
records = [lb.Gene(ensembl_gene_id=id) for id in adata.var.index[~validated]]
ln.save(records)
Hide code cell output
βœ… 36390 terms (99.70%) are validated for ensembl_gene_id
❗ 113 terms (0.30%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...

Now all genes pass validation:

lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
βœ… 36503 terms (100.00%) are validated for ensembl_gene_id

Validate metadata in .obs#

adata.obs.columns
Index(['donor', 'tissue', 'cell_type', 'assay'], dtype='object')
validated = ln.Feature.validate(adata.obs.columns)
βœ… 3 terms (75.00%) are validated for name
❗ 1 term (25.00%) is not validated for name: donor

1 feature is not validated: "donor". Let’s register it:

feature = ln.Feature.from_df(adata.obs.loc[:, ~validated])[0]
ln.save(feature)

All metadata columns are now validated as feature:

ln.Feature.validate(adata.obs.columns);
βœ… 4 terms (100.00%) are validated for name

Next, let’s validate the corresponding labels of each feature.

Some of the metadata labels can be typed using dedicated registries like CellType:

validated = lb.CellType.validate(adata.obs.cell_type)
❗ received 32 unique terms, 1616 empty/duplicated terms are ignored
βœ… 30 terms (93.80%) are validated for name
❗ 2 terms (6.20%) are not validated for name: germinal center B cell, megakaryocyte

Register non-validated cell types - they can all be loaded from a public ontology through Bionty:

records = lb.CellType.from_values(adata.obs.cell_type[~validated], "name")
ln.save(records)
Hide code cell output
βœ… created 2 CellType records from Bionty matching name: 'germinal center B cell', 'megakaryocyte'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='uMLhrmbZ', name='germinal center B cell', ontology_id='CL:0000844', synonyms='GC B-cell|GC B cell|GC B lymphocyte|germinal center B lymphocyte|GC B-lymphocyte|germinal center B-cell|germinal center B-lymphocyte', description='A Rapidly Cycling Mature B Cell That Has Distinct Phenotypic Characteristics And Is Involved In T-Dependent Immune Responses And Located Typically In The Germinal Centers Of Lymph Nodes. This Cell Type Expresses Ly77 After Activation.', updated_at=2023-09-04 09:33:46, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000785'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-09-04 09:33:47, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0001201'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='CIS4VJI0', name='B cell, CD19-positive', ontology_id='CL:0001201', synonyms='CD19+ B cell|B lymphocyte, CD19-positive|B-lymphocyte, CD19-positive|CD19-positive B cell|B-cell, CD19-positive', description='A B Cell That Is Cd19-Positive.', updated_at=2023-09-04 09:33:48, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000236'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B lymphocyte|B-lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-09-04 09:33:49, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000945'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='Z0yFV7vU', name='lymphocyte of B lineage', ontology_id='CL:0000945', description='A Lymphocyte Of B Lineage With The Commitment To Express An Immunoglobulin Complex.', updated_at=2023-09-04 09:33:49, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='UrtDirMx', name='megakaryocyte', ontology_id='CL:0000556', synonyms='megalocaryocyte|megalokaryocyte|megacaryocyte', description='A Large Hematopoietic Cell (50 To 100 Micron) With A Lobated Nucleus. Once Mature, This Cell Undergoes Multiple Rounds Of Endomitosis And Cytoplasmic Restructuring To Allow Platelet Formation And Release.', updated_at=2023-09-04 09:33:46, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000763'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='g1zY6vUW', name='myeloid cell', ontology_id='CL:0000763', description='A Cell Of The Monocyte, Granulocyte, Mast Cell, Megakaryocyte, Or Erythroid Lineage.', updated_at=2023-09-04 09:33:50, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000988'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-09-04 09:33:51, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… loaded 1 CellType record matching ontology_id: 'CL:0000548'
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0002371'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-09-04 09:33:52, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… loaded 1 CellType record matching ontology_id: 'CL:0000548'
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000003'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-09-04 09:33:53, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000000'
lb.ExperimentalFactor.validate(adata.obs.assay)
lb.Tissue.validate(adata.obs.tissue);
βœ… 3 terms (100.00%) are validated for name
βœ… 17 terms (100.00%) are validated for name

Because we didn’t mount a custom schema that contains a Donor registry, we use the Label registry to track donor ids:

ln.Label.validate(adata.obs.donor);
❗ received 12 unique terms, 1636 empty/duplicated terms are ignored
❗ 12 terms (100.00%) are not validated for name: D496, 621B, A29, A36, A35, 637C, A52, A37, D503, 640C, A31, 582C

Donor labels are not validated, so let’s register them:

donors = [ln.Label(name=name) for name in adata.obs.donor.unique()]
ln.save(donors)
ln.Label.validate(adata.obs.donor);
βœ… 12 terms (100.00%) are validated for name

Register #

modalities = ln.Modality.lookup()
experimental_factors = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
features = ln.Feature.lookup()

Register data#

When we create a File object from an AnnData, we’ll automatically link its feature sets and get information about unmapped categories:

file = ln.File.from_anndata(
    adata, description="Conde22", field=lb.Gene.ensembl_gene_id, modality=modalities.rna
)
Hide code cell output
πŸ’‘ file will be copied to default storage upon `save()` with key `None` ('.lamindb/Daaf9uCsQE8YxbVaUdt6.h5ad')
πŸ’‘ parsing feature names of X stored in slot 'var'
βœ…    36503 terms (100.00%) are validated for ensembl_gene_id
βœ…    linked: FeatureSet(id='PxAuIGFMKq6tBtlAQT9R', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', modality_id='NIYnYOo8', created_by_id='DzTjkKse')
πŸ’‘ parsing feature names of slot 'obs'
βœ…    4 terms (100.00%) are validated for name
βœ…    linked: FeatureSet(id='vjlYQsrjl0Qk1gTkwn5v', n=4, registry='core.Feature', hash='FB8NM5R-dAp_lUBKW16U', modality_id='UouDKKfD', created_by_id='DzTjkKse')
file.save()
βœ… saved 2 feature sets for slots: 'var','obs'
βœ… storing file 'Daaf9uCsQE8YxbVaUdt6' at '.lamindb/Daaf9uCsQE8YxbVaUdt6.h5ad'

The file has the following 2 linked feature sets:

file.features
Features:
  var: FeatureSet(id='PxAuIGFMKq6tBtlAQT9R', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-09-04 09:33:56, modality_id='NIYnYOo8', created_by_id='DzTjkKse')
    LIMASI (number)
    DAXX (number)
    CXCL8 (number)
    SLC43A3 (number)
    MTOR (number)
    None (number)
    MIR3936HG (number)
    PIGY (number)
    THOP1 (number)
    None (number)
    ... 
  obs: FeatureSet(id='vjlYQsrjl0Qk1gTkwn5v', n=4, registry='core.Feature', hash='FB8NM5R-dAp_lUBKW16U', updated_at=2023-09-04 09:34:00, modality_id='UouDKKfD', created_by_id='DzTjkKse')
    cell_type (category)
    assay (category)
    tissue (category)
    donor (category)

A less well curated dataset#

Access #

Let’s now consider a dataset with less-well curated features:

pbmc68k = ln.dev.datasets.anndata_pbmc68k_reduced()
pbmc68k
AnnData object with n_obs Γ— n_vars = 70 Γ— 765
    obs: 'cell_type', 'n_genes', 'percent_mito', 'louvain'
    var: 'n_counts', 'highly_variable'
    uns: 'louvain', 'louvain_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'

We see that this dataset is indexed by gene symbols:

pbmc68k.var.head()
n_counts highly_variable
index
HES4 1153.387451 True
TNFRSF4 304.358154 True
SSU72 2530.272705 False
PARK7 7451.664062 False
RBP7 272.811035 True

Validate #

lb.Gene.validate(pbmc68k.var.index, lb.Gene.symbol);
βœ… 695 terms (90.80%) are validated for symbol
❗ 70 terms (9.20%) are not validated for symbol: ATPIF1, C1orf228, CCBL2, RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, AC079767.4, GPX1, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, ...
lb.Gene.inspect(pbmc68k.var.index, lb.Gene.symbol);
βœ… 695 terms (90.80%) are validated for symbol
❗ 70 terms (9.20%) are not validated for symbol: ATPIF1, C1orf228, CCBL2, RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, AC079767.4, GPX1, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, ...
πŸ’‘    detected 54 terms with synonyms: ATPIF1, C1orf228, CCBL2, AC079767.4, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, SEPT7, WBSCR22, RSBN1L-AS1, CCDC132, ...
πŸ’‘ β†’  standardize terms via .standardize()
πŸ’‘    detected 5 Gene terms in Bionty for symbol: 'SNORD3B-2', 'SOD2', 'IGLL5', 'RN7SL1', 'GPX1'
πŸ’‘ β†’  add records from Bionty to your {model_name} registry via .from_values()
πŸ’‘    couldn't validate 11 terms: 'RP11-390E23.6', 'RP11-620J15.3', 'AC084018.1', 'TMBIM4-1', 'CTD-3138B18.5', 'RP11-277L2.3', 'RP3-467N11.1', 'RP11-291B21.2', 'RP11-156E8.1', 'RP11-782C8.1', 'RP11-489E7.4'
πŸ’‘ β†’  if you are sure, create new records via ln.Gene() and save to your registry

Standardize symbols and register additional symbols from Bionty:

pbmc68k.var.index = lb.Gene.standardize(pbmc68k.var.index, lb.Gene.symbol)
gene_records = lb.Gene.from_values(pbmc68k.var.index, lb.Gene.symbol)
ln.save(gene_records)
πŸ’‘ standardized 749/765 terms
βœ… loaded 748 Gene records matching symbol: 'HES4', 'TNFRSF4', 'SSU72', 'PARK7', 'RBP7', 'SRMS', 'MAD2L2', 'AGTRAP', 'TNFRSF1B', 'EFHD2', 'NECAP2', 'HP1BP3', 'C1QA', 'C1QB', 'HNRNPR', 'GALE', 'STMN1', 'CD52', 'FGR', 'ATP5IF1', ...
βœ… created 5 Gene records from Bionty matching symbol: 'GPX1', 'IGLL5', 'RN7SL1', 'SNORD3B-2', 'SOD2'
❗ did not create Gene records for 11 non-validated symbols: 'AC084018.1', 'CTD-3138B18.5', 'RP11-156E8.1', 'RP11-277L2.3', 'RP11-291B21.2', 'RP11-390E23.6', 'RP11-489E7.4', 'RP11-620J15.3', 'RP11-782C8.1', 'RP3-467N11.1', 'TMBIM4-1'

In this case, we only want to register data with validated genes:

validated = lb.Gene.validate(pbmc68k.var.index, lb.Gene.symbol)
❗ received 764 unique terms, 1 empty/duplicated term is ignored
βœ… 753 terms (98.60%) are validated for symbol
❗ 11 terms (1.40%) are not validated for symbol: RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, RP3-467N11.1, RP11-390E23.6, RP11-489E7.4, RP11-291B21.2, RP11-620J15.3, TMBIM4-1, AC084018.1, CTD-3138B18.5
pbmc68k_validated = pbmc68k[:, validated].copy()
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1840: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")

Convert gene symbols into ensembl gene ids:

records = lb.Gene.filter(id__in=[record.id for record in gene_records])
mapper = pd.DataFrame(records.values_list("symbol", "ensembl_gene_id")).set_index(0)[1]
pbmc68k_validated.var.insert(0, "gene_symbol", pbmc68k_validated.var.index)
pbmc68k_validated.var.rename(index=mapper, inplace=True)
pbmc68k_validated.var.head()
gene_symbol n_counts highly_variable
ENSG00000188290 HES4 1153.387451 True
ENSG00000186827 TNFRSF4 304.358154 True
ENSG00000160075 SSU72 2530.272705 False
ENSG00000116288 PARK7 7451.664062 False
ENSG00000162444 RBP7 272.811035 True

Validate cell types:

# inspect shows none of the terms are mappable
lb.CellType.inspect(pbmc68k_validated.obs["cell_type"])

# here we search the cell type names from the public ontology and grab the top match
# then add the cell type names from the pbmc68k as synonyms
celltype_bt = lb.CellType.bionty()
ontology_ids = []
mapper = {}
for ct in pbmc68k_validated.obs["cell_type"].unique():
    ontology_id = celltype_bt.search(ct).iloc[0].ontology_id
    record = lb.CellType.from_bionty(ontology_id=ontology_id)
    mapper[ct] = record.name
    record.save()
    record.add_synonym(ct)

# standardize cell type names in the dataset
pbmc68k_validated.obs["cell_type"] = pbmc68k_validated.obs["cell_type"].map(mapper)
Hide code cell output
❗ received 9 unique terms, 61 empty/duplicated terms are ignored
❗ 9 terms (100.00%) are not validated for name: Dendritic cells, CD19+ B, CD4+/CD45RO+ Memory, CD8+ Cytotoxic T, CD4+/CD25 T Reg, CD14+ Monocytes, CD56+ NK, CD8+/CD45RA+ Naive Cytotoxic, CD34+
πŸ’‘    couldn't validate 9 terms: 'CD19+ B', 'CD4+/CD25 T Reg', 'CD56+ NK', 'CD8+ Cytotoxic T', 'CD34+', 'CD14+ Monocytes', 'Dendritic cells', 'CD4+/CD45RO+ Memory', 'CD8+/CD45RA+ Naive Cytotoxic'
πŸ’‘ β†’  if you are sure, create new records via ln.CellType() and save to your registry
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000451'
πŸ’‘ also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-09-04 09:34:11, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000738'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-09-04 09:34:12, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', synonyms='Dendritic cells', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-09-04 09:34:12, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0001087'
πŸ’‘ also saving parents of CellType(id='6VQXlWS7', name='effector memory CD4-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:0001087', synonyms='CD4-positive TEMRA|CD4+ TEMRA', description='A Cd4-Positive, Alpha Beta Memory T Cell With The Phenotype Cd45Ra-Positive, Cd45Ro-Negative, And Ccr7-Negative.', updated_at=2023-09-04 09:34:13, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 2 CellType records from Bionty matching ontology_id: 'CL:4030002', 'CL:0000897'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='ylUbqlrS', name='effector memory CD45RA-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:4030002', synonyms='terminally differentiated effector memory cells re-expressing CD45RA|terminally differentiated effector memory CD45RA+ T cells|TEMRA cell', description='An Alpha-Beta Memory T Cell With The Phenotype Cd45Ra-Positive.', updated_at=2023-09-04 09:34:13, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000791'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='WKpZjuYS', name='mature alpha-beta T cell', ontology_id='CL:0000791', synonyms='mature alpha-beta T-lymphocyte|mature alpha-beta T lymphocyte|mature alpha-beta T-cell', description='A Alpha-Beta T Cell That Has A Mature Phenotype.', updated_at=2023-09-04 09:34:14, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='s6Ag7R5U', name='CD4-positive, alpha-beta memory T cell', ontology_id='CL:0000897', synonyms='CD4-positive, alpha-beta memory T-cell|CD4-positive, alpha-beta memory T-lymphocyte|CD4-positive, alpha-beta memory T lymphocyte', description='A Cd4-Positive, Alpha-Beta T Cell That Has Differentiated Into A Memory T Cell.', updated_at=2023-09-04 09:34:13, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000624'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='05vQoepH', name='CD4-positive, alpha-beta T cell', ontology_id='CL:0000624', synonyms='CD4-positive, alpha-beta T lymphocyte|CD4-positive, alpha-beta T-cell|CD4-positive, alpha-beta T-lymphocyte', description='A Mature Alpha-Beta T Cell That Expresses An Alpha-Beta T Cell Receptor And The Cd4 Coreceptor.', updated_at=2023-09-04 09:34:15, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='6VQXlWS7', name='effector memory CD4-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:0001087', synonyms='CD4+ TEMRA|CD4+/CD45RO+ Memory|CD4-positive TEMRA', description='A Cd4-Positive, Alpha Beta Memory T Cell With The Phenotype Cd45Ra-Positive, Cd45Ro-Negative, And Ccr7-Negative.', updated_at=2023-09-04 09:34:15, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000910'
πŸ’‘ also saving parents of CellType(id='OxsmyL44', name='cytotoxic T cell', ontology_id='CL:0000910', synonyms='cytotoxic T lymphocyte|cytotoxic T-lymphocyte|cytotoxic T-cell', description='A Mature T Cell That Differentiated And Acquired Cytotoxic Function With The Phenotype Perforin-Positive And Granzyme-B Positive.', updated_at=2023-09-04 09:34:16, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000911'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='yvHkIrVI', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-lymphocyte|effector T-cell|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-09-04 09:34:17, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0002419'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='2C5PhwrW', name='mature T cell', ontology_id='CL:0002419', synonyms='mature T-cell|CD3e-positive T cell', description='A T Cell That Expresses A T Cell Receptor Complex And Has Completed T Cell Selection.', updated_at=2023-09-04 09:34:18, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000084'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-09-04 09:34:18, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='OxsmyL44', name='cytotoxic T cell', ontology_id='CL:0000910', synonyms='cytotoxic T-cell|CD8+ Cytotoxic T|cytotoxic T-lymphocyte|cytotoxic T lymphocyte', description='A Mature T Cell That Differentiated And Acquired Cytotoxic Function With The Phenotype Perforin-Positive And Granzyme-B Positive.', updated_at=2023-09-04 09:34:19, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000919'
πŸ’‘ also saving parents of CellType(id='ORD0dMdt', name='CD8-positive, CD25-positive, alpha-beta regulatory T cell', ontology_id='CL:0000919', synonyms='CD8+CD25+ Treg|CD8+CD25+ T-lymphocyte|CD8+CD25+ T(reg)|CD8+CD25+ T lymphocyte|CD8+CD25+ T cell|CD8-positive, CD25-positive Treg|CD8-positive, CD25-positive, alpha-beta regulatory T-lymphocyte|CD8-positive, CD25-positive, alpha-beta regulatory T-cell|CD8+CD25+ T-cell|CD8-positive, CD25-positive, alpha-beta regulatory T lymphocyte', description='A Cd8-Positive Alpha Beta-Positive T Cell With The Phenotype Foxp3-Positive And Having Suppressor Function.', updated_at=2023-09-04 09:34:20, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000795'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='oTsFrhYW', name='CD8-positive, alpha-beta regulatory T cell', ontology_id='CL:0000795', synonyms='CD8-positive, alpha-beta regulatory T-cell|CD8-positive, alpha-beta Treg|CD8-positive T(reg)|CD8-positive, alpha-beta regulatory T lymphocyte|CD8+ Treg|CD8+ T(reg)|CD8+ regulatory T cell|CD8-positive, alpha-beta regulatory T-lymphocyte|CD8-positive Treg', description='A Cd8-Positive, Alpha-Beta T Cell That Regulates Overall Immune Responses As Well As The Responses Of Other T Cell Subsets Through Direct Cell-Cell Contact And Cytokine Release.', updated_at=2023-09-04 09:34:20, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0000625'
❗ now recursing through parents: this only happens once, but is much slower than bulk saving
πŸ’‘ you can switch this off via: lb.settings.auto_save_parents = False
πŸ’‘ also saving parents of CellType(id='VnKkQsME', name='CD8-positive, alpha-beta T cell', ontology_id='CL:0000625', synonyms='CD8-positive, alpha-beta T lymphocyte|CD8-positive, alpha-beta T-lymphocyte|CD8-positive, alpha-beta T-cell', description='A T Cell Expressing An Alpha-Beta T Cell Receptor And The Cd8 Coreceptor.', updated_at=2023-09-04 09:34:21, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='ORD0dMdt', name='CD8-positive, CD25-positive, alpha-beta regulatory T cell', ontology_id='CL:0000919', synonyms='CD8+CD25+ T-cell|CD8-positive, CD25-positive, alpha-beta regulatory T-cell|CD8+CD25+ T lymphocyte|CD4+/CD25 T Reg|CD8+CD25+ Treg|CD8+CD25+ T(reg)|CD8+CD25+ T cell|CD8-positive, CD25-positive, alpha-beta regulatory T lymphocyte|CD8-positive, CD25-positive, alpha-beta regulatory T-lymphocyte|CD8-positive, CD25-positive Treg|CD8+CD25+ T-lymphocyte', description='A Cd8-Positive Alpha Beta-Positive T Cell With The Phenotype Foxp3-Positive And Having Suppressor Function.', updated_at=2023-09-04 09:34:21, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0002057'
πŸ’‘ also saving parents of CellType(id='O0AQiAuv', name='CD14-positive, CD16-negative classical monocyte', ontology_id='CL:0002057', synonyms='CD16-negative monocyte|CD16- monocyte', description='A Classical Monocyte That Is Cd14-Positive, Cd16-Negative, Cd64-Positive, Cd163-Positive.', updated_at=2023-09-04 09:34:22, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='O0AQiAuv', name='CD14-positive, CD16-negative classical monocyte', ontology_id='CL:0002057', synonyms='CD16-negative monocyte|CD14+ Monocytes|CD16- monocyte', description='A Classical Monocyte That Is Cd14-Positive, Cd16-Negative, Cd64-Positive, Cd163-Positive.', updated_at=2023-09-04 09:34:22, bionty_source_id='UUUq', created_by_id='DzTjkKse')
βœ… created 1 CellType record from Bionty matching ontology_id: 'CL:0002102'
πŸ’‘ also saving parents of CellType(id='Xkw89opD', name='CD38-negative naive B cell', ontology_id='CL:0002102', synonyms='CD38-negative naive B lymphocyte|CD38-negative naive B-cell|CD38- naive B-cell|CD38-negative naive B-lymphocyte|CD38- naive B lymphocyte|CD38- naive B-lymphocyte|CD38- naive B cell', description='A Cd38-Negative Naive B Cell Is A Mature B Cell That Has The Phenotype Cd38-Negative, Surface Igd-Positive, Surface Igm-Positive, And Cd27-Negative, That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-09-04 09:34:23, bionty_source_id='UUUq', created_by_id='DzTjkKse')
πŸ’‘ also saving parents of CellType(id='Xkw89opD', name='CD38-negative naive B cell', ontology_id='CL:0002102', synonyms='CD38-negative naive B lymphocyte|CD38- naive B lymphocyte|CD38-negative naive B-cell|CD38-negative naive B-lymphocyte|CD38- naive B cell|CD38- naive B-cell|CD38- naive B-lymphocyte|CD8+/CD45RA+ Naive Cytotoxic', description='A Cd38-Negative Naive B Cell Is A Mature B Cell That Has The Phenotype Cd38-Negative, Surface Igd-Positive, Surface Igm-Positive, And Cd27-Negative, That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-09-04 09:34:23, bionty_source_id='UUUq', created_by_id='DzTjkKse')

Now, all cell types are validated:

lb.CellType.validate(pbmc68k_validated.obs["cell_type"]);
βœ… 9 terms (100.00%) are validated for name

Register #

file = ln.File.from_anndata(
    pbmc68k_validated,
    description="10x reference pbmc68k",
    field=lb.Gene.ensembl_gene_id,
)
πŸ’‘ file will be copied to default storage upon `save()` with key `None` ('.lamindb/S4DBVuVCt1xuRFb06uIq.h5ad')
πŸ’‘ parsing feature names of X stored in slot 'var'
βœ…    753 terms (100.00%) are validated for ensembl_gene_id
βœ…    linked: FeatureSet(id='awdEqckqgyaMHEsV80gw', n=753, type='number', registry='bionty.Gene', hash='-FY8VK1f6T3U_MXzH_pj', created_by_id='DzTjkKse')
πŸ’‘ parsing feature names of slot 'obs'
βœ…    1 term (25.00%) is validated for name
❗    3 terms (75.00%) are not validated for name: n_genes, percent_mito, louvain
βœ…    linked: FeatureSet(id='oR5xshWA4l6b4yE6ANM9', n=1, registry='core.Feature', hash='TBgy-69f02DeCujyo_kM', modality_id='UouDKKfD', created_by_id='DzTjkKse')
file.save()
βœ… saved 2 feature sets for slots: 'var','obs'
βœ… storing file 'S4DBVuVCt1xuRFb06uIq' at '.lamindb/S4DBVuVCt1xuRFb06uIq.h5ad'
cell_types = lb.CellType.from_values(pbmc68k_validated.obs["cell_type"], "name")
file.add_labels(cell_types, features.cell_type)
file.add_labels(species.human, feature=features.species)
file.add_labels(experimental_factors.single_cell_rna_sequencing, feature=features.assay)
βœ… loaded: FeatureSet(id='iWPs8Zo6ctPFnd3zS32O', n=1, registry='core.Feature', hash='ul8bYqoVbCUC_zI7iIVM', updated_at=2023-09-04 09:34:00, modality_id='UouDKKfD', created_by_id='DzTjkKse')
βœ… linked new feature 'species' together with new feature set FeatureSet(id='iWPs8Zo6ctPFnd3zS32O', n=1, registry='core.Feature', hash='ul8bYqoVbCUC_zI7iIVM', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
πŸ’‘ no file links to it anymore, deleting feature set FeatureSet(id='iWPs8Zo6ctPFnd3zS32O', n=1, registry='core.Feature', hash='ul8bYqoVbCUC_zI7iIVM', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
βœ… linked new feature 'assay' together with new feature set FeatureSet(id='jnBKcYYcJZ2pme0oVlqw', n=2, registry='core.Feature', hash='mmZjEmOkPr0wfDr63GXa', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
file.features
Features:
  var: FeatureSet(id='awdEqckqgyaMHEsV80gw', n=753, type='number', registry='bionty.Gene', hash='-FY8VK1f6T3U_MXzH_pj', updated_at=2023-09-04 09:34:24, created_by_id='DzTjkKse')
    EIF3D (number)
    DAXX (number)
    COX6A1 (number)
    MED28 (number)
    SERPINB6 (number)
    ARMH1 (number)
    CD3D (number)
    IGHA1 (number)
    MRPL18 (number)
    NDUFA8 (number)
    ... 
  obs: FeatureSet(id='oR5xshWA4l6b4yE6ANM9', n=1, registry='core.Feature', hash='TBgy-69f02DeCujyo_kM', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
    πŸ”— cell_type (9, bionty.CellType): 'conventional dendritic cell', 'B cell, CD19-positive', 'CD38-negative naive B cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'CD14-positive, CD16-negative classical monocyte', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'dendritic cell', 'cytotoxic T cell'
  external: FeatureSet(id='jnBKcYYcJZ2pme0oVlqw', n=2, registry='core.Feature', hash='mmZjEmOkPr0wfDr63GXa', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
    πŸ”— assay (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
    πŸ”— species (1, bionty.Species): 'human'
file.describe()
πŸ’‘ File(id='S4DBVuVCt1xuRFb06uIq', suffix='.h5ad', accessor='AnnData', description='10x reference pbmc68k', size=663138, hash='ezj0ByeaJEju69Ka3vkwFw', hash_type='md5', updated_at=2023-09-04 09:34:24)

Provenance:
  πŸ—ƒοΈ storage: Storage(id='Szavfu1U', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-09-04 09:33:15, created_by_id='DzTjkKse')
  πŸ’« transform: Transform(id='Nv48yAceNSh8z8', name='Validate & register scRNA-seq datasets', short_name='scrna', version='0', type=notebook, updated_at=2023-09-04 09:34:24, created_by_id='DzTjkKse')
  πŸ‘£ run: Run(id='e7dMSW0bVIQUINDGufH3', run_at=2023-09-04 09:33:17, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
  πŸ‘€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-04 09:33:15)
Features:
  var: FeatureSet(id='awdEqckqgyaMHEsV80gw', n=753, type='number', registry='bionty.Gene', hash='-FY8VK1f6T3U_MXzH_pj', updated_at=2023-09-04 09:34:24, created_by_id='DzTjkKse')
    EIF3D (number)
    DAXX (number)
    COX6A1 (number)
    MED28 (number)
    SERPINB6 (number)
    ARMH1 (number)
    CD3D (number)
    IGHA1 (number)
    MRPL18 (number)
    NDUFA8 (number)
    ... 
  obs: FeatureSet(id='oR5xshWA4l6b4yE6ANM9', n=1, registry='core.Feature', hash='TBgy-69f02DeCujyo_kM', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
    πŸ”— cell_type (9, bionty.CellType): 'conventional dendritic cell', 'B cell, CD19-positive', 'CD38-negative naive B cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'CD14-positive, CD16-negative classical monocyte', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'dendritic cell', 'cytotoxic T cell'
  external: FeatureSet(id='jnBKcYYcJZ2pme0oVlqw', n=2, registry='core.Feature', hash='mmZjEmOkPr0wfDr63GXa', updated_at=2023-09-04 09:34:24, modality_id='UouDKKfD', created_by_id='DzTjkKse')
    πŸ”— assay (1, bionty.ExperimentalFactor): 'single-cell RNA sequencing'
    πŸ”— species (1, bionty.Species): 'human'
file.view_flow()
https://d33wubrfki0l68.cloudfront.net/f9a7c27ec847f7241232c98bd5a28bc9484240a4/f2f68/_images/e610ee4b30a4c8c3913c6456e2f02d918b23cefa8435e7c629f09a7e7e1770c4.svg

πŸŽ‰ Now let’s continue with data integration: Integrate scRNA-seq datasets