ddipy: Python package¶
An Python package to obtain data from the Omics Discovery Index. It uses the RESTful Web Services at OmicsDI WS for that purpose.
Installation¶
we need to install ddipy:
1 | pip install ddipy
|
Client | Method | Result Structure | Description |
---|---|---|---|
DatasetClient | search | DataSetResult | Search for datasets in the resource |
get_dataset_details | DatasetSummary | Retrieve an Specific Dataset | |
get_dataset_files | array[string] | Retrieve the list of dataset’s file using positions | |
batch | BatchDataset | Retrieve a batch of datasets | |
latest | DataSetResult | Retrieve the latest datasets in the repository | |
most_accessed | DataSetResult | Retrieve an Specific Dataset | |
get_file_links | array[string] | Retrieve all file links for a given dataset | |
get_similar | DataSetResult | Retrieve the related datasets to one Dataset | |
get_similar_by_pubmed | array[DatasetSummary] | Retrieve all similar dataset based on pubmed id | |
DatabaseClient | get_database_all | array[DatabaseDetail] | Get details of all databases |
SeoClient | get_seo_home | StructuredDataGraph | Retrieve JSON+LD for home page |
get_seo_search | StructuredData | Retrieve JSON+LD for browse page | |
get_seo_api | StructuredData | Retrieve JSON+LD for api page | |
get_seo_database | StructuredData | Retrieve JSON+LD for databases page | |
get_seo_dataset | StructuredData | Retrieve JSON+LD for dataset page | |
get_seo_about | StructuredData | Retrieve JSON+LD for about page | |
TermClient | get_term_by_pattern | DictWord | Search dictionary Terms |
get_term_frequently_term_list | Term | Retrieve frequently terms from the Repo | |
StatisticsClient | get_statistics_organisms | array[StatRecord] | Return statistics about the number of datasets per Organisms |
get_statistics_tissues | array[StatRecord] | Return statistics about the number of datasets per Tissue | |
get_statistics_omics | array[StatRecord] | Return statistics about the number of datasets per Omics Type | |
get_statistics_diseases | array[StatRecord] | Return statistics about the number of datasets per dieases | |
get_statistics_domains | array[DomainStats] | Return statistics about the number of datasets per Repository | |
get_statistics_omics_by_year | array[StatOmicsRecord] | Return statistics about the number of datasets By Omics type on recent 5 years |
Examples¶
DatasetClient¶
This example shows how retrieve details of one dataset by using the Python package ddipy.
1 2 3 4 5 | from ddipy.dataset_client import DatasetClient
if __name__ == '__main__':
client = DatasetClient()
res = client.get_dataset_details("pride", "PXD000210", False)
|
This example shows a search for 20 the datasets for cancer human.
1 2 3 4 5 | from ddipy.dataset_client import DatasetClient
if __name__ == '__main__':
client = DatasetClient()
res = client.search("cancer human", "publication_date", "ascending")
|
This example shows a search for 30 the datasets for cancer human and skip first 1200 datasets
1 2 3 4 5 | from ddipy.dataset_client import DatasetClient
if __name__ == '__main__':
client = DatasetClient()
res = client.search("cancer human", "publication_date", "ascending", 1200, 30, 20)
|
This example is a query to retrieve all the datasets that reported the UniProt protein P21399 as identified.
1 2 3 4 5 | from ddipy.dataset_client import DatasetClient
if __name__ == '__main__':
client = DatasetClient()
res = client.search("UNIPROT:P21399")
|
This example is a query to find all the datasets where the gene ENSG00000147251 is reported as differentially expressed.
1 2 3 4 5 | from ddipy.dataset_client import DatasetClient
if __name__ == '__main__':
client = DatasetClient()
res = client.search("ENSEMBL:ENSG00000147251")
|
DatabaseClient¶
This example is a query to retrieve all databases recorded in OmicsDI
1 2 3 4 5 | from ddipy.dataset_client import DatabaseClient
if __name__ == '__main__':
client = DatabaseClient()
res = client.get_database_all()
|
SeoClient¶
This example is retriveing JSON+LD for dataset page
1 2 3 4 5 | from ddipy.dataset_client import SeoClient
if __name__ == '__main__':
client = SeoClient()
res = client.get_seo_dataset("pride", "PXD000210")
|
This example is retriveing JSON+LD for home page
1 2 3 4 5 | from ddipy.dataset_client import SeoClient
if __name__ == '__main__':
client = SeoClient()
res = client.get_seo_home()
|
StatisticsClient¶
This example is a query for statistics about the number of datasets per Tissue
1 2 3 4 5 | from ddipy.dataset_client import StatisticsClient
if __name__ == '__main__':
client = StatisticsClient()
res = client.get_statistics_tissues(20)
|
This example is a query for statistics about the number of datasets per dieases
1 2 3 4 5 | from ddipy.dataset_client import StatisticsClient
if __name__ == '__main__':
client = StatisticsClient()
res = client.get_statistics_diseases(20)
|
TermClient¶
This example for searching dictionary terms
1 2 3 4 5 | from ddipy.dataset_client import TermClient
if __name__ == '__main__':
client = TermClient()
res = client.get_term_by_pattern("hom", 10)
|
This example for retrieving frequently terms from the repo
1 2 3 4 5 | from ddipy.dataset_client import TermClient
if __name__ == '__main__':
client = TermClient()
res = client.get_term_by_pattern("pride", "description", 20)
|
Structure¶
DataSetResult¶
Name | Type |
---|---|
datasets | array[DatasetSummary] |
facets | array[Facet] |
count | integer |
DatasetSummary¶
Name | Type |
---|---|
accession | string |
database | string |
title | string |
description | string |
dates | Date |
scores | Score |
keywords | array[string] |
omics_type | array[string] |
organisms | array[Organism] |
cross_references | any |
files | array[string] |
additional | any |
Score¶
Name | Type |
---|---|
citationCount | integer |
reanalysisCount | integer |
searchCount | integer |
viewCount | integer |
connectionsCount | integer |
downloadCount | integer |
Facet¶
Name | Type |
---|---|
facet_values | array[FacetValue] |
label | string |
total | integer |
id | string |
BatchDataset¶
Name | Type |
---|---|
failure | array[Failure] |
datasets | array[DatasetSummary] |
DatabaseDetail¶
Name | Type |
---|---|
repository | string |
orcid_name | string |
url_template | string |
accession_prefix | array[string] |
title | string |
img_alt | string |
source_url | string |
description | string |
domain | string |
image | array[byte] |
icon | string |
source | string |
database_name | string |
StructuredDataGraph¶
Name | Type |
---|---|
graph | array[StructuredData] |
StructuredData¶
Name | Type |
---|---|
logo | string |
alternateName | string |
potentialAction | StructuredDataAction |
variableMeasured | string |
sameAs | string |
creator | array[StructuredDataAuthor] |
citation | StructuredDataCitation |
string | |
keywords | string |
primaryImageOfPage | StructuredDataImage |
description | string |
image | string |
name | string |
context | string |
type | string |
url | string |
StructuredDataCitation¶
Name | Type |
---|---|
author | StructuredDataAuthor |
publisher | StructuredDataAuthor |
name | string |
type | string |
url | string |
DomainStats¶
Name | Type |
---|---|
domain | StatRecord |
subdomains | array[DomainStats] |