ddipy: Python package¶

An Python package to obtain data from the Omics Discovery Index. It uses the RESTful Web Services at OmicsDI WS for that purpose.

Installation¶

we need to install ddipy:

 pip install ddipy

Client Documents¶
Client	Method	Result Structure	Description
DatasetClient	search	DataSetResult	Search for datasets in the resource
	get_dataset_details	DatasetSummary	Retrieve an Specific Dataset
	get_dataset_files	array[string]	Retrieve the list of dataset’s file using positions
	batch	BatchDataset	Retrieve a batch of datasets
	latest	DataSetResult	Retrieve the latest datasets in the repository
	most_accessed	DataSetResult	Retrieve an Specific Dataset
	get_file_links	array[string]	Retrieve all file links for a given dataset
	get_similar	DataSetResult	Retrieve the related datasets to one Dataset
	get_similar_by_pubmed	array[DatasetSummary]	Retrieve all similar dataset based on pubmed id
DatabaseClient	get_database_all	array[DatabaseDetail]	Get details of all databases
SeoClient	get_seo_home	StructuredDataGraph	Retrieve JSON+LD for home page
	get_seo_search	StructuredData	Retrieve JSON+LD for browse page
	get_seo_api	StructuredData	Retrieve JSON+LD for api page
	get_seo_database	StructuredData	Retrieve JSON+LD for databases page
	get_seo_dataset	StructuredData	Retrieve JSON+LD for dataset page
	get_seo_about	StructuredData	Retrieve JSON+LD for about page
TermClient	get_term_by_pattern	DictWord	Search dictionary Terms
	get_term_frequently_term_list	Term	Retrieve frequently terms from the Repo
StatisticsClient	get_statistics_organisms	array[StatRecord]	Return statistics about the number of datasets per Organisms
	get_statistics_tissues	array[StatRecord]	Return statistics about the number of datasets per Tissue
	get_statistics_omics	array[StatRecord]	Return statistics about the number of datasets per Omics Type
	get_statistics_diseases	array[StatRecord]	Return statistics about the number of datasets per dieases
	get_statistics_domains	array[DomainStats]	Return statistics about the number of datasets per Repository
	get_statistics_omics_by_year	array[StatOmicsRecord]	Return statistics about the number of datasets By Omics type on recent 5 years

Examples¶

DatasetClient¶

This example shows how retrieve details of one dataset by using the Python package ddipy.

 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
     client = DatasetClient()
     res = client.get_dataset_details("pride", "PXD000210", False)

This example shows a search for 20 the datasets for cancer human.

 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("cancer human", "publication_date", "ascending")

This example shows a search for 30 the datasets for cancer human and skip first 1200 datasets

 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("cancer human", "publication_date", "ascending", 1200, 30, 20)

This example is a query to retrieve all the datasets that reported the UniProt protein P21399 as identified.

from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("UNIPROT:P21399")

This example is a query to find all the datasets where the gene ENSG00000147251 is reported as differentially expressed.

from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("ENSEMBL:ENSG00000147251")

DatabaseClient¶

This example is a query to retrieve all databases recorded in OmicsDI

from ddipy.dataset_client import DatabaseClient

if __name__ == '__main__':
    client = DatabaseClient()
    res = client.get_database_all()

SeoClient¶

This example is retriveing JSON+LD for dataset page

from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
     client = SeoClient()
     res = client.get_seo_dataset("pride", "PXD000210")

This example is retriveing JSON+LD for home page

from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
     client = SeoClient()
     res = client.get_seo_home()

StatisticsClient¶

This example is a query for statistics about the number of datasets per Tissue

from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
     client = StatisticsClient()
     res = client.get_statistics_tissues(20)

This example is a query for statistics about the number of datasets per dieases

from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
     client = StatisticsClient()
     res = client.get_statistics_diseases(20)

TermClient¶

This example for searching dictionary terms

from ddipy.dataset_client import TermClient

if __name__ == '__main__':
     client = TermClient()
     res = client.get_term_by_pattern("hom", 10)

This example for retrieving frequently terms from the repo

from ddipy.dataset_client import TermClient

if __name__ == '__main__':
     client = TermClient()
     res = client.get_term_by_pattern("pride", "description", 20)

Structure¶

DataSetResult¶

DataSetResult Structure¶
Name	Type
datasets	array[DatasetSummary]
facets	array[Facet]
count	integer

DatasetSummary¶

DatasetSummary Structure¶
Name	Type
accession	string
database	string
title	string
description	string
dates	Date
scores	Score
keywords	array[string]
omics_type	array[string]
organisms	array[Organism]
cross_references	any
files	array[string]
additional	any

Date¶

Date Structure¶
Name	Type
publication	string
submission	string
update	string

Score¶

Score Structure¶
Name	Type
citationCount	integer
reanalysisCount	integer
searchCount	integer
viewCount	integer
connectionsCount	integer
downloadCount	integer

Organism¶

Organism Structure¶
Name	Type
acc	string
name	string

FacetValue¶

FacetValue Structure¶
Name	Type
label	string
count	string
value	string

BatchDataset¶

BatchDataset Structure¶
Name	Type
failure	array[Failure]
datasets	array[DatasetSummary]

Failure¶

Failure Structure¶
Name	Type
database	string
accession	string
name	string
source_url	string

DatabaseDetail¶

DatabaseDetail Structure¶
Name	Type
repository	string
orcid_name	string
url_template	string
accession_prefix	array[string]
title	string
img_alt	string
source_url	string
description	string
domain	string
image	array[byte]
icon	string
source	string
database_name	string

DictWord¶

DictWord Structure¶
Name	Type
total_count	integer
items	array[Item]

Item¶

Item Structure¶
Name	Type
name	string

Term¶

Term Structure¶
Name	Type
frequent	string
label	string

StructuredDataGraph¶

StructuredDataGraph Structure¶
Name	Type
graph	array[StructuredData]

StructuredData¶

StructuredData Structure¶
Name	Type
logo	string
alternateName	string
potentialAction	StructuredDataAction
variableMeasured	string
sameAs	string
creator	array[StructuredDataAuthor]
citation	StructuredDataCitation
email	string
keywords	string
primaryImageOfPage	StructuredDataImage
description	string
image	string
name	string
context	string
type	string
url	string

StructuredDataAction¶

StructuredDataAction Structure¶
Name	Type
query_input	string
type	string
target	string

StructuredDataAuthor¶

StructuredDataAuthor Structure¶
Name	Type
name	string
type	string

StructuredDataCitation¶

StructuredDataCitation Structure¶
Name	Type
author	StructuredDataAuthor
publisher	StructuredDataAuthor
name	string
type	string
url	string

StructuredDataImage¶

StructuredDataImage Structure¶
Name	Type
author	string
contentUrl	string
contentLocation	string
type	string

StatRecord¶

StatRecord Structure¶
Name	Type
label	string
name	string
value	string
id	string

DomainStats¶

DomainStats Structure¶
Name	Type
domain	StatRecord
subdomains	array[DomainStats]

StatOmicsRecord¶

StatOmicsRecord Structure¶
Name	Type
proteomics	string
transcriptomics	string
genomics	string
metabolomics	string
year	string