ddipy: Python package

An Python package to obtain data from the Omics Discovery Index. It uses the RESTful Web Services at OmicsDI WS for that purpose.

Installation

we need to install ddipy:

1
 pip install ddipy
Client Documents
Client Method Result Structure Description
DatasetClient search DataSetResult Search for datasets in the resource
  get_dataset_details DatasetSummary Retrieve an Specific Dataset
  get_dataset_files array[string] Retrieve the list of dataset’s file using positions
  batch BatchDataset Retrieve a batch of datasets
  latest DataSetResult Retrieve the latest datasets in the repository
  most_accessed DataSetResult Retrieve an Specific Dataset
  get_file_links array[string] Retrieve all file links for a given dataset
  get_similar DataSetResult Retrieve the related datasets to one Dataset
  get_similar_by_pubmed array[DatasetSummary] Retrieve all similar dataset based on pubmed id
DatabaseClient get_database_all array[DatabaseDetail] Get details of all databases
SeoClient get_seo_home StructuredDataGraph Retrieve JSON+LD for home page
  get_seo_search StructuredData Retrieve JSON+LD for browse page
  get_seo_api StructuredData Retrieve JSON+LD for api page
  get_seo_database StructuredData Retrieve JSON+LD for databases page
  get_seo_dataset StructuredData Retrieve JSON+LD for dataset page
  get_seo_about StructuredData Retrieve JSON+LD for about page
TermClient get_term_by_pattern DictWord Search dictionary Terms
  get_term_frequently_term_list Term Retrieve frequently terms from the Repo
StatisticsClient get_statistics_organisms array[StatRecord] Return statistics about the number of datasets per Organisms
  get_statistics_tissues array[StatRecord] Return statistics about the number of datasets per Tissue
  get_statistics_omics array[StatRecord] Return statistics about the number of datasets per Omics Type
  get_statistics_diseases array[StatRecord] Return statistics about the number of datasets per dieases
  get_statistics_domains array[DomainStats] Return statistics about the number of datasets per Repository
  get_statistics_omics_by_year array[StatOmicsRecord] Return statistics about the number of datasets By Omics type on recent 5 years

Examples

DatasetClient

This example shows how retrieve details of one dataset by using the Python package ddipy.

1
2
3
4
5
 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
     client = DatasetClient()
     res = client.get_dataset_details("pride", "PXD000210", False)

This example shows a search for 20 the datasets for cancer human.

1
2
3
4
5
 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("cancer human", "publication_date", "ascending")

This example shows a search for 30 the datasets for cancer human and skip first 1200 datasets

1
2
3
4
5
 from ddipy.dataset_client import DatasetClient

 if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("cancer human", "publication_date", "ascending", 1200, 30, 20)

This example is a query to retrieve all the datasets that reported the UniProt protein P21399 as identified.

1
2
3
4
5
from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("UNIPROT:P21399")

This example is a query to find all the datasets where the gene ENSG00000147251 is reported as differentially expressed.

1
2
3
4
5
from ddipy.dataset_client import DatasetClient

if __name__ == '__main__':
    client = DatasetClient()
    res = client.search("ENSEMBL:ENSG00000147251")

DatabaseClient

This example is a query to retrieve all databases recorded in OmicsDI

1
2
3
4
5
from ddipy.dataset_client import DatabaseClient

if __name__ == '__main__':
    client = DatabaseClient()
    res = client.get_database_all()

SeoClient

This example is retriveing JSON+LD for dataset page

1
2
3
4
5
from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
     client = SeoClient()
     res = client.get_seo_dataset("pride", "PXD000210")

This example is retriveing JSON+LD for home page

1
2
3
4
5
from ddipy.dataset_client import SeoClient

if __name__ == '__main__':
     client = SeoClient()
     res = client.get_seo_home()

StatisticsClient

This example is a query for statistics about the number of datasets per Tissue

1
2
3
4
5
from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
     client = StatisticsClient()
     res = client.get_statistics_tissues(20)

This example is a query for statistics about the number of datasets per dieases

1
2
3
4
5
from ddipy.dataset_client import StatisticsClient

if __name__ == '__main__':
     client = StatisticsClient()
     res = client.get_statistics_diseases(20)

TermClient

This example for searching dictionary terms

1
2
3
4
5
from ddipy.dataset_client import TermClient

if __name__ == '__main__':
     client = TermClient()
     res = client.get_term_by_pattern("hom", 10)

This example for retrieving frequently terms from the repo

1
2
3
4
5
from ddipy.dataset_client import TermClient

if __name__ == '__main__':
     client = TermClient()
     res = client.get_term_by_pattern("pride", "description", 20)

Structure

DataSetResult

DataSetResult Structure
Name Type
datasets array[DatasetSummary]
facets array[Facet]
count integer

DatasetSummary

DatasetSummary Structure
Name Type
accession string
database string
title string
description string
dates Date
scores Score
keywords array[string]
omics_type array[string]
organisms array[Organism]
cross_references any
files array[string]
additional any

Date

Date Structure
Name Type
publication string
submission string
update string

Score

Score Structure
Name Type
citationCount integer
reanalysisCount integer
searchCount integer
viewCount integer
connectionsCount integer
downloadCount integer

Organism

Organism Structure
Name Type
acc string
name string

Facet

Facet Structure
Name Type
facet_values array[FacetValue]
label string
total integer
id string

FacetValue

FacetValue Structure
Name Type
label string
count string
value string

BatchDataset

BatchDataset Structure
Name Type
failure array[Failure]
datasets array[DatasetSummary]

Failure

Failure Structure
Name Type
database string
accession string
name string
source_url string

DatabaseDetail

DatabaseDetail Structure
Name Type
repository string
orcid_name string
url_template string
accession_prefix array[string]
title string
img_alt string
source_url string
description string
domain string
image array[byte]
icon string
source string
database_name string

DictWord

DictWord Structure
Name Type
total_count integer
items array[Item]

Item

Item Structure
Name Type
name string

Term

Term Structure
Name Type
frequent string
label string

StructuredDataGraph

StructuredDataGraph Structure
Name Type
graph array[StructuredData]

StructuredData

StructuredData Structure
Name Type
logo string
alternateName string
potentialAction StructuredDataAction
variableMeasured string
sameAs string
creator array[StructuredDataAuthor]
citation StructuredDataCitation
email string
keywords string
primaryImageOfPage StructuredDataImage
description string
image string
name string
context string
type string
url string

StructuredDataAction

StructuredDataAction Structure
Name Type
query_input string
type string
target string

StructuredDataAuthor

StructuredDataAuthor Structure
Name Type
name string
type string

StructuredDataCitation

StructuredDataCitation Structure
Name Type
author StructuredDataAuthor
publisher StructuredDataAuthor
name string
type string
url string

StructuredDataImage

StructuredDataImage Structure
Name Type
author string
contentUrl string
contentLocation string
type string

StatRecord

StatRecord Structure
Name Type
label string
name string
value string
id string

DomainStats

DomainStats Structure
Name Type
domain StatRecord
subdomains array[DomainStats]

StatOmicsRecord

StatOmicsRecord Structure
Name Type
proteomics string
transcriptomics string
genomics string
metabolomics string
year string