CBDD stands for ‘Computational Biology for Drug Discovery’. It is a pre-competitive consortium between Clarivate Analytics (formerly the IP & Science business of Thomson Reuters) and several leading pharmaceutical companies, aimed at evaluation and uniform implementation of algorithms for network and pathway analysis of the molecular data sets.
The selection of an appropriate network analysis algorithm is a challenge. In the last few years, dozens of various network and pathway analysis approaches have been published in the peer-reviewed literature. However, using them requires understanding their required inputs (data and network), assumptions they make about the biology, and the goals that they intend to achieve. Furthermore, each new algorithm is a new learning curve, and implementations are not always readily available.
The goal of CBDD is, first of all, to select the state of the art algorithms which could be beneficial for drug development research and, second, implement them in uniform fashion, providing a robust and easy to use software package.
The CBDD toolkit is designed to enable seamless analysis of diverse data sets within the context of any networks available to the user. A typical design of a computational workflow in systems biology looks like the following:
The molecular data for analysis are typically readily available (either from the public domain or generated in-house) and have a relatively simple structure. CBDD helps with the two more complex steps, namely network generation and algorithm selection
The collection of algoritms implemented during CBDD program is available first and foremost as an R package. The web application is available for users who prefer the visual interface over the command line interface.
The functionality of CBDD R package can be roughly classified as follows:
The CBDD doesn’t end on the R package. Deliverables include GUI version of the CBDD, allowing users access to the algorithms via web interface.
In addition to that, collaborative nature of CBDD program provides opportunities for exchange of ideas and best practices of network analysis between members.
Following sections describe each of these pieces of functionality. More detailed description with more detailed description of algorithm areas and other features is available here
The network generation is a more challenging area. There are two major approaches to create large-scale networks describing relationships in biological systems:
Both approaches have their advantages. The CBDD provides infrastructure to load pre-existing networks from text files and from Clarivate Analytics’ MetaBase. Also, several of the algorithms are available for data-driven generation and modification of networks:
Area | Description |
---|---|
Data-driven network | De novo generation of relationships between biological entities (e.g. from similarities in gene expression profiles). |
Network adjustment | Weighting and adjustment of networks (e.g. based on tissue expression data), making them more specific to a particular biological context. |
The network analysis algorithms do different things and might be utilized for different purposes. It is often hard to classify the algorithms into precisely defined categories. CBDD uses a classification based upon the end goal (roughly corresponding to the typical research needs in drug development):
Area | Description | Example purpose |
---|---|---|
Node prioritization | Learn which nodes in the network are well connected to the nodes of interest and might regulate the phenotype. | Drug target identification |
Subnetwork prioritization | Find modules in networks which are associated with phenotype. | Mechanism reconstruction; Biomarker discovery |
Pathway prioritization (coming soon) | Learning which of the canonical signaling pathways are associated with phenotype provides good clues to the molecular mechanisms behind it and may help with biomarker search. | Mechanism reconstruction; Biomarker discovery |
Unsupervised analysis (coming soon) | Learn how patients stratify into the subtypes and which networks and pathways drive this stratification. | Patient stratification |
Integrative analysis | Many omics data types are routinely available, and there is often need to understand how they talk to one another. Networks can be utilized for answering questions such as ‘which mutations in my dataset affect the differential expression in disease?’. | Any purpose |
Network comparison | Compare mechanisms underlying different diseases or disease models. | Mechanism reconstruction; |
This block of features in R package includes functionality intended to make sense of the results produced by the algorithms, namely:
CBDD program includes leadership events, dedicated to exchange of ideas on how best to apply network analysis tools to the drug development-related tasks
In addition to that, the tutorials section have been added to the CBDD website, allowing registered CBDD users to share their efforts on different applications of network analysis and provide useful tips and tricks to the rest of community.
In addition to R package, CBDD functionality is also available as a web service providing GUI for the algorithms. The web service is not hosted by Clarivate Analytics; instead it should be deployed on the members’ IT infrastructure. The main purpose of this software extension is to enable more users (beyond R users) to utilize CBDD network algorithms and explore the results in more familiar graphical interface.
The GUI has following features
Interactive network visualizations available in CBDD R package are plugged into the GUI in appropriate places (e.g. view subnetworks from the algorithm result page).
Algorithms and other features of CBDD are selected based on collective vote of CBDD members.
The priorities in the development of CBDD are as follows: