Overview

CBDD stands for ‘Computational Biology for Drug Discovery’. It is a pre-competitive consortium between Clarivate Analytics (formerly the IP & Science business of Thomson Reuters) and several leading pharmaceutical companies, aimed at evaluation and uniform implementation of algorithms for network and pathway analysis of the molecular data sets.

The selection of an appropriate network analysis algorithm is a challenge. In the last few years, dozens of various network and pathway analysis approaches have been published in the peer-reviewed literature. However, using them requires understanding their required inputs (data and network), assumptions they make about the biology, and the goals that they intend to achieve. Furthermore, each new algorithm is a new learning curve, and implementations are not always readily available.

The goal of CBDD is, first of all, to select the state of the art algorithms which could be beneficial for drug development research and, second, implement them in uniform fashion, providing a robust and easy to use software package.

The CBDD toolkit is designed to enable seamless analysis of diverse data sets within the context of any networks available to the user. A typical design of a computational workflow in systems biology looks like the following:

  1. Gather molecular data to analyze;
  2. Collect an appropriate molecular network;
  3. Select the algorithm intended to solve a particular research problem;
  4. Run the algorithm;
  5. Interpret the results to get insight about the data.

The molecular data for analysis are typically readily available (either from the public domain or generated in-house) and have a relatively simple structure. CBDD helps with the two more complex steps, namely network generation and algorithm selection

The collection of algoritms implemented during CBDD program is available first and foremost as an R package. The web application is available for users who prefer the visual interface over the command line interface.

Functionality

The functionality of CBDD R package can be roughly classified as follows:

The CBDD doesn’t end on the R package. Deliverables include GUI version of the CBDD, allowing users access to the algorithms via web interface.

In addition to that, collaborative nature of CBDD program provides opportunities for exchange of ideas and best practices of network analysis between members.

Following sections describe each of these pieces of functionality. More detailed description with more detailed description of algorithm areas and other features is available here

Build network

The network generation is a more challenging area. There are two major approaches to create large-scale networks describing relationships in biological systems:

Both approaches have their advantages. The CBDD provides infrastructure to load pre-existing networks from text files and from Clarivate Analytics’ MetaBase. Also, several of the algorithms are available for data-driven generation and modification of networks:

Area Description
Data-driven network De novo generation of relationships between biological entities (e.g. from similarities in gene expression profiles).
Network adjustment Weighting and adjustment of networks (e.g. based on tissue expression data), making them more specific to a particular biological context.

Analyze

The network analysis algorithms do different things and might be utilized for different purposes. It is often hard to classify the algorithms into precisely defined categories. CBDD uses a classification based upon the end goal (roughly corresponding to the typical research needs in drug development):

Area Description Example purpose
Node prioritization Learn which nodes in the network are well connected to the nodes of interest and might regulate the phenotype. Drug target identification
Subnetwork prioritization Find modules in networks which are associated with phenotype. Mechanism reconstruction; Biomarker discovery
Pathway prioritization (coming soon) Learning which of the canonical signaling pathways are associated with phenotype provides good clues to the molecular mechanisms behind it and may help with biomarker search. Mechanism reconstruction; Biomarker discovery
Unsupervised analysis (coming soon) Learn how patients stratify into the subtypes and which networks and pathways drive this stratification. Patient stratification
Integrative analysis Many omics data types are routinely available, and there is often need to understand how they talk to one another. Networks can be utilized for answering questions such as ‘which mutations in my dataset affect the differential expression in disease?’. Any purpose
Network comparison Compare mechanisms underlying different diseases or disease models. Mechanism reconstruction;

Interpret results

This block of features in R package includes functionality intended to make sense of the results produced by the algorithms, namely:

Beyond R package

Community

CBDD program includes leadership events, dedicated to exchange of ideas on how best to apply network analysis tools to the drug development-related tasks

In addition to that, the tutorials section have been added to the CBDD website, allowing registered CBDD users to share their efforts on different applications of network analysis and provide useful tips and tricks to the rest of community.

GUI

In addition to R package, CBDD functionality is also available as a web service providing GUI for the algorithms. The web service is not hosted by Clarivate Analytics; instead it should be deployed on the members’ IT infrastructure. The main purpose of this software extension is to enable more users (beyond R users) to utilize CBDD network algorithms and explore the results in more familiar graphical interface.

The GUI has following features

  • Load data and networks, store them on the server side in and manage the files in data and network navigators. Share data with other users;
  • Run one or more algorithms which are applicable to selected data sets with full control of options;
  • Save / load workflows (particular data and particular settings of algorithms used in analysis);
  • Explore, save, export results of algorithms;
  • Compare results of different algorithm runs.

Interactive network visualizations available in CBDD R package are plugged into the GUI in appropriate places (e.g. view subnetworks from the algorithm result page).

Algorithm selection and implementation principles

Algorithms and other features of CBDD are selected based on collective vote of CBDD members.

The priorities in the development of CBDD are as follows: