This is not the document you are looking for? Use the search form below to find more!

Report home > Computer / Internet

Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions

0.00 (0 votes)
Document Description
Workflow solutions driven by data pipelining are increasingly becoming popular for accessing, aggregating and analyzing disparate data to make informed and intelligent decisions. Uses of workflow technologies which facilitate busi- ness intelligence (BI) improve productivity, decision making and research efficiency. In order to provide BI in a scientific or clinical based organization, it is imperative that the application or workflow technology must be compatible with mul- tiple data types and formats, be able to analyze the data and make it available throughout the organization. We term this as Scientific Business Intelligence (SBI) and discuss how modeling, simulations and informatics software, integrated with open and standards-based scientific operating platform (SOP), can deliver scientifically-relevant BI solutions. We illus- trate SBI with several examples encompassing all levels of users within an organization.
File Details
Submitter
  • Username: shinta
  • Name: shinta
  • Documents: 4332
Embed Code:

Add New Comment




Related Documents

Applications Of Bioinformatics In Drug Discovery And Process

by: ishaan, 41 pages

APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY AND PROCESS RESEARCH Dr. Basavaraj K. Nanjwade M.Pharm., Ph.D Associate Professor Department of Pharmaceutics JN Medical College KLE ...

Setting Business Goals And Objectives

by: rudolf, 1 pages

Setting Business Goals And Objectives

James Colby Maddox\’s Business Intelligence and Software Developer Resume

by: facunda, 3 pages

James Colby Maddox Mableton, GA 30126 ? (678) 895-4769 jmaddo15@students.kennesaw.edu SUMMARY ...

Value of Technical Publications in the Business Intelligence Industry

by: The Writers Block, 9 pages

This paper presents the value and need for quality Technical Publications in the growing Business Intelligence (BI) industry, to market BI products with a higher value proposition.

Drug Discovery in Rare Diseases

by: benturner, 2 pages

ReportsnReports adds a new report "Drug Discovery in Rare Diseases" to its research database.

Business Intelligence & Analytics

by: globallogic, 2 pages

To stay competitive in the data mining and business intelligence (BI) space, complex data transformation and analytical reporting must keep pace with ever-changing consumer needs and an increasing ...

Using Business Intelligence Solutions for Achieving Organization's Strategy: Arab International University Case Study

by: samanta, 5 pages

Business Intelligence (BI) is becoming an important IT framework that can help organizations managing, developing and communicating their intangible assets such as information and knowledge. Thus it ...

Business Intelligence Competency Centers: Creating a successful business intelligence strategy with SAS

by: samanta, 23 pages

Business intelligence is reaching more and more constituents inside and outside of your organization. Information demands, data volumes and audience populations are growing and will continue to grow ...

Business Intelligence Solutions

by: estuate, 1 pages

Estuate expertise in proven Oracle BI methodologies mitigates key risk factors from your business intelligence implementations. Estuate can redefine your BI strategy and deliver cost effective ...

Understanding the Significance of Business Intelligence Tools

by: globallogic, 2 pages

Decisions drive organizations, and making a good decision at a critical moment leads to a more efficient operation, a more profitable enterprise. Business intelligence tools are instrumental to ...

Content Preview

Current Computer-Aided Drug Design, 2008, 4, 13-22 13
Changing Paradigms in Drug Discovery: Scientific Business Intelligence™
and Workflow Solutions
Shikha Varma-O’Brien, Frank K. Brown*, Andrew LeBeau and Robert D. Brown
Accelrys Inc., 10188 Telesis Court, Suite 100, San Diego, CA 92121, USA
Abstract: Workflow solutions driven by data pipelining are increasingly becoming popular for accessing, aggregating and
analyzing disparate data to make informed and intelligent decisions. Uses of workflow technologies which facilitate busi-
ness intelligence (BI) improve productivity, decision making and research efficiency. In order to provide BI in a scientific
or clinical based organization, it is imperative that the application or workflow technology must be compatible with mul-
tiple data types and formats, be able to analyze the data and make it available throughout the organization. We term this as
Scientific Business Intelligence (SBI) and discuss how modeling, simulations and informatics software, integrated with
open and standards-based scientific operating platform (SOP), can deliver scientifically-relevant BI solutions. We illus-
trate SBI with several examples encompassing all levels of users within an organization.
Keywords: Scientific business intelligence, data pipelining, workflows, web deployment, data integration, dashboard,
Pipeline Pilot™.
LIMITATIONS OF TRADITIONAL BUSINESS IN-
underlying concept has widened into a more general notion
TELLIGENCE IN THE SCIENTIFIC COMMUNITY
of “omics”-based data mining (i.e. genomics, proteomics,
etc). As these disciplines have developed organically since
Business Intelligence technologies, including corporate
their genesis, their associated data structures, databases and
dashboards, hyper-cubes and visualization systems, have
analysis tools have grown exponentially both in size and
been successfully applied in enterprise operations such as
disparity, resulting in often disconnected and isolated re-
sales, marketing and finance [1]. Generally speaking, these
search efforts. As such, software in the materials and phar-
BI tools have primarily been used to satisfy the demands of
maceutical discovery arena has been sold as a series of point
executive-level decision makers who are looking for infor-
applications contained in several different interfaces. This
mation about existing products and services. However, the
not only requires users to write complicated scripts that are
needs of the business community are driving vendors to pro-
difficult to leverage between different types and levels of
vide improved analytics and more flexible and configurable
users, but it has also locked their data into software from one
interfaces.
vendor or another, demanding that they write and read files
Yet even with these changes, the use and relevance of
of multiple standards to transfer data between applications.
traditional BI technologies within the research and develop-
ment world has been limited, primarily because they are in-
SCIENTIFIC BUSINESS INTELLIGENCE
adequate both in their ability to embrace anything other than

In order to fully leverage the vast quantities and types of
structured numerical data, and in their lack of advanced sci-
data within their organization, scientific and clinical research
entific analysis and drilldown capabilities. In order for BI
organizations require a platform that encourages and enables
technologies to become more relevant and deployed more
exploration of data across many scientific disciplines. The
broadly, they must be able to federate differing types of data
platform must have the capability to access and aggregate
more robustly across an enterprise’s operations, providing
both structured and non-structured data from multiple re-
knowledge workers with the information necessary to in-
search areas (file, database, internet) into a single environ-
crease their productivity and effectiveness. But even with
ment. It must enable advanced scientific analytics and allow
that realization, the complexity of the data types and analysis
users to integrate the applications and algorithms that work
methods used in the scientific community has left the scien-
best for them. It must offer flexibility for users to view re-
tific and clinical research markets with un-served needs.
sults in the manner most effective for their needs, which may
DOWNFALLS OF DATA DISPARITY AND POINT
range from web portals to sophisticated 3D visualization.
PRODUCTS
Finally, for users to realize the true value of their data, the
platform must be able to deliver precisely the information
Over a decade ago, scientific informatics was trans-
users seek—exactly when and how they need it—through
formed with the advent of two new disciplines—
interactive reports and dashboards. Hence, the term “Scien-
bioinformatics and cheminformatics—which were created to
tific Business Intelligence” is used to describe technologies
change the way scientists mined data and how IT managed
that meet these demands, enabling scientists and scientific
that data [2, 3]. After the advent of these two disciplines, the
managers to attain new insights [4]. The three corner stones

of SBI are integration, information delivery, and analysis.
Effective delivery and communication of this information

*Address correspondence to this author at the Accelrys Inc., 10188 Telesis
throughout an organization can be in the form of simple re-
Court, Suite 100, San Diego, CA 92121, USA;
ports, web-based user interfaces, and dashboards. Dash-
E-mail: fbrown@accelrys.com


1573-4099/08 $55.00+.00
© 2008 Bentham Science Publishers Ltd.

14 Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1
Varma-O’Brien et al.
boards are a particularly effective delivery mechanism as
By adhering to the concept of Service Orientated Archi-
they allow the user to rapidly process a large amount of in-
tecture (SOA) through the provisions and consumption of
formation and to drill down to an appropriate level of detail.
standard web services, such technology can stand on its own
Finally, effective decision making requires high-quality data
or merge into a larger, corporate SOA environment. All the
analysis. The SOP framework provides an extensive array of
requirements for such an SOP are readily met by our
statistical methods, ranging from simple statistical indicators
workflow platform called Pipeline Pilot™. There are several
to advanced statistical modeling methods (e.g. Bayesian sta-
other workflow management tools that are capable of ad-
tistics). These are typically used to align key performance
dressing some of the issues described here, these include, but
indicators to a strategic objective and to provide links to re-
are not limited to, KDE from Inforsense [5], KNIME devel-
lated reports and information. These analytical methods are
oped by University of Konstanz, Germany [6] and Taverna
equally applicable to scientific and business process data.
[7]. The intent of this paper is not to review the pros and
With a goal to formalize and institutionalize standard
cons of these workflow solutions but to rather illustrate con-
scientific research and decision-making tools for SBI to the
cepts in SBI, which we have done by using Pipeline Pilot™.
same degree that BI tools have become mainstream tech-
The specific advantage to our specific SOA approach is that,
nologies, SBI creates foundational approaches and applica-
as an environment, it moves beyond the concept of a proprie-
tions that can enable new and important initiatives such as
tary backplane. Instead, in this paradigm, components de-
translational medicine and medical outcome research.
liver advanced data aggregation, data mining and scientific
analysis functionality from a variety of sources, covering a

In consideration of such a technology, it is important that
range of capabilities across multiple scientific disciplines.
there is an open and standards-based SOP that enables the
These components can be joined together to enable
integration and aggregation of diverse scientific data and
workflows that are simple (e.g., a one click check against
applications. Prior to the introduction of this SOP, the ability
Lipinski filters [8]) to extremely complex (e.g., complex
to aggregate disparate tools and data in a single environment
homology modeling or docking workflows).
had not existed. The SOP has broken the ground to move
forward, aggressively capitalizing on the existing need for a
As illustrated in Fig. 1 and described in the following
“plug and play” environment. The SOP enables users to
paragraphs, the power and openness of this architecture al-
readily leverage their preferred technologies, thereby allow-
lows the delivery of truly user-based solutions via:
ing them to meet their own individual needs without the bur-
1.
The ability to select and configure selected methods
den of maintaining “home grown” applications. Users bene-
into user-defined workflows.
fit from this novel “plug and play” environment as they will
2.
The freedom to choose a client interface for creating
no longer be locked into a single vendor relationship and
and deploying those workflows.
they will be able to incorporate “best of breed” compo-
nents—thereby challenging vendors to push the envelope of
3.
The capacity to deliver data and analysis results in
innovation.
interactive reports and dashboards.

















Fig. (1).
A scientific operating platform based on an SOA model enables users to pick “best of breed” components from multiple vendors
and configure the methods and interface to suit their needs. The new approach is highly customizable to the user’s process, enabling the
process to drive the informatics solution and not vice versa. This allows the user to attain a competitive advantage by optimizing their proc-
ess and then layering in the IT solutions.

Changing Paradigms in Drug Discovery
Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1 15
COMPONENTIZATION, WORKFLOWS, AND AUTO-
We developed this protocol in Pipeline Pilot™ to show
MATION
how easily a complex set of computational experiments can
be automated in a workflow environment or made available

We have previously reviewed the concept of developing
in a 3D modeling environment like Discovery Studio®. The
and customizing automated workflows using data pipelining
shown protocol calculates the binding free energy of a ligand
for application in virtual screening [9]. An individual opera-
to its receptor on the basis of well validated and published
tion, executable or an algorithm (e.g., read a record from a
physics based methods such as Molecular Mechanics- Pois-
file, calculate a property, cluster data, sort records) can be
son Boltzmann Surface Area (MM-PBSA) and Molecular
termed as a component. When put together through as pipe-
Mechanics-Generalized Born Surface Area (MM-GBSA)
line, these components make up workflows. A workflow is
[10-12]. This methodology is used for structure based design
the documentation of a data analysis process. Componentiza-
while lead prioritization and optimization, thus allowing
tion of a powerful and diverse range of functionalities on a
modelers to score and rank compounds for synthesis [13].
single platform represents an industry revolution that prom-
ises to improve research efficiency. Users can improve effi-
FLEXIBLE VISUALIZATION
ciency and attain new insights by creating workflows that
not only integrate functionalities previously disconnected

Visualizing 2D or 3D data remains a critical step through
due to vendor incompatibilities, but also transcend traditional
the workflow automation process. Since the components for
scientific discipline barriers. With such new found flexibil-
Accelrys’ SOP can be developed independently from any
ity, users can only be limited by their imagination. This abil-
specific client interface, end users have the freedom to de-
ity to rapidly configure components will also drive new sci-
ploy advanced scientific functionality in a variety of ways.
ence, such as has been achieved through the development of
As such, users can choose an interface that fits their needs.
new scoring methods for docking experiments. In the last
In some cases, this may be a web-based client that enables
few years, workflow tools have been introduced into compu-
easy delivery and access to non-experts, and in other cases it
tational life sciences and have made a significant impact in
will be a configured expert client. An example of the client
the way that cheminformatics, data mining and modeling
interface flexibility is Discovery Studio® (DS), which is a
experiments can be carried out. The power of workflow tools
comprehensive suite of life science modeling and simulation
is to allow the entire process to be defined graphically, cap-
software that is built upon Accelrys’ SOP technology. The
tured, and then executed in an automated fashion from be-
DS interface can be configured into a simplified user inter-
ginning to end. Workflow platforms provide integration tools
face for non-experts or a highly-featured interface for ex-
that allow external codes to be integrated and allow data
perts. For example, a company can configure the same inter-
sources in various formats to be accessed. A workflow that
face into a tool for communicating the results of molecular
connects the various steps in this extremely complex meth-
modeling tasks performed by a Computer Aided Drug De-
odology is illustrated in Fig. 2.
sign group to chemists and biologist. The interface can also
be configured as a simple modeling tool for chemists and
















Fig. (2). A workflow that configures complex science into a simple solution. A Pipeline Pilot™ protocol which uses Molecular Mechanics-
Poisson Boltzmann Surface Area (MM-PBSA) and Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) for structure-based
design to estimate the binding free energy of a ligand to a protein target. This protocol uses components to read and manipulate three-
dimensional molecular input data (proteins and ligands), perform CHARMm force-field based simulations, and report free energy of binding
in an automated and high throughput workflow.

16 Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1
Varma-O’Brien et al.
biologists to use themselves. These two levels of flexibility
within an SBI system and they equip the platform with the
and modular design will allow software development to
scientific intelligence to process these data types. On the
bring new innovations to market faster, with greater ease of
chemistry side, these include types such as molecules, reac-
use and higher quality standards.
tions and chemical queries. On the biology side, the types
include protein and DNA sequences.
INTERACTIVE REPORTING AND DASHBOARDS
Informatics also provides the scientific methods that are

Another advantage of this type of SOP is that it facilitates
key to storing, searching, manipulating and integrating the
the creation of “dashboards”—customized interfaces that
disparate sources of these complex data types. Cheminfor-
provide high-level “views” into critical organizational in-
matics powers chemistry databases, providing search and
formation, which can range from information about the
retrieval methods. Workflow and automation platforms, like
status of a company as a whole to information about a par-
Pipeline Pilot™, provide the capabilities to merge different
ticular research site or project. Often delivered as browser-
chemical database and files together joining and merging on
or PDF-based reports, dashboards are designed to provide
the basis of structure, rather than identifiers. Bioinformatics
direct access to current information in a visually compelling
functions provide equivalent search, retrieval, merge and
form. They give users the power to “drill-down” to the data
manipulation functions on sequence databases and files. Fi-
behind the high-level reports in real-time, enabling them to
nally, the two methodologies can be used together to join
immediately obtain the information they need to investigate
chemical and bioinformatics algorithms into unified che-
questions or problems. Any analyses required to obtain the
mogenomics workflows. Chem- and bioinformatics calcula-
underlying information are set up to be automatically exe-
tors provide annotation of data. Sophisticated data mining
cuted behind the scenes. As a result, end users don’t need to
capabilities in cheminformatics, including Bayesian learning
incur delays or productivity losses by calling on database
and recursive partitioning, provide the data analysis tools
administrators or scientists to obtain the information they
necessary to derive knowledge from the raw data.
need.
EXAMPLES OF SBI IN DRUG DISCOVERY: SOLU-
PUTTING THE “SCIENCE” IN “SCIENTIFIC BUSI-
TIONS FOR ALL LEVELS OF USERS
NESS INTELLIGENCE”: INNOVATION AND AN
ECOSYSTEM OF PARTNERS

With a combination of science and platform flexibility,
an SBI solution can be customized to provide the exact ana-
Creating a true “plug and play” architecture has de-
lytic and reporting functionalities a user needs. For example,
manded that the platform technology respect its users’ deci-
as illustrated in the first case study below, data from corpo-
sions on what is “best of breed” science and let them choose
rate scientific databases can be accessed and presented in one
from proprietary and external technologies. Understanding
way for executives needing to assess the state of the com-
this critical need for this type of flexibility and in order to
pany, and in a different way for managers or scientists look-
facilitate freedom of choice, our infrastructure has fostered
ing to make specific research decisions. Moreover, custom-
an environment of “co-opetition”, in which it continues to
ized solutions can be built with a low burden of time and
innovate in its areas of key strengths, while also establishing
cost because of the openness of the underlying technology
an ecosystem of premier academic groups and Independent
architecture and the availability of a drag-and-drop graphical
Software Vendor (ISV) partners.
interface that extends application development out to a wider
Furthermore, there are financial benefits that arise from
audience, limiting the involvement of IT groups and data-
having an architecture which allows flexibility. The inde-
base administrators, thus improving productivity, decision
pendence that companies can achieve from a “plug and play”
making, and efficiency for all types and levels of users—
environment is truly revolutionizing the way investment is
from executive to scientist.
made in buying software for informatics and modeling.
CASE STUDY 1: SITE MANAGEMENT DASHBOARD
Given a platform as shown in Fig. 1, users can better align
FOR EXECUTIVES
with any changes in a company’s process by modifying just
a fraction of a system, instead of throwing out the entire sys-

Executives, managers and other key decision makers rely
tem. No longer will investments be deemed “once and
on having accurate and up-to-date information at their dis-
done”, increasing the return on investment (ROI) for users
posal. However, this need is often unmet due to: a) The dif-
who invest in the new environment. The ROI will not only
ficulty of accessing necessary data, b) the challenges of
come from faster, better and more cost effective solutions,
gathering complete data and achieving the ability to explore
but also from a more competitive assembly of the applica-
that data through different levels—from high-level overview
tions into workflows that facilitate creativity. In most R&D
to very specific data, c) The high cost of manually creating
IT groups, the main goal is to enhance not just bottom line
reports, d) The slow process of creating reports, which often
efficiencies, but also top line growth through IT strategies.
results in out-of-date information. A critical value of SBI
Now, it is possible to quickly meet the ever changing de-
tools is the ability to present diverse types of data and enable
mands of the scientific community through rapid delivery of
users to drill down (or up) through the data in order to moni-
applications that can be built at the speed of discovery, with-
tor project, site or company performance, or address emer-
out excessive development or legacy support expense.
gent issues as they arise. Specifically, dashboards offer an
interactive means for providing both high-level summary
INFORMATICS AND SBI
data and drilldown capabilities via graphical web or pdf

Cheminformatics and bioinformatics provide data models
pages. Fig. 3 shows an example dashboard that could be used
for some of the fundamental objects that must be processed
by decision makers at a global pharmaceutical company to



Changing Paradigms in Drug Discovery
Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1 17
monitor compound registration performance, anywhere from
for users. In the updated report, the user can easily assess site
the site to the individual level. After the initial set-up of this
performance, such as detecting that the Atlanta site appears
dashboard, end users are empowered to access current data,
to be underperforming in terms of the success rate for com-
without needing a database administrator to execute queries
pound registration (i.e. the bar is now red). To further inves-
against the database. Updating the dashboard is easy, with
tigate this, the user can click the bar/pie slice/or table entry
negligible costs. In Fig. 3, the upper left pane shows the ini-
for Atlanta and drill down to view data for the Atlanta site
tial starting point of the dashboard deployed in a web
(Fig. 3, lower left panel). In this case, the data shown at the
browser. The user simply needs to click a link from an email
site level is similar to the higher-level data that compared all
or web page to get this report. This example shows various
sites (though this need not be the case); however, in the At-
graphical elements and tables that provide a high-level over-
lanta site view, the report compares individual scientists. The
view of the number and success rate for compound registra-
user can now assess the performance of the individual scien-
tion over a six-month period, across seven research sites.
tists and drill down to see individual compounds that were
The use of conditional color-coding on the bar chart and ta-
registered, as well as additional information on these com-
ble highlight sites that do not meet a defined performance
pounds (Fig. 3, lower right panel), each click driving a pro-
threshold and such visual cues are essential for effective in-
tocol that collects fresh data. Overall, the dashboard gives
formation presentation. In this example, to access the most
the user a means for quickly navigating from information at
up-to-date information, a user would click the “Click for six
a high-level (i.e. looking at overall research division per-
month update” link, which retrieves data for the most re-
formance) to detailed information (i.e. looking at data about
cently completed period and repopulates the page (Fig. 3,
individual compounds) -all within a simple point-and-click
upper right panel). The use of a web browser environment
interface.
and hyperlinks to perform actions eliminates learning curve


























Fig. (3). Screen shots of various reports from an interactive site management dashboard. Top left: overview of compounds registered by site
January to June. Top right: six-month update of site performance for July to December. Lower left: Atlanta site details. Bottom right: indi-
vidual scientist report.

18 Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1
Varma-O’Brien et al.
CASE STUDY 2: LEAD MANAGEMENT TOOL FOR
real-time, allowing the molecules to be reassessed immedi-
MANAGERS
ately.
When teams of scientists are working toward a common
CASE STUDY 3: WEB-BASED LIGAND-PHARMACO-
goal, the ability to gather information from multiple users
PHORE PROFILING TOOL FOR CHEMISTS AND
and databases is imperative because it aids in decision mak-
MODELERS
ing and helps prevent duplication of efforts. However, effi-
ciency suffers when scientists on a discovery research team
The SBI concept can be extended to in silico experi-
spend an inordinate amount of time gathering data and pre-
ments. Consider the life science area where we are able to
paring presentation materials for review. As a result, less
address new methods of innovation for several processes,
time is spent actually generating data—the primary role for
including target identification, hit identification, lead optimi-
the scientists.
zation, drugability, IP capture and reporting. When undertak-
ing lead identification, lead optimization and drugability,
Fig.
4 shows an example of a dashboard that could be
molecular modelers and cheminformaticians employ a com-
used by a manager or research team to make decisions about
bination of structure based design, 3D ligand based design
lead candidates. The dashboard draws data from multiple
and QSAR methodologies, but they do so in very different
databases (chemistry, biology, safety), as well as project
ways for each process. For example, when employing SBD
documents containing property criteria and project metadata.
tools for lead identification, one might use a “quick and sat-
It can be generated five minutes before a review session,
isfactory” 3D tool to generate a simple list of samples for
without the involvement of any of the scientists, allowing
screening because screening is relatively inexpensive and
them to work right up to meeting time. During the meeting,
quick. In this case, speed of calculation is more important.
the team can assess each new candidate molecule with the
However, when conducting studies for the lead optimization,
help of conditional color-coding of values that quickly reveal
one would want the most accurate methods available because
molecular properties. With this information, decisions can be
making compounds is very expensive and time consuming.
made on whether to keep a molecule (as a lead, second or
Therefore, in order to support the scientific process, tools
third candidate) or discard it. Decisions on candidate mole-
that cross traditional product lines must be integrated within
cules take immediate effect and the database is automatically
a unified solution, rather than within individual point solu-
updated. Values for the property criteria, and their priority,
tions.
can also be altered at any time; the effects are also applied in






















Fig. (4). Lead management dashboard that draws from multiple data sources and allows users to update property criteria and quickly review,
prioritize or discard leads in real-time.

Changing Paradigms in Drug Discovery
Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1 19
Ligand Profiler [14], a tool recently built in the ligand
the ability to select active compounds from inactive com-
based design arena (Fig. 5) is one of the best illustrations of
pounds from a closely-related chemical series. The example
innovation through such configuration. Pharmacophore
shown in Fig. 5 also demonstrates how dextrous this envi-
modeling and three dimensional database searching for scaf-
ronment truly is and can be done for any number of applica-
fold hopping and lead generation is now considered common
tions, which greatly improves the users’ ability to leverage
technique in drug discovery. In Ligand Profiler the classical
their investment in software in order to better meet their
pharmacophore approach is extended by screening a data-
needs. Integrated solutions such as this are easy to share and
base of pharmacophores in parallel to determine the overall
modify, making them a corporate asset that can be leveraged
retrieval of a compound by different models. This type of
by expert and non-expert users alike. This allows a company
parallel screening determines if the compound maps to
to take advantage of its overall intellectual capital. This ex-
pharmacophore models derived from other targets (issues
ample also illustrates an SBI solution that allows scientists to
with selectivity) and if the pharmacophore database contains
mine data using sophisticated methods and also visualize
models derived from ADME/Tox data, for example, 3D
complex data types, capabilities that are completely out of
pharmacophore models of CYP isoforms [15, 16] can also
the reach of traditional BI tools.
predict if the compound would be problematic as a drug. The

As a direct result from the flexibility to tailor underlying
aim is the fast in silico determination of the biological activ-
complex scientific methods, this simple-to-use tool perfectly
ity profile of a molecule in order to speed up the time and
illustrates the concept of “one click science.” “One click
cost-intensive drug discovery development process and in-
science” allows expert users to tailor complex scientific
crease its efficiency.
methodologies into a solution delivered as a set of
Ligand Profiler brings together 3D ligand based design,
workflows that are shared on a portal and are executed by
reporting and web portal technology. It has shown the ability
end-users through a single click. While the single click could
to accurately select compounds for protein types, as well as
potentially represent hundreds of behind-the-scenes opera-


























Fig. (5).
Left: Graphical user interface of the web-based ligand profiling tool showing screening results for an active compoundset in a de-
tailed pharmacophore profile mode. Left: 2D hit compound structure. Middle: selectable 3D hit compound structure and pharmacophore
mapping. Right: Bar chart showing hit models, their score, and model information.

20 Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1
Varma-O’Brien et al.
tions, to an end-user it is reduced to a simple to interpret
providers and fulfill the SBI needs. An example of this, im-
report, which is the essence of SBI.
plemented on the Pipeline Pilot platform as the ChemMining
collection, is shown in Fig. 6. In this example, documents
CASE STUDY 4: MINING UNSTRUCTURED DATA –
from various sources are read into the workflow. These can
TEXT MINING FOR CHEMISTS TO EXTRACT
include external documents, such as patents, and internal
STRUCTURES FROM DOCUMENTS
documents, and several different document types can be

Organizations such as pharmaceutical and biotechnology
processed, including PDF, HTML and Microsoft® Word
companies have access to a vast amount of unstructured text
documents. The text of the documents is passed through a
content in the form of external patents, journal articles and
component (Identify Chemical Names) that scans the text to
their own internal company documents. This combined con-
find text strings representing chemical names. Types of
tent represents a hugely valuable resource, for many types of
names that can be recognized include IUPAC, SMILES
information, but of critical importance to pharmaceutical and
strings and common/brand names. A combination of linguis-
biotechnology companies is the need to identify and extract
tic rules and an internal dictionary is used for name recogni-
chemical structures found in documents. However, ironi-
tion. However, as is typical with automated text processing
cally, the sheer bulk of this content can be a barrier to its
procedures, this process is not foolproof and can result in
own use, limiting the value that can be extracted.
false-positive names being found. Therefore, the next step in
the workflow is to pass each candidate chemical name to a

For external data sources, such as patents, there are con-
name-to-structure converter. Any name that is successfully
tent providers that use either manual or automated processes
converted to a structure is considered “valid” while those
to curate the documents and provide the content as a service
that do not convert are sent to the red “Fail port” on the
to subscribers. This can be extremely valuable but has limita-
component (these failures can be collected up for further
tions. The cost of accessing data is often high, preventing the
analysis). There are several name-to-structure converters
sort of exploration of the content landscape that is needed to
available to users, and most of these have been integrated
address some questions. Also, since the data are all preproc-
onto the Pipeline Pilot™ platform [17, 18]. The converted
essed, there are often limitations on what data are available
names, along with the document in which they were found,
and in what formats. Furthermore, the end-user has no input
can then be further processed to meet the particular require-
into how the documents are processed and how ambiguous
ments of the task, such as:
chemical names should be handled. Finally, there is no such
service for internal documents, unless a given company has
o
Process one or a few documents to characterize the
undertaken the task of processing these documents. These
structures contained in the documents, for example to
limitations argue for a more flexible solution to process
determine the coverage of structures from a set of
documents and extract knowledge of interest.
patents of interest.

Exploring the content of internal and external documents is
o
Process a large set of documents and construct a da-
not just a scientific need, but also an important SBI undertak-
tabase that allows users to search with a structure
ing, that allows users to survey the intellectual property land-
query (e.g., similarity and/or substructure), and re-
scape of competitors, in the case of external documents, and to
trieve documents containing matching structures.
improve organizational knowledge and reduce inefficiencies
In both of these cases, additional content (such as bio-
due to duplication of efforts, in the case of internal documents.
logical molecules, disease processes) can be extracted from
A modular approach to identifying and extracting infor-
the documents, to place the structures found in the wider
mation, such as chemical structures, from unstructured text
context of the article, a critical value [19].
documents can address many of the limitations of content













Fig. (6).
Pipeline Pilot protocol showing the modular approach to extracting chemical names from text documents. Documents from a variety
of sources, and in multiple formats, can be processed to identify candidate chemical names that are then converted to structures. The struc-
tures can be viewed immediately in a report or stored to a database for future querying.

Changing Paradigms in Drug Discovery
Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1 21
























Fig. (7). Example results of searching structure and text databases to retrieve documents containing structures that match, in this case, a sub-
structure query.

This modular approach to chemical text mining has many
CONCLUSION
benefits including:

The scientific and clinical research communities stand to
o
The ability to process and extract knowledge from
improve productivity, decision making and research effi-
both external documents (many of which are freely
ciency by leveraging SBI technologies. The technology de-
available) and internal documents, using the same
scribed in this review coupled with well validated scientific
process.
modeling, simulation and informatics technologies can de-
o
The ability to do a very targeted analysis of a few
liver SBI solutions as required by any company’s infrastruc-
documents or a widespread exploratory search.
ture.
o
The ability to create databases of structures, and the
ACKNOWLEDGEMENTS
documents where they were found, for a sizeable but

We are very thankful to William Taylor for his direction
targeted set of documents, such as G-protein coupled
and vision regarding Scientific Business Intelligence and
receptor (GPCR) inhibitors, or drugs for cardiac dis-
gratefully recognize Kathleen Los for her critical review of
eases.
the manuscript.
o
Having both the documents and structures (and any
ABBREVIATIONS
other relevant content) available in a flexible form
that can be manipulated for further analyses (Fig. 7).
BI =
Business
Intelligence
In summary, a modular, workflow-based approach to
SBI =
Scientific
Business
Intelligence
chemical text mining can assist pharmaceutical and biotech-
SOP
=
Scientific Operating Platform
nology companies leverage the vast amount of unstructured
text content that is available to them, thereby fulfilling criti-
SOA
=
Service Orientated Architecture
cal needs in SBI.
ROI
=
Return on Investment

22 Current Computer-Aided Drug Design, 2008, Vol. 4, No. 1
Varma-O’Brien et al.
REFERENCES
[10]
Massova, I.; Kollman, P. Perspect. Drug Discov. Des., 2000, 18,
113.
[1]
Schlegel, K.; Hostmann, B.; Bitterer, A. “Magic Quadrant for
[11]
Kuhn, B.; Gerber, P.; Schulz-Gasch, T.; Stahl, M. J. Med. Chem.,
Business Intelligence Platforms, 1Q07.” Gartner RAS Core Re-
2005, 48, 4040.
search Note G00145507. January 26, 2007.
[12]
Kuhn, B.; Gerber, P.; Schulz-Gasch, T.; Stahl, M. J. Med. Chem.,
[2]
Brown, F.K. "Chapter 35. Chemoinformatics: What is it and How
2005, 48, 4040.
does it Impact Drug Discovery" Annual Reports in Med. Chem.,
[13]
http://www.accelrys.com/info/charmm_sbd/charmm_forcefield_sb
Ed. James A. Bristol, 1998, 33, 375.
d_solutions.html.
[3]
Bull, A.T.; Ward, A.C.; Goodfellow, M. Microbiol. Mol. Biol. Rev.,
[14]
Steindl, T.M.; Schuster, D.; Laggner, C.; Chuang, K.; Hoffmann,
2000, 64, 573.
R.D.; Langer, T. J. Chem. Inf. Model., 2007, 47, 563.
[4]
Scientific Business Intelligence has been trademarked by Accelrys.
[15]
Ekins, S.; Bravi G.; Wikel J. H.; Wrighton S.A. J. Pharmacol. Exp.
[5]
http://www.inforsense.com/.
Ther., 1999, 291, 424.
[6]
http://www.knime.org/.
[16]
Ekins, S.; Stresser, D.M.; Williams, J.A. Trends Pharmacol. Sci.,
[7]
http://taverna.sourceforge.net/index.php.
2003, 24, 161.
[8]
Lipinski, C.A. J. Pharmacol. Toxicol. Methods, 2000, 44, 235.
[17]
Advanced Chemistry Development Labs, http://www.acdlabs.com/.
[9]
Brown, R.D.; Varma-O’Brien, S.; Rogers, D. QSAR and Comb.
[18]
CambridgeSoft, http://www.cambridgesoft.com/.
Sci., 2006, 12, 1181.
[19]
Banville, D.L. Drug Discov. Today, 2006, 11, 35.


Received: July 16, 2007
Revised: September 5, 2007
Accepted: September 10, 2007




Download
Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions as:

From:

To:

Share Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Changing Paradigms in Drug Discovery : Scientific Business Intelligence and Workflow Solutions as:

Copy html code above and paste to your web page.

loading