This is not the document you are looking for? Use the search form below to find more!

Report home > Science

SMILES CHEMICAL REACTION DATABASE

0.00 (0 votes)
Document Description
The SMILES Chemical Reaction Database is a set of files containing structural information about pairs of reactants and products of two million different chemical reactions. The simplified molecular-input line-entry system (SMILES) of representing molecular structures is used to represent molecular connectivity and stereochemical relationships as strings of characters, and indeed chemical reactions as well. The SMILES Reaction Database is now 186.8 MB in size, and it contains two million reactant-product pairs extracted from thousands of respected journals and patents, contained in six files.
File Details
  • Added: April, 16th 2012
  • Reads: 204
  • Downloads: 2
  • File size: 179.58kb
  • Pages: 4
  • Tags: smiles chemical reaction database, treasure trove of mathematics, chemistry database, reaction examples, chemical reactions, reaction database, chemical reaction database, chemistry examples
  • content preview
Submitter
Embed Code:

Add New Comment




Related Documents

SMILES Chemical Reaction Database

by: TTM, 4 pages

The SMILES Chemical Reaction Database is a set of files containing structural information about pairs of reactant(s) and product(s) of two million different chemical reactions extracted from ...

Chemical Reaction Database

by: TTM, 4 pages

The SMILES Chemical Reaction Database is a set of files containing structural information about pairs of reactant(s) and product(s) of two million different chemical reactions extracted from ...

قاعدة بيانات التفاعلات الكيميائية SMILES

by: TTM, 4 pages

SMILES قاعدة بيانات التفاعلات ...

תגובה כימית Smiles מסד נתונים

by: TTM, 4 pages

תגובת חיוכים מאגר כימית היא ...

SMILES केमिकल रिएक्शन धन्यवाद

by: TTM, 4 pages

मुस्कान रासायनिक रिएक्शन ...

SMILES 화학 반응 데이터베이스

by: TTM, 4 pages

스마일 화학 반응 데이터베이스는 존경받는 저널과 특허의 ...

化學反應數據庫

by: TTM, 4 pages

SMILES ...

solution manual Essentials of chemical reaction engineering h. Scott fogler

by: castsmtb, 14 pages

solution manual Essentials of chemical reaction engineering h. Scott fogler Exercises in physical geology, 12/e w. Kenneth hamblin james d howard im I HAVE THE FOLLOWING SOLUTIONS MANUALS & ...

Content Preview
Treasure Trove of Mathematics
SMILES CHEMICAL REACTION DATABASE
LINKS
WELCOME
Pricing Information
The SMILES Chemical Reaction Database is a set of files containing structural information about pairs of
Sheet make_na server
reactant(s) and product(s) of two mil ion different chemical reactions. The simplified molecular-input line-entry
Purchasing
system (SMILES) of representing molecular structures is used to represent molecular connectivity and
Information
stereochemical relationships as strings of characters, and indeed chemical reactions as wel . These SMILES string
The Wolfram
representations inspired the creation of machine learning computer programs that learn the input/output relationship
Functions Site
Legal (c) Notice
that exists between reactant space and product space, using novel string transformation algorithms (implemented
Biophysics Software
within the book A New Kind of Chemistry (c) 2012, scheduled to be released in the Fal of 2012 on Amazon.com,
SMILES Reaction
using the Mathematica programming language).
Database
ChemAxon
Applications: Chemical Reaction Outcome Prediction, QSARs and Retrosynthetic Analysis.
CFTR genomics
Gene Therapy Net
As a demonstration of the use of SMILES strings to represent the connectivity and steric geometry of chemical
NCBI
structures and reactions, and the utility of the machine learning technique, consider the fol owing two verified results
database
which were correctly predicted by a mathematical model derived from a dataset of 100,000 reactions (of which
these two reactions were excluded) possessing reactant profiles (structural and stoichiometric) somewhat similar
ChemSpider
Database
(very similar cases were excluded for purposes of testing) to each of the novel test cases:
CFTR wiki
Genes and Disease
Mathematica 8
[O-]S([O-])(=O)=O.CCCCc1ccc(CCCC)c(c1)[N+]#N.CCCCc1ccc(CCCC)c(c1)[N+]#N.OS(O)(=O)=O>>CCCCc1ccc2CCC(C)c2c1
Docs
BLOGS
Treasure Trove of
Mathematics
BOOK
PREVIEWS

The Gamma Function
OUR
BOOKSTORES

[H][C@@](OP(Oc1c(C)cccc1C)c1ccccc1)(c1ccnc2ccccc12)[C@@]1([H])C[C@@]2([H])CCN1C[C@]2([H])C=C>>[H][C@@]
(OP(c1ccccc1)c1c(C)cc(C)cc1C)(c1ccnc2ccccc12)[C@@]1([H])C[C@@]2([H])CCN1C[C@]2([H])C=C
The Gamma Function
Questions?
Comments?
Email Us

Of course, the machine learning technique is equal y applicable to retrosynthetic analysis - having a target product in mind,
one is able to predict the structure of successful starting materials for the prior synthetic step. Many tentative starting
materials, or leads, for a synthetic step can be obtained by computing different predictive models, themselves obtained by
basing each of the new models on different subsets of the database. Such subsets can be chosen on some selection
criteria, or randomly, but in this case each training subset must be entirely composed of reactions having unique sets of
reactants to avoid multivalued data.
Reaction prediction is a one-to-one (1:1) relationship whereas retrosynthetic analysis concerns a one-to-many (1: M)
relationship. In the case of retrosynthetic analysis, this situation is dealt with by decreasing the size of the training data set to
the point where the resulting model makes incorrect suggestions a good fraction of the time. Having not incorporated a
significant amount (and possibly type) of knowledge from the database, the model has room to get creative sort of speak.
Yet by subsequently running the results through a wel -trained reaction prediction model, we borrow back definitiveness, and
thereby confirm whether the suggested reactions are feasible or not.
Machine learning of chemical reactions can be distinguished from the more orthodox approaches in three very important
ways: First, the work is entirely non-reductionist, explaining chemical reactivity not as the result of the behaviors of the
constituent subatomic particles, but rather as the result of higher mathematical conservation laws.
To understand why conservation laws, which represent mathematical symmetries, are used consider any set of non-col inear
data points in the Cartesian plane. The number of possible curves which could pass through those data points is infinite. It is
highly presumptuous and almost certainly in error to naively assume that a smooth curve connecting the data points would
represent the intermediate points correctly given an arbitrary curvy data set. Data fitting, which in essence even includes
techniques such as neural networks, in and of itself simply cannot be used to generalize data generical y. The fact remains
that at least one condition must be applied to the curve which would distinguish the curve as the solution. And this requires
prior knowledge of a model. Data fitting, in any form, is only properly used to tweak the parameters of a model, not to derive
a model. This is a very common oversight that plagues much research in the field of computational intel igence.
In this work, we instead search for what is mathematical y conserved to within a proportionality factor. The mathematical
conservation law H is isomorphic to the linear relationship y=bx, such that H(m(D ))=H(m(D )) where the D are empirical
i,2
i,1
i,j
data points, 1 is a proportionality factor and m(*) is the chemical metric. Given that the space is discrete and finite, we may
legitimately conclude, under the conditions of a sufficiently simple function H, sufficiently large i, and a wel -chosen metric,
that a mathematical conservation law has been determined, and that the values of the novel points [H(m(d )),H(m(d ))]
r,1
r,2

between the empirical points [H(m(D )),H(m(D ))] also lie along the straight line connecting the empirical points. The map
i,1
i,2

can then be considered completed and the d can be numerical y solved for. The whole point of linearization is that there
r,2
are aleph-2 possible different curves, a bigger infinity than that of the set of real numbers, aleph-1. But the set of linear rays
bound to a particular point is aleph-1, depending only upon the real value of .
H is searched for through a process of evolution. Random functional forms are generated, put through rounds of crossover,
mutation, simplification and selection. Both task performance and functional simplicity are applied as selective pressures.
Simplicity is sought such that we find true conservation functions. An unreasonable effectiveness of the function at task
completion is the goal.
When we apply our mathematical model-building technology to the mathematical analogue of the SMILES Reaction
Database or any subset thereof, we are applying the very same logic to a subset of chemical space - the discrete space of
al molecular structures.

The second distinguishing factor is that the high-level mathematical conservation laws we use to predict reactions are based
directly upon:
* Experimental reaction data - the reaction database stores two mil ion reaction strings.
* Unique string representations of chemical graphs -- SMILES.
* Unique, uniformly-sized, order-dependent and reversible mathematical representations of strings as the product of
matrix (non-commutative) multiplication using a character-to-matrix substitution.
* Data splicing - defined as data fusion through the discovery of mathematical conservation laws.
* Evolution of simplest possible function H is key.
* H is a scalar function, while m is a matrix function.
* The functional form of H is dependent upon the functional form of m, the value of and the Di,k.
* Chemical metric - a scalar-valued matrix function based on an advanced theory of prototypicality.
Since the strings are represented by matrices while m(*) is a scalar, we are essential y assigning multidimensional data
points to points on the real line. This does not lead to the assignment of more than one multidimensional data point to a
single point on the real line. In fact the size of the infinity representing al of the points in the plane and the size of the infinity
representing al of the points on the real line are the same. Thus unique assignments of al n-dim data points to points on the
real line are possible, which is provable. Take a point on a two-dimensional plane (x,y). We can take the digits which we
would use to write down x and y and simply interleave them. This interleaving technique results in a real number for every
possible point, and no two points on the plane map to the same number. This same argument can be extended to any
number of dimensions, as long as we have a finite number of dimensions. The concept of dimension has no effect on the
size or cardinality of an infinite space; dimensions are cardinal y meaningless. Yet here we are dealing with a discrete
hypervolume, a countable infinity if the whole volume is considered, but in this case - a very large finite number. The total
number of possible smal organic molecules alone that populate 'chemical space' has been estimated to exceed 1060.
Reaction space is thus unfathomably large, yet finite.
The third distinguishing factor is that the machine learning technique is both more definitive, more efficient and more capable
than the traditional approaches when applied to chemical reaction questions. For example, traditional quantum reactive
scattering calculations are typical y limited to reactions involving less than six atoms to within any degree of accuracy.
Reactive scattering problems involving more than six atoms become effectively intractable due to the combinatoric increases
in the number of operations that must be performed on the mathematical objects inherited from quantum theory to get at a
reasonable answer.
String transformations have many valuable applications in mathematics and physics as wel (for example, the formal
technique known as term rewriting is used in the field of computer algebra systems).
ABOUT THE SMILES REACTION DATABASE
In 2007, rapid work at TTM began on the assemblage of a human-reviewed chemical reaction database, soon after the
development of the supporting image knowledge-extraction and spidering software was final y achieved. The SMILES
Reaction Database is now 186.8 MB in size, and it contains two mil ion reactant-product pairs extracted from thousands of
respected journals and patents, contained in six files. The reaction data entries in each file of the database occur on
consecutive lines of the file, which are delineated by newline characters.

OBTAINING THE SMILES REACTION DATABASE
Legal Notice
Pricing Information Sheet

Purchase an immediate download:
1 2 3 4 5 6 7 8 9 10
(Select purchasing option)
You may download a maximum of three times, so please save your files to a removable disc and store it safely.
Questions or Comments? Email Us
SMILES Reaction Database is Copyrighted (c) 2012 by Treasure Trove of Mathematics. All rights are reserved worldwide.

Document Outline


Download
SMILES CHEMICAL REACTION DATABASE

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share SMILES CHEMICAL REACTION DATABASE to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share SMILES CHEMICAL REACTION DATABASE as:

From:

To:

Share SMILES CHEMICAL REACTION DATABASE.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share SMILES CHEMICAL REACTION DATABASE as:

Copy html code above and paste to your web page.

loading