Assalamualaikum dear brothers and sisters!
It has been a tiring week since the final examination is around the corner. But giving up is not in our lives' dictionary.
Today's post, maybe last maybe not is about SMILES.
SMILES
An abbreviation of Simplified Molecular-Input Line-Entry System. A specification in form of a line notation for describing the structure of chemical structures using short ASCII strings. It can be imported by most molecule editors for conversion back into two-dimensional or three-dimensional models of the molecules. It was initiated by the author David Weininger at the USEPA Mid-Continent Ecology Division Laboratory in Duluth in 1980s. (wikipedia)
Canonical SMILES
The SMILES format is a linear text format which can describe the connectivity and chirality of a molecule. Canonical SMILES gives a single 'canonical' form for any particular molecule.
Isomeric SMILES
The version of the SMILES specification that includes extensions to support the specification of isotopes, chirality and configuration about double bonds.
Applications of SMILES
No.
|
Applications
|
Explanations and Examples
|
||
1
|
SMILES Bond
|
SINGLE ( - )
DOUBLE ( = )
TRIPLE (# )
AROMATIC (
* )
For example, Ethene C=CChloroethene ClC=C 1,1-Dichloroethene ClC(Cl)=C
cis-1,2-Dichloroethene ClC=CCl Trichloroethene ClC(Cl)=CCl Perchloroethene
ClC(Cl)=C(Cl)Cl ,
for cyclohexane and dioxane can be written as C1CCCCC1 and O1CCOCC1
|
||
2
|
SMILES Aromaticity
|
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's'
and 'n' respectively. Benzene,pyridine and furan can be represented respectively by the SMILES c1ccccc1, n1ccccc1
and o1cccc1. Bonds between aromatic atoms are, by default, aromatic although
these can be specified explicitly using the ':' symbol. Aromatic atoms can be
singly bonded to each other and biphenyl can be represented by c1ccccc1-c2ccccc2. Aromatic nitrogen
bonded to hydrogen, as found in pyrrolemust be represented as [nH] and imidazole is written in SMILES notation as n1c[nH]cc1.
|
||
3
|
SMILES Isotopes
|
Isotopes are specified with a number equal to the integer isotopic mass
preceding the atomic symbol. Benzene in which one atom is carbon-14 is written as [14c]1ccccc1 and deuterochloroformis [2H]C(Cl)(Cl)Cl.
|
||
4
|
Smiles Branches
|
Represented by enclosure in parentheses Can be
nested or stacked Examples: CC(O)CC is 2-Butanol OCC(C)C is iso-Butanol
OC(C)(C)C is tert-Butanol A branch cannot begin a SMILES notation A branch
cannot immediately follow a double- or triple-bond symbol Example: C=(CC)C is
invalid, but C(=CC)C or C(CC)=C are valid SMILES
|
||
5
|
SMILES Symbols
|
String of alphanumeric characters and certain
punctuation symbols Terminates at the first space encountered when read left
to right The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I
|
||
6
|
Other SMILES Atoms
|
Aliphatic or nonaromatic carbon: C Atom in aromatic
ring: lowercase letter Designate ring closure with pairs of matching digits,
e.g. c1ccccc1 is Benzene, whereas C1CCCCC1 is Cyclohexane
|
||
7
|
SMILES Charges
|
Specify attached hydrogens and charges in square
brackets Number of attached hydrogens is the symbol H followed by optional
digit Examples: [H+] proton [OH-] hydroxyl anion [OH3+] hydronium cation
[Fe++] iron(II) cation
|
||
8
|
SMILES Cyclic Structures
|
Break one single or one aromatic bond in each ring
Number in any order –.Designate ring-breaking atoms by the same digit
following the atomic symbol. Numbers indicate start and stop of ring Same
number indicates start and end of the ring, entered immediately following the
start/end atoms Only numbers 1 – 9 are used. A number should appear only
twice Atom can be associated w. 2 consecutive numbers, e.g., Napthalene:
c12ccccc1cccc2
|
||
9
|
SMILES Conventions
|
Avoid two consecutive left parentheses if possible.
Strive for the fewest number of possible branches Tautomeric bonds are not
designated; enter the appropriate form
|
||
10
|
SMILES Fragments
|
Nitro N(=O)(=O), Nitrate ON(=O)(=O), Nitrite ON(=O),
Sulfonic acid S(=O)(=O)O, Cyanide/Nitrile C#N, Azide N=N#N, Azido N+=N-
|
||
11
|
SMILES Metals
|
[Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K]
[Li] [Mg] Na] [Ni] [Pt] [Sb] [Sn] [Zn] [Zr]
|
Examples of Molecular Structure and SMILES Notations
c1ccccc1 C1=CC=CC=C1 | benzene | |
c1ccc2CCCc2c1 C1=CC=CC(CCC2)=C12 | indane | |
c1occc1 C1OC=CC=C1 | furan | |
c1ccc1 C1=CC=C1 | cyclobutadiene |
We hope this information is suffice for you to refer. Thanks for lending your time here.
Have a good day, XOXO.
No comments:
Post a Comment