23 - 24 October 2018
Hotel Palace Berlin,
Berlin

Fill out this quick and easy form to download this exclusive article

This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even for small fragments with only two heavy atoms.

The method is based on a word embedding algorithm borrowed from natural language processing field, and approximately 6 million unlabeled PubChem chemicals were used for training. The resulting dense fragment vectors are in contrast to the traditional sparse “one-hot” fragment representation and capture rich relational structure in the fragment space. The vectors of small linear fragments were averaged to yield distributed vectors of bigger fragments and molecules, which were used for different tasks, e.g., clustering, ligand recall, and quantitative structure−activity relationship modeling. The distributed vectors were found to be better at clustering ring systems and recall of kinase igands as compared to standard binary fingerprints.

This work demonstrates unsupervised learning of fragment chemistry from large sets of unlabeled chemical structures and subsequent application to supervised training on relatively small data sets of labeled chemicals.

Teaching computers a little bit of chemistry by showing them a large number of chemical structures

SUMAN CHAKRAVARTI
Vice President, Chief Scientific Officer at Multicase

Terms and Conditions | Privacy Policy

Country *

Salutation (Mr., Mrs., Dr., etc.)

First Name *

Surname *

Email *

Phone Number

Job Title

Company Name

Company Address

City

State/Province

Postal/ZIP code *

Please tick these boxes if you do not wish to receive marketing information relevant to you from KNect365 Life Sciences

via Email

via Telephone

via Post

Please tick this box if you do not wish to be included on our third party mailing list to receive any related marketing communications from our third party partners

I am interested in sponsoring or exhibiting

I am interested in speaking

I am interested in attending

DOWNLOAD

Teaching computers a little bit of chemistry by showing them a large number of chemical structures

SUMAN CHAKRAVARTIVice President, Chief Scientific Officer at Multicase

SUMAN CHAKRAVARTI
Vice President, Chief Scientific Officer at Multicase