--GENERAL INFORMATION--
TITLE: Chemically Standardized Dataset of 512 Kinases for Statistical Modeling
CONTRIBUTOR: Leiden Academic Centre for Drug Research, Leiden University 

This dataset was compiled from publically available compound sets and filtered and standardized for statistical modeling purposes. 

--METHOD--
Data for this dataset was derived from ChEMBL (version 23), Eidogen, and ExCAPE-DB. The following filters were applied: molecular weight < 700 and duplicates filter. The compounds were standardized using BIOVIA Pipeline Pilot 2016: salts were removed, largest fragment was kept, stereochemistry and pi-systems were standardized, and charges were neutralized. 

--DATA SPECIFIC INFORMATION--
The dataset contains the following properties/columns:
Interaction_ID - The identifier representing the interaction that was measured (protein-compound)
InChIKey - The compound identifier 
md5 - The protein identifier
Active - Binary (0/1), with value 1 if pchembl >= 6.5
Uniprot - The identifier for the protein as published on Uniprot
parent_Uniprot - The identifier for the parent protein as published on Uniprot
Cluster - The cluster number (1-5) that assigns the compound to its chemical cluster

