--GENERAL INFORMATION-- TITLE: Chemically Standardized Dataset of 512 Kinases for Statistical Modeling CONTRIBUTOR: Leiden Academic Centre for Drug Research, Leiden University This dataset was compiled from publically available compound sets and filtered and standardized for statistical modeling purposes. --METHOD-- Data for this dataset was derived from ChEMBL (version 23), Eidogen, and ExCAPE-DB. The following filters were applied: molecular weight < 700 and duplicates filter. The compounds were standardized using BIOVIA Pipeline Pilot 2016: salts were removed, largest fragment was kept, stereochemistry and pi-systems were standardized, and charges were neutralized. --DATA SPECIFIC INFORMATION-- The dataset contains the following properties/columns: Interaction_ID - The identifier representing the interaction that was measured (protein-compound) InChIKey - The compound identifier md5 - The protein identifier Active - Binary (0/1), with value 1 if pchembl >= 6.5 Uniprot - The identifier for the protein as published on Uniprot parent_Uniprot - The identifier for the parent protein as published on Uniprot Cluster - The cluster number (1-5) that assigns the compound to its chemical cluster