In silico target prediction for protein-protein interaction (PPI) small molecule inhibitors using CGBVS
(CGBVSによるPPI小分子阻害剤のin silicoターゲット予測)

This is a poster paper we have presented at the 37th Medicinal Chemistry Symposium held in Hachioji, Tokyo Japan from the 27th to 29th of November 2019. The PDF of the poster can be accessed through the link at the end of the blog.

The paper is about the use of Chemical Genomics-Based Virtual Screening (CGBVS) to find compounds in the ChEMBL25 database having potential inhibitory activity against protein-protein interaction.

What is CGBVS anyway?

CGBVS is a machine learning-based method for predicting the activity of a compound based on the binding pattern obtained from the interaction information (chemical genomics information) between the protein (biological space) and the compound (chemical space).

There are huge amounts of chemical genomics information that can be obtained from public databases, such as ChEMBL and Uniprot, that can be used as training data to create predictive models. The concept of the technique is summarized in the following figure.

CGBVS was developed in the lab of Professor Yasushi Okuno during his tenure at the Graduate School of Pharmaceutical Sciences at Kyoto University. He has since moved to the Graduate School of Medicine at the same University. His expertise was mainly in the field of bio- and chemoinformatics in which he has produced many publications.

The steps leading to the creation of predictive models are illustrated in the figure below. It is worth noting that the technique utilized support vector machines (SVM) which is still considered to be one of the reliable approaches in the field of chemoinformatics.

There are currently 8 available predictive models that are used to screen against protein targets corresponding to GPCRs, kinases, ion channels, nuclear receptors, proteases, transporters, cytochrome P450 and PPI related proteins. Cytochrome P450 and PPI models were only recently added to the list, that is, after this poster paper was presented in the recent Medicinal Chemistry Symposium.

These models are created requiring only 2d compound descriptors, protein descriptors, and compound-protein interaction data. Compound descriptors are generated using the application alvaDesc, which is the successor to the widely used DRAGON application. On the other hand, the protein descriptors are generated using the PROFEAT 2016 web server. The compound protein interaction data were mainly obtained from the ChEMBL25 database but for the purpose of this current research, the PPI predictive model we have created was based on the TIMBAL database which is a database mainly catering to PPI data.

The TIMBAL database is basically a mixture of different types of assay data pertaining to protein-protein interaction in which activity values are presented as IC50, Ki, Kd, or %Inh. Activity units were presented as % or molarity units (mM, uM, nM, pM). We have only selected data whose activity values are presented in molarity units and converted all values to uM. For the creation of training data sets, we have set positive and negative cutoffs at <=10 uM and >=15 uM, respectively. All in all, the total number of interaction data was 8,602 for positive and 1,680 for negative.

The following figure shows the flow for the creation of the PPI model.

Based on the target protein list we extracted from the TIMBAL database, we obtained compounds from the ChEMBL25 database that have shown activities against the targets in the abovementioned protein list. We selected only the compounds that are present only in the ChEMBL25 database, that is, we removed the compounds that are found to be existing in the training data used to create the PPI model. In total, we have obtained 67,422 compounds which we screened against the PPI model.

CGBVS scores are, generally, values between 0 and 1, and values greater than or equal to 0.5 are considered to be positive or, simply put, they have potential activity against the corresponding target protein. The table below lists the number of positive compounds for each target protein. Although there were 71 target proteins registered in the PPI model, only 46 of them were present in the training data, hence, CGBVS calculation included only those 46 proteins.

Target ProteinNo. of Compounds with activity
ITB3_HUMAN7655
BCL_HUMAN7106
BRD4_HUMAN5043
B2CLI_HUMAN4986
MDM2_HUMAN4848
ITB1_HUMAN4824
ITA4_HUMAN4009
ITA2B_HUMAN3281
ITB2_HUMAN3141
BAD_HUMAN2769
ITB7_HUMAN2126
TTHY_HUMAN2084
PPIA_HUMAN1956
HIF1A_HUMAN1884
ITB5_HUMAN1491
FKB1A_HUMAN1262
XIAP_HUMAN1166
CTNBI_HUMAN1159
BRD2_HUMAN715
ITA5_HUMAN569
ITB6_HUMAN537
TNFA_HUMAN195
MEN1_HUMAN122
TAB1_HUMAN118
IL2_HUMAN82
PPIB_HUMAN81
STAT3_HUMAN72
RAD51_HUMAN71
VEGFA_HUMAN57
S10AA_HUMAN51
ANXA2_HUMAN476
BRDT_HUMAN32
NRP1_HUMAN32
MDM4_HUMAN28
KEAP1_HUMAN20
TF7L2_HUMAN20
S100B_HUMAN18
MED23_HUMAN5
ELF3_HUMAN4
ITA2_HUMAN3
TNR5_HUMAN3
XPO1_HUMAN3
RAC1_HUMAN2
RASK_HUMAN2
CLH1_HUMAN0
MYC_HUMAN0

Among the compounds having positive results, we presented 4 of them showing activity against Bcl-2 (BCL2_HUMAN) and Bcl-xl (BAD_HUMAN) proteins. IC50 values are obtained from the ChEMBL25 database. I would like to emphasize that Venetoclax has been in the drug market for a while and is known to be an active Bcl-2 inhibitor. It is generally used to treat adult patients with chronic lymphocytic leukemia (CLL).

We have shown that even though the training data did not include compounds from the ChEMBL25 database, the PPI model showed sensitivity with respect to Bcl-2 protein.