Computer-aided drug design to discover DNMT inhibitors from phytochemicals

: Inhibitors of DNA methyltransferase (DNMTs) are now a major family of epigenetic targets with therapeutic interest. However, only two cytosine analogues 5-azacytosine (azacytidine) and 20-deoxy-5-azacytidine (decitabine), have been approved as the most cutting-edge medications for treating epigenetic cancer with some restrictions. In this context, computational methods that rely on quantitative structure-activity relationship (QSAR) play a crucial role allowing us to predict the biological activity of potential molecules based on the theoretically calculated physicochemical properties of these compounds. When coupled with machine learning (ML), QSAR approaches create an ideal platform for discovering potential drug candidates. In this study, three Machine Learning (ML) models; Random Forest, Support Vector Machine, and Artificial Neural Network, were trained using modified TeachOpenCADD KNIME workflows and applied it to the identification of plant molecules that are structurally similar to the active pharmaceuticals of current DNMT inhibitors. Then molecular docking simulations were performed using AutoDock Vina, employing two human DNMT structures (PDB codes: 4WXX and 2QRV) as target proteins and the predicted phytochemicals as ligands. Additionally, we focused on the R882H mutation hotspot in the catalytic domain of DNMT3A, which is associated with aberrant DNA methylation in acute myeloid leukemia (AML). Consequently, the structure of R882H DNMT3A (PDB code: 6W8J) was docked with the identified novel ligands. As a result of our computational analysis, eight phytochemicals were predicted as potential DNMT inhibitors through the ML approaches from KNIME. Subsequently, three of these phytochemicals, namely Herbacetin, Kaempferide, and Morin were identified as virtual hits against DNMTs following the molecular docking simulations. Overall, our study demonstrates the effectiveness of this computational strategy in identifying DNMT inhibitors. These findings hold promise for the discovery of potent and selective anticancer drugs targeting DNMTs.


INTRODUCTION
DNA methyltransferases (DNMTs) are epigenetic enzymes that methylate cytosine at the C5 position, playing a vital role in cell differentiation and development.(Cheng & Blumenthal, 2008;Goll & Bestor, 2005;Hermann et al., 2004).The three major DNMTs involved in this process are DNMT1, DNMT3A, and DNMT3B.DNMT1 is primarily considered a maintenance DNA methyltransferase responsible for maintaining CpG methylation patterns, playing important roles in embryonic development and the survival of somatic cells (Li et al., 2016).DNMT3A and DNMT3B are classified as de novo methyltransferases that play essential roles in establishing DNA methylation patterns during gametogenesis and early development, contributing to the establishment of embryonic methylation patterns (Okano et al., 1999).DNMTs are large multidomain proteins, consisting of a catalytic C-terminal methyltransferase domain responsible for methylating DNA and a complex N-terminal part containing diverse targeting and regulatory functions (Cheng & Blumenthal, 2008).DNMTs recognize flipped-out cytosines within doublestranded DNA and operate via the nucleophilic attack mechanism (Jones & Liang, 2009).In this mechanism, the unstable methyl group from S-adenosylmethionine (SAM) is transferred to the C-5 atom of cytosine, leading to the formation of 5-methylcytosine (Pfaffeneder et al., 2011).De-regulation of the DNMTs has been shown in many types of cancer including the lung, breast, stomach, and colon, as well as in leukemia (Chik et al., 2011;Chik & Szyf, 2010;Gnyszka et al., 2013).In cancer cells, DNMTs can be overexpressed, leading to hypermethylation of tumor-suppressor genes, silencing their expression and promoting tumor growth.Conversely, DNMTs can also be downregulated, resulting in global hypomethylation and genomic instability (Delpu et al., 2013).
One of the benefits of epigenetic alterations, in contrast to genetic mutations, is their potential reversibility (Esteller, 2011).Certain drugs like azacytidine and decitabine have been developed as epigenetic therapies, targeting DNMTs to reverse abnormal DNA methylation patterns (Jones & Taylor, 1980).These drugs work by inhibiting DNMT activity, leading to the reactivation of silenced genes and restoring normal cellular functions.In addition to synthetic drugs, natural products, such as phytochemicals derived from plants, have also been identified as DNMT inhibitors (Saldívar-gonzález et al., 2018).These natural compounds offer the advantage of being relatively more readily available and often exhibit lower toxicity compared to synthetic compounds, making them promising candidates for developing DNMT-targeted therapies with potentially fewer side effects (Mund et al., 2016;Malongane et al., 2017).The discovery of such agents provides opportunities for developing novel epigenetic-based treatments and advancing the field of personalized medicine.
For decades, the process of natural product-based drug discovery involved a trial-and-error approach, where compounds from natural sources were isolated and tested in vitro and animal models to determine their efficacy against specific diseases, resulting in a lengthy and costly process to bring new drugs to market.A recent report showcased the development of a drug molecule using a computer-aided drug discovery approach, resulting in significant time savings of nearly a decade in the drug discovery process (Jarada et al., 2020).Such computational approaches when coupled with advances in genomics, proteomics, and metabolomics approaches as well as advanced cheminformatics applications such as theoretical quantitative structure-activity relationship (QSAR) calculations have the potential to revolutionize the pharmaceutical drug discovery sciences (Chakravarti & Alla, 2019).
In recent years advances in computational approaches and cheminformatics applications have resulted in large collections of molecular data being available in publicly accessible databases.Accessing and undertaking in silico experimentation with these vast libraries of data is now possible due to open-source cheminformatics libraries such as CDK, ChemmineR, RDKit, OpenChem, etc.However, the use of these software requires a significantly high level of coding knowledge in Python, R, C++, or Java.This creates a barrier to entry for experts highly knowledgeable in plant sciences, who do not have a computer science background.To overcome this limitation, the platform is referred to as Konstanz Information Miner (KNIME) allows the creation of modular and yet very powerful data mining workflows using a visual programming approach.
This study examines the effectiveness of phytochemicals in inhibiting DNMTs and employs computational methods to discover new potential inhibitors of DNMTs from a personally curated phytochemical database containing over 400 natural compounds.To the best of our knowledge, this is the first investigation to systematically screen a diverse collection of natural products for DNMT inhibitors using the KNIME analytic platform.Predicted drug-like compounds exhibiting favourable DNMTs binding traits and possessing diverse chemical scaffolds were identified through a docking-based virtual approach utilizing the AutoDock Vina program (Trott & Olson, 2010).

Data collection, training and evaluation ML models from KNIME
KNIME Analytics Platform v4.5.2 (Fillbrunn et al., 2017) was downloaded and then cheminformatics extension (Roughley, 2018)  Performance of the trained ML algorithms was tested using 'Scorer' and 'ROC curve' KNIME node.Overall statistics and confusion matrix was calculated by the Scorer node.

Drug likeness test from KNIME
Predicted molecules were filtered with the help of Lipinski's rules of 5 to distinguish between drug-like and non-drug-like molecules with 18 KNIME nodes.It predicts a high probability of success or failure due to drug-likeness for molecules complying with 3 or more of the following rules by calculating molecular weight ≤ 500, number of hydrogen bond donors ≤ 5, number of hydrogen bond acceptors ≤ 10, molar refractivity (SMR) in between 40 to 130 and log P value ≤ 5 (Lipinski, 2004).

Docking-based virtual screening
Chimera v1.16 (Pettersen et al., 2004) along with Autodock vina v1.1.2(Eberhardt et al., 2021) was used to dock compounds (Trott & Olson, 2010).3D structures of 8 compounds that were predicted as DNMT inhibitors from self-prepared machine learning KNIME workflow, were optimized with Tripos force field with Gasteiger charges and docked on two human DNMT structures [PDB codes: 4WXX (Zhang et al., 2015) and 2QRV (Jia et al., 2007)] using AutoDock Vina, prioritizing according to their value of affinity (lower than -6 kcal/mol), hydrogen bond count, root-mean-square deviation of atomic positions (RMSD, Å) and number of active torsions.Also, the R882H DNMT3A is a mutation hotspot in catalytic domain of DNMT3A causing aberrant DNA methylation in acute myeloid leukemia (AML) (Anteneh et al., 2020).Therefore, R882H DNMT3A structure [PDB code: 6W8J (Anteneh et al., 2020)] was docked with those found as novel ligands.All the protein structures were downloaded from Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/).The x, y, and z coordinates for the centre grid boxes on A chain of 4WXX were -47.0, -60.0 and 7.0, whereas for E chain of 2QRV were 63.0, -16.0 and -2.0, and A chain of 6W8J were 165.0, -146.0 and 16.0 respectively.There is one missing loop in 2QRV E chain and 5 missing regions in 4WXX, which are far away from the active site and no missing loop in 6W8J A chain.Three docking runs were performed for each ligand, and the pose with the highest absolute value of affinity was saved.Finally, the binding affinity value for a L. R. L. S. Kumari and W. R. P. Wijesinghe

Figure 1:
The graphical interface of KNIME, demonstrating structure-based screening of DNMT inhibitors from machine learning (A) and molecular filtering from Lipinski Rules of Five to show drug likeness test (B).Every node and metanode in the workflow are tagged with a concise topic description and outlines the primary steps within the workflow.specific complex was determined using the mean affinity value for the optimal pose.UCSF Chimera was used for the graphical visualisation (Pettersen et al., 2004).

RESULTS
The entire KNIME workflow for training and evaluating the models contains 60 KNIME nodes (Figure 1).The model is capable of identifying molecules that have similar structures to drugs that are used to treat as DNMT inhibitors with an accuracy of 95.81% in RF and 95.39% in ANN, and 94.13% in SVM.Eight phytochemicals (Allicin, Betaine, Citric acid, Dehydrocostuslactone, Herbacetin, Kaempferide, Morin, Pyrogallol) were predicted as DNMT inhibitor in ANN (Table S1a).Although the already known DNMT drugs were predicted from the SVM and RF models in the KNIME workflow, none of phytochemicals were identified as DNMT inhibitors.
Drug likeness test results can be observed bellow (Table 1,  S1b).According to Lipinski's rules of 5, all phytochemicals were predicted as successful oral drugs by complying 3 or more rules of Lipinski's (Figure 2).
Molecular docking validation performed by re-docked of Azacitidine on their complex with DNMTs (4WXX, 2QRV, and 6W8J, respectively), revealed an optimal reproduction of the predicted poses compared with experimental binding mode for these ligands (co-crystalized ligand), with satisfactory results with AutoDock Vina.Obtained results were in agreement with similar studies conducted for this purpose (Robert et al., 2006), reported the best RMSD value >2-5 Å.
For illustrative purposes, the best poses obtained for each DNMT-ligand complexes with utilized molecular docking protocols are shown in Fig. 3, showing the best binding pose obtained according to experimental co-crystallized ligand, RMSD values, binding affinity (AutoDock Vina) which are reported in supplementary file Table S2.

DISCUSSION
In this study, we built a modified ML model with three types of ML algorithms and the model can predict the structural similarity with > 94% accuracy.This KNIME workflow is the initial step of our drug discovery pipeline to identify potential drug candidates from plant molecules.We harnessed the power of SMILES notation for representing molecular data, a widely accepted and efficient method.
Our data sourcing involved retrieving information from the ChEMBL database, a high ranking resource, curated by experts.This database provides access to an extensive array of compounds with comprehensively documented biological activities, in line with the established standard procedure within the field for obtaining bioactivity data.(Gaulton et al., 2012).We methodically selected three distinct categories: DNMT inhibitors, phytochemicals, and rheumatoid arthritis drugs (those lacking DNMT inhibitory activity).This classification scheme played a crucial role in forming a well-balanced dataset, which is essential for the effectiveness of robust machine learning.FeatMorgan fingerprint generation is a reasonable choice for molecular feature extraction with high number of bits .Nevertheless, we acknowledge that alternative fingerprinting techniques such as MACCS or Morgan fingerprints are worth exploring in future studies (Barta, 2016).
ML approaches are a highly effective method to lower the cost and time in drug discovery methodology and the current study was based on identifying plant molecules that have similar structures to DNMT inhibitors available in the drug industry.ANN, Random Forest, and SVM are widely used for cheminformatics tasks (Tropsha, 2010).Also, 10fold cross-validation is a common practice in machine learning model evaluation (Chicco, 2017).Evaluation metrics such as accuracy, precision, recall, and F1-score are commonly used in cheminformatics (Basha et al., 2019;Russo et al., 2018).Here we used scorer (JavaScript) node to show the evaluation matrix.ROC curve analysis provides a comprehensive view of classifier performance (Fawcett, 2006).Once these molecules were identified, another method was applied to check the drug's likeness.
Lipinski's Rule of Five is a widely accepted guideline for drug-likeness (Lipinski, 2004).
Chimera is a widely used molecular visualization tool (Pettersen et al., 2004).AutoDock Vina is a popular docking software known for its accuracy and efficiency (Trott & Olson, 2010).DNMT1, DNMT3A, and DNMT3B, play essential roles in DNA methylation processes (Hermann et al., 2004).This study is explaining how the docking simulations with DNMT structures, including those with mutations, can shed light on potential novel ligands or compounds that may target the active sites or allosteric regions of DNMTs.DNMT3A mutations, especially the R882H mutation, have gained considerable attention in the context of AML.This mutation is associated with abnormal DNA methylation patterns and is considered as a driver mutation in AML pathogenesis (Anteneh et al., 2020) and contributes to epigenetic dysregulation and are implicated in disease progression (Shih et al., 2012).To date, scientists have investigated the use of xenogeneic hematopoietic stem cell transplantation (Xu et al., 2015) and chemotherapy (Ayala et al., 2021;Döhner et al., 2018) as potential treatments for individuals with AML who have the DNMT3A R882 mutation.Certain research findings have indicated that a significant enhancement in survival rates among these patients can be achieved solely with a high dosage of daunorubicin (Luskin et al., 2016).Additionally, such elevated dosages have been shown to stimulate side effects (Lebaron et al., 1988).
Although a number of studies for the discovery of DNMTs have been supported by methods of molecular docking, this report is the first where two different approches, ML and AutoDock Vina are used to determine the structural similarity and interaction feasibility between compounds and DNMTs, applying the diversity of criteria for selection of phytochemicals as new DNMT inhibitors.
From chemical structures of predicted molecules which are depicted in Figure 4, Herbacetin is a naturally occurring flavonoid compound that can be found in sources like Ephedrae herba (Koyama et al., 2021) and various other plants.Herbacetin is particularly renowned for its potent antioxidant capabilities and its ability to combat tumors in breast, colon, and skin tissues (Kim et al., 2017;Kim et al., 2016).
Kaempferide, an O-methylated flavonol, is closely related to Kaempferol, which serves as the precursor for Kaempferide.The main distinction between them lies in a monomethoxy substitution on the 4 th position of the B ring in Kaempferide.Kaempferide were widely found in many plants (Lai et al., 2007).Notably, Kaempferide has shown superior pharmacokinetic properties compared to various other flavonoids, including Kaempferol, as demonstrated in a study by Jiang et al., 2018.Researchers have investigated the anticancer potential of Kaempferide against a range of cancer types in vitro.These include cervical cancer (Nath et al., 2015), breast cancer (Yusuf et al., 2021), lung cancer (Li et al., 2020), and colon cancer (Chen et al., 2021).Morin is a polyphenolic flavonol compound primarily found in plant families like Moraceae, Rosaceae, and Fagaceae, as indicated by Solairaja et al., in 2021.It has been shown to act as a potent cell proliferation inhibitor for human leukemia (Kuo et al., 2007), anti-tumor promotion effect by significantly inhibiting skin tumor promotion (Iwase et al., 2001).Morin also acts as a chemopreventive agent against oral carcinogenesis, in vitro and in vivo (Kawabata et al., 1999).To the best of our knowledge, while Herbacetin, Kaempferide, and Morin anticancer properties have been explored in a limited number of other cancer types, its impact on DNMT has not been documented yet.
A wide array of secondary metabolites, particularly flavonoids, are present in significant quantities in a variety of medicinal plants, fruits, and vegetables.These compounds offer numerous health benefits, such as acting as antioxidants and having the potential to combat inflammation and tumors (Grotewold, 2006).Flavonoids are among the most prevalent phenolic compounds in foods like fruits, vegetables, grains, spices, beverages, and medicinal plants.They are characterized by a chemical structure with three carbon rings (C6-C3-C6) and distinct A, B, and C rings (as depicted in Figure 5).Flavonoids can be further categorized into subclasses, including flavones, flavonols, flavanones, anthocyanidins, and isoflavonoids (Fan et al., 2019).
With reference to the Kanwal et al. (2016) in silico docking studies with some flavones, the presence of methylation sites in predicted phytochemicals, such as R4=OCH3 in Kempferide and R4=OH in Herbacetin and Morin in the B ring (Figure 5, Table 1) bind efficiently to the catalytic pocket of DNMT, and seems that these compounds may share similarities with well-known Food and Drug Administration (FDA)-approved drugs like azacitidine (5-azacytidine) and decitabine (5-aza-deoxy-cytidine).These drugs contain a nitrogen atom within their pyrimidine ring, which is not found in regular cytidine and is essential for their inhibitory effects on DNA methyltransferases (DNMTs).Similarly, the flavonoids identified in our docking study sugested that structural elements that could potentially exchange methyl group and subsequently inhibit DNMTs.In addition flavones offer some advantages over azacitidine as a novel non-nucleoside inhibitor of DNMT1 because they are known to rapidly intercalate with DNA (Kausar et al., 2009;Zhang et al., 2012).
Furthermore, methyltransferase activity can be directly L. R. L. S. Kumari and W. R. P. Wijesinghe  and competitively inhibited by various natural compounds, including EGCG from green tea, genistein from soybeans, apigenin and luteolin, and myricetin as a representative dietary flavonoid (Busch et al., 2015;Kanwal et al., 2016;Morris et al., 2016;Sanaei et al., 2018).This inhibition underscores the potential of flavonoids to interfere with the activity and stability of methyltransferases.
Computer-aided drug design is an important field in the discovery of new drugs as its utility is beyond estimation.Machine learning models and molecular docking simulations play a crucial role in identifying potential druglike molecules from vast chemical libraries.However, these techniques encounter limitations.In developing machine learning models, it becomes necessary to consider the quantity of training data along with quality while selecting pertinent molecular properties that will give optimum performance (Badar, 2023).
The interaction between a protein and its ligand in docking simulations is usually simplified because factors such as the mobility of the protein or solvent effects, which would lead to unreliable results, are often ignored (Pujadas et al., 2008).As a result, laboratory experiments (in vivo and in vitro) play a significant role in the verification of computational techniques, while improved computational algorithms are needed for addressing such limitations.However, computational methodologies can significantly reduce cost and time in the drug discovery process.

CONCLUSION
The results presented in this study demonstrate the successful classification of structurally similar molecules, namely Herbacetin, Kaempferide, and Morin as potential inhibitors against DNMT using trained machine learning models.However, further in-vitro and in-vivo efficacy activity needs to be investigated for these phytochemicals.

Table 1 :
The analysis results of Lipinski rules of fiveRulesLess than five Less than five Less than ten Less than 500 should be between

Table 3 :
Main differences in predicted phytochemicals

table S1a :
Predicted active compounds list from machine learning workflow