top of page

The Update of Enamine Dataset and Virtual Screening on Cloud HPC

This article introduces how to use Virtuaflow to virtually screen Enamine's Advance, HTS, and Real Diverse databases.



Introduction to ENAMINE Database


After the update, there are currently three ENAMINE databases available, which are introduced as follows:


1. Enamine REAL

2020/07/20 Migrated from www.virtual-flow.org, contains 1.46 billion compounds, the supplier is enamine (Shanghai Taosu agent), which needs to be customized.

Location: $LIBRARY/enamine/ligand-library

todo.all generation and spatial coordinates: available online at www.virtual-flow.org.


2. Enamine REAL-Diverse

Contains 21 million various Real compounds, and the supplier is enamine (Shanghai Taosu agent), which needs to be customized.


Location: $LIBRARY/enamine-diverse-real-drug-like/ligand-library


todo.all generation and spatial coordinates: see todo.all in the database directory, see README for the spatial coordinate explanation.


3. Enamine Advance


The May 2020 version contains 448,388 compounds, and the supplier is enamine (Shanghai Taosu agent), which is in stock.


Location: $LIBRARY/enamine-adv-202005/ligand-library


todo.all generation: see todo.all in the database directory, see README for the spatial coordinate explanation.


4. Enamine HTS


The May 2020 version contains 1,756,280 compounds, and the supplier is enamine (Shanghai Taosu agent), which is in stock.


Location: $LIBRARY/enamine_hts_vfvs_202005/ligand-library


cloud


todo.all generation: see todo.all in the database directory, see README for the spatial coordinate explanation.


The environment variable LIBRARY is defined as follows:


export VFVS_ROOT=/public/software/.local/easybuild/software/virtualflow
export LIBRARY=$VFVS_ROOT/libs

Preparation for Virtual Screening


Take the virtual screening of BCR-ABL kinase inhibitors as an example, assuming you have prepared the following documents:


1. Protein file for Dokcing calculation:1iep_prot.pdbqt


2. Docking configuration file for QVINA2: config.txt


3. VirtualFlow configuration file: ctrl.all

The setting of this file is mainly related to the number of computing node cores used. I have prepared three kinds of situations here: configuration files for 4-core (all.ctrl-c4), 8-core, and 16-core (all.ctrl-c16) nodes, these configuration files have been configured and are suitable for virtual screening with qvina.


4. List of compounds used for docking: todo.all

In this example, Enamine advance is used, and the todo.all file is in the database directory


Use of Virtulflow


1. Set the project file directory


Suppose we plan to start a virtual screening calculation in the VFVS_ABL directory under HOME, then we need to create this directory as the directory to manage the project. In order to facilitate subsequent management, and create a VFVS_DIR environment variable to point to this directory:


cd  ~
mkdir VFVS_ABL
export VFVS_DIR=/home/cloudam/VFVS_ABL

2. Prepare the input-files directory


2.1 Create input-files directory


First, we need to create an input-files, used to save the input file.


cd $VFVS_DIR
mkdir input-files

2.2 Designating databases for virtual screening


Since we need to perform virtual screening on Enamine advance, put the library into the input-files directory and use soft links:


ln -sf $LIBRARY/enamine-adv-202005/ligand-library $VFVS_DIR/input-files

2.3 Create the acceptor file directory and place the acceptor file


mkdir -p $VFVS_DIR/input-files/receptors
cp 1iep_prot.pdbqt $VFVS_DIR/input-files/receptors

2.4 Create a docking scene directory and place docking parameter files


mkdir -p $VFVS_DIR/input-files/qvina02_rigid_receptor1
cp config.txt $VFVS_DIR/input-files/qvina02_rigid_receptor1/config.txt

3. Prepare the tools directory

First copy the tools of Virtual flow to the project directory


cp -fr $VFVS_ROOT/vfvs/tools $VFVS_DIR

Copy todo.all to tools:


cp  $VFVS_DIR/input-files/ligand-library/todo.all $VFVS_DIR/tools/templates/todo.all

Copy the Virtualfollow configuration file


cp   all.ctrl-c4 $VFVS_DIR/tools/templates/all.ctrl

Among them, all.ctrl-c4 is the Virtual Flow configuration file I prepared in advance, which is suitable for 4-core computing nodes. Correspondingly, there are 8-core and 16-core (all.ctrl-c16) configuration files.


Check the contents of the current project file:


[cloudam@master 1_input]$ tree
.
├── input-files
│   ├── ligand-library -> /public/software/.local/easybuild/software/virtualflow/libs/enamine-adv-202005/ligand-library
│   ├── qvina02_rigid_receptor1
│   │   └── config.txt
│   └── receptors
│       └── 1iep_prot.pdbqt
├── output-files
└── tools
    ├── bin
    │   ├── qvina02
    │   ├── qvina_w
    │   ├── smina
    │   ├── sqs
    │   ├── time_bin
    │   ├── vina
    │   ├── vina_carb
    │   └── vina_xb
    ├── slave
    │   ├── continue-jobline.sh
    │   ├── copy-templates.sh
    │   ├── copy-templates.sh.clodam
    │   ├── copy-templates.sh.old
    │   ├── exchange-continue-jobline.sh
    │   ├── exchange-jobfile.sh
    │   ├── prepare-todolists-cloudam.sh
    │   ├── prepare-todolists.sh
    │   ├── prepare-todolists.sh.old.old
    │   ├── show_banner.sh
    │   ├── submit.sh
    │   └── sync-jobfile.sh
    ├── templates
    │   ├── all.ctrl
    │   ├── one-queue.sh
    │   ├── one-step.sh
    │   ├── template1.lsf.sh
    │   ├── template1.pbs.sh
    │   ├── template1.sge.sh
    │   ├── template1.slurm.sh
    │   ├── template1.slurm.sh.old
    │   ├── template1.torque.sh
    │   └── todo.all
    ├── tmp
    │   └── README.md
    ├── vf_continue_all.sh
    ├── vf_continue_jobline.sh
    ├── vf_prepare_folders.sh
    ├── vf_redistribute_collections_multiple.sh
    ├── vf_redistribute_collections_single.sh
    ├── vf_report.sh
    ├── vf_start_jobline.sh
    └── vf_start_jobline.sh.old
 
10 directories, 41 files

4. Prepare the workflow directory (workflow)


cd $VFVS_DIR/tools
./vf_prepare_folders.sh

5. Begin Virtual Screening


Suppose we use 200 nodes for virtual screening, start virtual screening with vf_start_jobline.sh:


cd $VFVS_DIR/tools
NODE_NUMBER=200
./vf_start_jobline.sh 1 $NODE_NUMBER templates/template1.slurm.sh submit 1

6. Progress view and docking result statistics


cd $VFVS_DIR/tools
./vf_report.sh  -c vs -d qvina02_rigid_receptor1

Post-processing of virtual screening: list top compounds and pose extraction


This part of the content requires OpenBabel support, so the virtual environment of OpenBabel needs to be preloaded. In Cloudam, the virtual environment rdkit has pre-installed openbabel:


module add Anaconda3/2020.02
source activate
conda activate rdkit

1. Copy VFTools


First, you need to make a copy of VFTools to the local:


cd $VFVS_DIR
cp -r  $VFVS_ROOT/vftools/VFTools .
export PATH=$VFVS_DIR/VFTools/bin:$PATH

2. Sort and get the top compound list


After the VirtualFlow calculation is completed, all compounds can also be sorted. Create a dedicated directory pp, and type the command:


cd $VFVS_DIR
mkdir -p pp/ranking
cd pp/ranking

Sort all compounds:


vfvs_pp_ranking_all.sh ../../output-files/complete/ 2 meta_tranche

If it doesn't work, type:


vfvs_pp_ranking_all.sh -h

According to the help, make sure the parameters are correct. Also, confirm that VFTools/bin has been added to the PATH environment variable.


Give a list of the top 100 compounds with the best scores, saved as compound:


head -n 100 qvina02_rigid_receptor1/firstposes.all.minindex.sorted.clean > compounds

3. Extract the docking binding mode (pose) of the top 100 compounds with the highest score


Create a new directory to save the pose


cd $VFVS_DIR/pp
mkdir -p docking_poses/qvina02_rigid_receptor1
cd docking_poses/qvina02_rigid_receptor1

According to the list (componds) in the first step, extract the pose of the top 100 compounds:


cp ../../ranking/compounds .  #复制列表到当前目录
vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_tranch compounds dockingsposes overwrite

Generate three files and two directories:


compounds  compounds.energies  compounds.energies.uniq.csv dockingsposes  dockingsposes.plain

Each pose with the best score has been saved in the dockingsposes.plain directory in PDB format in order of ranking and the results can be analyzed directly with visualization software. It is recommended to use FlareViewer (free) as visualization software to analyze the results.


About Cloudam


Cloudam HPC is a one-stop HPC platform with 300+ pre-installed to deploy immediately. The system can smartly schedule compute nodes and dynamically schedule the software licenses, optimizing workflow and boosting efficiency for engineers and researchers in Life Sciences, AI/ML, CAE/CFD Simulations, Universities/Colleges, etc.


Partnered with AWS, Azure, Google Cloud, Oracle Cloud, etc., Cloudam powers your R&D with massive cloud resources without queuing.


You can submit jobs by intuitive templates, SLURM, and Windows/Linux workstations. Whether you are a beginner or a professional, you can always find it handy to run and manage your job.


There is a $30 Free Trial for every new user. Why not register and boost your R&D NOW?





bottom of page