This article introduces how to use Virtuaflow to virtually screen Enamine's Advance, HTS, and Real Diverse databases.

Introduction to ENAMINE Database
After the update, there are currently three ENAMINE databases available, which are introduced as follows:
1. Enamine REAL
2020/07/20 Migrated from www.virtual-flow.org, contains 1.46 billion compounds, the supplier is enamine (Shanghai Taosu agent), which needs to be customized.
Location: $LIBRARY/enamine/ligand-library
todo.all generation and spatial coordinates: available online at www.virtual-flow.org.
2. Enamine REAL-Diverse
Contains 21 million various Real compounds, and the supplier is enamine (Shanghai Taosu agent), which needs to be customized.
Location: $LIBRARY/enamine-diverse-real-drug-like/ligand-library
todo.all generation and spatial coordinates: see todo.all in the database directory, see README for the spatial coordinate explanation.
3. Enamine Advance
The May 2020 version contains 448,388 compounds, and the supplier is enamine (Shanghai Taosu agent), which is in stock.
Location: $LIBRARY/enamine-adv-202005/ligand-library
todo.all generation: see todo.all in the database directory, see README for the spatial coordinate explanation.
4. Enamine HTS
The May 2020 version contains 1,756,280 compounds, and the supplier is enamine (Shanghai Taosu agent), which is in stock.
Location: $LIBRARY/enamine_hts_vfvs_202005/ligand-library
cloud
todo.all generation: see todo.all in the database directory, see README for the spatial coordinate explanation.
The environment variable LIBRARY is defined as follows:
export VFVS_ROOT=/public/software/.local/easybuild/software/virtualflow
export LIBRARY=$VFVS_ROOT/libs
Preparation for Virtual Screening
Take the virtual screening of BCR-ABL kinase inhibitors as an example, assuming you have prepared the following documents:
1. Protein file for Dokcing calculation:1iep_prot.pdbqt
2. Docking configuration file for QVINA2: config.txt
3. VirtualFlow configuration file: ctrl.all
The setting of this file is mainly related to the number of computing node cores used. I have prepared three kinds of situations here: configuration files for 4-core (all.ctrl-c4), 8-core, and 16-core (all.ctrl-c16) nodes, these configuration files have been configured and are suitable for virtual screening with qvina.
4. List of compounds used for docking: todo.all
In this example, Enamine advance is used, and the todo.all file is in the database directory
Use of Virtulflow
1. Set the project file directory
Suppose we plan to start a virtual screening calculation in the VFVS_ABL directory under HOME, then we need to create this directory as the directory to manage the project. In order to facilitate subsequent management, and create a VFVS_DIR environment variable to point to this directory:
cd ~
mkdir VFVS_ABL
export VFVS_DIR=/home/cloudam/VFVS_ABL
2. Prepare the input-files directory
2.1 Create input-files directory
First, we need to create an input-files, used to save the input file.
cd $VFVS_DIR
mkdir input-files
2.2 Designating databases for virtual screening
Since we need to perform virtual screening on Enamine advance, put the library into the input-files directory and use soft links:
ln -sf $LIBRARY/enamine-adv-202005/ligand-library $VFVS_DIR/input-files
2.3 Create the acceptor file directory and place the acceptor file
mkdir -p $VFVS_DIR/input-files/receptors
cp 1iep_prot.pdbqt $VFVS_DIR/input-files/receptors
2.4 Create a docking scene directory and place docking parameter files
mkdir -p $VFVS_DIR/input-files/qvina02_rigid_receptor1
cp config.txt $VFVS_DIR/input-files/qvina02_rigid_receptor1/config.txt
3. Prepare the tools directory
First copy the tools of Virtual flow to the project directory
cp -fr $VFVS_ROOT/vfvs/tools $VFVS_DIR
Copy todo.all to tools:
cp $VFVS_DIR/input-files/ligand-library/todo.all $VFVS_DIR/tools/templates/todo.all
Copy the Virtualfollow configuration file
cp all.ctrl-c4 $VFVS_DIR/tools/templates/all.ctrl
Among them, all.ctrl-c4 is the Virtual Flow configuration file I prepared in advance, which is suitable for 4-core computing nodes. Correspondingly, there are 8-core and 16-core (all.ctrl-c16) configuration files.
Check the contents of the current project file:
[cloudam@master 1_input]$ tree
.
├── input-files
│ ├── ligand-library -> /public/software/.local/easybuild/software/virtualflow/libs/enamine-adv-202005/ligand-library
│ ├── qvina02_rigid_receptor1
│ │ └── config.txt
│ └── receptors
│ └── 1iep_prot.pdbqt
├── output-files
└── tools
├── bin
│ ├── qvina02
│ ├── qvina_w
│ ├── smina
│ ├── sqs
│ ├── time_bin
│ ├── vina
│ ├── vina_carb
│ └── vina_xb
├── slave
│ ├── continue-jobline.sh
│ ├── copy-templates.sh
│ ├── copy-templates.sh.clodam
│ ├── copy-templates.sh.old
│ ├── exchange-continue-jobline.sh
│ ├── exchange-jobfile.sh
│ ├── prepare-todolists-cloudam.sh
│ ├── prepare-todolists.sh
│ ├── prepare-todolists.sh.old.old
│ ├── show_banner.sh
│ ├── submit.sh
│ └── sync-jobfile.sh
├── templates
│ ├── all.ctrl
│ ├── one-queue.sh
│ ├── one-step.sh
│ ├── template1.lsf.sh
│ ├── template1.pbs.sh
│ ├── template1.sge.sh
│ ├── template1.slurm.sh
│ ├── template1.slurm.sh.old
│ ├── template1.torque.sh
│ └── todo.all
├── tmp
│ └── README.md
├── vf_continue_all.sh
├── vf_continue_jobline.sh
├── vf_prepare_folders.sh
├── vf_redistribute_collections_multiple.sh
├── vf_redistribute_collections_single.sh
├── vf_report.sh
├── vf_start_jobline.sh
└── vf_start_jobline.sh.old
10 directories, 41 files
4. Prepare the workflow directory (workflow)
cd $VFVS_DIR/tools
./vf_prepare_folders.sh
5. Begin Virtual Screening
Suppose we use 200 nodes for virtual screening, start virtual screening with vf_start_jobline.sh:
cd $VFVS_DIR/tools
NODE_NUMBER=200
./vf_start_jobline.sh 1 $NODE_NUMBER templates/template1.slurm.sh submit 1
6. Progress view and docking result statistics
cd $VFVS_DIR/tools
./vf_report.sh -c vs -d qvina02_rigid_receptor1
Post-processing of virtual screening: list top compounds and pose extraction
This part of the content requires OpenBabel support, so the virtual environment of OpenBabel needs to be preloaded. In Cloudam, the virtual environment rdkit has pre-installed openbabel:
module add Anaconda3/2020.02
source activate
conda activate rdkit
1. Copy VFTools
First, you need to make a copy of VFTools to the local:
cd $VFVS_DIR
cp -r $VFVS_ROOT/vftools/VFTools .
export PATH=$VFVS_DIR/VFTools/bin:$PATH
2. Sort and get the top compound list
After the VirtualFlow calculation is completed, all compounds can also be sorted. Create a dedicated directory pp, and type the command:
cd $VFVS_DIR
mkdir -p pp/ranking
cd pp/ranking
Sort all compounds:
vfvs_pp_ranking_all.sh ../../output-files/complete/ 2 meta_tranche
If it doesn't work, type:
vfvs_pp_ranking_all.sh -h
According to the help, make sure the parameters are correct. Also, confirm that VFTools/bin has been added to the PATH environment variable.
Give a list of the top 100 compounds with the best scores, saved as compound:
head -n 100 qvina02_rigid_receptor1/firstposes.all.minindex.sorted.clean > compounds
3. Extract the docking binding mode (pose) of the top 100 compounds with the highest score
Create a new directory to save the pose
cd $VFVS_DIR/pp
mkdir -p docking_poses/qvina02_rigid_receptor1
cd docking_poses/qvina02_rigid_receptor1
According to the list (componds) in the first step, extract the pose of the top 100 compounds:
cp ../../ranking/compounds . #复制列表到当前目录
vfvs_pp_prepare_dockingposes.sh ../../../output-files/complete/qvina02_rigid_receptor1/results/ meta_tranch compounds dockingsposes overwrite
Generate three files and two directories:
compounds compounds.energies compounds.energies.uniq.csv dockingsposes dockingsposes.plain
Each pose with the best score has been saved in the dockingsposes.plain directory in PDB format in order of ranking and the results can be analyzed directly with visualization software. It is recommended to use FlareViewer (free) as visualization software to analyze the results.
About Cloudam
Cloudam HPC is a one-stop HPC platform with 300+ pre-installed to deploy immediately. The system can smartly schedule compute nodes and dynamically schedule the software licenses, optimizing workflow and boosting efficiency for engineers and researchers in Life Sciences, AI/ML, CAE/CFD Simulations, Universities/Colleges, etc.
Partnered with AWS, Azure, Google Cloud, Oracle Cloud, etc., Cloudam powers your R&D with massive cloud resources without queuing.
You can submit jobs by intuitive templates, SLURM, and Windows/Linux workstations. Whether you are a beginner or a professional, you can always find it handy to run and manage your job.
There is a $30 Free Trial for every new user. Why not register and boost your R&D NOW?