Building a Structural Model of Adk 1) Download 4ake_a.pdb (code: 4AKE) from the RCSD website (http://www.rcsb.org/pdb/home/home.do). Prepare the coordinates (.pdb) and structural files (.psf) of the protein with missing atoms and hydrogens. Create adk_psfgen.tcl and insert the following commands in this tcl file: package require psfgen resetpsf set path /home/jyhuang/WS_NAMD/Adk topology /home/jyhuang/WS_NAMD/toppar/top_all27_prot_lipid.rtf pdbalias residue HIS HSE pdbalias atom ILE CD1 CD pdbalias atom HOH O OH2 pdbalias residue HOH TIP3 segment ADK { first NTER last CTER pdb $path/4ake_a.pdb } coordpdb $path/4ake_a.pdb ADK guesscoord writepdb $path/adk.pdb writepsf $path/adk.psf Save and quit. 2) Start vmd with the display driver in text mode (-dispdev text), and execute (-e) the script file 2h3f_psfgen.tcl. $ vmd -dispdev text -e ./adk_psfgen.tcl This will run psfgen with the input file adk_psfgen.tcl, which will generate adk.pdb and adk.psf including missing atoms and missing hydrogen atoms. 3) Solvating the Protein a) Open vmd in the commandline $ vmd In vmd, follow the sequence: File-->New Molecule-->Browse-->adk.psf-->OK-->Load-->Browse-->adk.pdb-->OK-->Load to input the newly constructed protein structure file adk.psf and coordinates file adk.pdb. Close Molecule File Bowser. b) In the VMD TkCon window of your VMD session type: package require solvate solvate adk.psf adk.pdb -t 12 -o adk_wb 4) In the VMD Main window, click Molecule → Delete Molecule. Then input new molecule with File → New Molecule. Use the Browse... button to find the file adk_wb.psf. Load it by pressing the Load button. This will load the structural information into VMD. In the same Molecule File Browser, follow Browse--> 2adk_wb.pdb--> OK--> Load. Then close Molecule File Bowser. 5) In the VMD TkCon window type: set all [atomselect top all] $all moveby [ vecinvert [ measure center $all ]] measure minmax $all -> {-31.5 -40.6 -38.7} {31.5 40.4 38.6} set prot [atomselect top "protein"] $prot moveby [ vecinvert [ measure center $prot ]] measure center $prot -> {-6.32e-7 8.84e-7 -3.65e-6} 6) Ionize: Extension->Modeling->Add Ions i. Check the files adk_wb.psf & adk_wb.pdb ii. Output = adk_wi iii. Neutralize and Set Concentration = 0.05 mol/L iv. Segname of placed ions = ION v. Click on Autoionize Close tcl console and exit vmd. ############################################################################# ############ Energy Minimization and Configuration Relaxation ############### ############################################################################# adk_wi.psf & adk_wi.pdb will be the starting point for running MD simulations. Energy minimization and constrained dynamics are needed to prepared the starting configuration for a production MD run. a) Energetically minimize the system and then run a constrained simulation where only water molecules can move. In a text editor, prepare (or edit) adk_wimineq.conf. Note: if {X} { A } which implies that if X=1 the procedure coded by commands A will be executed; if X=0, the procedure will be skipped. In adk_wimineq.conf: # Some Adjustable Parameters # Names of input files and output file, temperature, and restart option. # Simulation Parameters # parameter file & temperature initialization (comment out if using a restart file). Use the following commands to read in the ff parameters of protein, lipid, and water/ions: paraTypeCharmm on parameters /home/jyhuang/WS_NAMD/toppar/par_all27_prot_lipidNBFIX.prm # PBC: In vmd, you can check the dimensions and center of adk_wi by For example: to find out the minmax and the center, use the minmax command: set all [atomselect top all] measure minmax $all Making the dimensions a little bit bigger (1-1.5 Ang) so that atoms on the boundary can relax. # PME shall be small multiples of 2, 3, or 5 with grid spacing around 1A # Pressure control is turned off since in NVT most atoms will be fixed # Fixed atoms: Set the beta column of adk_fix.pdb to 1.0 for all atoms (i.e., everything except the protein). fixedAtomsCol is where you specify to use the beta column (B). fix_adk.tcl' in the current directory can be used to fix the atoms of the protein. Source this script or type it out in the Tk console. # Minimization and dynamics: Run 1000 steps of minimization, re-initialize the temperature and then 0.5 ns of dynamics. In a console navigate to the current working directory. Then type: $ /opt/NAMD/namd2 adk_wimineq.conf > adk_wimineq.log # Restart from the previous finished run Open file: adk_wieq.conf outputName adk_wieq now turned on restart options, "if {1} {..." inputname adk_wimineq i. commented out "temperature $temperature". We get it from '.vel' file. ii. Turned off manually entering cell dimensions since it will come from the '.xsc' restart file. iii. Enable constant pressure control for a NPT run iv. Disabled Fixed Atoms # EXTRA PARAMETERS # Run the simulation: $ /opt/NAMD/namd2 adk_wieq.conf > adk_wieq.log Look at the .log file. Notice that the energies now have gone down. Finally look at the simulation adk_wieq.dcd. No gap in the periodic boundaries. ################ Starting a Production MD Run ##################### 1. Open in a text editor: adk_wimd.conf set up the correct restart and output file names. Notice that all external forces have been removed and there is no minimization step this time. 2. Run Dynamics: $ namd2 adk_wimd.conf > adk_wimd.log ############### Analyzing MD Trajectories with bio3D ####################### $ sudo R > library(bio3d) > pdb <- read.pdb("adk_wi.pdb") > dcd <- read.dcd("adk_wimd.dcd", cell = FALSE) > ca.inds <- atom.select(pdb, "protein", elety="CA") > trj <- fit.xyz(pdb$xyz, dcd, fixed.inds=ca.inds$xyz, mobile.inds=ca.inds$xyz) > protpdb <- trim.pdb(pdb, ca.inds) > protdcd <- trim(trj, col.inds=ca.inds$xyz) > print(protpdb$xyz) Total Frames#: 1 Total XYZs#: 642, (Atoms#: 214) [1] -7.371 -16.143 2.799 <...> -8.017 -19.417 -6.735 [642] + attr: Matrix DIM = 1 x 642 > print(protdcd) Total Frames#: 1000 Total XYZs#: 642, (Atoms#: 214) [1] -8.098 -14.896 5.585 <...> -3.051 -1.928 -21.151 [642000] + attr: Matrix DIM = 1000 x 642 ############## Root Mean Square Deviation ################## > rd <- rmsd(protpdb, protdcd, fit=TRUE, ncore=4) > plot(rd, typ="l", ylab="RMSD", xlab="Frame No.") > points(lowess(rd), typ="l", col="red", lty=1, lwd=3) > dev.copy(jpeg,filename="rmsd_plot.jpg") > dev.off() > hist(rd, breaks=40, freq=FALSE, main="RMSD Histogram", xlab="RMSD") > lines(density(rd), col="red", lwd=3) > dev.copy(jpeg,filename="rmsd_hist.jpg") > dev.off() ############# Root Mean Squared Fluctuations ############### > rf <- rmsf(protdcd) > plot(rf, ylab="RMSF", xlab="Residue Position", typ="l") > dev.copy(jpeg,filename="rmsf_plot.jpg") > dev.off() ##### Principal Component Analysis ##### > pc <- pca.xyz(protdcd) > plot(pc, col=bwr.colors(nrow(dcd))) > dev.copy(jpeg,filename="pc_plot.jpg") > dev.off() Examine the contribution of each residue to the first two principal components > plot.bio3d(pc$au[,1], ylab="PC1 (B), PC2 (R)", xlab="Residue Position", typ="l", col="blue", lty=1, lwd=3) > points(pc$au[,2], typ="l", col="red", lty=1, lwd=3) > dev.copy(jpeg,filename="pc1&2_resid.jpg") > dev.off() ####### Perform NMA ###### # Normal mode analysis (NMA) of a single protein structure can be carried out by providing a PDB object to the function nma(). > modes <- nma(protpdb) > print(modes) # Note that the returned nma object consists of a number of attributes listed on the +attr: line. To get a quick overview of the results one can simply call the plot() function on the returned nma object. > plot(modes, sse=pdb) # Calculate the cross-correlation matrix > cm <- dccm(modes) # Plot a correlation map with plot.dccm(cm) > plot(cm, sse=protpdb, contour=F, col.regions=bwr.colors(20), at=seq(-1,1,0.1) ) # View the correlations in the structure with pymol > view.dccm(cm, protpdb, launch=TRUE) # Deformation analysis provides a measure for the amount of local flexibility in the protein structure. We can calculate deformation energies (with deformation.nma()) and atomic fluctuations (with fluct.nma()) of the first three modes and visualize the results in PyMOL: > dfmE <- deformation.nma(modes) > dfmSum <- rowSums(dfmE$ei[,1:3]) > flucts <- fluct.nma(modes, mode.inds=seq(7,9)) > write.pdb(pdb=NULL, xyz=modes$xyz, file="R-defor.pdb", b=dfmSum) > write.pdb(pdb=NULL, xyz=modes$xyz, file="R-fluct.pdb", b=flucts) # Domain analysis can identify regions in the protein that move as rigid bodies. Run geostas to find domains > gs <- geostas(modes, k=4) # Write NMA trajectory with domain assignment > mktrj(modes, mode=7, chain=gs$grps) # Plot a atomic movement similarity matrix > plot(gs, margin.segments=gs$grps, contour=FALSE) ####### Buld protein structure network from MD trajectory ################## > cij <- dccm(protdcd, ncore=4) > net <- cna(cij, ncore=4, cutoff.cij=0.45) > plot(net, protpdb) > dev.copy(jpeg,filename="net_ge045.jpg") > dev.off() # View the correlations in pymol > view.dccm(cij, protpdb, launch = TRUE) # View the structure mapped network view.cna(net, protpdb, launch = TRUE) # Look into partitions with modularity close to the maximal value but with an overall smaller number of communities > tree <- community.tree(net, rescale=TRUE) > plot(tree$num.of.comms, tree$modularity, xlab="Communities", ylab="Modularity") # Function using community.tree() and network.amendment() mod.select <- function(x, thres=0.1) { remodel <- community.tree(x, rescale = TRUE) n.max = length(unique(x$communities$membership)) ind.max = which(remodel$num.of.comms == n.max) v = remodel$modularity[length(remodel$modularity):ind.max] v = rev(diff(v)) fa = which(v>=thres)[1] - 1 ncomm = ifelse(is.na(fa), min(remodel$num.of.comms), n.max - fa) ind <- which(remodel$num.of.comms == ncomm) network.amendment(x, remodel$tree[ind, ]) } > nnet = mod.select(net) > nnet > plot(nnet, protpdb) > view.cna(nnet, pdb, launch=TRUE) # Use of a contact map filter effectively ignores long range correlations between residues that are not in physical contact for the majority of the trajectory. > cm <- cmap(protdcd, ncore=4, dcut = 4.5, scut = 0, pcut = 0.75, mask.lower = FALSE) > net.cut <- cna(cij, cm = cm) > plot.dccm((cij * cm), main="") > plot.dccm(cij, margin.segments = net$communities$membership) # The layout.cna() function can be used to determine the geometric center of protein residues and communities and thus provide 3D and 2D network and community graph layouts that most closely match the protein systems 3D coords. > layout3D <- layout.cna(net, protpdb, k=3) > layout2D <- layout.cna(net, protpdb, k=2) > xy <- plot.cna(net) > dev.copy(jpeg,filename="cnet_2D.jpg") > dev.off() > identify.cna(xy, cna = net) # To remove communities composed of only 2 residues, use > net.pruned <- prune.cna(net, size.min = 3) > par(mfcol = c(1, 2), mar = c(0, 0, 0, 0)) > plot.cna(net) > plot.cna(net.pruned) > dev.copy(jpeg,filename="cnet_pruned.jpg") > dev.off() # In protein-networks, identifying the nodes with many edges can yield insights into the internal dynamic coordination of protein regions. Centrality measures can be employed to characterize protein sections. > node.betweenness <- betweenness(net$network) > plot(node.betweenness, xlab="Residue No", ylab="Centrality", type="h") # You can map these centrality values to the B-factor column of a PDB file and observe it in VMD. > write.pdb(protpdb, b=normalize.vector(node.betweenness), file="Betweeness.pdb")