GROMACS SUBSET STATISTICS
This is a method for generating data in gromacs programs which require the system to be composed of only those molecules or atoms on which statistics will be run.
Let’s say you have a system composed of N different species and you’ve got your xtc trajectory file from the run. Then let’s say you want to know about average cluster sizes of one of the species in the simulation. For some programs in gromacs (not g_clustsize, the one in question here) this is fairly easy because the software lets you specify, either through the program options itself or through an index file, what you’d like to consider. With others though, especially those with options for dealing with molecule statistics explicitly, that don’t allow you to do this for whatever reason. So, a workaround is necessary.
First, edit your input mdp file and topology with extreme prejudice, eliminating or commenting out references to anything you’re not interested in. For instance, in my files, I need to remove all water and ions. I call all these new files “fake” versions of the real files.
$ diff fake_fullmd.mdp fullmd.mdp
15,16c15,16
< xtc_grps = protein ; sol na+ cl-
< energygrps = protein ; sol na+ cl-
---
> xtc_grps = protein sol na+ cl-
> energygrps = protein sol na+ cl-
$ diff fake_topol.top topol.top
163a164,166
> SOL 9529
> NA+ 10
> CL- 10
Make sure you have an index file if you don’t already:
$ make_ndx -f conf.gro
Then dump the first frame of the real simulation. The program (trjconv) will ask you which parts of the frame you’d like to dump.
$ trjconv -f traj.xtc -o fake_protein.gro -s b4md.tpr -n index.ndx -dump 0
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:
Now generate the new input binary for your “fake” system.
$ grompp -f fake_fullmd.mdp -c fake_protein.gro -p fake_topol.top -o fake_b4md.tpr
Convert your trajectory, again selecting whichever part of the trajectory you’re interested in.
$ trjconv -f traj.xtc -o fake_protein.xtc -s b4md.tpr -n index.ndx
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:
Now you’re ready to run your analysis! It won’t actually use the index file you specify here (since you’re only looking at molecules with the -mol option), though it requires it for some reason that eludes me.
$ g_clustsize -f fake_protein.xtc -s fake_b4md.tpr -mol -n index.ndx
And there you have it. Cluster statistics for an arbitrary subset of your system. Cheers.
NOTE: There actually are slightly more elegant ways of doing this, but this is perfectly sufficient for simple situations, like clustering of some molecule in some other explicit medium.
Nice. I don’t know if I’ll get a chance to use this but I’m sure someone will. This is probably some functionality that I should consider adding to my code one day.
i have been stuggling with this for the past 48 hours!!!!
Thank you so much for adding this !!! it works awesome!!!