GAUSSIAN STEP MISMATCH

Science, Technical — acosta @ 11:39 am

I run into this weird error all the time in Gaussian when doing optimization or frequency restarts. This is a big problem for me because a lot of times I run these on a large number of processors, but in a queue with a relatively short max wall time. Searching the web for this error yields one hit, and it’s in Chinese and not helpful even under Google translate.

Error originates: RdWrOT: IFlag = 2 Data mismatch
Search? I only get here. And the results aren’t exactly useful …

你给出的信息太少了,能不能多贴一点出来?

Anyway this almost always seems to be a problem with the collision of the previous and current route. Often I have to increase the number of optimization or SCF cycles because my systems are large and I do optimizations with diffuse functions, which tend to be pretty ill-behaved. Here’s the total head of the log:

******************************************
Gaussian 03: AM64L-G03RevE.01 11-Sep-2007
29-Mar-2009
******************************************
%chk=freq_min60.chk
%nprocshared=8
Will use up to 8 processors via shared memory.
%nproclinda=16
Will use up to 16 processors via Linda.
%mem=100MW
Default route: MaxDisk=200GB
----------------------------------------------------------------------
# freq=restart b3lyp/6-31+g(d) geom=allcheckpoint guess=read int=fmmna
toms=300 scf=tight
----------------------------------------------------------------------
1/10=4,30=1,35=1/3;
99//99;

GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGrad
Berny optimization.
Restoring state from the checkpoint file "freq_min60.chk".
Title: min60
Route: # opt b3lyp/6-31+g(d) geom=allcheckpoint guess=read int=fmmnat
oms=300 optcyc=1000 scfcyc=1000
RdWrOT: IFlag = 2 Data mismatch
MaxStp (old) = 504 MaxStp (new) = 2
MaxJob (old) = 1 MaxJob (new) = 1
RdWrOT: Data mismatch on MaxStp/MaxJob
Error termination via Lnk1e in /apps/steele/g03-E.01/l103.exe at Sun Mar 29 01:19:04 2009.
Job cpu time: 0 days 0 hours 0 minutes 13.6 seconds.
File lengths (MBytes): RWF= 49 Int= 0 D2E= 0 Chk= 56 Scr= 1
Command exited with non-zero status 1

I’m posting this more so this error comes up in Google to Vdov so maybe, maybe someone can tell me about it (no one, including people who really know the software well, has been able to provide an acceptable explanation thus far). If the two routes are both optimizations, for instance, you can usually get around this error by eliminating the opt cycle specification in the new restarted route. But if you’re moving the guess and geometry to some new calculation, it’s nearly impossible to get around this. The solution is almost always to create a formatted checkpoint file (formchk) and convert back (unfchk), so the route disappears. You could also obviously do this by specifying the new geometry as a Z-matrix in the initial calculation, but I much prefer to read my initial guess from the checkpoint, so this is not a good option in many cases. Starting the calculation and then restarting from the new binary checkpoint file usually does the trick, as there are appear to be no collisions in the route cycles.

Anyway, cheers. Hopefully someone who knows something about this will let me know.

GROMACS SUBSET STATISTICS

Science, Technical — acosta @ 10:57 am

This is a method for generating data in gromacs programs which require the system to be composed of only those molecules or atoms on which statistics will be run.

Let’s say you have a system composed of N different species and you’ve got your xtc trajectory file from the run. Then let’s say you want to know about average cluster sizes of one of the species in the simulation. For some programs in gromacs (not g_clustsize, the one in question here) this is fairly easy because the software lets you specify, either through the program options itself or through an index file, what you’d like to consider. With others though, especially those with options for dealing with molecule statistics explicitly, that don’t allow you to do this for whatever reason. So, a workaround is necessary.

First, edit your input mdp file and topology with extreme prejudice, eliminating or commenting out references to anything you’re not interested in. For instance, in my files, I need to remove all water and ions. I call all these new files “fake” versions of the real files.

$ diff fake_fullmd.mdp fullmd.mdp
15,16c15,16
< xtc_grps = protein ; sol na+ cl-
< energygrps = protein ; sol na+ cl-
---
> xtc_grps = protein sol na+ cl-
> energygrps = protein sol na+ cl-


$ diff fake_topol.top topol.top
163a164,166
> SOL 9529
> NA+ 10
> CL- 10

Make sure you have an index file if you don’t already:

$ make_ndx -f conf.gro

Then dump the first frame of the real simulation. The program (trjconv) will ask you which parts of the frame you’d like to dump.

$ trjconv -f traj.xtc -o fake_protein.gro -s b4md.tpr -n index.ndx -dump 0
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:

Now generate the new input binary for your “fake” system.

$ grompp -f fake_fullmd.mdp -c fake_protein.gro -p fake_topol.top -o fake_b4md.tpr

Convert your trajectory, again selecting whichever part of the trajectory you’re interested in.

$ trjconv -f traj.xtc -o fake_protein.xtc -s b4md.tpr -n index.ndx
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:

Now you’re ready to run your analysis! It won’t actually use the index file you specify here (since you’re only looking at molecules with the -mol option), though it requires it for some reason that eludes me.

$ g_clustsize -f fake_protein.xtc -s fake_b4md.tpr -mol -n index.ndx

And there you have it. Cluster statistics for an arbitrary subset of your system. Cheers.

NOTE: There actually are slightly more elegant ways of doing this, but this is perfectly sufficient for simple situations, like clustering of some molecule in some other explicit medium.

TEST YOUR MPI

Personal, Science, Technical — acosta @ 4:53 pm

It sometimes amazes me how a lot of people are much happier to ask stupid questions than to just do the basic work themselves, maybe even learning something in the process. In the Gromacs community, the past couple weeks have been a great time for some nice examples of this. Version 4 came out, which *substantially* improves the scalability of parallelized molecular simulations, due to a move from the previously standard particle decomposition method to the much more general domain decomposition (DD) method. The DD method has been popular in continuum physics and in other fields for quite some time, but this is the first real application to discrete work it has seen.

So, of course, people need to know how to do parallel simulations with this code. In all major package managers 4.0.2 hasn’t made it through any appropriate channels, so people have to build it themselves. Unlike most major scientific packages, building Gromacs is absurdly simple. Things are quite beautiful actually.

Anyway, my point isn’t to extol the virtues of Gromacs but rather to suggest that if something doesn’t work, do the initial work to figure out the problem and exhaust at least the most obvious problems with the software before throwing your hands up in the air. Problem with MPI? Test it first! Anyone working with MPI should at the very least be able to look up how to write a basic MPI application.

An example:

#include "stdio.h"
#include "mpi.h"

int
main(argc, argv)
int   argc;
char  *argv[];
{
  int  rank, size, length;

  char name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Get_processor_name(name, &length);

  printf ("process %d of %d on %s\n", rank, size, name);

  MPI_Finalize();

  return 0;
}

If you’re on a modern Debian or Ubuntu build with OpenMPI installed (pretty much the standard MPI implementation you should be using), then build.

$ mpicc.openmpi -o hello hello.c
$ mpirun.openmpi -np 4 hello
process 1 of 4 on enskog
process 2 of 4 on enskog
process 3 of 4 on enskog
process 0 of 4 on enskog

I do love my MPI. And Gromacs does dynamic load balancing now … so freaking fast.

Cheers.

EXPLAIN THIS [UPDATED]

Personal, Science — acosta @ 4:24 pm

A few minutes ago I was making chocolate milk (yes, I know, I’m 5 years old, feel free to insert witty banter). Well, as I was stirring I realized that the frequency of the sound I was hearing as the spoon hit the side of the glass decreased with increasing rotational velocity of the fluid. I have yet to come up with a satisfactory explanation for this phenomena, though I’ve only thought about it for about 5 minutes now. Thoughts?

Cheers.

UPDATE: Verdict: lame. See comment #1.

COMPLEXITY OF SONGS

Music, Science, Technical — acosta @ 11:03 am

A short post, but I have to post it. ‘The Complexity of Songs’ is a short communication Don Knuth wrote back in the 70s which is really quite interesting. It’s also a pretty funny joke.

The article capitalizes on the tendency of popular songs to evolve from long and content-rich ballads to highly repetitive texts with little or no meaningful content.

[...]

“…our ancient ancestors invented the concept of refrain” to reduce the space complexity of songs, which becomes crucial when a large number of songs is to be committed to one’s memory.

[...]

Finally, progress during the twentieth century—stimulated by the fact that “the advent of modern drugs has led to demands for still less memory”—leads to the ultimate improvement: Arbitrarily long songs with space complexity O(1), e.g. for a song to be defined by the recurrence relation.

We’ve really taken the concept to heart in modern popular music haven’t we? See here for explanation and here for the original paper.

Cheers.

C++ THREADS

Personal, Science, Technical — acosta @ 7:07 pm

I’m used to writing in C (and Matlab, unfortunately), though I’m not particularly proficient in either. But lately I’ve taken on C++ and holy hell what a huge language. Still, it has a lot of nice features that are going to be important to me in the next year of my graduate work and I’m gonna stick with it. Yay OO, ugh.

For all its size, one of the areas where I have been left completely unsatisfied is in support for threads. Yes, of course POSIX threads are there and I’ve had some success implementing them in some of my older, now completely obsolete C code which I never want to look at again. It’s baffling to me that there is nothing in the STL which develops some nice thread classes. I know there are at least 2 (if not more) very experienced C++ programmers who read vdov.net, and I’m looking for advice. Have you looked at some developed thread classes and if so what have you thought? Recommendations? I would really rather not have to write my own thread classes from scratch (especially since accessing the C pthread library would be a nightmare here), as this is both utterly useless for my research and, well, I’d probably screw it up with near-fledgling knowledge of the language.

Cheers.

“SUPERINSULATION” PART I [PHYSICS]

Personal, Science — shollen @ 1:12 pm

This story is both scientifically interesting and hilarious in some places; you should continue reading it. I’ve divided it into several parts, as it is fairly long. It involves science, scientific politics, and gracious insults. Most importantly, it discusses how my lab at Brown University has shown strong evidence for the existence of Cooper pairs in insulators. In case some readers are backlogged on their scientific jargon (do they have RSS feeds for that?), I’ll describe what I mean. (more…)

TWEENERS

Personal, Science, Technical — jrgreen @ 5:17 pm

I’m drawn to writing with a clear purpose and logical structure: writing that places the readers’ consumption of the content above all else. When studying a technical subject, I attempt to find the clearest, most concise text(s) available. That is, I look for the book or books that will expose the roots of the area. Further, I find reading more fruitful when the text is designed to lay a foundation for a field using a line of reasoning with a concise argument or set of arguments, as opposed to a purely axiomatic or pedagogical approach.

Typically, such books are shorter than those I use for reference and much longer than a wikipedia article – they are in between. I’ve taken to calling these books “tweeners” (n., pl., pronounced tee-wieners), as in “they are be-tween-ers”. Another possible term was “t’ain’ts” (n., pl., a contracted contraction of it with ain’t), as in “t’ain’t a wikipedia article and t’ain’t a reference book”. While I prefer the equally appropriate term t’ain’t, the unfortunate (inappropriate) slang meaning justifies avoiding this collision of terminology (no link). There are also less severe collisions with “tweener”:

Let it be understood that I am not referring to a tweener, n., (1) a person capable of playing multiple positions in a sport, (2) a person that falls between two age generations, (3) a bowling form, (4) a hobbit between the ages of 20 and 32 or (5) a man that looks like a woman or vice versa.

Currently, I’m reading A.I. Khinchin’s “Mathematical Foundations of Statistical Mechanics”. It’s definitely a tweener! As far as I know, the readers (and writers) of vdov.net are a diverse group. Do you have a tweener? Are you man, woman, man that looks like a woman, woman that looks like a man or hobbit enough to share it?

OPTIMAL DECOMPOSITION OF A BOX [UPDATED]

Science, Technical — acosta @ 2:18 pm

For awhile now I’ve been doing distributed computing based on two major methods: the METIS graph partitioning method for decomposition and the MPI method for parallelism. Both of these techniques are well established and used extensively in many fields of computational physics, engineering and chemistry. I’ve been doing simulations in a simple mesh for a few months now. This mesh is simply a box with 200 x 200 x 200 cells. I decompose the box into 8 parts, each part to be run on a different processor using MPI as the construct to deal with processor-processor boundaries/communication. It occurred to me that the METIS method does something particularly ridiculous in this case.

If you simply break the box up into 8 pieces, the easiest possible way to do this is just to simply cut through the planes of the box. The faces of the global box do not exist on processor boundaries, as I apply boundary conditions on all these faces. Each cutting plane has 200 x 200 faces, so you don’t need a CS or math degree to know that the number of processor faces in this case would be 120,000. Is this what METIS gives you? No! It gives you 164,033 processor faces. What the hell?

Here’s a little graph of what this looks like (excuse my very quick and dirty xfig’ing). The width of the boundaries is directly related to the number of processor-processor faces between each decomposed domain.

While there is some obvious symmetry here (within a certain level of approximation), this yields far from the cleanest solution. While METIS may be fantastic for complex domains, it doesn’t do well with simple domains with obvious symmetry. Further, each domain should have a maximum of 3 processor-processor boundaries! It’s important to note here that in fact each processor has 3 major processor-processor boundaries (each node has 3 wide connections — this tell us that METIS is in fact roughly trying to get to the optical structure described above). It’s all the little connections that would be removed with some knowledge of the basic full domain structure. I understand and perhaps believe that this could all be due to some convergence criteria in the method which I am unaware of (in my reading of the papers on the subject and the code itself I haven’t found any such parameter), though still, I see no reason why some from-end part of the algorithmic implementation shouldn’t take into consideration the symmetry of the large and subdomain groups.

Cheers.

[UPDATE after the jump] (more…)

A THEOREM OF ATHEORISM

Personal, Science — jrgreen @ 12:30 pm

After dinner at work last night, I met a new postdoc working down the hall from my office. I said hello, attempting to overcome my social awkwardness, and asked what type of research she does in the chemistry department. She replied “I’m an experimentalist. You, ahem, must be a theorist.” Whoa!

How in the spirit of chemistry did she know?! So I asked. She replied “I can just tell.” Baffling! Then I looked down and realized the first corollary and theorem, in my developing theory of how to not behave like a theorist (hereby termed atheorism):

Corollary 1: Chicken noodle soup shrapnel on a shirt is neither necessary nor sufficient to indicate someone is a theorist.

Theorem 1: Chicken noodle soup shrapnel on a male wearing a t-shirt that says “Visionary Women: Challenging assumptions and inspiring change” from 1993 is sufficient but not necessary to indicate the male is a theorist.

It turned out that I had forgotten to bring dining utensils with my dinner to work. Slurping Campbell’s chicken noodle soup seemed like a good idea at dinner time. Forgetfulness is also typical theorist behavior and will be a later theorem, when my sinful theorist nature catches up with me.

Sincerely,
A devoted atheorist

COMPUTER CHEMISTRY

Science — acosta @ 11:40 am

In this month’s Physics Today, there is an article called “Chemistry on the computer”. The first major quote from the article caught my eye. It comes from Auguste Comte, a natural philosopher, in 1830.

Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit of chemistry. If mathematical analysis should ever hold a prominent place in chemistry — an aberration which is happily almost impossible — it would occasion a rapid and widespread degeneration of that science.

Awesome. I don’t think that Mr. Comte would be very happy with me or a number of people here at Vdov.net.

TRAFFIC FLOWS

Science — acosta @ 2:25 pm

There has been a lot of talk on the tubes lately about the traffic flow problem, specifically a part of this problem that we’re all familiar with: complete stoppages that seem to have no explanation. Some recent links on the popularized tubes (aka, not the science tubes), seem to indicate that there has been some incredible breakthrough in our understanding on this subject. For example:

Slashdot: Scientists solve the mystery of traffic jams

This is fine and well, but unfortunately these people fail to mention the most important work on the subject which initially came from the theory of nonlinear wave equations, and was more or less solved in 1974. It was summed up in a classic text on linear and nonlinear waves so titled and written G. B. Whitham. The book is out of print but it’s around on Amazon as well as other stores and any self-respecting science library should have this book sitting on the shelves. The main problem is one of wave propagation leading to “shock fronts” in traffic. If one person brakes for no reason, shock waves develop and travel backwards (for most flow problems) relative to the moving frame of the cars. Consider a velocity function for cars as a function of the density.

V(\rho) = Q(\rho)/\rho

It’s quite simple to assume that V(\rho) must be a decreasing function of \rho which starts from some maximum value at \rho=0 and decreases to zero as \rho\rightarrow\rho_j, and the maximum density flow Q(\rho) occurs at some specific value of \rho. Guess what? Actual observations peg the value of \rho_j at about 255 vehicles per mile and the maximum flow density \rho_m at about 80 (or 1500 vehicles per hour). Amazingly these values scale in a near linear fashion as lanes are added to the flow on a simple highway. It turns out the maximum flow rate is actually achieved at about 20 miles per hour. If we then develop a simple expression for the propagation velocity:

c(\rho) = Q'(\rho) = V(\rho) + \rho V'(\rho)

Since the derivative of the velocity function is less than 0, propagation of shock waves in a traffic flow travel backwards, and according to Whitham, “warn the drives of disturbances ahead”. Unfortunately this has some pretty negative consequences for you and I, the driver, who will inevitably be fed up with random stoppages in the road for no particular reason. Whitham continues to make some elementary arguments on the status of a wave near the stoppage density of traffic on a road. It turns out that the second derivative of the density flow function Q(\rho) is less than zero, which means that a local increase of density propagates backwards, and shock forms somewhere behind the initial disturbance.

Now I’m sure that people have made some improvements in the mathematical description of this problem since the pioneering work of Whitham, but don’t be fooled: pretty much everything you read about “new developments” in this area in the popular media have been solved for more than 4 decades.

Cheers.

Next Page »
vdov.net is an anthony costa production. ownership of the content provided is retained by the author and by vdov.net.