GAUSSIAN STEP MISMATCH

Science, Technical — acosta @ 11:39 am

I run into this weird error all the time in Gaussian when doing optimization or frequency restarts. This is a big problem for me because a lot of times I run these on a large number of processors, but in a queue with a relatively short max wall time. Searching the web for this error yields one hit, and it’s in Chinese and not helpful even under Google translate.

Error originates: RdWrOT: IFlag = 2 Data mismatch
Search? I only get here. And the results aren’t exactly useful …

你给出的信息太少了,能不能多贴一点出来?

Anyway this almost always seems to be a problem with the collision of the previous and current route. Often I have to increase the number of optimization or SCF cycles because my systems are large and I do optimizations with diffuse functions, which tend to be pretty ill-behaved. Here’s the total head of the log:

******************************************
Gaussian 03: AM64L-G03RevE.01 11-Sep-2007
29-Mar-2009
******************************************
%chk=freq_min60.chk
%nprocshared=8
Will use up to 8 processors via shared memory.
%nproclinda=16
Will use up to 16 processors via Linda.
%mem=100MW
Default route: MaxDisk=200GB
----------------------------------------------------------------------
# freq=restart b3lyp/6-31+g(d) geom=allcheckpoint guess=read int=fmmna
toms=300 scf=tight
----------------------------------------------------------------------
1/10=4,30=1,35=1/3;
99//99;

GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGrad
Berny optimization.
Restoring state from the checkpoint file "freq_min60.chk".
Title: min60
Route: # opt b3lyp/6-31+g(d) geom=allcheckpoint guess=read int=fmmnat
oms=300 optcyc=1000 scfcyc=1000
RdWrOT: IFlag = 2 Data mismatch
MaxStp (old) = 504 MaxStp (new) = 2
MaxJob (old) = 1 MaxJob (new) = 1
RdWrOT: Data mismatch on MaxStp/MaxJob
Error termination via Lnk1e in /apps/steele/g03-E.01/l103.exe at Sun Mar 29 01:19:04 2009.
Job cpu time: 0 days 0 hours 0 minutes 13.6 seconds.
File lengths (MBytes): RWF= 49 Int= 0 D2E= 0 Chk= 56 Scr= 1
Command exited with non-zero status 1

I’m posting this more so this error comes up in Google to Vdov so maybe, maybe someone can tell me about it (no one, including people who really know the software well, has been able to provide an acceptable explanation thus far). If the two routes are both optimizations, for instance, you can usually get around this error by eliminating the opt cycle specification in the new restarted route. But if you’re moving the guess and geometry to some new calculation, it’s nearly impossible to get around this. The solution is almost always to create a formatted checkpoint file (formchk) and convert back (unfchk), so the route disappears. You could also obviously do this by specifying the new geometry as a Z-matrix in the initial calculation, but I much prefer to read my initial guess from the checkpoint, so this is not a good option in many cases. Starting the calculation and then restarting from the new binary checkpoint file usually does the trick, as there are appear to be no collisions in the route cycles.

Anyway, cheers. Hopefully someone who knows something about this will let me know.

GROMACS SUBSET STATISTICS

Science, Technical — acosta @ 10:57 am

This is a method for generating data in gromacs programs which require the system to be composed of only those molecules or atoms on which statistics will be run.

Let’s say you have a system composed of N different species and you’ve got your xtc trajectory file from the run. Then let’s say you want to know about average cluster sizes of one of the species in the simulation. For some programs in gromacs (not g_clustsize, the one in question here) this is fairly easy because the software lets you specify, either through the program options itself or through an index file, what you’d like to consider. With others though, especially those with options for dealing with molecule statistics explicitly, that don’t allow you to do this for whatever reason. So, a workaround is necessary.

First, edit your input mdp file and topology with extreme prejudice, eliminating or commenting out references to anything you’re not interested in. For instance, in my files, I need to remove all water and ions. I call all these new files “fake” versions of the real files.

$ diff fake_fullmd.mdp fullmd.mdp
15,16c15,16
< xtc_grps = protein ; sol na+ cl-
< energygrps = protein ; sol na+ cl-
---
> xtc_grps = protein sol na+ cl-
> energygrps = protein sol na+ cl-


$ diff fake_topol.top topol.top
163a164,166
> SOL 9529
> NA+ 10
> CL- 10

Make sure you have an index file if you don’t already:

$ make_ndx -f conf.gro

Then dump the first frame of the real simulation. The program (trjconv) will ask you which parts of the frame you’d like to dump.

$ trjconv -f traj.xtc -o fake_protein.gro -s b4md.tpr -n index.ndx -dump 0
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:

Now generate the new input binary for your “fake” system.

$ grompp -f fake_fullmd.mdp -c fake_protein.gro -p fake_topol.top -o fake_b4md.tpr

Convert your trajectory, again selecting whichever part of the trajectory you’re interested in.

$ trjconv -f traj.xtc -o fake_protein.xtc -s b4md.tpr -n index.ndx
...
Select group for output
Group 0 ( System) has 37968 elements
Group 1 ( Protein) has 1568 elements
Group 2 ( Protein-H) has 784 elements
...
Select a group:

Now you’re ready to run your analysis! It won’t actually use the index file you specify here (since you’re only looking at molecules with the -mol option), though it requires it for some reason that eludes me.

$ g_clustsize -f fake_protein.xtc -s fake_b4md.tpr -mol -n index.ndx

And there you have it. Cluster statistics for an arbitrary subset of your system. Cheers.

NOTE: There actually are slightly more elegant ways of doing this, but this is perfectly sufficient for simple situations, like clustering of some molecule in some other explicit medium.

YOUR GOVT ON TWITTER AGAIN [SOCIAL MEDIA AND GOVT]

Links, Politics, Technical, USA — afischer @ 9:50 pm

A map of congressional twitterers A while ago I wrote a post about government types on Twitter. What surprised me at the time was that Republicans seemed to outnumber Democrats on Twitter. A blogger from the UK, Mat Morrison, confirms my supposition with a nice map. He also brings up a point I hadn’t even considered. There is almost no cross-talk between the two parties (at least on Twitter). It really makes on wonder what contact Republicans and Democrats have online. Do congressmen text each other all the time? Email? Is participation in social media balkanizing or unifying? The one thing Mr. Morrison doesn’t address is the “authenticity” of congressional tweets. There is a HUGE difference between Hillary/Obama’s staffer written tweet directives and the personal tweeting of Rep. John Culberson, Tim Ryan, Thad McCotter, or Neil Abercrombie (who has a very odd twitter feed… and starts a lot of tweets with a “Hi everybody” or similar phrase).

YOUR HOME NETWORK

Personal, Technical — acosta @ 6:23 pm

I’ve been giving a lot of thought to how crappy my home network has become. Of course this has gotten me to think about what my ideal home network would look like. So, for all of you out there, let’s do a little thought experiment. Say you have the money to put together your “perfect” home network. What would it look like? What major components would it use? What services would it run? How would it be organized?

This has been a pretty fun little experiment for me, and I’ve come up with configurations anywhere from relative modesty to incredible setups which would cost tens if not hunders of thousands of dollars and include substantial modifications to wiring and home configuration (network closet anyone?)

Maybe some of you have some ideas. Cheers.

TEST YOUR MPI

Personal, Science, Technical — acosta @ 4:53 pm

It sometimes amazes me how a lot of people are much happier to ask stupid questions than to just do the basic work themselves, maybe even learning something in the process. In the Gromacs community, the past couple weeks have been a great time for some nice examples of this. Version 4 came out, which *substantially* improves the scalability of parallelized molecular simulations, due to a move from the previously standard particle decomposition method to the much more general domain decomposition (DD) method. The DD method has been popular in continuum physics and in other fields for quite some time, but this is the first real application to discrete work it has seen.

So, of course, people need to know how to do parallel simulations with this code. In all major package managers 4.0.2 hasn’t made it through any appropriate channels, so people have to build it themselves. Unlike most major scientific packages, building Gromacs is absurdly simple. Things are quite beautiful actually.

Anyway, my point isn’t to extol the virtues of Gromacs but rather to suggest that if something doesn’t work, do the initial work to figure out the problem and exhaust at least the most obvious problems with the software before throwing your hands up in the air. Problem with MPI? Test it first! Anyone working with MPI should at the very least be able to look up how to write a basic MPI application.

An example:

#include "stdio.h"
#include "mpi.h"

int
main(argc, argv)
int   argc;
char  *argv[];
{
  int  rank, size, length;

  char name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Get_processor_name(name, &length);

  printf ("process %d of %d on %s\n", rank, size, name);

  MPI_Finalize();

  return 0;
}

If you’re on a modern Debian or Ubuntu build with OpenMPI installed (pretty much the standard MPI implementation you should be using), then build.

$ mpicc.openmpi -o hello hello.c
$ mpirun.openmpi -np 4 hello
process 1 of 4 on enskog
process 2 of 4 on enskog
process 3 of 4 on enskog
process 0 of 4 on enskog

I do love my MPI. And Gromacs does dynamic load balancing now … so freaking fast.

Cheers.

COMPLEXITY OF SONGS

Music, Science, Technical — acosta @ 11:03 am

A short post, but I have to post it. ‘The Complexity of Songs’ is a short communication Don Knuth wrote back in the 70s which is really quite interesting. It’s also a pretty funny joke.

The article capitalizes on the tendency of popular songs to evolve from long and content-rich ballads to highly repetitive texts with little or no meaningful content.

[...]

“…our ancient ancestors invented the concept of refrain” to reduce the space complexity of songs, which becomes crucial when a large number of songs is to be committed to one’s memory.

[...]

Finally, progress during the twentieth century—stimulated by the fact that “the advent of modern drugs has led to demands for still less memory”—leads to the ultimate improvement: Arbitrarily long songs with space complexity O(1), e.g. for a song to be defined by the recurrence relation.

We’ve really taken the concept to heart in modern popular music haven’t we? See here for explanation and here for the original paper.

Cheers.

C++ THREADS

Personal, Science, Technical — acosta @ 7:07 pm

I’m used to writing in C (and Matlab, unfortunately), though I’m not particularly proficient in either. But lately I’ve taken on C++ and holy hell what a huge language. Still, it has a lot of nice features that are going to be important to me in the next year of my graduate work and I’m gonna stick with it. Yay OO, ugh.

For all its size, one of the areas where I have been left completely unsatisfied is in support for threads. Yes, of course POSIX threads are there and I’ve had some success implementing them in some of my older, now completely obsolete C code which I never want to look at again. It’s baffling to me that there is nothing in the STL which develops some nice thread classes. I know there are at least 2 (if not more) very experienced C++ programmers who read vdov.net, and I’m looking for advice. Have you looked at some developed thread classes and if so what have you thought? Recommendations? I would really rather not have to write my own thread classes from scratch (especially since accessing the C pthread library would be a nightmare here), as this is both utterly useless for my research and, well, I’d probably screw it up with near-fledgling knowledge of the language.

Cheers.

TWEENERS

Personal, Science, Technical — jrgreen @ 5:17 pm

I’m drawn to writing with a clear purpose and logical structure: writing that places the readers’ consumption of the content above all else. When studying a technical subject, I attempt to find the clearest, most concise text(s) available. That is, I look for the book or books that will expose the roots of the area. Further, I find reading more fruitful when the text is designed to lay a foundation for a field using a line of reasoning with a concise argument or set of arguments, as opposed to a purely axiomatic or pedagogical approach.

Typically, such books are shorter than those I use for reference and much longer than a wikipedia article – they are in between. I’ve taken to calling these books “tweeners” (n., pl., pronounced tee-wieners), as in “they are be-tween-ers”. Another possible term was “t’ain’ts” (n., pl., a contracted contraction of it with ain’t), as in “t’ain’t a wikipedia article and t’ain’t a reference book”. While I prefer the equally appropriate term t’ain’t, the unfortunate (inappropriate) slang meaning justifies avoiding this collision of terminology (no link). There are also less severe collisions with “tweener”:

Let it be understood that I am not referring to a tweener, n., (1) a person capable of playing multiple positions in a sport, (2) a person that falls between two age generations, (3) a bowling form, (4) a hobbit between the ages of 20 and 32 or (5) a man that looks like a woman or vice versa.

Currently, I’m reading A.I. Khinchin’s “Mathematical Foundations of Statistical Mechanics”. It’s definitely a tweener! As far as I know, the readers (and writers) of vdov.net are a diverse group. Do you have a tweener? Are you man, woman, man that looks like a woman, woman that looks like a man or hobbit enough to share it?

OPTIMAL DECOMPOSITION OF A BOX [UPDATED]

Science, Technical — acosta @ 2:18 pm

For awhile now I’ve been doing distributed computing based on two major methods: the METIS graph partitioning method for decomposition and the MPI method for parallelism. Both of these techniques are well established and used extensively in many fields of computational physics, engineering and chemistry. I’ve been doing simulations in a simple mesh for a few months now. This mesh is simply a box with 200 x 200 x 200 cells. I decompose the box into 8 parts, each part to be run on a different processor using MPI as the construct to deal with processor-processor boundaries/communication. It occurred to me that the METIS method does something particularly ridiculous in this case.

If you simply break the box up into 8 pieces, the easiest possible way to do this is just to simply cut through the planes of the box. The faces of the global box do not exist on processor boundaries, as I apply boundary conditions on all these faces. Each cutting plane has 200 x 200 faces, so you don’t need a CS or math degree to know that the number of processor faces in this case would be 120,000. Is this what METIS gives you? No! It gives you 164,033 processor faces. What the hell?

Here’s a little graph of what this looks like (excuse my very quick and dirty xfig’ing). The width of the boundaries is directly related to the number of processor-processor faces between each decomposed domain.

While there is some obvious symmetry here (within a certain level of approximation), this yields far from the cleanest solution. While METIS may be fantastic for complex domains, it doesn’t do well with simple domains with obvious symmetry. Further, each domain should have a maximum of 3 processor-processor boundaries! It’s important to note here that in fact each processor has 3 major processor-processor boundaries (each node has 3 wide connections — this tell us that METIS is in fact roughly trying to get to the optical structure described above). It’s all the little connections that would be removed with some knowledge of the basic full domain structure. I understand and perhaps believe that this could all be due to some convergence criteria in the method which I am unaware of (in my reading of the papers on the subject and the code itself I haven’t found any such parameter), though still, I see no reason why some from-end part of the algorithmic implementation shouldn’t take into consideration the symmetry of the large and subdomain groups.

Cheers.

[UPDATE after the jump] (more…)

MATLAB IS INFURIATING BUT HERE’S SOME CODE

Personal, Science, Technical — acosta @ 11:42 am

I’ve had to do a lot of work in Matlab recently, not because I want to work in Matlab or learn a new (albeit very contrived) language. The only reason is that I prefer not to rewrite huge sections of Matlab code that do a lot of the important work for me in my bioinformatics applications. Yes, I could write my own principal component engine, my own golay smoothing, my own normalizations and plotting code, my own peak discovery and alignment code, but hell … why would I do all of that, especially since this application is not particularly computationally expensive. Knowing that all these functions already exist in Matlab, I thought maybe this would be a one day project. Little did I know that Matlab totally sucks. Let me give an example. Let’s say you want to plot a bunch of points from some matrix of data, and some of those points come from group 1, some from group 2, etc. You’d think in something like Matlab this would be obvious. And indeed, at first approximation it is. In theory you just use a command ‘hold on’, which will hold the plot such that you can successively add data points to the plot and you won’t delete all the stuff you already added with the plot command. In theory this looks something like this (don’t worry about the other functions, they are hashes associated with each experiment such that the data gets plotted with groups of points correctly distinguished):

hold on;
for k = 1:numfiles
  for l = 1:numexpt
    if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
    plot(P(k,compa),P(k,compb),plothash_a{l}, ...
    'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
    end
  end
end

Indeed, this works very well. So, let’s say instead I want to plot in 3D. So, I use the command ‘plot3′ instead of ‘plot’. Of course, one would expect this to be very simple. The part here that counts looks like:

hold on;
[...]
if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
plot3(P(k,compa),P(k,compb),P(k,compc),plothash_a{l}, ...
'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
[...]

Knowing that plot3 is the correct command, this produces a 2D plot only representative of the P(k,compa),P(k,compb) data segment. What the hell? So it turns out that if you hold a new plot with ‘hold on’, Matlab assumes you want a 2D plot. Then upon trying to plot in 3D, Matlab decides it is smarter than you are and that clearly your choice of a 2D plot outweighs your decision to use the ‘plot3′ command, and plots in 2D anyway without throwing an error. Why would ‘plot3′ tell me nothing??? I realize this is a pretty trivial complaint and there are plenty of other great examples of ridiculous crap in Matlab that makes no sense.

Anyway, done complaining. In a ton of data processing Matlab demos, the program asks you to important a series of files into a one data matrix, and does it with some very clumsy code that requires you to manually change the program every time you move to a new data set. Not really my style. Let’s say you have a bunch of data vectors organized in a series of directories (happens all the time), where the directories are representative of some data group that should be accessible as a unit. How about something like this:

repository = pwd;
expt = dir('*.enabled');
numexpt = size(expt,1);
for i = 1:numexpt
  repo{i} = strcat(repository,'/',expt(i).name,'/');
  file(i,:) = dir([repo{i} '*.csv']);
  num(i,:) = numel(file(i,:));
  files(i,:) = strcat(repo{i},{file(i,:).name});
end
expt = transpose(expt);
file = transpose(file);
num = transpose(num);
files = transpose(files);
numfiles = numel(files);
for k = 1:numfiles
  [X,Y(:,k)] = textread(files{k});
end

I use the transposes just because they are nice later in my code, they are certainly not required. I am no Matlab programmer, and I know some of you out there are, so any suggestions as to better file import mechanisms would be greatly appreciated. Short of that though, this is a million times better and far more general than the crap they put you through in the Matlab demos (specifically anything in the bioinformatics sections).

Cheers.

SIMPLE LINUX, UBUNTU, LINUS & COMPIZ

Personal, Science, Technical — acosta @ 9:05 pm

Quite a title eh? Well this is sort of a random stream-of-consciousness kind of post. So be prepared. But this place was getting a little dull recently so I thought I’d rehash some of the things I’ve done to my machines recently and perhaps review them a bit. So here goes.

I think I read (though I can’t seem to find the reference anywhere) an interview with Linus Torvalds recently in which he said something like the following (if you know the reference feel free to let me know, I’m pretty sure it was on Kernel Trap this year sometime):

I don’t use Debian or any other ‘low-level’ Linux flavors because I feel like Linux should be easy to use and manageable for day-to-day work, etc.

Those of you that know me well probably know that I have made nothing short of a career in the past 3 years going exactly in the opposite direction here. Recently however, I decided to take the plunge for a number of reasons. They are briefly: 1) I’ve got way too many machines to take care of these days, 2) I love Debian but on laptops I find it a bit annoying to have to configure dynamic things every time I move and 3) Recently I screwed up a bunch of my machines and decided it was time to reinstall them, 4) Being ridiculously OCD I needed to have all my machines running the same software and they all basically need to look the same. Lastly, and definitely most importantly, Ion3 was really having trouble running a lot of the software I needed to run, including Fluent (ANSYS), Gambit, Matlab, etc. So all these things together, along with my acquisition of a brand new laptop, made me decide to take the plunge and reinstall all my machines with … (drum-roll), 64-bit Ubuntu.

Generally I’m pretty happy with my choice. I loved the Ion3 window manager and Debian in general, but Ubuntu is basically Debian with some fancy crap built on top of it. So the backend is basically the same. Plus the update cycle is way better in Ubuntu … well, at least faster. As far as using Gnome, I’m not completely sold yet. I sort of like it … I guess, and I’m getting used to it. But I do miss the simplicity of Ion3. I don’t, however, miss configuring everything manually in Debian for my laptop or the huge number of problems I had with applications really not liking the Ion3 windowing model.

Oddly, Ubuntu Gusty’s (7.10) compositing window manager (Compiz Fusion 0.52) is pretty annoying. There are really no real benefits to it so far as I can tell, other than Aero/Aqua-type effects. And there are plenty of annoyances. As I first got back to reinstalling my systems, basically everything that didn’t work with Ion3 well also didn’t work with Compiz, so I had to disable it out of the box on all of my machines. Annoying.

Alec turned me on to ‘unison’ as a nice little remote folder syncing utility, which is quite wonderful. I use it now to sync my document tree between my 3 work machines (work laptop, work desktop and home desktop). It’s designed for just 2 machines but it works equally well with 3.

I also got a nice new laptop recently, a Dell Latitude D430, which is their ultra-portable business machine. I’ve used it extensively already and generally I’m quite happy with it. Ubuntu runs great on it — I haven’t really been able to detect even the slightest hitch yet — it’s got fantastic battery life and the performance sacrifices due to ultra-portability and long battery life really don’t affect me in the slightest. It’s really going to be brilliant to be able to work on a plane or while traveling, not to mention when I just need to get out of lab for any number of reasons (there are lots of them).

I’m not sure I have much else to say. Lots of real work to get done since my OP is over, as my boss wants to publish pretty soon and I really don’t have enough yet done to do that. Hopefully I’ll be writing paper #2 in February. I doubt anyone will really care about this post but it’s here for you if you like; I had to write something, this place is dead. Hey you … write something for vdov.

Cheers.

MLB.COM BASEBALL ON THE IPHONE

Personal, Sports, Technical — acosta @ 6:46 pm

Dear MLB.com,

As evidenced by your almost immediate response to the release of the iPhone on your mobile updates page for real-time info, stats and pitch-by-bitch play, some non-trivial number of your customers must have iPhones by now. Anyone who reads this site with any regularity would know that I waited in line to get an iPhone on the day it was released, and was pleased by your response. Even though most of the places I watch or check baseball have some sort of Wi-Fi, I still prefer the low-bandwidth version a large part of the time. However recently you have decided to put a banner ad across the top of most of these pages. This means that on any of the real-time game stat and pitch-by-bitch windows, the most important stats are now obfuscated as there is no longer enough real estate on the iPhone screen, and no real way to scale the image. This is pretty much me just whining however it has resulted in me not using that page very much anymore. If it is necessary to use ads at all, I am sure there is a more logical way to place that ad so the content doesn’t suffer.

Thank you,
acosta

Next Page »
vdov.net is an anthony costa production. ownership of the content provided is retained by the author and by vdov.net.