COMPLEXITY OF SONGS

Music, Science, Technical — acosta @ 11:03 am

A short post, but I have to post it. ‘The Complexity of Songs’ is a short communication Don Knuth wrote back in the 70s which is really quite interesting. It’s also a pretty funny joke.

The article capitalizes on the tendency of popular songs to evolve from long and content-rich ballads to highly repetitive texts with little or no meaningful content.

[...]

“…our ancient ancestors invented the concept of refrain” to reduce the space complexity of songs, which becomes crucial when a large number of songs is to be committed to one’s memory.

[...]

Finally, progress during the twentieth century—stimulated by the fact that “the advent of modern drugs has led to demands for still less memory”—leads to the ultimate improvement: Arbitrarily long songs with space complexity O(1), e.g. for a song to be defined by the recurrence relation.

We’ve really taken the concept to heart in modern popular music haven’t we? See here for explanation and here for the original paper.

Cheers.

C++ THREADS

Personal, Science, Technical — acosta @ 7:07 pm

I’m used to writing in C (and Matlab, unfortunately), though I’m not particularly proficient in either. But lately I’ve taken on C++ and holy hell what a huge language. Still, it has a lot of nice features that are going to be important to me in the next year of my graduate work and I’m gonna stick with it. Yay OO, ugh.

For all its size, one of the areas where I have been left completely unsatisfied is in support for threads. Yes, of course POSIX threads are there and I’ve had some success implementing them in some of my older, now completely obsolete C code which I never want to look at again. It’s baffling to me that there is nothing in the STL which develops some nice thread classes. I know there are at least 2 (if not more) very experienced C++ programmers who read vdov.net, and I’m looking for advice. Have you looked at some developed thread classes and if so what have you thought? Recommendations? I would really rather not have to write my own thread classes from scratch (especially since accessing the C pthread library would be a nightmare here), as this is both utterly useless for my research and, well, I’d probably screw it up with near-fledgling knowledge of the language.

Cheers.

TWEENERS

Personal, Science, Technical — jrgreen @ 5:17 pm

I’m drawn to writing with a clear purpose and logical structure: writing that places the readers’ consumption of the content above all else. When studying a technical subject, I attempt to find the clearest, most concise text(s) available. That is, I look for the book or books that will expose the roots of the area. Further, I find reading more fruitful when the text is designed to lay a foundation for a field using a line of reasoning with a concise argument or set of arguments, as opposed to a purely axiomatic or pedagogical approach.

Typically, such books are shorter than those I use for reference and much longer than a wikipedia article - they are in between. I’ve taken to calling these books “tweeners” (n., pl., pronounced tee-wieners), as in “they are be-tween-ers”. Another possible term was “t’ain’ts” (n., pl., a contracted contraction of it with ain’t), as in “t’ain’t a wikipedia article and t’ain’t a reference book”. While I prefer the equally appropriate term t’ain’t, the unfortunate (inappropriate) slang meaning justifies avoiding this collision of terminology (no link). There are also less severe collisions with “tweener”:

Let it be understood that I am not referring to a tweener, n., (1) a person capable of playing multiple positions in a sport, (2) a person that falls between two age generations, (3) a bowling form, (4) a hobbit between the ages of 20 and 32 or (5) a man that looks like a woman or vice versa.

Currently, I’m reading A.I. Khinchin’s “Mathematical Foundations of Statistical Mechanics”. It’s definitely a tweener! As far as I know, the readers (and writers) of vdov.net are a diverse group. Do you have a tweener? Are you man, woman, man that looks like a woman, woman that looks like a man or hobbit enough to share it?

OPTIMAL DECOMPOSITION OF A BOX [UPDATED]

Science, Technical — acosta @ 2:18 pm

For awhile now I’ve been doing distributed computing based on two major methods: the METIS graph partitioning method for decomposition and the MPI method for parallelism. Both of these techniques are well established and used extensively in many fields of computational physics, engineering and chemistry. I’ve been doing simulations in a simple mesh for a few months now. This mesh is simply a box with 200 x 200 x 200 cells. I decompose the box into 8 parts, each part to be run on a different processor using MPI as the construct to deal with processor-processor boundaries/communication. It occurred to me that the METIS method does something particularly ridiculous in this case.

If you simply break the box up into 8 pieces, the easiest possible way to do this is just to simply cut through the planes of the box. The faces of the global box do not exist on processor boundaries, as I apply boundary conditions on all these faces. Each cutting plane has 200 x 200 faces, so you don’t need a CS or math degree to know that the number of processor faces in this case would be 120,000. Is this what METIS gives you? No! It gives you 164,033 processor faces. What the hell?

Here’s a little graph of what this looks like (excuse my very quick and dirty xfig’ing). The width of the boundaries is directly related to the number of processor-processor faces between each decomposed domain.

While there is some obvious symmetry here (within a certain level of approximation), this yields far from the cleanest solution. While METIS may be fantastic for complex domains, it doesn’t do well with simple domains with obvious symmetry. Further, each domain should have a maximum of 3 processor-processor boundaries! It’s important to note here that in fact each processor has 3 major processor-processor boundaries (each node has 3 wide connections — this tell us that METIS is in fact roughly trying to get to the optical structure described above). It’s all the little connections that would be removed with some knowledge of the basic full domain structure. I understand and perhaps believe that this could all be due to some convergence criteria in the method which I am unaware of (in my reading of the papers on the subject and the code itself I haven’t found any such parameter), though still, I see no reason why some from-end part of the algorithmic implementation shouldn’t take into consideration the symmetry of the large and subdomain groups.

Cheers.

[UPDATE after the jump] (more…)

MATLAB IS INFURIATING BUT HERE’S SOME CODE

Personal, Science, Technical — acosta @ 11:42 am

I’ve had to do a lot of work in Matlab recently, not because I want to work in Matlab or learn a new (albeit very contrived) language. The only reason is that I prefer not to rewrite huge sections of Matlab code that do a lot of the important work for me in my bioinformatics applications. Yes, I could write my own principal component engine, my own golay smoothing, my own normalizations and plotting code, my own peak discovery and alignment code, but hell … why would I do all of that, especially since this application is not particularly computationally expensive. Knowing that all these functions already exist in Matlab, I thought maybe this would be a one day project. Little did I know that Matlab totally sucks. Let me give an example. Let’s say you want to plot a bunch of points from some matrix of data, and some of those points come from group 1, some from group 2, etc. You’d think in something like Matlab this would be obvious. And indeed, at first approximation it is. In theory you just use a command ‘hold on’, which will hold the plot such that you can successively add data points to the plot and you won’t delete all the stuff you already added with the plot command. In theory this looks something like this (don’t worry about the other functions, they are hashes associated with each experiment such that the data gets plotted with groups of points correctly distinguished):

hold on;
for k = 1:numfiles
  for l = 1:numexpt
    if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
    plot(P(k,compa),P(k,compb),plothash_a{l}, ...
    'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
    end
  end
end

Indeed, this works very well. So, let’s say instead I want to plot in 3D. So, I use the command ‘plot3′ instead of ‘plot’. Of course, one would expect this to be very simple. The part here that counts looks like:

hold on;
[...]
if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
plot3(P(k,compa),P(k,compb),P(k,compc),plothash_a{l}, ...
'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
[...]

Knowing that plot3 is the correct command, this produces a 2D plot only representative of the P(k,compa),P(k,compb) data segment. What the hell? So it turns out that if you hold a new plot with ‘hold on’, Matlab assumes you want a 2D plot. Then upon trying to plot in 3D, Matlab decides it is smarter than you are and that clearly your choice of a 2D plot outweighs your decision to use the ‘plot3′ command, and plots in 2D anyway without throwing an error. Why would ‘plot3′ tell me nothing??? I realize this is a pretty trivial complaint and there are plenty of other great examples of ridiculous crap in Matlab that makes no sense.

Anyway, done complaining. In a ton of data processing Matlab demos, the program asks you to important a series of files into a one data matrix, and does it with some very clumsy code that requires you to manually change the program every time you move to a new data set. Not really my style. Let’s say you have a bunch of data vectors organized in a series of directories (happens all the time), where the directories are representative of some data group that should be accessible as a unit. How about something like this:

repository = pwd;
expt = dir('*.enabled');
numexpt = size(expt,1);
for i = 1:numexpt
  repo{i} = strcat(repository,'/',expt(i).name,'/');
  file(i,:) = dir([repo{i} '*.csv']);
  num(i,:) = numel(file(i,:));
  files(i,:) = strcat(repo{i},{file(i,:).name});
end
expt = transpose(expt);
file = transpose(file);
num = transpose(num);
files = transpose(files);
numfiles = numel(files);
for k = 1:numfiles
  [X,Y(:,k)] = textread(files{k});
end

I use the transposes just because they are nice later in my code, they are certainly not required. I am no Matlab programmer, and I know some of you out there are, so any suggestions as to better file import mechanisms would be greatly appreciated. Short of that though, this is a million times better and far more general than the crap they put you through in the Matlab demos (specifically anything in the bioinformatics sections).

Cheers.

SIMPLE LINUX, UBUNTU, LINUS & COMPIZ

Personal, Science, Technical — acosta @ 9:05 pm

Quite a title eh? Well this is sort of a random stream-of-consciousness kind of post. So be prepared. But this place was getting a little dull recently so I thought I’d rehash some of the things I’ve done to my machines recently and perhaps review them a bit. So here goes.

I think I read (though I can’t seem to find the reference anywhere) an interview with Linus Torvalds recently in which he said something like the following (if you know the reference feel free to let me know, I’m pretty sure it was on Kernel Trap this year sometime):

I don’t use Debian or any other ‘low-level’ Linux flavors because I feel like Linux should be easy to use and manageable for day-to-day work, etc.

Those of you that know me well probably know that I have made nothing short of a career in the past 3 years going exactly in the opposite direction here. Recently however, I decided to take the plunge for a number of reasons. They are briefly: 1) I’ve got way too many machines to take care of these days, 2) I love Debian but on laptops I find it a bit annoying to have to configure dynamic things every time I move and 3) Recently I screwed up a bunch of my machines and decided it was time to reinstall them, 4) Being ridiculously OCD I needed to have all my machines running the same software and they all basically need to look the same. Lastly, and definitely most importantly, Ion3 was really having trouble running a lot of the software I needed to run, including Fluent (ANSYS), Gambit, Matlab, etc. So all these things together, along with my acquisition of a brand new laptop, made me decide to take the plunge and reinstall all my machines with … (drum-roll), 64-bit Ubuntu.

Generally I’m pretty happy with my choice. I loved the Ion3 window manager and Debian in general, but Ubuntu is basically Debian with some fancy crap built on top of it. So the backend is basically the same. Plus the update cycle is way better in Ubuntu … well, at least faster. As far as using Gnome, I’m not completely sold yet. I sort of like it … I guess, and I’m getting used to it. But I do miss the simplicity of Ion3. I don’t, however, miss configuring everything manually in Debian for my laptop or the huge number of problems I had with applications really not liking the Ion3 windowing model.

Oddly, Ubuntu Gusty’s (7.10) compositing window manager (Compiz Fusion 0.52) is pretty annoying. There are really no real benefits to it so far as I can tell, other than Aero/Aqua-type effects. And there are plenty of annoyances. As I first got back to reinstalling my systems, basically everything that didn’t work with Ion3 well also didn’t work with Compiz, so I had to disable it out of the box on all of my machines. Annoying.

Alec turned me on to ‘unison’ as a nice little remote folder syncing utility, which is quite wonderful. I use it now to sync my document tree between my 3 work machines (work laptop, work desktop and home desktop). It’s designed for just 2 machines but it works equally well with 3.

I also got a nice new laptop recently, a Dell Latitude D430, which is their ultra-portable business machine. I’ve used it extensively already and generally I’m quite happy with it. Ubuntu runs great on it — I haven’t really been able to detect even the slightest hitch yet — it’s got fantastic battery life and the performance sacrifices due to ultra-portability and long battery life really don’t affect me in the slightest. It’s really going to be brilliant to be able to work on a plane or while traveling, not to mention when I just need to get out of lab for any number of reasons (there are lots of them).

I’m not sure I have much else to say. Lots of real work to get done since my OP is over, as my boss wants to publish pretty soon and I really don’t have enough yet done to do that. Hopefully I’ll be writing paper #2 in February. I doubt anyone will really care about this post but it’s here for you if you like; I had to write something, this place is dead. Hey you … write something for vdov.

Cheers.

MLB.COM BASEBALL ON THE IPHONE

Personal, Sports, Technical — acosta @ 6:46 pm

Dear MLB.com,

As evidenced by your almost immediate response to the release of the iPhone on your mobile updates page for real-time info, stats and pitch-by-bitch play, some non-trivial number of your customers must have iPhones by now. Anyone who reads this site with any regularity would know that I waited in line to get an iPhone on the day it was released, and was pleased by your response. Even though most of the places I watch or check baseball have some sort of Wi-Fi, I still prefer the low-bandwidth version a large part of the time. However recently you have decided to put a banner ad across the top of most of these pages. This means that on any of the real-time game stat and pitch-by-bitch windows, the most important stats are now obfuscated as there is no longer enough real estate on the iPhone screen, and no real way to scale the image. This is pretty much me just whining however it has resulted in me not using that page very much anymore. If it is necessary to use ads at all, I am sure there is a more logical way to place that ad so the content doesn’t suffer.

Thank you,
acosta

FTP HACK

Technical — acosta @ 1:56 pm

The good people over at The New Criterion have seen their fair share of problems recently. As some of you might know, I had done some consulting work for them and it was just one thing after another. None of them were anyone’s particular fault, though the accumulation of a wide range of issues resulted in near-constant problems and downtime, including multiple root-level and apache-level comprimises. I’ll avoid describing the evolution of those issues, but this last one is just funny, mostly because I haven’t ever really seen this type of stuff before on one of my own machines. (more…)

WHEN SOFTWARE MODELS COLLIDE

Technical — acosta @ 3:35 pm

Doesn’t this seem fundamentally wrong somehow?


pts/4 root@enskog:/etc/apt # apt-cache policy iceweasel
iceweasel:
  Installed: 2.0.0.5-0etch1+lenny1
  Candidate: 2.0.0.5-0etch1+lenny1
  Version table:
     2.0.0.6-1 0
        500 http://debian.osuosl.org unstable/main Packages
     2.0.0.6-0etch1 0
        600 http://security.debian.org stable/updates/main Packages
 *** 2.0.0.5-0etch1+lenny1 0
        900 http://security.debian.org testing/updates/main Packages
        100 /var/lib/dpkg/status
     2.0.0.3-1 0
        600 http://debian.osuosl.org stable/main Packages
        900 http://debian.osuosl.org testing/main Packages

I just thought it was a little funny. If you use Debian or Ubuntu and Co., you might get it.

SORT OF CLEVER

Technical — acosta @ 9:44 am

I found this little kernel-init application called bootchart today (OK so it was on some blog that I get in my RSS feed), and it’s sort of clever. It just loads as you load your kernel and stores a bunch of data in memory (as generally one grabs /boot in read-only) then writes it out to file in /var/log/bootchart.tgz. They’ve written a little java application that interprets all the data it gathered and pushes out an image of everything it knew about during the boot process. I thought it was sort of clever. If you install it and are running grub, it’s pretty simple to set up. Just edit /boot/grub/menu.lst and add something like this (this is mine):


title Debian GNU/Linux, kernel 2.6.21-2-amd64 (bootchart mode)
root (hd1,0)
kernel /boot/vmlinuz-2.6.21-2-amd64 root=/dev/sdb1 ro init=/sbin/bootchartd
initrd /boot/initrd.img-2.6.21-2-amd64
savedefault

For most of you the only real difference is the addition of ‘init=/sbin/bootchard’ to the kernel line. I added a new selection to my grub menu for this but I suppose you could run it at every boot … though I can’t imagine this being that fun more than once or twice. Here’s the image of what I ended up with on my machine at work (8 processor Xeon, 16 gigs of RAM and 15K disks). (more…)

IPHONE FIRST IMPRESSIONS

Personal, Technical — acosta @ 2:51 pm

I told myself initially that I wouldn’t write a post about my impressions of the iPhone, but after having used it for now approaching 20 hours and having done some investigatory work on some network service of the phone, I thought I would at least write something, albeit short.

Commercials don’t do this thing justice. It is just simply fun to use. I haven’t really found a single glitch so far. The notes and calculator applications are a little ugly and look rushed, but I imagine in time these will be fixed or people will get used to them. Their functionality isn’t bad … just a little ugly. I could also use some more information about the phone. I, like many people, filter connections to my wireless access points and I needed to know it’s MAC address. That information wasn’t available, so I had to put it on another hub and arp the thing, which was a little annoying.

It does, however, work flawlessly as advertised. Interestingly though, I started doing some traffic dumps from the WiFi interface of the phone, and found that all of those network applications actually get all their data from Apple, not Yahoo (weather) or Google (maps). In fact all application data comes from wu.apple.com. Based on it’s IP it appears to actually be an Apple machine. So … am I to understand that all Google’s map data is specifically reorganized for Apple and sent to them? Can the application actually interpret the raw data from Google’s machines, or are there a fair number of processing steps between Google’s data and the device data? A simple scan of the device from another machine on my local network indicates that it’s pretty locked down. Nothing is open. On another note: EDGE is surprisingly zippy on this device … it certainly was NOT on my last EDGE device. I don’t know if this is from the rumored EDGE network upgrades or the iPhone itself. Regardless, I’m pretty happy with this thing.

Cheers.

FLASH 9 ON X64

Personal, Technical — acosta @ 12:36 pm

Like many people I have been consistently frustrated with the last of Flash support on AMD64/X64. I’m no Flash fan, and if I had my way about it we would just eliminate it entirely from all aspects of the web. But unfortunately I don’t have that option. The vast majority of solutions to this problem involve running a chroot-ed X86 environment within your AMD64 build in order to get Flash to play. Yes, major system overhaul for a simple plugin … seems a bit much. But today I found a good solution that actually works (OK Miriam found it and told me). This is a simple nspluginwrapper issue. In whatever your distro, install nspluginwrapper and then go download the 32 bit version of Flash 9. Extract the libflashplayer.xpt file and the libflashplayer.so file and put them in the appropriate location (in my case /usr/lib/mozilla/plugins/). Then, just run,

nspluginwrapper -i /usr/lib/mozilla/plugins/libflashplayer.so

which will create the file npwrapper.libflashplayer.so in that plugins directory. That’s it. You’re done. Flash 9 works. Brilliant. Now I can waste away time watching YouTube at work, which is really all I was looking for anyway.

Next Page »
vdov.net is an anthony costa production. ownership of the content provided is retained by the author and by vdov.net.