MATLAB IS INFURIATING BUT HERE’S SOME CODE

Personal, Science, Technical — acosta @ 11:42 am

I’ve had to do a lot of work in Matlab recently, not because I want to work in Matlab or learn a new (albeit very contrived) language. The only reason is that I prefer not to rewrite huge sections of Matlab code that do a lot of the important work for me in my bioinformatics applications. Yes, I could write my own principal component engine, my own golay smoothing, my own normalizations and plotting code, my own peak discovery and alignment code, but hell … why would I do all of that, especially since this application is not particularly computationally expensive. Knowing that all these functions already exist in Matlab, I thought maybe this would be a one day project. Little did I know that Matlab totally sucks. Let me give an example. Let’s say you want to plot a bunch of points from some matrix of data, and some of those points come from group 1, some from group 2, etc. You’d think in something like Matlab this would be obvious. And indeed, at first approximation it is. In theory you just use a command ‘hold on’, which will hold the plot such that you can successively add data points to the plot and you won’t delete all the stuff you already added with the plot command. In theory this looks something like this (don’t worry about the other functions, they are hashes associated with each experiment such that the data gets plotted with groups of points correctly distinguished):

hold on;
for k = 1:numfiles
  for l = 1:numexpt
    if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
    plot(P(k,compa),P(k,compb),plothash_a{l}, ...
    'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
    end
  end
end

Indeed, this works very well. So, let’s say instead I want to plot in 3D. So, I use the command ‘plot3′ instead of ‘plot’. Of course, one would expect this to be very simple. The part here that counts looks like:

hold on;
[...]
if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
plot3(P(k,compa),P(k,compb),P(k,compc),plothash_a{l}, ...
'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
[...]

Knowing that plot3 is the correct command, this produces a 2D plot only representative of the P(k,compa),P(k,compb) data segment. What the hell? So it turns out that if you hold a new plot with ‘hold on’, Matlab assumes you want a 2D plot. Then upon trying to plot in 3D, Matlab decides it is smarter than you are and that clearly your choice of a 2D plot outweighs your decision to use the ‘plot3′ command, and plots in 2D anyway without throwing an error. Why would ‘plot3′ tell me nothing??? I realize this is a pretty trivial complaint and there are plenty of other great examples of ridiculous crap in Matlab that makes no sense.

Anyway, done complaining. In a ton of data processing Matlab demos, the program asks you to important a series of files into a one data matrix, and does it with some very clumsy code that requires you to manually change the program every time you move to a new data set. Not really my style. Let’s say you have a bunch of data vectors organized in a series of directories (happens all the time), where the directories are representative of some data group that should be accessible as a unit. How about something like this:

repository = pwd;
expt = dir('*.enabled');
numexpt = size(expt,1);
for i = 1:numexpt
  repo{i} = strcat(repository,'/',expt(i).name,'/');
  file(i,:) = dir([repo{i} '*.csv']);
  num(i,:) = numel(file(i,:));
  files(i,:) = strcat(repo{i},{file(i,:).name});
end
expt = transpose(expt);
file = transpose(file);
num = transpose(num);
files = transpose(files);
numfiles = numel(files);
for k = 1:numfiles
  [X,Y(:,k)] = textread(files{k});
end

I use the transposes just because they are nice later in my code, they are certainly not required. I am no Matlab programmer, and I know some of you out there are, so any suggestions as to better file import mechanisms would be greatly appreciated. Short of that though, this is a million times better and far more general than the crap they put you through in the Matlab demos (specifically anything in the bioinformatics sections).

Cheers.

5 Comments »

  1. Did you ever look into R/Biocondutor? I know that is what the programmers here use for a lot of similar types of applications.

    Comment by afischer — 1/28/2008 @ 1:19 pm
  2. interesting. i would have much preferred to write in R and i didn’t know about bioconductor. the types of things i need to do that would be included in packages like R/BC i’ve already done, and for development from here on out i might as well now use matlab since i have learned it pretty well at this point. at least i can get around. probably a pretty solid find for code examples though thanks. i should never say this of course, but there are some nice things about working in such a high level language for basic informatics … certainly i would be shooting myself in the foot if i wrote anything in matlab to do my main project calculations (finite volume/finite element and MD)

    Comment by acosta — 1/28/2008 @ 2:35 pm
  3. that being said, i just had to write this … this is when i realize how inelegant matlab can be …

    grp(counter:counter+size(file(:,i),1)-1,1) = repmat({expt(i).name},size(file(:,i)));

    but it works!

    Comment by acosta — 1/28/2008 @ 2:39 pm
  4. What caught my eye was the ‘hold on;’ – I saw that and read it as advising the compiler about the quality of upcoming code. I think some language should have that construct.

    Comment by Alec — 1/28/2008 @ 4:58 pm
  5. haha that’s awesome. i would much prefer matlab have something like that then it’s stupid plotting subsystem.

    Comment by acosta — 1/29/2008 @ 11:26 am

RSS feed for comments on this post.

Leave a comment

vdov.net is an anthony costa production. ownership of the content provided is retained by the author and by vdov.net.