I’ve had to do a lot of work in Matlab recently, not because I want to work in Matlab or learn a new (albeit very contrived) language. The only reason is that I prefer not to rewrite huge sections of Matlab code that do a lot of the important work for me in my bioinformatics applications. Yes, I could write my own principal component engine, my own golay smoothing, my own normalizations and plotting code, my own peak discovery and alignment code, but hell … why would I do all of that, especially since this application is not particularly computationally expensive. Knowing that all these functions already exist in Matlab, I thought maybe this would be a one day project. Little did I know that Matlab totally sucks. Let me give an example. Let’s say you want to plot a bunch of points from some matrix of data, and some of those points come from group 1, some from group 2, etc. You’d think in something like Matlab this would be obvious. And indeed, at first approximation it is. In theory you just use a command ‘hold on’, which will hold the plot such that you can successively add data points to the plot and you won’t delete all the stuff you already added with the plot command. In theory this looks something like this (don’t worry about the other functions, they are hashes associated with each experiment such that the data gets plotted with groups of points correctly distinguished):
hold on;
for k = 1:numfiles
for l = 1:numexpt
if (isequal(char(grp(k)),expt(l).name)) pplot(l) = ...
plot(P(k,compa),P(k,compb),plothash_a{l}, ...
'MarkerSize',10,'MarkerEdgeColor','k','MarkerFaceColor',plothash_c{l});
end
end
end
Indeed, this works very well. So, let’s say instead I want to plot in 3D. So, I use the command ‘plot3′ instead of ‘plot’. Of course, one would expect this to be very simple. The part here that counts looks like:
hold on;
[...]
if (isequal(char(grp(k)),expt(l).name)) pplot(l) = …
plot3(P(k,compa),P(k,compb),P(k,compc),plothash_a{l}, …
‘MarkerSize’,10,’MarkerEdgeColor’,'k’,'MarkerFaceColor’,plothash_c{l});
[...]
Knowing that plot3 is the correct command, this produces a 2D plot only representative of the P(k,compa),P(k,compb) data segment. What the hell? So it turns out that if you hold a new plot with ‘hold on’, Matlab assumes you want a 2D plot. Then upon trying to plot in 3D, Matlab decides it is smarter than you are and that clearly your choice of a 2D plot outweighs your decision to use the ‘plot3′ command, and plots in 2D anyway without throwing an error. Why would ‘plot3′ tell me nothing??? I realize this is a pretty trivial complaint and there are plenty of other great examples of ridiculous crap in Matlab that makes no sense.
Anyway, done complaining. In a ton of data processing Matlab demos, the program asks you to important a series of files into a one data matrix, and does it with some very clumsy code that requires you to manually change the program every time you move to a new data set. Not really my style. Let’s say you have a bunch of data vectors organized in a series of directories (happens all the time), where the directories are representative of some data group that should be accessible as a unit. How about something like this:
repository = pwd;
expt = dir('*.enabled');
numexpt = size(expt,1);
for i = 1:numexpt
repo{i} = strcat(repository,'/',expt(i).name,'/');
file(i,:) = dir([repo{i} '*.csv']);
num(i,:) = numel(file(i,:));
files(i,:) = strcat(repo{i},{file(i,:).name});
end
expt = transpose(expt);
file = transpose(file);
num = transpose(num);
files = transpose(files);
numfiles = numel(files);
for k = 1:numfiles
[X,Y(:,k)] = textread(files{k});
end
I use the transposes just because they are nice later in my code, they are certainly not required. I am no Matlab programmer, and I know some of you out there are, so any suggestions as to better file import mechanisms would be greatly appreciated. Short of that though, this is a million times better and far more general than the crap they put you through in the Matlab demos (specifically anything in the bioinformatics sections).
Cheers.