grep biotech | more

A blog about my experiences with bioinformatics, operating systems, and random other technologies and bits.

Wednesday, March 28, 2012

Biological Alignment Software, Used for Finding Common Phrases

Recently I was asked to compare a second manuscript to one we'd previously written on a very similar subject to make sure there wasn't too much "copy and paste". While this could probably be done by eye, I like to do things the easy (hard? but at least fun) way, so I wondered if diff or some other UNIX utility could help me out. Apparently, not, or at least not easily. The next thing to try in my repertoire was something from the massive array of biological sequence alignment software.

While there are probably better (and expensive) options for finding similarities between documents, primarily for detecting plagiarism, I wondered if local alignment software could also do the job. Here's an example using fairly lenient gap penalties with the 'water' program from EMBOSS.

Using a simple script to convert two text files to two FASTA files for input in to your aligner of choice, I got this output from water:

I also tried making an English alphabet substitution matrix (that was an identity matrix), but it didn't seem to matter much. Due to special meanings of some characters and the ability to only use English alphabet characters in the aligners I tested so far, it would be best to edit the source code of the alignment package for more serious use. Also, water was somewhat slow on this alignment, so using a heuristic package like SOAP or BLAST might be better for long documents.

Monday, August 23, 2010

ksh93 for bioinformatics

From #illumos on Freenode:
alanc: I suspect anyone using ksh93 for bioinformatics is likely to be the thing under study
* gdamore looks at roland.
yep, richlowe was right.

Monday, April 12, 2010

My favorite $TERM

The default in opensolaris, xterm, was not nice. I copied a file:

pfexec cp /usr/local/share/terminfo/x/xterm-256color /usr/share/lib/terminfo/x/

then add:
export TERM=xterm-256color
to ~/.bashrc

If you don't have this installed, and I'm not sure how I installed it, you can probably find it on a linux installation. Now back to work, in color!

Sunday, September 27, 2009

If you are trying to build libSBML, you may run into a problem similar to this.
As the post seems to indicate, building gcc with --disable-concept-checks should fix the problem. This is the case for libSBML. It seems that disabled concept checks may be the default for gcc; I am not sure why my gcc build had them enabled originally (I suppose it is a good idea if you only have to worry about your own code).

Tuesday, July 21, 2009

Monitor hardware faults in Solaris

I found the following script:

which makes use of Solaris' FMA service (Fault Management):

Really, it would be nice to have email notifications set up and easily configurable. I didn't want to mess with syslog-ng, as I wasn't sure where to start (I certainly didn't want everything it finds emailed to me).

As always, you'll want to test that your mail gets past SPAM filters - you can do this with mailx from the command line. I edited the script to use mailx -r my_reply_address to keep it consistent with my testing (since gmail initially did mark my tests as spam). I also tested the script in VirtualBox to verify that it works when a hard drive that is part of a zpool goes missing.

Saturday, December 20, 2008

Blender for Solaris X64 (Core2Duo)

Download: Blender Bundle

This is a rough build (no packages) of Blender 2.4.8a. Furthermore, it may only run on Intel Core2Duo systems, since I used -march=core2 and -mtune=core2 when building with gcc. This build was done without OpenAL; if you need this, let me know. All "make install"s for dependencies install in /usr/local, with the exception of blender itself, which does not appear to have a make install; instead, copy the directory to the desired location and create a script like the following to run Blender (say, at /usr/local/blender):

LD_LIBRARY_PATH=/usr/gnu/lib/amd64:/usr/local/lib/amd64 /usr/local/blender-2.48a/obj/solaris-2.11-x86_64/bin/blender

You will also need to have (64bit) on your system; see the post below on building gcc if you need this. If you use this or have any suggestions, please let me know.

Friday, September 5, 2008

Remove advertisement text from searched content

Web search engines, google's included, need to remove (by default anyway) any text that appears to be coming from an advertisement in the page from the searched text of the page. I could talk more about why, but I think this is obvious.

Wednesday, July 23, 2008

Equality of elements should imply equality of types

Unfortunately, few things are perfect. I'll leave it to someone else to tell me I'm wrong (or right?) here, but the following seems less than ideal:

sage: a = int('1')
sage: b = Integer('1')
sage: a == b
sage: type(a) == type(b)

I love Sage, so maybe I'll come back to this later when I have more time. Right now, I think that if type(a) is contained in type(b) *and* a == b, then type(a) == type(b) should return true. But this brings up interesting questions about what are types ... I'll edit this entry more later once I've read up on more type theory, more category theory, and sage/python type handling design (this could be a while!).

Saturday, August 4, 2007

Update 08/03/07

I've been doing a fair amount of coding lately, I don't feel like going in to details just yet. One project is CthughaNix, the resurrection of a great music visualization tool. Speaking of bring back things from the past, I started reading The Wheel of Time saga again, since there's only the last book left to be written. This was most likely a bad decision on my part as it is hard for me to put a book down and I'm fairly busy right now. The other project is work related – it is intended to be parametric alignment software eventually, but right now I'm finalizing regular global alignment and integrating it with EMBOSS.

Here's a SS of Cthughanix running in Solaris, but screen shots don't really do it justice:

Tuesday, July 3, 2007

GMP 4.2.1 on OS X 10.3.9

You don't want the same company that makes the iPhone to make your enterprise servers. Regardless, it seems I'm stuck with using a small OS X 10.3.9 cluster. We can't upgrade easily (I won't get in to the reasons, though some of them are partly Apple's fault of course), but suffice to say that despite having such a nice machine, we are stuck in the 32 bit land of the not so ancient OS X 10.3 and can't even get things like Java JDK 1.5 or 1.6 (an added incentive to upgrade to OS X 10.4 from Apple...).

At any rate, if you also have these problems (unlikely, but I know some of you are out there), here are the options I used to build GMP 4.2.1 with GCC 4.2.0 (prerelease):

CFLAGS and CXXFLAGS: -O3 -m32 -mcpu=7450 -mpowerpc -maltivec

make distclean && ./configure --enable-cxx ABI=32 && make && make check