Archive

Archive for the ‘General’ Category

Finding text similarities with fuzzy hashes – Duplicate code for example

April 24th, 2011 1 comment

How would an email server go about identifying spam email ? The problem is an interesting one. The challenges towards identifying spam are…

1. Scaling any solution to thousands of emails
2. Identifying spam even when there are small changes to the spam content
3. Reducing false positives

One solution could be to identify a hash for the spam message and compare the hash with the hash of a new message. The problem with this approach is that minute changes in a message can result in a different hash. A fuzzy hash (Context triggered piecewise hash (CTPH)) solves this by calculating hashes based on a trigger point in the text. Hash values are calculated for pieces of the text, delimited by a trigger. For example the trigger for the following text could be ‘a’ and ‘or’

why would a lazy sunday be greeted with sleep. or was I wrong? You were not sleeping ?

Categories: General Tags: ,

Using greasemonkey to prototype your web UI

March 18th, 2011 No comments

When mock-ups and screenshots fail to deliver the idea that you are trying to convey, a prototype can deliver a strong impact. But why should you prototype with GreaseMonkey when you can check out code from source control and mock a web page with a few shabbily scribbled lines of javascript ?

I often find myself comparing GreaseMoney with the Netbeans swing builder for thick clients. Its great because you can deliver a quick UI without worrying about the underlying functionality. But where GreaseMoney differentiates itself is in its ability to mock a live web page. By live I mean one that is already deployed on production. Never underestimate the impact of mocking a feature on a live website. Changing the innerHTML of a few HTML elements and hacking out some JS can get you quick results.

Categories: General Tags: , ,

Find open files in linux using lsof

December 29th, 2010 No comments

Deleting a file that has been opened by another process in linux does not free up disk space. Running the df or du commands will indicate conflicting results. Closing / killing the process that opened the files will release the space on the disk. The lsof command can help you track, say the top ten open files in your OS sorted by disk space. If you ever run into trouble with large open files, use the following command

Top ten open files:
lsof / | awk ‘{if($7 > 1048576) print $7/1048576 “MB” ” ” $9 }’ | sort -n -u | tail

Output:

3.8054MB /usr/lib/libgtk-x11-2.0.so.0.2200.0
4.28024MB /usr/share/icons/hicolor/icon-theme.cache
8.17912MB /usr/lib/locale/locale-archive
8.86022MB /var/lib/apt/lists/lk.archive.ubuntu.com_ubuntu_dists_maverick_main_binary-i386_Packages
11.4047MB /usr/lib/flashplugin-installer/libflashplayer.so
14.6893MB /usr/lib/firefox-3.6.10/libxul.so
15.6504MB /var/cache/apt/pkgcache.bin
27.4744MB /var/lib/apt/lists/lk.archive.ubuntu.com_ubuntu_dists_maverick_universe_binary-i386_Packages
34.6615MB /usr/share/icons/gnome/icon-theme.cache
44.1719MB /home/user/.mozilla/firefox/tnrqzpro.default/urlclassifier3.sqlite

You can also lookup open files based on pid / port number. I hope the script saves you some time, should you ever find yourself in this situation.





Categories: General Tags: , ,

Installing ubuntu – An adventure

October 20th, 2010 6 comments

Having switched to Suse linux a while back, I am enjoying the ride but for a few gripes. The UI does not load as smoothly as it should under certain circumstances. The keyboard also acts all crazy without warning. The new ubuntu release 10.10 is here, so I wanted to give that a shot hoping the experience would be better than what Suse had to offer. May be this was a KDE Vs Gnome problem. Perhaps 32 bit installations are less troublesome than the 64 bit ones. Well I wont know unless I try.

And it begins:

After downloading the Ubuntu ISO, I went about writing it to a CD and started the install process. The installation was riddled with error messages. Selecting partition X on hard disk 1, made the installer hate me. It complained saying ‘Either the hard disk or the CD has some sort of media related problem’. ‘hmmm… its probably the hard disk since there were no errors when the CD was written’ I thought.

Categories: General Tags: , ,

Crypt DES and 8 character truncated passwords

May 18th, 2010 1 comment

Many passwords in linux are encrypted using the crypt() utility. The user is usually not aware of the difference between a crypt and a MD5 encryption. Well it can turn out to be important, especially if crypt uses the default DES-based scheme to perform the encryption.

The problem with crypt() + Traditional DES is that it truncates the password length to 8 characters. Users are not usually aware of this and assume that the entire length of the password has been saved and encrypted. Take the apache tool htpasswd for example. It uses crypt() to encrypt passwords (It may also use its own MD5 routine) into a password file. The following command creates a new user in a password file

htpasswd password_file new_user

Categories: General Tags: , ,