Leave a comment

Jul 3, 2011 | WordNet Similarity bundle

Image via Andreas Westh

After toiling about setting up my own WordNet Similarity system (see link) and realizing that such a good system has not been very active since 2008, I decided to give it a stab and make it easier for other people to do the same.

What’s nice about WordNet Similarity is that it uses a variety of similarity metrics to judge semantic relatedness between two terms, which is convenient because of the way WordNet constructs synsets. Once you have your own system it’s easy to make a simple API that lets you access similarity metrics between individual terms.

The bundled package contains:

Digest SHA1 v2.13
Text Similarity v.0.08
WordNet v3.0
WordNet QueryData v1.9
WordNet Similarity v2.05

and is everything you need to successfully deploy it on a system. I use a Mac machine, but you should be able to extract a lot of it for a Windows machine too.

  1. Unzip each of the folders
  2. Install WordNet:
    make install

    On a Mac/Unix system it will install WordNet into /usr/local/WordNet-3.0/. If you get permission errors, use the super-user command sudo.

  3. Install the other dependencies, with WordNet Similarity last:

    perl Makefile.PL
    make install

  4. In order to deploy the web-based tool (like this), you’ll need to install some sort of Apache server. I’ve found MAMP to be useful, although a Mac will support it without any additional downloading. You’ll need to modify your httpd.conf file at /etc/apache2/httpd.conf to accept Perl/CGI requests. You should be able to find references online to help you with that.
  5. You’ll need to create a bunch of word vectors from the WordNet glosses. Open up the utils folder in the WordNet Similarity folder, and run a command similar to this one:

    perl wordVectors.pl vectors.dat --stopfile /Users/yourname/Downloads/packages/WordNet-Similarity-2.01/samples/stoplist.txt

    This lets the WordVector generator skip over any stopwords. The result should be a vectors.dat file that’s about 37.1MB big.

  6. Open the web folder in WordNet Similarity. Copy everything over to where you can access it via a web server. You’ll really only need the cgi-bin folder, but the doc folder is useful too.
  7. Once you’ve copied it over to your web server space, copy over the vectors.dat file you created, as well as the stoplist.txt file in the samples folder for WordNet Similarity
  8. Edit similarity_server.conf so that both the lock and error log paths point to the folder you’re currently in, as the original settings didn’t work (for me). Change vectordb to point to vectors.dat
  9. On a Mac, you’ll need to change the perl path in the first line of each of similarity.cgi, wps.cgi and similarity_server.pl from #!/usr/local/bin/perl -wT to #!/usr/bin/perl -wT.
  10. Fire up the server using perl -T similarity_server.pl & and keep both your finders and toes crossed. You should see something along the lines of:

    Lock file = /Users/myname/Sites/wn/similarity.lock
    Error log = /Users/myname/Sites/wn/similarity.log
    Word vectors = vectors.dat
    Stoplist = stoplist.txt
    Loading modules... done.
    Started server... accepting requests.

  11. Fire up your browser and navigate to the similarity.cgi page!

This entry was posted on Sunday, July 3rd, 2011 at 7:10 pm, EST under the category of Coding. You can leave a response, or trackback from your own site.