Sunday, December 20, 2015

Easy installation of Gensim/word2vec in Python


1. Install Anaconda

Go to https://www.continuum.io/downloads and download the installer and install. I tried both Windows 64bit version and Linux 64bit version.

Note that  easy_install from https://pypi.python.org/pypi/setuptools is already included.

2. Install gensim
This is mainly based on https://radimrehurek.com/gensim/install.html but simplified.
To install gensim, type
easy_install --upgrade gensim
in Anaconda Prompt in Windows, or in a terminal in Ubuntu.
Another way to install gensim easily is type the following in Anaconda Prompt:
conda install gensim

I tried pip and other methods for gensim, but ran into problems (see below). So the above way is recommended.

To check the packages, type "conda list" and make sure gensim is included.

Other ways to install python and gensim may be more complicated. One reason may be related to C compiler or BLAS/LAPACK is needed.

3. Open Spyder to test.

Type "from gensim.models.word2vec import Word2Vec" in the IPython Console in the lower left corner. If no error is generated, you are ready for gensim and word2vec.


If an older gensim version is needed (e.g., due to the recent update in gensim on LabeledSentence to TaggedDocument), you may want to revert to an old version
pip uninstall gensin

pip install gensim==0.10.3
or
pip install gensim-0.10.3.tar.gz  # you need to download the package first
conda install gensim-0.10.3.tar.gz

4. Gensim fast version
In Spyder, you may check if you have the fast version of gensim supported or not. The fast version can have 70x speedup, but a C compiler is needed.
Type
import gensim
gensim.models.word2vec.FAST_VERSION
If you get 1, then you have it. Otherwise, install mingw or MSVC (select visual C++ after installing Visual Studio 2015 Community version) in Windows, or gcc-dev in Ubuntu. Mingw's path needs to be added to system path or user path; likewise for MSVC. Then do "conda uninstall gensim" or "pip uninstall gensim", and do "conda install gensim" or "pip install gensim". After these, try
import gensim
gensim.models.word2vec.FAST_VERSION
and see if you get 1. If not, I found it may be useful to add the following:
[blas]
library_dirs = C:\BLAS
blas_libs = libblas
[lapack]
library_dirs = C:\BLAS
lapack_libs = liblapack
OR
[blas]
library_dirs = C:\BLAS
blas_libs = libblas3
[lapack]
library_dirs = C:\BLAS
lapack_libs = liblapack3
depending on which blas/lapack files you have, into
C:\Users\A\Anaconda3\Lib\site-packages\numpy\distutils\site.cfg
Then try again, and you should get 1.

1 comment: