The easiest way to get started with LexVec is to download the binary release. We only distribute amd64 binaries for Linux.
If you are using Windows, OS X, 32-bit Linux, or any other OS, follow the instructions below to build from source.
Building from source
- Install the Go compiler
Make sure your
Execute the following commands in your terminal:
go get github.com/alexandres/lexvec cd $GOPATH/src/github.com/alexandres/lexvec go build
In-memory (default, faster)
To get started, run
which trains a model using the small text8
corpus (100MB from Wikipedia).
Basic usage of LexVec is:
$ ./lexvec -corpus somecorpus -output someoutputdirectory/vectors
$ ./lexvec -h
for a full list of options.
Additionally, we provide a
script which implements the exact same interface as the word2vec
package should you want to test LexVec using existing scripts.
By default, LexVec stores the sparse matrix being factorized in-memory. This can be a problem if your training corpus is large and your system memory limited. We suggest you first try using the in-memory implementation. If you run into Out-Of-Memory issues, try this External Memory approximation. xi
env OUTPUTDIR=output ./external_memory_lexvec.sh -corpus somecorpus -dim 300 ...exactsameoptionsasinmemory
Salle, A., Idiart, M., & Villavicencio, A. (2016). Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations . arXiv preprint arXiv:1606.00819.
Salle, A., Idiart, M., & Villavicencio, A. (2016). Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory . arXiv preprint arXiv:1606.01283.
Copyright (c) 2016 Salle, Alexandre firstname.lastname@example.org . All work in this package is distributed under the MIT License.