Zen-NE

Sunday, January 10, 2016

NLP cheatsheet

POS tagging: (https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)

Number	Tag	Description
1.	CC	Coordinating conjunction
2.	CD	Cardinal number
3.	DT	Determiner
4.	EX	Existential there
5.	FW	Foreign word
6.	IN	Preposition or subordinating conjunction
7.	JJ	Adjective
8.	JJR	Adjective, comparative
9.	JJS	Adjective, superlative
10.	LS	List item marker
11.	MD	Modal
12.	NN	Noun, singular or mass
13.	NNS	Noun, plural
14.	NNP	Proper noun, singular
15.	NNPS	Proper noun, plural
16.	PDT	Predeterminer
17.	POS	Possessive ending
18.	PRP	Personal pronoun
19.	PRP$	Possessive pronoun
20.	RB	Adverb
21.	RBR	Adverb, comparative
22.	RBS	Adverb, superlative
23.	RP	Particle
24.	SYM	Symbol
25.	TO	to
26.	UH	Interjection
27.	VB	Verb, base form
28.	VBD	Verb, past tense
29.	VBG	Verb, gerund or present participle
30.	VBN	Verb, past participle
31.	VBP	Verb, non-3rd person singular present
32.	VBZ	Verb, 3rd person singular present
33.	WDT	Wh-determiner
34.	WP	Wh-pronoun
35.	WP$	Possessive wh-pronoun
36.	WRB	Wh-adverb

Chunking: (highlighted are more commonly used)

ADJP: Adjective phrase

ADVP: Adverb phrase

NP: Noun phrase

PP: Prepositional phrase

S: Simple declarative clause

SBAR: Subordinate clause

SBARQ: Direct question introduced by wh-element

SINV: Declarative sentence with subject-aux inversion

SQ: Yes/no questions and subconstituent of SBARQ excluding wh-element

VP: Verb phrase

WHADVP: Wh-adverb phrase

WHNP: Wh-noun phrase

WHPP: Wh-prepositional phrase

X: Constituent of unknown or uncertain category

*: “Understood” subject of infinitive or imperative

0: Zero variant of that in subordinate clauses

T: Trace of wh-Constituent

IOB or IOB2 format for inside, outside, and beginning

B: beginning

I: inside

O: not in any chunk (, . and, etc.)

E: ending

S: standalone

NER

PER: person

ORG: organization

LOC: locations

MISC: misellaneous

Also sometimes including times and quantities (ordinals and numbers)

B: beginning

I: inside

E: ending

S: simple

Semantic Role Labeling (http://www.cs.upc.edu/~srlconll/)

A semantic role in language is the relationship that a syntactic constituent has with a predicate. Typical semantic arguments include Agent, Patient, Instrument, etc. and also adjunctive arguments indicating Locative, Temporal, Manner, Cause, etc. aspects. Recognizing and labeling semantic arguments is a key task for answering "Who", "When", "What", "Where", "Why", etc. questions in Information Extraction, Question Answering, Summarization, and, in general, in all NLP tasks in which some kind of semantic interpretation is needed.

The following sentence, taken from the PropBank corpus, exemplifies the annotation of semantic roles:

[_A0 He ] [_AM-MOD would ] [_AM-NEG n't ] [_V accept ] [_A1 anything of value ] from [_A2 those he was writing about ] .

Here, the roles for the predicate accept (that is, the roleset of the predicate) are defined in the PropBank Frames scheme as:

V: verb
A0: acceptor
A1: thing accepted
A2: accepted-from
A3: attribute
AM-MOD: modal
AM-NEG: negation

Parsing

Parsing may show the sentence structure (generally a tree structure) with the above tags.

Thursday, December 31, 2015

trying a doc2vec example by Pandamonium

from https://github.com/linanqiu/word2vec-sentiments
below is what I tried and got. Before you start, download the test files.

It worked almost out-of-the-box, except for a couple of very minor changes I had to make (highlighted below). The performance is just great. It seems to be the best doc2vec tutorial I've found. Highly recommended.

# gensim modules
from gensim import utils
from gensim.models.doc2vec import LabeledSentence
from gensim.models import Doc2Vec

# numpy
import numpy

# shuffle
from random import shuffle

# logging
import logging
import os.path
import sys
# gensim modules
from gensim import utils
from gensim.models.doc2vec import LabeledSentence
from gensim.models import Doc2Vec

# numpy
import numpy

# shuffle
from random import shuffle

# logging
import logging
import os.path
import sys

try:
import cPickle as pickle
except:
import pickle

program = os.path.basename(sys.argv[0])
logger = logging.getLogger(program)
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s')
logging.root.setLevel(level=logging.INFO)
logger.info("running %s" % ' '.join(sys.argv))

class LabeledLineSentence(object):

def __init__(self, sources):
self.sources = sources

flipped = {}

# make sure that keys are unique
for key, value in sources.items():
if value not in flipped:
flipped[value] = [key]
else:
raise Exception('Non-unique prefix encountered')

def __iter__(self):
for source, prefix in self.sources.items():
with utils.smart_open(source) as fin:
for item_no, line in enumerate(fin):
yield LabeledSentence(utils.to_unicode(line).split(), [prefix + '_%s' % item_no])

def to_array(self):
self.sentences = []
for source, prefix in self.sources.items():
with utils.smart_open(source) as fin:
for item_no, line in enumerate(fin):
self.sentences.append(LabeledSentence(
utils.to_unicode(line).split(), [prefix + '_%s' % item_no]))
return self.sentences

def sentences_perm(self):
shuffle(self.sentences)
return self.sentences

sources = {'test-neg.txt':'TEST_NEG', 'test-pos.txt':'TEST_POS', 'train-neg.txt':'TRAIN_NEG', 'train-pos.txt':'TRAIN_POS', 'train-unsup.txt':'TRAIN_UNS'}

sentences = LabeledLineSentence(sources)

model = Doc2Vec(min_count=1, window=10, size=100, sample=1e-4, negative=5, workers=16)
model.build_vocab(sentences.to_array())
for epoch in range(50):
logger.info('Epoch %d' % epoch)
model.train(sentences.sentences_perm())

model.save('./imdb.d2v')
model = Doc2Vec.load('./imdb.d2v')
model.most_similar('good')
model.docvecs['TRAIN_NEG_0']
train_arrays = numpy.zeros((25000, 100))
train_labels = numpy.zeros(25000)

for i in range(12500):
prefix_train_pos = 'TRAIN_POS_' + str(i)
prefix_train_neg = 'TRAIN_NEG_' + str(i)
train_arrays[i] = model.docvecs[prefix_train_pos]
train_arrays[12500 + i] = model.docvecs[prefix_train_neg]
train_labels[i] = 1
train_labels[12500 + i] = 0

print( train_arrays)
print( train_labels)
test_arrays = numpy.zeros((25000, 100))
test_labels = numpy.zeros(25000)

for i in range(12500):
prefix_test_pos = 'TEST_POS_' + str(i)
prefix_test_neg = 'TEST_NEG_' + str(i)
test_arrays[i] = model.docvecs[prefix_test_pos]
test_arrays[12500 + i] = model.docvecs[prefix_test_neg]
test_labels[i] = 1
test_labels[12500 + i] = 0

classifier = LogisticRegression()
classifier.fit(train_arrays, train_labels)

# classifier
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(train_arrays, train_labels)

classifier.score(test_arrays, test_labels)

# classifier
from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(train_arrays, train_labels)

I got:
Out[24]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)

classifier.score(test_arrays, test_labels)
Out[25]: 0.87831999999999999
This is just great!

Wednesday, December 23, 2015

Gensim: "Modern Methods for Sentiment Analysis by Michael Czerny"

This is purely based on https://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis#disqus_thread and the comments on the page. A few small changes were needed so I captured the updates here.

1. Preparation

Download 7z if you don't have it yet from http://www.7-zip.org/download.html .

Download GoogleNews-vectors-negative300.bin.gz from https://code.google.com/p/word2vec/ .

Download IMDB review data from http://bit.ly/1FizNyc .

As suggested in the original webpage, go to http://www.enchantedlearning.com/wordlist/ and collect words for food, sports, and weather, and put the words in food_words.txt, sports_words.txt, and weather_words.txt.

In Ubuntu, you may need to use the C compile by setting up:

sudo apt-get install build-essential

2. Test 1

In Spyder IPython window, paste the following

from gensim.models.word2vec import Word2Vec

model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

and you should get

[('queen', 0.7118191719055176),

('monarch', 0.6189674139022827),

('princess', 0.5902431011199951),

('crown_prince', 0.5499460697174072),

('prince', 0.5377321243286133)]

You will need at least about 8GB memory. I tried also with 4GB RAM, and it gave the result after more than one hour, which is too slow.

3. Test 2 (a continuation of Test 1)

import numpy as np

with open('food_words.txt', 'r', ) as infile:
    food_words = infile.readlines()

with open('sports_words.txt', 'r') as infile:
    sports_words = infile.readlines()

with open('weather_words.txt', 'r') as infile:
    weather_words = infile.readlines()

def getWordVecs(words):
    vecs = []
    for word in words:
        word = word.replace('\n', '')
        try:
            vecs.append(model[word].reshape((1,300)))
        except KeyError:
            continue
    vecs = np.concatenate(vecs)
    return np.array(vecs, dtype='float') #TSNE expects float type values

food_vecs = getWordVecs(food_words)
sports_vecs = getWordVecs(sports_words)
weather_vecs = getWordVecs(weather_words)

If you run into error for reading the text files (which I encountered in some systems but not always), change to:

import numpy as np

with open('food_words.txt', 'r', encoding='utf8') as infile:
    food_words = infile.readlines()

with open('sports_words.txt', 'r', encoding='utf8') as infile:

    sports_words = infile.readlines()

with open('weather_words.txt', 'r', encoding='utf8') as infile:

    weather_words = infile.readlines()

Then

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

ts = TSNE(2)
reduced_vecs = ts.fit_transform(np.concatenate((food_vecs, sports_vecs, weather_vecs)))

#color points by word group to see if Word2Vec can separate them
for i in range(len(reduced_vecs)):
    if i < len(food_vecs):
        #food words colored blue
        color = 'b'
    elif i >= len(food_vecs) and i < (len(food_vecs) + len(sports_vecs)):
        #sports words colored red
        color = 'r'
    else:
        #weather words colored green
        color = 'g'
    plt.plot(reduced_vecs[i,0], reduced_vecs[i,1], marker='o', color=color, markersize=8)

Then you should see a plot of 3 clustered colored dots.

3. Test 3

This is modified from the original tweeter data based test. However, as we don't have tweeter data, we substitute with the pos.txt and neg.txt from the IMDB review data. So this is just for the sake of testing code.

from sklearn.cross_validation import train_test_split
from gensim.models.word2vec import Word2Vec

with open('pos.txt', 'r', encoding='utf8') as infile:

    pos_tweets = infile.readlines()

with open('neg.txt', 'r', encoding='utf8') as infile:

    neg_tweets = infile.readlines()

#use 1 for positive sentiment, 0 for negative
y = np.concatenate((np.ones(len(pos_tweets)), np.zeros(len(neg_tweets))))

x_train, x_test, y_train, y_test = train_test_split(np.concatenate((pos_tweets, neg_tweets)), y, test_size=0.2)

#Do some very minor text preprocessing
def cleanText(corpus):
    corpus = [z.lower().replace('\n','').split() for z in corpus]
    return corpus

x_train = cleanText(x_train)
x_test = cleanText(x_test)

n_dim = 300
#Initialize model and build vocab
imdb_w2v = Word2Vec(size=n_dim, min_count=10)
imdb_w2v.build_vocab(x_train)

#Train the model over train_reviews (this may take several minutes)
imdb_w2v.train(x_train)

I got an output 8684307.


#Build word vector for training set by using the average value of all word vectors in the tweet, then scale
def buildWordVector(text, size):
    vec = np.zeros(size).reshape((1, size))
    count = 0.
    for word in text:
        try:
            vec += imdb_w2v[word].reshape((1, size))
            count += 1.
        except KeyError:
            continue
    if count != 0:
        vec /= count
    return vec

from sklearn.preprocessing import scale
train_vecs = np.concatenate([buildWordVector(z, n_dim) for z in x_train])
train_vecs = scale(train_vecs)

#Train word2vec on test tweets
imdb_w2v.train(x_test)

I got:

WARNING:gensim.models.word2vec:supplied example count (10000) did not equal expected count (40000)
Out[11]: 2172554

#Build test tweet vectors then scale
test_vecs = np.concatenate([buildWordVector(z, n_dim) for z in x_test])
test_vecs = scale(test_vecs)

#Use classification algorithm (i.e. Stochastic Logistic Regression) on training set, then assess model performance on test set
from sklearn.linear_model import SGDClassifier

lr = SGDClassifier(loss='log', penalty='l1')
lr.fit(train_vecs, y_train)

print( 'Test Accuracy: %.2f'%lr.score(test_vecs, y_test))

I got

Test Accuracy: 0.72

Note that I needed to add parentheses for the last statement to run correctly.

#Create ROC curve
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

pred_probas = lr.predict_proba(test_vecs)[:,1]

fpr,tpr,_ = roc_curve(y_test, pred_probas)
roc_auc = auc(fpr,tpr)
plt.plot(fpr,tpr,label='area = %.2f' %roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc='lower right')

plt.show()

4. Test 4

import gensim

LabeledSentence = gensim.models.doc2vec.LabeledSentence

from sklearn.cross_validation import train_test_split
import numpy as np

with open('pos.txt','r') as infile:
    pos_reviews = infile.readlines()

with open('neg.txt','r') as infile:
    neg_reviews = infile.readlines()

with open('unsup.txt','r') as infile:
    unsup_reviews = infile.readlines()

#use 1 for positive sentiment, 0 for negative
y = np.concatenate((np.ones(len(pos_reviews)), np.zeros(len(neg_reviews))))

x_train, x_test, y_train, y_test = train_test_split(np.concatenate((pos_reviews, neg_reviews)), y, test_size=0.2)

#Do some very minor text preprocessing
def cleanText(corpus):
    punctuation = """.,?!:;(){}[]"""
    corpus = [z.lower().replace('\n','') for z in corpus]
    corpus = [z.replace('<br />', ' ') for z in corpus]

    #treat punctuation as individual words
    for c in punctuation:
        corpus = [z.replace(c, ' %s '%c) for z in corpus]
    corpus = [z.split() for z in corpus]
    return corpus

x_train = cleanText(x_train)
x_test = cleanText(x_test)
unsup_reviews = cleanText(unsup_reviews)

#Gensim's Doc2Vec implementation requires each document/paragraph to have a label associated with it.
#We do this by using the LabeledSentence method. The format will be "TRAIN_i" or "TEST_i" where "i" is
#a dummy index of the review.
def labelizeReviews(reviews, label_type):
    labelized = []
    for i,v in enumerate(reviews):
        label = '%s_%s'%(label_type,i)
        labelized.append(LabeledSentence(v, [label]))
    return labelized

x_train = labelizeReviews(x_train, 'TRAIN')
x_test = labelizeReviews(x_test, 'TEST')
unsup_reviews = labelizeReviews(unsup_reviews, 'UNSUP')

import random


size = 400

#instantiate our DM and DBOW models
model_dm = gensim.models.Doc2Vec(min_count=1, window=10, size=size, sample=1e-3, negative=5, workers=3)
model_dbow = gensim.models.Doc2Vec(min_count=1, window=10, size=size, sample=1e-3, negative=5, dm=0, workers=3)

#build vocab over all reviews
model_dm.build_vocab(np.concatenate((x_train, x_test, unsup_reviews)))

#you may run into error here: "Python int too large to convert to C long." If this occurs, change the hashfxn in Word2Vec constructor __init__: from

        self.cbow_mean = int(cbow_mean)
        self.hashfxn = hashfxn
        self.iter = iter

to

        self.cbow_mean = int(cbow_mean)
        #self.hashfxn = hashfxn

        def hash32(value):
     return hash(value) & 0xffffffff

        self.hashfxn = hash32

        self.iter = iter

https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/11197/gensim-word2vec-cython-on-windows/93787

X=x_train + x_test + unsup_reviews
model_dm.build_vocab(X)
model_dbow.build_vocab(X)

On one system (Ubuntu 12.04, 12), it has the following error on

#Get training set vectors from our models
def getVecs(model, corpus, size):
    vecs = [np.array(model[z.labels[0]]).reshape((1, size)) for z in corpus]
    return np.concatenate(vecs)

train_vecs_dm = getVecs(model_dm, x_train, size)
with the following error: AttributeError: 'LabeledSentence' object has no attribute 'labels'
Checking the other system which it worked, x_train[0].labels = ['TRAIN_0'}, that's why it worked. But on this system, it has tags=['TRAIN_0']. So I changed to
#Get training set vectors from our models
def getVecs(model, corpus, size):
    vecs = [np.array(model[z.tags[0]]).reshape((1, size)) for z in corpus]
    return np.concatenate(vecs)

train_vecs_dm = getVecs(model_dm, x_train, size)

However, this generated another error:

 model_dm[x_train[0]]


  File "/home/anaconda3/lib/python3.5/site-packages/gensim-0.12.3-py3.5-linux-x86_64.egg/gensim/models/word2vec.py", line 1293, in <listcomp>
    return vstack([self.syn0[self.vocab[word].index] for word in words])

Reverting to gensim 0.10.3 seems to resolve this problem temporarily.

#We pass through the data set multiple times, shuffling the training reviews each time to improve accuracy.
all_train_reviews = np.concatenate((x_train, unsup_reviews))
#if this is too slow, may need to change it range(1) or even range(0), but accuracy would be reduced
for epoch in range(10):  
    perm = np.random.permutation(all_train_reviews.shape[0])
    model_dm.train(all_train_reviews[perm])
    model_dbow.train(all_train_reviews[perm])

#Get training set vectors from our models
def getVecs(model, corpus, size):
    vecs = [np.array(model[z.labels[0]]).reshape((1, size)) for z in corpus]
    return np.concatenate(vecs)

train_vecs_dm = getVecs(model_dm, x_train, size)
train_vecs_dbow = getVecs(model_dbow, x_train, size)

train_vecs = np.hstack((train_vecs_dm, train_vecs_dbow))

#train over test set
x_test = np.array(x_test)

for epoch in range(10):
    perm = np.random.permutation(x_test.shape[0])
    model_dm.train(x_test[perm])
    model_dbow.train(x_test[perm])

#Construct vectors for test reviews
test_vecs_dm = getVecs(model_dm, x_test, size)
test_vecs_dbow = getVecs(model_dbow, x_test, size)

test_vecs = np.hstack((test_vecs_dm, test_vecs_dbow))
from sklearn.linear_model import SGDClassifier

lr = SGDClassifier(loss='log', penalty='l1')
lr.fit(train_vecs, y_train)

print( 'Test Accuracy: %.2f'%lr.score(test_vecs, y_test))
#Create ROC curve
from sklearn.metrics import roc_curve, auc
%matplotlib inline
import matplotlib.pyplot as plt

pred_probas = lr.predict_proba(test_vecs)[:,1]

fpr,tpr,_ = roc_curve(y_test, pred_probas)
roc_auc = auc(fpr,tpr)
plt.plot(fpr,tpr,label='area = %.2f' %roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc='lower right')

plt.show()

Sunday, December 20, 2015

Easy installation of Gensim/word2vec in Python

1. Install Anaconda

Go to https://www.continuum.io/downloads and download the installer and install. I tried both Windows 64bit version and Linux 64bit version.

Note that easy_install from https://pypi.python.org/pypi/setuptools is already included.

2. Install gensim
This is mainly based on https://radimrehurek.com/gensim/install.html but simplified.
To install gensim, type
easy_install --upgrade gensim
in Anaconda Prompt in Windows, or in a terminal in Ubuntu.
Another way to install gensim easily is type the following in Anaconda Prompt:
conda install gensim

I tried pip and other methods for gensim, but ran into problems (see below). So the above way is recommended.

To check the packages, type "conda list" and make sure gensim is included.

Other ways to install python and gensim may be more complicated. One reason may be related to C compiler or BLAS/LAPACK is needed.

3. Open Spyder to test.

Type "from gensim.models.word2vec import Word2Vec" in the IPython Console in the lower left corner. If no error is generated, you are ready for gensim and word2vec.

If an older gensim version is needed (e.g., due to the recent update in gensim on LabeledSentence to TaggedDocument), you may want to revert to an old version
pip uninstall gensin

pip install gensim==0.10.3
or
pip install gensim-0.10.3.tar.gz # you need to download the package first
conda install gensim-0.10.3.tar.gz

4. Gensim fast version
In Spyder, you may check if you have the fast version of gensim supported or not. The fast version can have 70x speedup, but a C compiler is needed.
Type
import gensim
gensim.models.word2vec.FAST_VERSION
If you get 1, then you have it. Otherwise, install mingw or MSVC (select visual C++ after installing Visual Studio 2015 Community version) in Windows, or gcc-dev in Ubuntu. Mingw's path needs to be added to system path or user path; likewise for MSVC. Then do "conda uninstall gensim" or "pip uninstall gensim", and do "conda install gensim" or "pip install gensim". After these, try
import gensim
gensim.models.word2vec.FAST_VERSION
and see if you get 1. If not, I found it may be useful to add the following:
[blas]
library_dirs = C:\BLAS
blas_libs = libblas
[lapack]
library_dirs = C:\BLAS
lapack_libs = liblapack
OR
[blas]
library_dirs = C:\BLAS
blas_libs = libblas3
[lapack]
library_dirs = C:\BLAS
lapack_libs = liblapack3
depending on which blas/lapack files you have, into
C:\Users\A\Anaconda3\Lib\site-packages\numpy\distutils\site.cfg
Then try again, and you should get 1.

Monday, December 14, 2015

DL4J Deep Learning for Java installation on Windows

This is purely based on http://deeplearning4j.org/ but selects the steps I followed to install DL4J on Windows. The website deeplearning4j.org provides multiple ways to do but some can be complicated. This is what I have done and it works for me. Hope this is helpful for others who want to do similar things. Please forgive my formatting ...

Java 7 or above

Java is the main interface and networking language of ND4J, because it’s used for everything from distributed cloud-based systems with thousands of nodes, to low-memory IoT devices. It’s a “write once, run anywhere” language.

If you don’t have Java 7 installed on your machine, download the Java Development Kit (JDK) here.

What I downloaded is JDK1.8.0_66. And I added JAVA_HOME as an environment variable. See following screenshots:

To test which version of Java you have (and whether you have it at all), type the following into your command line: java -version:

d:>java -version

java version "1.8.0_66"

Java(TM) SE Runtime Environment (build 1.8.0_66-b18)

Java HotSpot(TM) 64-Bit Server VM (build 25.66-b18, mixed mode)

Maven

Maven is an automated build tool for Java projects (among its other uses). It locates the latest version of ND4J and DL4J project libraries (.jar files) and downloads them automatically. You can find those repositories on Maven Central.

After downloading, extract and set PATH variable.

Similar to the setting of JAVA_HOME when installing Java, here we add M2_HOME with value of C:\apache-maven-3.3.9 and M2 with value of %M2_HOME%\bin in User variables.

To check if Maven is installed in your machine, and which version you have, enter the following into the command line:

 mvn --version

What I got:

C:>mvn --version

Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:4

7-06:00)

Maven home: C:\apache-maven-3.3.9\bin\..

Java version: 1.8.0_66, vendor: Oracle Corporation

Java home: C:\Program Files\Java\jdk1.8.0_66\jre

Default locale: en_US, platform encoding: Cp1252

OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"

Integrated Development Environment: IntelliJ

An Integrated Development Environment (IDE) will allow you to work with our API and build your nets with a few clicks. The free community edition of IntelliJ has installation instructions. This is straightforward.

DL4J example

The example can be downloaded and extracted from

https://github.com/deeplearning4j/dl4j-0.4-examples

Git can be used if Git is installed.

Now open IntelliJ, choose "Import Project", navigate to D:\DL4J\dl4j-0.4-examples-master\ and select the file pom.xml. Select DBNIrisExample.java from the lefthand file tree. Hit run! (It’s the green button that appears when you right-click on the source file…)

The following shows my setting and output in IntelliJ:

To run more examples and do more experiments, it seems more tools are needed.

MinGW

Install MinGW 32 bits even if you have a 64-bit computer (the download button is on the upper right). I also installed a few other packages using the installer.

Download Lapack liblapack3.dll without Intel compiler. Also download libblas3.dll. Put the files under C:\BLAS and add this to PATH.

It seems these are sufficient for running the examples. Git can also be installed.

Checking the tools

Running this file, WindowsInfo.bat, can help debug your Windows install. Here’s one example of its output that shows what to expect. First download it, then open a command window / terminal. cd to the directory to which it was dowloaded. Enter WindowsInfo and hit enter. To copy its output, right click on command window -> select all -> hit enter. Output is then on clipboard.

Here is what I got:

d:WindowsInfo.bat
Getting data. Please Wait...
--------------------------------------------
Operating System: Microsoft Windows 7 Professional 64-bit
Service Pack: 1
Processor: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
GPU: Intel(R) HD Graphics 5500
Total Memory: 8080
--------------------------------------------
-- Java --
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b18)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b18, mixed mode)
Java home: C:\Program Files\Java\jdk1.8.0_66
--------------------------------------------
-- cl.exe --
INFO: Could not find files for the given pattern(s).
'cl.exe' is not recognized as an internal or external command,
operable program or batch file.
-- vcvars32.bat --
INFO: Could not find files for the given pattern(s).
-- vcvars64.bat --
INFO: Could not find files for the given pattern(s).
-- CUDA --
'nvcc' is not recognized as an internal or external command,
operable program or batch file.
-- GIT --
git version 2.6.3.windows.1
-- Maven --
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:4
7-06:00)
Maven home: C:\apache-maven-3.3.9\bin\..
Java version: 1.8.0_66, vendor: Oracle Corporation
Java home: C:\Program Files\Java\jdk1.8.0_66\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"
-- OpenBLAS --
libblas3.dll, liblapack3.dll, libopenblas.dll:
C:\blas\libblas3.dll
C:\blas\liblapack3.dll
C:\blas\libopenblas.dll
--------------------------------------------
-- PATH --
C:\ProgramData\Oracle\Java\javapath;C:\windows\system32;C:\windows;C:\windows\Sy
stem32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Intel\W
iFi\bin\;C:\Program Files\Common Files\Intel\WirelessCommon\;C:\Program Files (x
86)\Riverbed\Steelhead Mobile\;C:\Program Files (x86)\Skype\Phone\;C:\Program Fi
les (x86)\IVI Foundation\VISA\WinNT\Bin;C:\Program Files\Git\cmd;D:\Tools\miktex\bin\x64\;C:\apache-maven-3.3.9\bi
n;C:\blas;C:\Program Files\Git\bin

Friday, February 27, 2015

Waveshare SpotPear 3.2 inch TFT touchscreen for Raspberry Pi

A word of caution: setting up Waveshare touchscreen for Pi is not very straightforward. I spent a long on it. Get a better touchscreen if you can.

Waveshare touchscreen may be shipped with a CD with a special factory image. With the factory image, all you need is to boot up with a microSD with the image. However, due to some issues that I could not fix, the factory image does not support EW-7811Un WiFi dongle. I tried for quite a while, such as rebuilding the kernel as some websites suggest, but I could not make it work.
Then I had to try to configure the touchscreen based on common Raspbian (Debian Wheezy, 2015-02-16). I followed the steps by Notro to install FBTFT drivers as loadable modules:

1. Modify /usr/share/X11/xorg.conf.d/99-fbturbo.conf: change fb0 (HDMI output) to fb1 (SPI output), so that the line will look like:

Option "fbdev" "/dev/fb1"

2. Make changes in /boot/config.txt so that it has the following:

# uncomment to force a console size. By default it will be display's size minus

# overscan.

framebuffer_width=800

framebuffer_height=600

dtparam=spi=on

3. Enable SPI using raspi-config.

4. Install Notro's kernel module:

sudo REPO_URI=https://github.com/notro/rpi-firmware rpi-update

Reboot.

5. To load the kernel module automatically at boot time, I modified /etc/modules and the following is what it is now:

# /etc/modules: kernel modules to load at boot time.

# This file contains the names of kernel modules that should be loaded

# at boot time, one per line. Lines beginning with "#" are ignored.

# Parameters can be specified after the module name.

snd-bcm2835

i2c-bcm2708

i2c-dev

fbtft_device name=waveshare32b debug=3 rotate=90 width=320 height=240

ads7846_device model=7846 cs=1 gpio_pendown=17 keep_vref_on=1 swap_xy=1 pressure_max=255 x_plate_ohms=60 x_min=200 x_max=3900 y_min=200 y_max=3900

w1-gpio

w1-therm

Alternatively, the module can be loaded if you type:

sudo modprobe fbtft_device name=waveshare32b

This seems not loading the touch, but the display should work fine now.

I did not change /boot/cmdline.txt.

6. To test:

Typing dmesg | grep tft should show that SPI devices registered.

6-1: Add in /etc/rc.local before exit 0 if you wish to start GUI automatically on the touchscreen upon startup:

su -l pi -c startx

6-2: Type the following to start xwindows:

FRAMEBUFFER=/dev/fb1 startx

6-3: Type the following to start the console on the touchscreen:

con2fbmap 1 1

See also the following websites for excellent tutorials:

http://code.magicnat.com/view/44da6c2a

http://www.circuitbasics.com/setup-lcd-touchscreen-raspberry-pi/

http://www.circuitbasics.com/raspberry-pi-touchscreen-calibration-screen-rotation/

Wednesday, December 17, 2014

Raspberry pi running SSH, lxsession, and Motion

This screenshot shows RPi with SSH enabled, and connected via a Ubuntu laptop in a headless way. The RPi is running Motion with an old wedcam.

To connect to RPi using the laptop, do
sudo ssh -X pi@192...
And then use lxsession to open x window.

The screenshot shows the Pi window is merged with the Ubuntu window. The left bar is for Ubuntu. The bottom bar lists all the programs running in both Pi and Ubuntu. The bottom right shows the Pi's CPU usage, time, etc.

Note that Motion cannot be viewed in Chromium. I had to use Firefox to view it. The default port for it (unless the conf file is modified) is 8081. To start/stop/restart the service, do
sudo service motion start
sudo service motion stop
sudo service motion restart