Sunday, January 10, 2016

NLP cheatsheet

POS tagging: (https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)


Number
Tag
Description
1.CCCoordinating conjunction
2.CDCardinal number
3.DTDeterminer
4.EXExistential there
5.FWForeign word
6.INPreposition or subordinating conjunction
7.JJAdjective
8.JJRAdjective, comparative
9.JJSAdjective, superlative
10.LSList item marker
11.MDModal
12.NNNoun, singular or mass
13.NNSNoun, plural
14.NNPProper noun, singular
15.NNPSProper noun, plural
16.PDTPredeterminer
17.POSPossessive ending
18.PRPPersonal pronoun
19.PRP$Possessive pronoun
20.RBAdverb
21.RBRAdverb, comparative
22.RBSAdverb, superlative
23.RPParticle
24.SYMSymbol
25.TOto
26.UHInterjection
27.VBVerb, base form
28.VBDVerb, past tense
29.VBGVerb, gerund or present participle
30.VBNVerb, past participle
31.VBPVerb, non-3rd person singular present
32.VBZVerb, 3rd person singular present
33.WDTWh-determiner
34.WPWh-pronoun
35.WP$Possessive wh-pronoun
36.WRBWh-adverb

(see also Table 1.1  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.8216&rep=rep1&type=pdf)

Chunking: (highlighted are more commonly used)

ADJP: Adjective phrase 
ADVP: Adverb phrase 
NP: Noun phrase 
PP: Prepositional phrase 
S: Simple declarative clause 
SBAR: Subordinate clause 
SBARQ: Direct question introduced by wh-element 
SINV: Declarative sentence with subject-aux inversion 
SQ: Yes/no questions and subconstituent of SBARQ excluding wh-element 
VP: Verb phrase 
WHADVP: Wh-adverb phrase 
WHNP: Wh-noun phrase 
WHPP: Wh-prepositional phrase 
X: Constituent of unknown or uncertain category 
*: “Understood” subject of infinitive or imperative 
0: Zero variant of that in subordinate clauses 
T: Trace of wh-Constituent 

IOB or IOB2 format for inside, outside, and beginning
B: beginning 
I: inside
O: not in any chunk (, . and, etc.)
E: ending
S: standalone


NER

PER: person
ORG: organization
LOC: locations
MISC: misellaneous
Also sometimes including times and quantities (ordinals and numbers)

B: beginning 
I: inside
E: ending
S: simple

Semantic Role Labeling (http://www.cs.upc.edu/~srlconll/)

semantic role in language is the relationship that a syntactic constituent has with a predicate. Typical semantic arguments include Agent, Patient, Instrument, etc. and also adjunctive arguments indicating Locative, Temporal, Manner, Cause, etc. aspects. Recognizing and labeling semantic arguments is a key task for answering "Who", "When", "What", "Where", "Why", etc. questions in Information Extraction, Question Answering, Summarization, and, in general, in all NLP tasks in which some kind of semantic interpretation is needed.
The following sentence, taken from the PropBank corpus, exemplifies the annotation of semantic roles:
[A0 He ] [AM-MOD would ] [AM-NEG n't ] [V accept ] [A1 anything of value ] from [A2 those he was writing about ] .
Here, the roles for the predicate accept (that is, the roleset of the predicate) are defined in the PropBank Frames scheme as:
V: verb
A0: acceptor
A1: thing accepted
A2: accepted-from
A3: attribute
AM-MOD: modal
AM-NEG: negation

Parsing 

Parsing may show the sentence structure (generally a tree structure) with the above tags.