Monday, January 24, 2011

NLTK corpus functions

From: http://blog.ynada.com/tag/nltk

fileids() The files of the corpus
fileids([categories]) The files of the corpus corresponding to these categories
categories() The categories of the corpus
categories([fileids]) The categories of the corpus corresponding to these files
raw() The raw content of the corpus
raw(fileids=[f1,f2,f3]) The raw content of the specified files
raw(categories=[c1,c2]) The raw content of the specified categories
words() The words of the whole corpus
words(fileids=[f1,f2,f3]) The words of the specified fileids
words(categories=[c1,c2]) The words of the specified categories
sents() The sentences of the specified categories
sents(fileids=[f1,f2,f3]) The sentences of the specified fileids
sents(categories=[c1,c2]) The sentences of the specified categories
abspath(fileid) The location of the given file on disk
encoding(fileid) The encoding of the file (if known)
open(fileid) Open a stream for reading the given corpus file
root() The path to the root of locally installed corpus
readme() The contents of the README file of the corpus

No comments:

Post a Comment