jxtract
Class Corpus

java.lang.Object
  extended by jxtract.Corpus

public class Corpus
extends java.lang.Object

This class represents one or more text files that make up a corpus.

Author:
Adam Goforth

Constructor Summary
Corpus(java.lang.String filename_)
          Constructor for the Corpus.
 
Method Summary
 boolean closeFile()
          Close the file if it was open.
 void countLines()
          Count all the lines in the corpus and print to System.out.
 java.util.Vector getFrequentWords(int minFrequency)
           
 java.util.Vector getSentencesWith(java.lang.String word_)
          This returns a Vector of Strings, where each String is a sentence in the Corpus that contains the specified word.
 java.util.Vector getSentencesWith(java.lang.String w1, java.lang.String w2, int distance)
          This returns a Vector of Strings, where each String is a sentence in the Corpus that contains the specified words, with w2 being distance words away from w1.
 boolean openFile()
          Open the file for reading.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Corpus

public Corpus(java.lang.String filename_)
Constructor for the Corpus. Currently only supports a file, but should support a directory with multiple files.

Parameters:
filename_ - The text file that contains the corpus.
Method Detail

openFile

public boolean openFile()
Open the file for reading.

Returns:
true if the open succeeded, false if the open failed.

closeFile

public boolean closeFile()
Close the file if it was open.

Returns:
true if the file was already closed or the operation closed it. false if the file is still open after execution.

getSentencesWith

public java.util.Vector getSentencesWith(java.lang.String word_)
This returns a Vector of Strings, where each String is a sentence in the Corpus that contains the specified word.

Parameters:
word - The word that is searched for in the corpus.
Returns:
The Vector of Strings with the sentences.

getSentencesWith

public java.util.Vector getSentencesWith(java.lang.String w1,
                                         java.lang.String w2,
                                         int distance)
This returns a Vector of Strings, where each String is a sentence in the Corpus that contains the specified words, with w2 being distance words away from w1.

Parameters:
w1 - The first word
w2 - The second word
distance - The distance between them. -5 to -1 and 1 to 5 are valid values.
Returns:
A Vector of Strings with one matched sentence per string.

countLines

public void countLines()
Count all the lines in the corpus and print to System.out.


getFrequentWords

public java.util.Vector getFrequentWords(int minFrequency)