scone.util
Class DocumentParser

java.lang.Object
  extended by scone.util.DocumentParser

public class DocumentParser
extends java.lang.Object

transforms tokens into database objects.
HtmlTokens which represent links are transformed into LinkToken objects. The following keys and values are added to the meta data:

"baseNode"the NetNode
"htmlDocument"the HtmlNode

Author:
Harald Weinreich, Volkert Buchmann

Field Summary
static int CALCFINGERPRINT
           
static int CONSIDERINCLUSIONS
           
static int CONSIDERKEYWORDS
           
static int CONSIDERLINKS
           
static java.lang.String COPYRIGHT
           
static int MAX_BODYTEXT
           
static int MAX_SOURCECODE
           
static int PARSEDOCUMENT
           
static int POSTDATA
           
static int SAVEBODYTEXT
           
static int SAVESOURCECODE
           
 
Constructor Summary
DocumentParser(int requirements)
          create the initial instance
DocumentParser(int requirements, boolean showRequirements)
          create the initial instance
 
Method Summary
 void parse(TokenInputStream in, TokenOutputStream out)
          Parse document and collect data for NetNode and HtmlNode objects: number of links, number of images, language etc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COPYRIGHT

public static final java.lang.String COPYRIGHT
See Also:
Constant Field Values

CONSIDERLINKS

public static final int CONSIDERLINKS
See Also:
Constant Field Values

CONSIDERINCLUSIONS

public static final int CONSIDERINCLUSIONS
See Also:
Constant Field Values

PARSEDOCUMENT

public static final int PARSEDOCUMENT
See Also:
Constant Field Values

CONSIDERKEYWORDS

public static final int CONSIDERKEYWORDS
See Also:
Constant Field Values

SAVEBODYTEXT

public static final int SAVEBODYTEXT
See Also:
Constant Field Values

SAVESOURCECODE

public static final int SAVESOURCECODE
See Also:
Constant Field Values

CALCFINGERPRINT

public static final int CALCFINGERPRINT
See Also:
Constant Field Values

POSTDATA

public static final int POSTDATA
See Also:
Constant Field Values

MAX_BODYTEXT

public static final int MAX_BODYTEXT
See Also:
Constant Field Values

MAX_SOURCECODE

public static final int MAX_SOURCECODE
See Also:
Constant Field Values
Constructor Detail

DocumentParser

public DocumentParser(int requirements)
create the initial instance


DocumentParser

public DocumentParser(int requirements,
                      boolean showRequirements)
create the initial instance

Parameters:
requirement - is an bitarray. See scone.Plugin for more information.
showRequirements - shall the requirements be displayed?
Method Detail

parse

public void parse(TokenInputStream in,
                  TokenOutputStream out)
Parse document and collect data for NetNode and HtmlNode objects: number of links, number of images, language etc.