scone.robot
Class Robot

java.lang.Object
  extended by scone.robot.Robot

public class Robot
extends java.lang.Object

The central class of the scone.robot package
Use the static method instance() to get a robot object.
All objects which use services of the robot package need only communicate with the robot object. A class which uses the robot must implement the RobotUser interface.

As you can see in the following example the use of the robot is quite simple:

public class RobotTest implements RobotUser { // All classes that use the robot must implement this interface
...
Robot robot = Robot.instance(); // Get a reference to the Scone-Robot
SimpleUri startUri = new SimpleUri("http://www.informatik.uni-hamburg.de"); // The robot starts with this URI
RobotTask rt = new RobotTask(startUri,3,RobotTask.ALL,this); // Create a new RobotTask
rt.setMaxPageSize(5000); // Set some properties for this task
...
robot.scan(rt);

Author:
Frank Wollenweber

Method Summary
 int getJobsInPageLoaderPool()
          The number of jobs in the threadpool.
 int getMaxConcurrentRequests()
          The maximum number of concurrent requests the robot will send to one server
 int getMaxIdleTime()
          The number of milliseconds a thread is allowed to be idle before it is killed
 int getMaxNumberOfThreads()
          The maximum number of threads in the threadpool
static int getNextRobotTaskId()
          Get the next unique RobotTask ID
 int getNumberOfPendingQueueEntries(RobotTask robotTask)
          Get the number of QueueEntries from this robotTask
 int getNumberOfPendingQueueEntries(SimpleUri uri)
          Get the number of QueueEntries for this uri
 int getNumberOfRobotTasks()
          Get the number of robot tasks the robot currently has to handle.
 PageLoaderPoolStats getPageLoaderPoolStatus()
          Get the status of the threadpool
 java.util.Vector getPendingQueueEntries(RobotTask robotTask)
          Get all pending QueueEntries which belong to robotTask
 java.util.Vector getPendingQueueEntries(SimpleUri uri)
          Use this method to get the Vector of QueueEntries for this uri
 QueueEntry getPendingURL(SimpleUri uri, RobotTask robotTask)
          Use this method to get the QueueEntry with the same url from the URLQueue which belongs to the robotTask
 java.lang.String getRobotName()
          Get the name of the robot used in the http requests in the user agent field
 java.util.Vector getRobotTasks()
          Get all robot tasks the robot currently has to handle.
 int getTimeout()
          The timeout for the http connections
static Robot instance()
          Use this method to get an instance of the robot.
 boolean isPendingURL(SimpleUri uri, RobotTask robotTask)
          Use this method to check if an url is pending in the execution of a robotTask
 boolean isValidFileExtension(java.lang.String file)
          Checks the file if the extension is one of the extensions specified in robot.xml
 void printPageLoaderPoolStatus()
          Print the status of the threadpool
 void scan(RobotTask robotTask)
          With this method you can advice the robot to execute a specified task.
 void stopRobotTask(int robotTaskId)
          With this method you can advice the robot to stop a specified task.
 void stopRobotTask(RobotTask robotTask)
          With this method you can advice the robot to stop a specified task.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

instance

public static Robot instance()
Use this method to get an instance of the robot. If there is no robot active, the default constructor of the robot is used.


scan

public void scan(RobotTask robotTask)
With this method you can advice the robot to execute a specified task.

Parameters:
robotTask - specifies what the robot should do.

stopRobotTask

public void stopRobotTask(RobotTask robotTask)
With this method you can advice the robot to stop a specified task. All entries of this task in the urlqueue are deleted. The pages currently downloading are finished.

Parameters:
robotTask - specifies the task to stop.

stopRobotTask

public void stopRobotTask(int robotTaskId)
With this method you can advice the robot to stop a specified task. All entries of this task in the urlqueue are deleted. The pages currently downloading are finished.

Parameters:
id - of the robotTask.

getRobotTasks

public java.util.Vector getRobotTasks()
Get all robot tasks the robot currently has to handle.

Returns:
The vector containing all robotTasks

getNumberOfRobotTasks

public int getNumberOfRobotTasks()
Get the number of robot tasks the robot currently has to handle.

Returns:
number of robotTasks

getRobotName

public java.lang.String getRobotName()
Get the name of the robot used in the http requests in the user agent field

Returns:
name of the robot

getMaxConcurrentRequests

public int getMaxConcurrentRequests()
The maximum number of concurrent requests the robot will send to one server

Returns:
number of concurrent requests

getTimeout

public int getTimeout()
The timeout for the http connections

Returns:
timeout

getMaxNumberOfThreads

public int getMaxNumberOfThreads()
The maximum number of threads in the threadpool

Returns:
max number of threads

getMaxIdleTime

public int getMaxIdleTime()
The number of milliseconds a thread is allowed to be idle before it is killed

Returns:
idle time

printPageLoaderPoolStatus

public void printPageLoaderPoolStatus()
Print the status of the threadpool


getPageLoaderPoolStatus

public PageLoaderPoolStats getPageLoaderPoolStatus()
Get the status of the threadpool

Returns:
status object

getJobsInPageLoaderPool

public int getJobsInPageLoaderPool()
The number of jobs in the threadpool. These jobs may belong to many robotTasks

Returns:
number of jobs in the threadpool

isPendingURL

public boolean isPendingURL(SimpleUri uri,
                            RobotTask robotTask)
Use this method to check if an url is pending in the execution of a robotTask

Parameters:
uri - the URI to check
robotTask - only QueueEntries of the task a considered during the search
Returns:
true, if the url is pending

getPendingURL

public QueueEntry getPendingURL(SimpleUri uri,
                                RobotTask robotTask)
Use this method to get the QueueEntry with the same url from the URLQueue which belongs to the robotTask

Parameters:
uri - the URI to check
robotTask - only QueueEntries of the task a considered during the search
Returns:
the queueEntry which fits to the parameters

getPendingQueueEntries

public java.util.Vector getPendingQueueEntries(SimpleUri uri)
Use this method to get the Vector of QueueEntries for this uri

Parameters:
uri - the URI to check
Returns:
the Vector of queueEntries for this uri

getNumberOfPendingQueueEntries

public int getNumberOfPendingQueueEntries(SimpleUri uri)
Get the number of QueueEntries for this uri

Parameters:
uri - the URI to check
Returns:
number of QueueEntries for this Uri

getPendingQueueEntries

public java.util.Vector getPendingQueueEntries(RobotTask robotTask)
Get all pending QueueEntries which belong to robotTask

Parameters:
robotTask - get entries from this robotTask
Returns:
Vector of queuEntries of queueEntries

getNumberOfPendingQueueEntries

public int getNumberOfPendingQueueEntries(RobotTask robotTask)
Get the number of QueueEntries from this robotTask

Parameters:
url - the URL to check
robotTask - only QueueEntries of the task a considered during the search
Returns:
number of queueEntries

isValidFileExtension

public boolean isValidFileExtension(java.lang.String file)
Checks the file if the extension is one of the extensions specified in robot.xml

Parameters:
file - the file to check
Returns:
true, if the extension is valid

getNextRobotTaskId

public static int getNextRobotTaskId()
Get the next unique RobotTask ID

Returns:
id