Work flow engine

DataScraper's core is a proprietary work-flow engine which schedules the processors declared in a data and clue extraction workflow file. There is a global context container which holds data or commands to be exchanged among the processors.

Each of the processors have same structure. The external interfaces are as follows:

  • interface for retrieving data generated by the system or other processors from the context container;
  • interface for outputing data to the context container to be used by other processors;
  • interface for retrieving commands initiated by the system or other processors. The commands may be to interrupt the whole work flow, to skip the current processor, a token to direct the running of current processor. Tokens are most popular commands. Each processor must check the token to decide on how to run or on how to transfer tokens to succeeding processors.
  • interface for sending a command to another processor.