Periodic Tasks

Docspell Documentation

Context and Problem Statement🔗

Currently there is a Scheduler that consumes tasks off a queue in the database. This allows multiple job executors running in parallel racing for the next job to execute. This is for executing tasks immediately – as long as there are enough resource.

What is missing, is a component that maintains periodic tasks. The reason for this is to have house keeping tasks that run regularily and clean up stale or unused data. Later, users should be able to create periodic tasks, for example to read e-mails from an inbox or to be notified of due items.

The problem is again, that it must work with multiple job executor instances running at the same time. This is the same pattern as with the Scheduler: it must be ensured that only one task is used at a time. Multiple job exectuors must not schedule a perdiodic task more than once. If a periodic tasks takes longer than the time between runs, it must wait for the next interval.

Considered Options🔗

  1. Adding a timer and nextrun field to the current job table
  2. Creating a separate table for periodic tasks

Decision Outcome🔗

The 2. option.

For internal housekeeping tasks, it may suffice to reuse the existing job queue by adding more fields such that a job may be considered periodic. But this conflates with what the Scheduler is doing now (executing tasks as soon as possible while being bound to some resource limits) with a completely different subject.

There will be a new PeriodicScheduler that works on a new table in the database that is representing periodic tasks. This table will share fields with the job table to be able to create RJob records. This new component is only taking care of periodically submitting jobs to the job queue such that the Scheduler will eventually pick it up and run it. If the tasks cannot run (for example due to resource limitation), the periodic scheduler can't do nothing but wait and try next time.

CREATE TABLE "periodic_task" (
  "id" varchar(254) not null primary key,
  "enabled" boolean not null,
  "task" varchar(254) not null,
  "group_" varchar(254) not null,
  "args" text not null,
  "subject" varchar(254) not null,
  "submitter" varchar(254) not null,
  "priority" int not null,
  "worker" varchar(254),
  "marked" timestamp,
  "timer" varchar(254) not null,
  "nextrun" timestamp not null,
  "created" timestamp not null
);

Preparing for other features, at some point periodic tasks will be created by users. It should be possible to disable/enable them. The next 6 properties are needed to insert jobs into the job table. The worker field (and marked) are used to mark a periodic job as "being worked on by a job executor".

The timer is the schedule, which is a systemd-like calendar event string. This is parsed by this library. The nextrun field will store the timestamp of the next time the task would need to be executed. This is needed to query this table for the newest task.

The PeriodicScheduler works roughly like this:

On startup:

  • Remove stale worker values. If the process has been killed, there may be marked tasks which must be cleared now.

Main-Loop: 0. Cancel current scheduled notify (see 4. below)

  1. get next (= earliest & enabled) periodic job
  2. if none: stop
  3. if triggered (= nextrun <= 'now'):
  • Mark periodic task. On fail: goto 1.
  • Submit new job into the jobqueue:
    • Update nextrun field
    • Check for non-final jobs of that name. This is required to not run the same periodic task multiple times concurrently.
      • if exist: goto 4.
      • if not exist: submit job
  • Unmark periodic task
  1. if future
  • schedule notify: notify self to run again next time the task schedule triggers