Joex🔗
Introduction🔗
Joex is short for Job Executor and it is the component managing long running tasks in docspell. One of these long running tasks is the file processing task.
One joex component handles the processing of all files of all collectives/users. It requires much more resources than the rest server component. Therefore the number of jobs that can run in parallel is limited with respect to the hardware it is running on.
For larger installations, it is probably better to run several joex
components on different machines. That works out of the box, as long
as all components point to the same database and use different
app-id
s (see configuring
docspell).
When files are submitted to docspell, they are stored in the database and all known joex components are notified about new work. Then they compete on getting the next job from the queue. After a job finishes and no job is waiting in the queue, joex will sleep until notified again. It will also periodically notify itself as a fallback.
Task vs Job🔗
Just for the sake of this document, a task denotes the code that has to be executed or the thing that has to be done. It emerges in a job, once a task is submitted into the queue from where it will be picked up and executed eventually. A job maintains a state and other things, while a task is just code.
Scheduler and Queue🔗
The scheduler is the part that runs and monitors the long running jobs. It works together with the job queue, which defines what job to take next.
To create a somewhat fair distribution among multiple collectives, a collective is first chosen in a simple round-robin way. Then a job from this collective is chosen by priority.
There are only two priorities: low and high. A simple counting
scheme determines if a low prio or high prio job is selected
next. The default is 4, 1
, meaning to first select 4 high priority
jobs and then 1 low priority job, then starting over. If no such job
exists, its falls back to the other priority.
The priority can be set on a Source (see uploads). Uploading through the web application will always use priority high. The idea is that while logged in, jobs are more important that those submitted when not logged in.
Scheduler Config🔗
The relevant part of the config file regarding the scheduler is shown below with some explanations.
The pool-size
setting determines how many jobs run in parallel. You
need to play with this setting on your machine to find an optimal
value.
The counting-scheme
determines for all collectives how to select
between high and low priority jobs; as explained above. It is
currently not possible to define that per collective.
If a job fails, it will be set to stuck state and retried by the
scheduler. The retries
setting defines how many times a job is
retried until it enters the final failed state. The scheduler waits
some time until running the next try. This delay is given by
retry-delay
. This is the initial delay, the time until the first
re-try (the second attempt). This time increases exponentially with
the number of retries.
The jobs will log about what they do, which is picked up and stored
into the database asynchronously. The log events are buffered in a
queue and another thread will consume this queue and store them in the
database. The log-buffer-size
determines the size of the queue.
At last, there is a wakeup-period
that determines at what interval
the joex component notifies itself to look for new jobs. If jobs get
stuck, and joex is not notified externally it could miss to
retry. Also, since networks are not reliable, a notification may not
reach a joex component. This periodic wakup is just to ensure that
jobs are eventually run.
Periodic Tasks🔗
The job executor can execute tasks periodically. These tasks are stored in the database such that they can be submitted into the job queue. Multiple job executors can run at once, only one is ever doing something with a task. So a periodic task is never submitted twice. It is also not submitted, if a previous task has not finished yet.
Starting on demand🔗
The job executor and rest server can be started multiple times. This is especially useful for the job executor. For example, when submitting a lot of files in a short time, you can simply startup more job executors on other computers on your network. Maybe use your laptop to help with processing for a while.
You have to make sure, that all connect to the same database, and that
all have unique app-id
s.
Once the files have been processced you can stop the additional executors.
Shutting down🔗
If a job executor is sleeping and not executing any jobs, you can just
quit using SIGTERM or Ctrl-C
when running in a terminal. But if
there are jobs currently executing, it is advisable to initiate a
graceful shutdown. The job executor will then stop taking new jobs
from the queue but it will wait until all running jobs have completed
before shutting down.
This can be done by sending a http POST request to the api of this job executor:
curl -XPOST "http://localhost:7878/api/v1/shutdownAndExit"
If joex receives this request it will immediately stop taking new jobs and it will quit when all running jobs are done.
If a job executor gets terminated while there are running jobs, the jobs are still in the current state marked to be executed by this job executor. In order to fix this, start the job executor again. It will search all jobs that are marked with its id and put them back into waiting state. Then send a graceful shutdown request as shown above.