The two components have one prerequisite in common: they both require Java to run. While this is the only requirement for the REST server, the Joex components requires some more external programs.
The rest server and joex components are not required to "see" each other, though it is recommended.
Very often, Java is already installed. You can check this by opening a
terminal and typing java -version
. Otherwise install Java using your
package manager or see this site for
other options.
It is enough to install the JRE. The JDK is required, if you want to build docspell from source.
Docspell has been tested with Java version 1.8 (or sometimes referred to as JRE 8 and JDK 8, respectively). The pre-build packages are also build using JDK 8. But a later version of Java should work as well.
The next tools are only required on machines running the Joex component.
gs
command)
is used to extract/convert PDF files into images that are then fed
to ocr. It is available on most GNU/Linux distributions.The performance of unoconv
can be improved by starting unoconv -l
in a separate process. This runs a libreoffice/openoffice listener and
therefore avoids starting one each time unoconv
is called.
On Debian this should install all joex requirements:
sudo apt-get install ghostscript tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng unpaper unoconv wkhtmltopdf ocrmypdf
SOLR is used to provide the fulltext search feature. This feature can be disabled, so installing SOLR is optional. But without it, there is no fulltext search.
When installing manually (i.e. not via docker), just install solr and create a core as described in the solr documentation. That will provide you with the connection url (the last part is the core name).
When using the provided docker-compose.yml
setup, SOLR is already setup.
SOLR must be reachable from all joex and all rest server components.
Both components must have access to a SQL database. The SQL database contains all data (including binary files) and is the central component of docspell. Docspell has support these databases:
The H2 database is an interesting option for personal and mid-size setups, as it requires no additional work. It is integrated into docspell and works really well out of the box. It is also configured as the default database.
When using H2, make sure that all components access the same database
– the jdbc url must point to the same file. Then, it is important to
add the options
;MODE=PostgreSQL;DATABASE_TO_LOWER=TRUE;AUTO_SERVER=TRUE
at the end
of the url. See the config page for
an example.
For large installations, PostgreSQL or MariaDB is recommended. Create a database and a user with enough privileges (read, write, create table) to that database.