djehuty
djehuty is the data repository system developed by and for 4TU.ResearchData. The name finds its inspiration in Thoth, the Egyptian entity that introduced the idea of writing.
The source code can be downloaded at the Releases1 page. Make sure to download the djehuty-0.0.1.tar.gz file.
Or, directly download the tarball using the command-line:
After obtaining the tarball, it can be unpacked using the tar command:
The djehuty program needs Python (version 3.6 or higher) and Git to be installed. Additionally, a couple of Python packages need to be installed. The following sections describe installing the prerequisites on various GNU/Linux distributions. To put the software in the context of its environment, figure 1.1 displays the complete run-time dependencies from djehuty to glibc.
figureRun-time references when constructed with the packages from GNU Guix.
The web service of djehuty stores its information in a SPARQL 1.1 (“SPARQL 1.1 Overview”, 2013) endpoint. We recommend either Blazegraph2 or Virtuoso open-source edition3 .
The Python packages on Enterprise Linux version 7 or higher seem to be too far out of date. So installing the prerequisites involves two steps.
The first step involves installing system-wide packages for Python and Git.
The second step involves using Python’s venv module to install the Python packages in a virtual environment:
After obtaining the source code (see section 1.1 ‘Obtaining the source code’) and installing the required tools (see section 1.2 ‘Installing the prerequisites’), building involves running the following commands:
To run the make install command, super user privileges may be required. Specify a --prefix to the configure script to install the tools to a user-writeable location to avoid needing super user privileges.
After installation, the djehuty program will be available.
djehuty processes its information using the Resource Description Framework (Lassila, 1999). This chapter describes the parts that make up the data model of djehuty.
Throughout this chapter, abbreviated references to ontologies are used. Table 2.1 lists these abbreviations.
Abbreviation | Ontology URI |
djht | |
rdf | |
rdfs | |
xsd | |
In addition to abbreviating ontologies with their prefix we use another notational shortcut. To effectively communicate the structure of the RDF graph used by djehuty we introduce a couple of shorthand notations.
When the object in a triple is typed, we introduce the shorthand to only show the type, rather than the actual value of the object. Figure 2.1 displays this for URIs, and figure 2.2 displays this for literals.
figureShorthand notation for triples with an rdf:type which features a hollow predicate arrow and a
colored type specifier with rounded corners.
Literals are depicted by rectangles (with sharp edges) in contrast to URIs which are depicted as rectangles with rounded edges.
figureShorthand notation for triples with a literal, which features a hollow predicate arrow and a
colored rectangular type specifier.
When the subject of a triple is the shorthand type, assume the subject is not the type itself but the subject which has that type.
To preserve the order in which lists were formed, the data model makes use of rdf:List with numeric indexes. This pattern will be abbreviated in the remainder of the figures as displayed in figure 2.3.
figureShorthand notation for rdf:List with numeric indexes, which features a hollow double-arrow.
Lists have arbitrary lengths, and the numeric indexes use 1-based indexing.
The hollow double-arrow depicts the use of an rdf:List with numeric indexes.
Datasets play a central role in the repository system because every other type links in one way or another to it. The user submits files along with data about those bytes as a single record which we call a djht:Dataset. Figure 2.4 shows how the remainder of types in this chapter relate to a djht:Dataset.
figureThe RDF pattern for a djht:Dataset. For a full overview of djht:Dataset properties, use the
exploratory from the administration panel.
Datasets are versioned records. The data and metadata between versions can differ, except all versions of a dataset share an identifier. We use djht:DatasetContainer to describe the version-unspecific properties of a set of versioned datasets.
figureThe RDF pattern for a djht:DatasetContainer. All versions of a dataset share a
djht:dataset_id and a UUID in the container URI.
The data model follows a natural expression of published versions as a linked list. Figure 2.5 further reveals that the view, download, share and citation counts are stored in a version-unspecific way.
djehuty uses an external identity provider, but stores an e-mail address, full name, and preferences for categories.
When the djht:Dataset originated out of a funded project, the funders can be listed using djht:Funding. Figure 2.7 displays the details for this structure.
Lassila, O. (1999, February). Resource description framework (RDF) model and syntax specification [W3C Recommendation]. (http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/)
SPARQL 1.1 overview [W3C Recommendation]. (2013, March). (http://www.w3.org/TR/2013/REC-sparql11-overview-20130321/)