Indexing

Step 3: Indexing

The final step bridges the gap between a complex graph and a fast, searchable web interface.

The QLever Mirror: Before indexing begins, an automated job dumps the Virtuoso graph into QLever. This provides a high-performance environment for the intensive SPARQL queries required for the next stage.
Graph-to-Relational Mapping: A specialized repository queries QLever (instead of Virtuoso) to leverage its superior read performance. It uses the internal data model to flatten complex graph connections (e.g., links between Documents, People, and Organizations) into an OpenSearch index.

Categorization Logic

To keep the UI organized, the indexer filters graph nodes into 8 primary categories based on their internal model types. If a graph node does not match one of these types, it remains in the Knowledge Graph but is not indexed for the UI.

Category	Model Types / Vocabularies
Datasets	`Dataset`
Documents	`Article`, `ScholarlyArticle`, `Book`, `Chapter`, `Text`, `Periodical`, `Thesis`, `Report`
Software	`SoftwareApplication`, `SoftwareSourceCode`
Organizations	`Organization`
Institutions	`Organization`, `ResearchOrganization`, `EducationalOrganization`, `MedicalOrganization`, `FundingAgency`, `Corporation`, `ArchiveOrganization`, `NGO`, `GovernmentOrganization`
Experts	`Person`
Events	`Event`
Instruments	Specialized URIs: `SociologicalInstrument` (DDI-Discovery) and `PhysicalInstrument` (DataCite)

Note: While most categories are derived from Schema.org, the Instruments category uses specialized external vocabularies (e.g., DDI Discovery and DataCite) to ensure domain-specific precision. Types like CreativeWork or Intangible exist within the Knowledge Graph but are not currently indexed for the UI.

Step 3: Indexing​

Categorization Logic​

Step 3: Indexing

Categorization Logic