Indexing
Step 3: Indexing
The final step bridges the gap between a complex graph and a fast, searchable web interface.
- The QLever Mirror: Before indexing begins, an automated job dumps the Virtuoso graph into QLever. This provides a high-performance environment for the intensive SPARQL queries required for the next stage.
- Graph-to-Relational Mapping: A specialized repository queries QLever (instead of Virtuoso) to leverage its superior read performance. It uses the internal data model to flatten complex graph connections (e.g., links between Documents, People, and Organizations) into an OpenSearch index.
Categorization Logic
To keep the UI organized, the indexer filters graph nodes into 8 primary categories based on their internal model types. If a graph node does not match one of these types, it remains in the Knowledge Graph but is not indexed for the UI.
| Category | Model Types / Vocabularies |
|---|---|
| Datasets | Dataset |
| Documents | Article, ScholarlyArticle, Book, Chapter, Text, Periodical, Thesis, Report |
| Software | SoftwareApplication, SoftwareSourceCode |
| Organizations | Organization |
| Institutions | Organization, ResearchOrganization, EducationalOrganization, MedicalOrganization, FundingAgency, Corporation, ArchiveOrganization, NGO, GovernmentOrganization |
| Experts | Person |
| Events | Event |
| Instruments | Specialized URIs: SociologicalInstrument (DDI-Discovery) and PhysicalInstrument (DataCite) |
Note: While most categories are derived from Schema.org, the Instruments category uses specialized external vocabularies (e.g., DDI Discovery and DataCite) to ensure domain-specific precision. Types like
CreativeWorkorIntangibleexist within the Knowledge Graph but are not currently indexed for the UI.
