Skip to main content

Indexing

Step 3: Indexing

The final step bridges the gap between a complex graph and a fast, searchable web interface.

  • The QLever Mirror: Before indexing begins, an automated job dumps the Virtuoso graph into QLever. This provides a high-performance environment for the intensive SPARQL queries required for the next stage.
  • Graph-to-Relational Mapping: A specialized repository queries QLever (instead of Virtuoso) to leverage its superior read performance. It uses the internal data model to flatten complex graph connections (e.g., links between Documents, People, and Organizations) into an OpenSearch index.

Categorization Logic

To keep the UI organized, the indexer filters graph nodes into 8 primary categories based on their internal model types. If a graph node does not match one of these types, it remains in the Knowledge Graph but is not indexed for the UI.

CategoryModel Types / Vocabularies
DatasetsDataset
DocumentsArticle, ScholarlyArticle, Book, Chapter, Text, Periodical, Thesis, Report
SoftwareSoftwareApplication, SoftwareSourceCode
OrganizationsOrganization
InstitutionsOrganization, ResearchOrganization, EducationalOrganization, MedicalOrganization, FundingAgency, Corporation, ArchiveOrganization, NGO, GovernmentOrganization
ExpertsPerson
EventsEvent
InstrumentsSpecialized URIs: SociologicalInstrument (DDI-Discovery) and PhysicalInstrument (DataCite)

Note: While most categories are derived from Schema.org, the Instruments category uses specialized external vocabularies (e.g., DDI Discovery and DataCite) to ensure domain-specific precision. Types like CreativeWork or Intangible exist within the Knowledge Graph but are not currently indexed for the UI.