Skip to main content

Activity 1: Collaborative extension of the HKG data model

Description

During this activity, participants collaboratively proposed extensions to the HKG data model in a two-step process. First, they identified relevant use cases and requirements (user: “what would you want to search for?" / data provider: "what of your information should be in the graph?"); second, they suggested how these could be implemented within the model ("what required detail needs to be added to the HKG data model?"). The model was divided into its main components and distributed across the room, allowing participants to move between stations and contribute where they saw potential for extension. With 13 participants, this resulted in 59 extension suggestions and 30 proposals for new concepts, with most contributions focusing on the Dataset component (19).

Presentation slide

Below you can re-visit the slides presented as instructions during the first group activity.

Slides: Activity - Data Model Extension

This browser cannot show the PDF inline. Open Slides: Activity - Data Model Extension.

Results

The following table presents the transcribed results from the activity. The table is structured to preserve relationships between inputs across phases while keeping the representation concise.

Data - Data Model Extension
Model CategoryQuestion 1 (what would you want to know?)Question 2 (how could it be implemented?) Comment
Document1Dataset or Software link ?!!
2Research field / areasystemic overview by re3data!
3discipline
4As a user: What type of document is it?
5fundingvalues: private personell public
6Get me all the papers published by scientists at my organization
7preprint link if already peer reviewed publication
8open accessHow to get access
9short summary of document
0keywords
organizations of co-authors
FAIRness score also for software and dataset
FAIR
AI readinesss
Person1Affiliation - when where
2age religionlast accessnot sure what was meant by this association
3role of the person
4expertise
5family name changes etc.
6how to contact the person?
7Research field
8what scientific outputs that persons contributed to?
9research profile – research directions – research interests
10Top 10 global collaborators
11affiliation role (steward PhD Student…)
12As a provider: how to respect GDPR when I share persons data?
12As a user: present and past affiliations
13As a user: how to contact
How is personal data handled?stand alone entry on pink post it
Event1time
2place
3topic
4vel. Sub event
5description
6registration deadline
7trustable conference?
Instrument1Any type of PID (ISIN at least) where to use/how (land/atmosph for…)
2what can be measured with the instrument?# keywords
year of production
firmware version
3as a user: know specifications of the instrument and the mode it was/is operated
4what is measured
5location
6software version
7what can be measured?
8method of measurement
9ability to accesspoint of contact email
which instruments are recorded?
Organization1Location info of the organization
2PID ROR
3Year of funding
4Funding (Public/Private %)
other1alternative (P)IDs
Dataset1FormatSoftware to access the data
2Provenance! Where does the data come from / how was it then collected?Synthetic data?
3Funding / Affiliation Number of citations use in Helmholtz
4Software needed to analyze
5Provider
6Link to full metadata record
7Quality?
8Time Interval?
9User: variable names in the datasetResearch field
quality of measurements/variables
units of measurements
10Anonym?
AI readinesss
As a user I want to know how the dataset was created . Maybe add example github repo of the corresponding software
As a user I want to know the format but I might also to know conversion tools for other formats – list of conversion tools for the dataset
Possibilty to suggest changes
DataCatalog1How / If I can find „live“ data e.g. time series databasesA field that clarifies if it is a finished dataset or an growing dataset
2What the interface for harvesting?
3What kind of queries I can do
4what metadata schema is used? What information can I retrieve
5What is the scope?
6What data does the catalog accept?
7kind of data stored in the catalog
8Which formats are stored in the catalog?
What happens if data(sets) is detected at the sources?
Research field
Data Formats
covered research field
Software1How to handle changes old versions?
2changelog
3License? GitlabOverview: What am I allowed to do with this SW?
4Project information – if it was developed for a project etcLinked publication (paper)
5Programming language
6Is it linked with Helmholtz SW Indicator
7Use of SoftwareEnv: What are the dependencies to run the software?
Size: Personell use / small institution use / big / in between
8Input / Output
9Contributor
10Is that SW still maintained?modification (Edit: time last modified?)unclear how 2nd relates to 1st
An indicator that SW is alive and in use
info
  • Column 1 lists the existing data model categories.
  • Column 3 (Question 1 / Phase 1) contains initial inputs and ideas.
  • Column 4 (Question 2 / Phase 2) shows refined or connected concepts. Entries in the are in the same row as a Q1 item, if they were placed in a connected way on the poster. Where multiple contributions were made for the same Q1 item, it is listed only once.
  • Column 5 includes additional comments or clarifications from the transcription process.

Synopsis

What worked well: We are very grateful for the active engagement of all participants. Despite the complexity of discussing a semantic data model with a diverse audience, contributions were thoughtful and constructive throughout. Most input focused on areas of the data model that are already comparatively well-developed. This underlines their importance for the Helmholtz KG while helping to distinguish core elements from more accessory components.

What was challenging: The second part of the activity - modeling specific extensions within the data model - proved challenging for many participants. This reflects the inherent difficulty of balancing generality with usability at this level. Nevertheless, the contributions are valuable as they can help identify broader conceptual connections and inform more specific extensions.

Aspects already being considered in the HKG strategy and development: Several topics raised during the activity align with ongoing development directions of the Helmholtz KG. These include the modeling of Helmholtz research fields (e.g. for deriving system-level indicators), the representation of instruments and their scientific purpose, and the structuring of data catalogs and dataset access as part of future model extensions.

New perspectives and future directions: The activity also highlighted aspects that have so far received less attention, including extending organizational information beyond ROR and more detailed representations of software, such as development histories or changelogs.