Activity 1: Collaborative extension of the HKG data model
Description
During this activity, participants collaboratively proposed extensions to the HKG data model in a two-step process. First, they identified relevant use cases and requirements (user: “what would you want to search for?" / data provider: "what of your information should be in the graph?"); second, they suggested how these could be implemented within the model ("what required detail needs to be added to the HKG data model?"). The model was divided into its main components and distributed across the room, allowing participants to move between stations and contribute where they saw potential for extension. With 13 participants, this resulted in 59 extension suggestions and 30 proposals for new concepts, with most contributions focusing on the Dataset component (19).
Presentation slide
Below you can re-visit the slides presented as instructions during the first group activity.
Slides: Activity - Data Model Extension
Results
The following table presents the transcribed results from the activity. The table is structured to preserve relationships between inputs across phases while keeping the representation concise.
Data - Data Model Extension
| Model Category | Question 1 (what would you want to know?) | Question 2 (how could it be implemented?) | Comment | |
|---|---|---|---|---|
| Document | 1 | Dataset or Software link ?!! | ||
| 2 | Research field / area | systemic overview by re3data! | ||
| 3 | discipline | |||
| 4 | As a user: What type of document is it? | |||
| 5 | funding | values: private personell public | ||
| 6 | Get me all the papers published by scientists at my organization | |||
| 7 | preprint link if already peer reviewed publication | |||
| 8 | open access | How to get access | ||
| 9 | short summary of document | |||
| 0 | keywords | |||
| organizations of co-authors | ||||
| FAIRness score also for software and dataset | ||||
| FAIR | ||||
| AI readinesss | ||||
| Person | 1 | Affiliation - when where | ||
| 2 | age religion | last access | not sure what was meant by this association | |
| 3 | role of the person | |||
| 4 | expertise | |||
| 5 | family name changes etc. | |||
| 6 | how to contact the person? | |||
| 7 | Research field | |||
| 8 | what scientific outputs that persons contributed to? | |||
| 9 | research profile – research directions – research interests | |||
| 10 | Top 10 global collaborators | |||
| 11 | affiliation role (steward PhD Student…) | |||
| 12 | As a provider: how to respect GDPR when I share persons data? | |||
| 12 | As a user: present and past affiliations | |||
| 13 | As a user: how to contact | |||
| How is personal data handled? | stand alone entry on pink post it | |||
| Event | 1 | time | ||
| 2 | place | |||
| 3 | topic | |||
| 4 | vel. Sub event | |||
| 5 | description | |||
| 6 | registration deadline | |||
| 7 | trustable conference? | |||
| Instrument | 1 | Any type of PID (ISIN at least) where to use/how (land/atmosph for…) | ||
| 2 | what can be measured with the instrument? | # keywords | ||
| year of production | ||||
| firmware version | ||||
| 3 | as a user: know specifications of the instrument and the mode it was/is operated | |||
| 4 | what is measured | |||
| 5 | location | |||
| 6 | software version | |||
| 7 | what can be measured? | |||
| 8 | method of measurement | |||
| 9 | ability to access | point of contact email | ||
| which instruments are recorded? | ||||
| Organization | 1 | Location info of the organization | ||
| 2 | PID ROR | |||
| 3 | Year of funding | |||
| 4 | Funding (Public/Private %) | |||
| other | 1 | alternative (P)IDs | ||
| Dataset | 1 | Format | Software to access the data | |
| 2 | Provenance! Where does the data come from / how was it then collected? | Synthetic data? | ||
| 3 | Funding / Affiliation | Number of citations use in Helmholtz | ||
| 4 | Software needed to analyze | |||
| 5 | Provider | |||
| 6 | Link to full metadata record | |||
| 7 | Quality? | |||
| 8 | Time Interval? | |||
| 9 | User: variable names in the dataset | Research field | ||
| quality of measurements/variables | ||||
| units of measurements | ||||
| 10 | Anonym? | |||
| AI readinesss | ||||
| As a user I want to know how the dataset was created . Maybe add example github repo of the corresponding software | ||||
| As a user I want to know the format but I might also to know conversion tools for other formats – list of conversion tools for the dataset | ||||
| Possibilty to suggest changes | ||||
| DataCatalog | 1 | How / If I can find „live“ data e.g. time series databases | A field that clarifies if it is a finished dataset or an growing dataset | |
| 2 | What the interface for harvesting? | |||
| 3 | What kind of queries I can do | |||
| 4 | what metadata schema is used? What information can I retrieve | |||
| 5 | What is the scope? | |||
| 6 | What data does the catalog accept? | |||
| 7 | kind of data stored in the catalog | |||
| 8 | Which formats are stored in the catalog? | |||
| What happens if data(sets) is detected at the sources? | ||||
| Research field | ||||
| Data Formats | ||||
| covered research field | ||||
| Software | 1 | How to handle changes old versions? | ||
| 2 | changelog | |||
| 3 | License? Gitlab | Overview: What am I allowed to do with this SW? | ||
| 4 | Project information – if it was developed for a project etc | Linked publication (paper) | ||
| 5 | Programming language | |||
| 6 | Is it linked with Helmholtz SW Indicator | |||
| 7 | Use of Software | Env: What are the dependencies to run the software? | ||
| Size: Personell use / small institution use / big / in between | ||||
| 8 | Input / Output | |||
| 9 | Contributor | |||
| 10 | Is that SW still maintained? | modification (Edit: time last modified?) | unclear how 2nd relates to 1st | |
| An indicator that SW is alive and in use |
- Column 1 lists the existing data model categories.
- Column 3 (Question 1 / Phase 1) contains initial inputs and ideas.
- Column 4 (Question 2 / Phase 2) shows refined or connected concepts. Entries in the are in the same row as a Q1 item, if they were placed in a connected way on the poster. Where multiple contributions were made for the same Q1 item, it is listed only once.
- Column 5 includes additional comments or clarifications from the transcription process.

Synopsis
What worked well: We are very grateful for the active engagement of all participants. Despite the complexity of discussing a semantic data model with a diverse audience, contributions were thoughtful and constructive throughout. Most input focused on areas of the data model that are already comparatively well-developed. This underlines their importance for the Helmholtz KG while helping to distinguish core elements from more accessory components.
What was challenging: The second part of the activity - modeling specific extensions within the data model - proved challenging for many participants. This reflects the inherent difficulty of balancing generality with usability at this level. Nevertheless, the contributions are valuable as they can help identify broader conceptual connections and inform more specific extensions.
Aspects already being considered in the HKG strategy and development: Several topics raised during the activity align with ongoing development directions of the Helmholtz KG. These include the modeling of Helmholtz research fields (e.g. for deriving system-level indicators), the representation of instruments and their scientific purpose, and the structuring of data catalogs and dataset access as part of future model extensions.
New perspectives and future directions: The activity also highlighted aspects that have so far received less attention, including extending organizational information beyond ROR and more detailed representations of software, such as development histories or changelogs.
