Activity 1: Collaborative extension of the HKG data model

Description

During this activity, participants collaboratively proposed extensions to the HKG data model in a two-step process. First, they identified relevant use cases and requirements (user: “what would you want to search for?" / data provider: "what of your information should be in the graph?"); second, they suggested how these could be implemented within the model ("what required detail needs to be added to the HKG data model?"). The model was divided into its main components and distributed across the room, allowing participants to move between stations and contribute where they saw potential for extension. With 13 participants, this resulted in 59 extension suggestions and 30 proposals for new concepts, with most contributions focusing on the Dataset component (19).

Presentation slide

Below you can re-visit the slides presented as instructions during the first group activity.

Slides: Activity - Data Model Extension

Results

The following table presents the transcribed results from the activity. The table is structured to preserve relationships between inputs across phases while keeping the representation concise.

Data - Data Model Extension

Model Category		Question 1 (what would you want to know?)	Question 2 (how could it be implemented?)	Comment
Document	1	Dataset or Software link ?!!
	2	Research field / area	systemic overview by re3data!
	3	discipline
	4	As a user: What type of document is it?
	5	funding	values: private personell public
	6	Get me all the papers published by scientists at my organization
	7	preprint link if already peer reviewed publication
	8	open access	How to get access
	9	short summary of document
	0	keywords
			organizations of co-authors
			FAIRness score also for software and dataset
			FAIR
			AI readinesss
Person	1	Affiliation - when where
	2	age religion	last access	not sure what was meant by this association
	3	role of the person
	4	expertise
	5	family name changes etc.
	6	how to contact the person?
	7	Research field
	8	what scientific outputs that persons contributed to?
	9	research profile – research directions – research interests
	10	Top 10 global collaborators
	11	affiliation role (steward PhD Student…)
	12	As a provider: how to respect GDPR when I share persons data?
	12	As a user: present and past affiliations
	13	As a user: how to contact
			How is personal data handled?	stand alone entry on pink post it
Event	1	time
	2	place
	3	topic
	4	vel. Sub event
	5	description
	6	registration deadline
	7	trustable conference?
Instrument	1	Any type of PID (ISIN at least) where to use/how (land/atmosph for…)
	2	what can be measured with the instrument?	# keywords
			year of production
			firmware version
	3	as a user: know specifications of the instrument and the mode it was/is operated
	4	what is measured
	5	location
	6	software version
	7	what can be measured?
	8	method of measurement
	9	ability to access	point of contact email
			which instruments are recorded?
Organization	1	Location info of the organization
	2	PID ROR
	3	Year of funding
	4	Funding (Public/Private %)
other	1	alternative (P)IDs
Dataset	1	Format	Software to access the data
	2	Provenance! Where does the data come from / how was it then collected?	Synthetic data?
	3	Funding / Affiliation	Number of citations use in Helmholtz
	4	Software needed to analyze
	5	Provider
	6	Link to full metadata record
	7	Quality?
	8	Time Interval?
	9	User: variable names in the dataset	Research field
			quality of measurements/variables
			units of measurements
	10	Anonym?
			AI readinesss
			As a user I want to know how the dataset was created . Maybe add example github repo of the corresponding software
			As a user I want to know the format but I might also to know conversion tools for other formats – list of conversion tools for the dataset
			Possibilty to suggest changes
DataCatalog	1	How / If I can find „live“ data e.g. time series databases	A field that clarifies if it is a finished dataset or an growing dataset
	2	What the interface for harvesting?
	3	What kind of queries I can do
	4	what metadata schema is used? What information can I retrieve
	5	What is the scope?
	6	What data does the catalog accept?
	7	kind of data stored in the catalog
	8	Which formats are stored in the catalog?
			What happens if data(sets) is detected at the sources?
			Research field
			Data Formats
			covered research field
Software	1	How to handle changes old versions?
	2	changelog
	3	License? Gitlab	Overview: What am I allowed to do with this SW?
	4	Project information – if it was developed for a project etc	Linked publication (paper)
	5	Programming language
	6	Is it linked with Helmholtz SW Indicator
	7	Use of Software	Env: What are the dependencies to run the software?
			Size: Personell use / small institution use / big / in between
	8	Input / Output
	9	Contributor
	10	Is that SW still maintained?	modification (Edit: time last modified?)	unclear how 2nd relates to 1st
			An indicator that SW is alive and in use

info

Column 1 lists the existing data model categories.
Column 3 (Question 1 / Phase 1) contains initial inputs and ideas.
Column 4 (Question 2 / Phase 2) shows refined or connected concepts. Entries in the are in the same row as a Q1 item, if they were placed in a connected way on the poster. Where multiple contributions were made for the same Q1 item, it is listed only once.
Column 5 includes additional comments or clarifications from the transcription process.

Synopsis

What worked well: We are very grateful for the active engagement of all participants. Despite the complexity of discussing a semantic data model with a diverse audience, contributions were thoughtful and constructive throughout. Most input focused on areas of the data model that are already comparatively well-developed. This underlines their importance for the Helmholtz KG while helping to distinguish core elements from more accessory components.

What was challenging: The second part of the activity - modeling specific extensions within the data model - proved challenging for many participants. This reflects the inherent difficulty of balancing generality with usability at this level. Nevertheless, the contributions are valuable as they can help identify broader conceptual connections and inform more specific extensions.

Aspects already being considered in the HKG strategy and development: Several topics raised during the activity align with ongoing development directions of the Helmholtz KG. These include the modeling of Helmholtz research fields (e.g. for deriving system-level indicators), the representation of instruments and their scientific purpose, and the structuring of data catalogs and dataset access as part of future model extensions.

New perspectives and future directions: The activity also highlighted aspects that have so far received less attention, including extending organizational information beyond ROR and more detailed representations of software, such as development histories or changelogs.

Description​

Presentation slide​

Results​

Synopsis​

Description

Presentation slide

Results

Synopsis