Quality Assurance through Data Curation

Every data and software set on DaRUS undergoes a data curation process to ensure quality and connectivity.

Quality Criteria in the Curation Process

Each dataset undergoes a formal quality assurance process, during which the DaRUS team checks the data, metadata and license, and revises and enriches them together with the authors to ensure the findability, accessibility, interoperability and reusability of the data, software and workflows.


Is the structure of the data set sufficiently described? Will it be clear to subsequent users which files in the data set have which role?

  • Check the descriptive metadata (description, notes) and file metadata (folder structure, file descriptions) for elements that provide and explain the structure

Is the content of the data set sufficiently described? Is there information on the context of the creation of the data, the research object under consideration and - if applicable - the variables collected?

  • Check the descriptive metadata (description), the process and engineering metadata, the linking metadata and descriptive files for elements that explain the content
  • Transfer of unstructured information (e.g. from README files) into structured metadata fields

Are longer texts in the descriptive metadata fields (description, notes) sufficiently structured so that they are easy to read and use?

  • Use of HTML elements to make links clickable, structure bulleted lists and make text breaks visible
  • Check for spelling


Are persistent identifiers specified for persons and publications?

  • Addition of ORCID or GND number for authors
  • Addition of DOIs for publications

Are terms from controlled vocabularies, terminology or classifications used in the subject indexing information (keywords, subject classification)?

  • Linking of subject headings with controlled vocabularies (using subject-specific ontology portals such as the terminology services of the NFDI consortia, Wikidata and library classifications such as LCSH, GND)

Does the data set belongs to a text publication? Is it related to other data publications?

  • Linking with text and data publications in a human- and machine-readable way via persistent identifiers (usually DOI) to support the idea of a PID graph

Do all links used resolve and point to a valid resource?

  • Check all existing links


Is information available on which steps must be taken to reproduce the data in the data set?

  • Check of the describing metadata fields description and notes and README files

Are the contained data sizes documented with the units used (e.g. by specifying them in the engineering metadata, as a description or as a README)? Is the tabular data available in a standardized archive format (such as .csv, .hdf5) for which a previewer is available?

  • Checking the files for tabular data
  • Support for conversion to archive formats
  • Checking the file metadata

Does the data set contain source code of research software or code components?

  • Check details of the programming language used, dependencies or further information in the CodeMeta metadata block
  • Archive and link the code with Software Heritage, if the software is also available in a publicly accessible software repository
  • Check the issued license for compatibility
  • Support for linking code repositories back to the data set

Are details of the methods, instruments, environments and software used available?

  • Check and, if necessary, supplement the descriptive process metadata


Is the data stored in an open, long-term readable and/or common file format in the community?

  • Support for converting proprietary formats into archive formats (e.g. csv files or hdf5 for tabular data, TIFF for images, pdf for documents)
  • Preparation of data for optimal use with previewers

Is a long-term contact address provided?

  • Check the contact address(es) provided
  • Recommendation for suitable contact addresses (functional addresses, long-term employees)

Legal Security

Is an existing and correctly linked license assigned?

  • Advice on selecting a suitable license for the data contained (e.g. CC licenses for data or software licenses for software)?
  • Check for existing license information and license conflicts

The information provided by the DaRUS team does not constitute legal advice.

Does the title and/or description indicate that the data/software may be subject to export control?

  • Hint on possible export control problems
  • Referral to the export control officer of the University of Stuttgart

The information provided by the DaRUS team does not constitute legal advice.

Do the title, description and/or data suggest that the data has a personal reference? If so, has it been clarified and documented in the privacy metadata block whether the data has been sufficiently anonymized/pseudonymized and released for publication?

  • Check the files and metadata
  • If necessary, refer to the central data protection authority Zendas

The information provided by the DaRUS team does not constitute legal advice.


To the top of the page