# Using DaRUS via API

This notebook gives an introduction how DaRUS can be accessed by Application Programming Interfaces (APIs). This can be helpful when the work with DaRUS shall be automated. It further allows to connect other programms / scripts with the data repository.

We will see two ways of working with the APIs, one using [curl](https://curl.se/), and the other using a Python libray called [pyDataverse](https://github.com/gdcc/pyDataverse).

## Preparation

Assuming that curl is already installed on the system, we need some environment variables for later usage. If you don't have an [API token](https://guides.dataverse.org/en/latest/api/auth.html) so far you can create one by clicking on your account name in the navbar (after login to DemoDaRUS / DaRUS), then select "API Token" from the dropdown menu. In this tab, click "Create Token". It is a common mistake that the server you use and the API token do not match, so be aware that you pick the right API token for the DaRUS instance you work with.

We set these variables also in python, and since pyDataverse is not part of this Jupyter installation, it has to be installed.

curl

In [1]:
# for production use https://darus.uni-stuttgart.de
%env SERVER_URL=https://demodarus.izus.uni-stuttgart.de
# the API token represents your login (password), so keep it secret
%env API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx


env: SERVER_URL=https://demodarus.izus.uni-stuttgart.de
env: API_TOKEN=bef831ab-2e12-453b-9d2d-1f23e8880d24


pyDataverse

In [4]:
SERVER_URL="https://demodarus.izus.uni-stuttgart.de"
API_TOKEN="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

In [3]:
! pip install pyDataverse

Collecting pyDataverse
 Downloading pyDataverse-0.3.1-py3-none-any.whl (32 kB)
Installing collected packages: pyDataverse
Successfully installed pyDataverse-0.3.1


## Creating new datasets

### Option 1: Based on a simple example json file
The Dataverse documentation provides a simple json file that can serve as a basis for creating datasets via the API. You can download the file with 

In [2]:
! wget https://guides.dataverse.org/en/5.5/_downloads/fc56af1c414df69fd4721ce3629f0c03/dataset-finch1.json

--2022-05-30 07:53:04-- https://guides.dataverse.org/en/5.5/_downloads/fc56af1c414df69fd4721ce3629f0c03/dataset-finch1.json
Resolving guides.dataverse.org (guides.dataverse.org)... 18.213.227.1
Connecting to guides.dataverse.org (guides.dataverse.org)|18.213.227.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2346 (2.3K) [application/json]
Saving to: ‘dataset-finch1.json’


2022-05-30 07:53:05 (359 MB/s) - ‘dataset-finch1.json’ saved [2346/2346]



inspect it with an editior, and change it as you like. For automation you can use, e.g. [jq](https://stedolan.github.io/jq/), for bash or the json-library in Python. 
If your data is ready, upload it to the server. For that, you need the dataverse id where the dataset should reside in. You can easily obtain it within the URL of the dataverse.



curl

In [2]:
%%bash
export PARENT=fokus_hod
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-Type: application/json" -X POST "$SERVER_URL/api/dataverses/$PARENT/datasets" --upload-file dataset-finch1.json


{"status":"OK","data":{"id":8038,"persistentId":"doi:10.15770/darus-1315"}}

 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 2380 100 75 100 2305 83 2567 --:--:-- --:--:-- --:--:-- 2650


pyDataverse

In [5]:
from pyDataverse.api import NativeApi
dataset_example = open("dataset-finch1.json").read()
PARENT="fokus_hod"
api = NativeApi(SERVER_URL, API_TOKEN)
resp = api.create_dataset(PARENT, dataset_example)
resp.json()

Dataset with pid 'doi:10.15770/darus-1316' created.


{'status': 'OK',
 'data': {'id': 8039, 'persistentId': 'doi:10.15770/darus-1316'}}

In [6]:
# an example of changing the title programmatically
import json
metadata = json.load(open("dataset-finch1.json"))
fields = metadata["datasetVersion"]["metadataBlocks"]["citation"]["fields"]
for field in fields:
 if field["typeName"] == 'title':
 field["value"] = 'The title of a dataset should be as specific as possible'
 break
resp = api.create_dataset(PARENT, json.dumps(metadata))
resp.json()

Dataset with pid 'doi:10.15770/darus-1317' created.


{'status': 'OK',
 'data': {'id': 8040, 'persistentId': 'doi:10.15770/darus-1317'}}

### Option 2: Based on an already existing dataset
If there is already a well described dataset you want to base on, it's possible to start from there. Download the json-representative, manipulate it and create a new dataset as done above.

curl

In [15]:
%%bash
export PERSISTENT_IDENTIFIER=doi:10.15770/darus-1312
curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-Type: application/json" "$SERVER_URL/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER" | json_pp > "existing_dataset.json" 

 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 1614 100 1614 0 0 24437 0 --:--:-- --:--:-- --:--:-- 24830


Extract the "latestVersion" element from the downloaded json file, rename it to "datasetVersion", and delete every child element except "license", "metadataBlocks" and "termsOfUse". **Note**, the contact email has been removed during export. Since it is a required metadata field for every dataset, you have to add it again within the citation metadata block inside the contact field.

```json
"datasetContactEmail" : {
 "multiple" : false,
 "typeClass" : "primitive",
 "typeName" : "datasetContactEmail",
 "value" : "your-email@example.com"
}
```

In [9]:
%%bash
export PARENT=fokus_hod
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$PARENT/datasets" --upload-file existing_dataset.json

{"status":"OK","data":{"id":8041,"persistentId":"doi:10.15770/darus-1318"}}

 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 3284 100 75 100 3209 69 2982 0:00:01 0:00:01 --:--:-- 3054


pyDataverse

In [16]:
import json
metadata = json.load(open("existing_dataset.json"))
latestVersion = metadata["data"]["latestVersion"]
new_metadata = {"datasetVersion": 
 {"license": latestVersion["license"],
 "metadataBlocks": latestVersion["metadataBlocks"],
 "termsOfUse": latestVersion["termsOfUse"]}}
fields = new_metadata["datasetVersion"]["metadataBlocks"]["citation"]["fields"]
for field in fields:
 if field["typeName"] == 'datasetContact':
 field["value"][0]["datasetContactEmail"] = {
 "multiple" : False,
 "typeClass" : "primitive",
 "typeName" : "datasetContactEmail",
 "value" : "your-email@example.com"}
 elif field["typeName"] == 'title':
 field["value"] = 'Example dataset for joint simulation and experimental data created with pyDataverse'
 
resp = api.create_dataset(PARENT, json.dumps(new_metadata))
resp.json()

Dataset with pid 'doi:10.15770/darus-1319' created.


{'status': 'OK',
 'data': {'id': 8042, 'persistentId': 'doi:10.15770/darus-1319'}}

## Add metadata

It is also possible to add or update the metadata of an existing dataset. This can be useful when several people are working on a dataset. Maybe one person is creating the dataset in the web interface using a template, another person might want to add specific metadata based on log file information programatically. 

The json file you need has a simpler structure than the full representation of a dataset. Add only the fields you want to add to the dataset, e.g.,
```json
{"fields": [
 {"typeName" : "processSoftware",
 "value" : [
 {"processSoftwareName" : {
 "typeName" : "processSoftwareName",
 "value" : "Aquisition software X"}
 }]
 }]
}
```

curl

In [None]:
%%bash
export PERSISTENT_IDENTIFIER=doi:10.15770/darus-XXXX
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata/?persistentId=$PERSISTENT_IDENTIFIER --upload-file dataset-edit-metadata-sample.json

pyDataverse

In [None]:
import json
additional_metadata = {"fields": [
 {"typeName" : "processSoftware",
 "value" : [
 {"processSoftwareName" : {
 "typeName" : "processSoftwareName",
 "value" : "Simulation software X"}
 }]
 }]
}
PID="doi:10.15770/darus-XXXX"
resp = api.edit_dataset_metadata(PID, json.dumps(additional_metadata))
resp.json()

## Upload files

Finally, lets add data to a dataset. The demonstrated way may not work for files larger than 10 GB. If you have problems get in touch with the DaRUS-team.


curl

In [None]:
%%bash

export PERSISTENT_ID=doi:10.15770/darus-xxxx
export FILENAME=simulation.dat
curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F "file=@$FILENAME" -F 'jsonData={"description":"Simulation raw data of a random process using the os.urandom function in Python","directoryLabel":"data/","categories":["Simulation"], "restrict":"false"}' "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_ID"

pyDataverse

In [None]:
import json
PID = "doi:10.15770/darus-xxxx"
filename="random-structure.png"
metadata = {"description": "By jaacker on Pixabay", "restrict": False}
resp = api.upload_datafile(PID, filename, json_str=json.dumps(metadata), is_pid=True)
resp.json()

In [None]:
# pyDataverse does not sent the correct MIME type, so let Dataverse redetect it with the dataFile id from the last output
resp = api.redetect_file_type(identifier='7609', is_pid=False, dry_run=False)
resp.json()

# Further reading
The topics and examples in this tutorial are only a tip of the iceberg of what you can do with Dataverse's APIs. You should be familar now how they work in principle and can further study the [official documentation](https://guides.dataverse.org/en/5.5/api/index.html). Make sure, that the current version of the documentation matches the version of DaRUS. You can see the current version in the footer on the right on each DaRUS page.

Beside further API calls, there is also a list of other [client libraries](https://guides.dataverse.org/en/5.5/api/client-libraries.html) that might be of interest for you.