My goal is to create a copy of all the DELPHI data for Open Science II. I am using the cernopendata-client to handle the downloads, and I encountered two errors that I believe are not on my side.
First error:
$ cernopendata-client download-files --doi 10.7483/OPENDATA.DELPHI.6LIH.7UJA --verify --protocol xrootd
==> Downloading file 1 of 4
-> File: ./None/hzha03pyth6156_hattbb_206.5_70_90_22432.xsdst
==> ERROR: Please provide at least one of the following arguments: (recid, doi, title)
But I can just use recid, so going on the second error:
$ cernopendata-client download-files --recid 93773 --verify --protocol xrootd
==> Downloading file 1 of 4
-> File: ./93773/hzha03pyth6156_hattbb_206.5_70_90_22432.xsdst
==> Verifying file hzha03pyth6156_hattbb_206.5_70_90_22432.xsdst…
-> Expected size 29291520, found 29291520
-> Expected checksum adler32:03d9681c, found adler32:3d9681c
==> ERROR: File checksum does not match.
This error is caused by cernopendata-client/cernopendata_client/verifier.py at fc54c028682d149cb81b30532fddaa0bdef5a3ac · cernopendata/cernopendata-client · GitHub where the hex function strips the leading zeroes, but the server checksum retains them. Statistically, this will cause 1/10th of all files to fail the checksum verification. Currently, I am uncertain whether this behaviour is desirable for other use cases. I would be happy to create a pull request otherwise.
I wanted to use the CERN Open Data Portal to retrieve all the recids, but the website and the API powering it are limited to 10 000 results. Even a request like https://opendata.cern.ch/api/records/?q=&sort=mostrecent&size=50&from=10000&experiment=DELPHI&type=Dataset returns a status 400. Can you suggest an alternative or a solution for this limitation?