Unable to download simulated data files from CERN opendata

Hello,
When I try to directly download any data file from the CERN opendata website, I am getting a 504 Gateway Time-out The server didn’t respond in time error. I tried to download the files from my command prompt using the command: cernopendata-client download-files --recid (record number of file), which is working for some files such as the Jpsi data. However, it isn’t working for datasets like *Simulated dataset ttGG_8TeV-whizard-pythia6 in AODSIM format for 2012 collision data, Simulated dataset TTGJets_8TeV-madgraph in AODSIM format for 2012 collision data, Simulated dataset tGamma_8TeV_madgraph in AODSIM format for 2012 collision data. What should I do to fix this?
Thanks in advance

2 Likes

Hi! It looks there a problem with downloads from the portal in general. As the service is maintained by the CERN IT, we need to wait that the lab opens and people who can fix can have a look.

Thanks for your patience!

Hi @aarushi-tiwari

The download troubles that you observed were intermittent and I hope that you managed to download all the files of interest by now. (Apologies for my late reply!)

I’d like to share two news on this topic:

  1. We have deployed an improvement to the file serving infrastructure earlier this week so that you should be seeing much less gateway timeout problems than in December.

  2. If you do encounter additional download troubles, you may be perhaps interested in using our command-line client https://cernopendata-client.readthedocs.io/ that automatises downloads of large datasets with file checksum verifications etc.

Most notably, the client allows you to easily use the XRootD protocol for downloads, which is usually more robust than the HTTPS protocol due to larger available bandwidth.

Hello, I am also getting the same error for the dataset 6042. Files with byte size 92, has this error.

504 Gateway Time-out

The server didn't respond in time.

I have used cernopendata-client with and without xrootd, same errors. I am seeing the same error with many other datasets, is there a parallel rate limit being applied?

Is there a better way to automatically retry the errored files?

Thanks


rwxr-xr-x 3 slg slg          3 May 13 17:15 ../
-rw-r--r-- 1 slg slg         92 May 16 12:42 001F37AE-9F90-E211-A64D-0026189438DC.root
-rw-r--r-- 1 slg slg         92 May 16 00:48 0020A31F-E68E-E211-9DEF-00304867918E.root
-rw-r--r-- 1 slg slg  141424048 May 16 14:59 0023C1C8-9989-E211-AA2E-002618FDA248.root
-rw-r--r-- 1 slg slg         92 May 15 12:53 00254B58-D38E-E211-9F3D-0025905938AA.root
-rw-r--r-- 1 slg slg         92 May 15 12:54 004DB2FC-DB8E-E211-99BF-0026189437F9.root
-rw-r--r-- 1 slg slg         92 May 16 00:48 00553ED9-E38E-E211-B2C5-0026189437FA.root
-rw-r--r-- 1 slg slg         92 May 15 12:54 0059BF8F-DF8E-E211-8B2B-003048678BE8.root
-rw-r--r-- 1 slg slg         92 May 16 00:48 005D4A39-D98E-E211-B9F8-003048FFCB6A.root
-rw-r--r-- 1 slg slg  590354158 May 14 22:09 006CA42E-DA8E-E211-9DD6-002618943934.root
-rw-r--r-- 1 slg slg 4277659092 May 13 17:52 00709DF3-9C87-E211-A1CC-00248C0BE018.root
-rw-r--r-- 1 slg slg         92 May 14 22:06 0077D2F0-AC89-E211-9E50-00304867BF9A.root
-rw-r--r-- 1 slg slg   54964738 May 14 22:09 00780BFC-DA8E-E211-A467-00261894394A.root
-rw-r--r-- 1 slg slg 4023351065 May 13 17:52 0082F5DC-7F88-E211-85BC-002618943971.root
-rw-r--r-- 1 slg slg         92 May 16 12:42 0092DDCA-A090-E211-BDB3-003048679010.root
-rw-r--r-- 1 slg slg         92 May 16 00:50 009E6308-D38E-E211-82E4-0025905938AA.root
-rw-r--r-- 1 slg slg 4048270134 May 13 17:50 00A0BA54-A788-E211-94B1-0026189438D9.root
-rw-r--r-- 1 slg slg  203713336 May 14 22:13 00A9CFCC-B089-E211-B357-002618943986.root
-rw-r--r-- 1 slg slg         92 May 16 12:42 00B14741-8890-E211-911A-003048678FE4.root
-rw-r--r-- 1 slg slg         92 May 15 12:54 00BE7B79-D48E-E211-84D7-0026189437F5.root
-rw-r--r-- 1 slg slg   77690626 May 14 22:11 00CBCC95-D78E-E211-A36F-002618943947.root
-rw-r--r-- 1 slg slg         92 May 15 12:56 00D48D2B-DC8E-E211-AE82-0026189438C1.root
-rw-r--r-- 1 slg slg  126314464 May 14 22:13 00D65790-D38E-E211-8ECE-0025905938AA.root
-rw-r--r-- 1 slg slg         92 May 16 00:51 00D9E3E6-E38E-E211-AB1A-002590593920.root
-rw-r--r-- 1 slg slg         92 May 16 00:52 00DE3B5F-D78E-E211-94EB-002618943880.root
-rw-r--r-- 1 slg slg         92 May 15 12:57 00E46728-E68E-E211-85B7-0026189438A0.root
-rw-r--r-- 1 slg slg 2318739085 May 13 17:45 00E51259-9887-E211-AF3E-00261894386E.root
-rw-r--r-- 1 slg slg 4008633457 May 13 17:51 00F481E1-A488-E211-8535-003048678BAA.root
-rw-r--r-- 1 slg slg         92 May 15 12:58 00F8FA80-B489-E211-AE8F-002618943956.root
-rw-r--r-- 1 slg slg 4176819149 May 13 17:58 00FEAEEA-CE87-E211-B450-003048678F84.root
-rw-r--r-- 1 slg slg         92 May 15 12:58 00FFAF1E-D38E-E211-8F77-003048FFCBFC.root
-rw-r--r-- 1 slg slg         92 May 16 00:52 0200FD1F-E18E-E211-A4B7-003048678DD6.root
-rw-r--r-- 1 slg slg         92 May 15 12:58 02069067-D38E-E211-A856-003048FFCC18.root
-rw-r--r-- 1 slg slg         92 May 16 12:44 021CA8CE-7890-E211-9113-00304867BF9A.root
-rw-r--r-- 1 slg slg         92 May 15 13:00 022167F7-D88E-E211-9C5F-00261894396E.root
-rw-r--r-- 1 slg slg 3441129040 May 13 18:02 022590AC-8E88-E211-9659-0026189437EB.root
-rw-r--r-- 1 slg slg 4151941050 May 13 18:05 0227E1FA-D487-E211-B83E-002590593902.root
-rw-r--r-- 1 slg slg         92 May 15 13:00 0228431B-E18E-E211-836E-003048D15E14.root
-rw-r--r-- 1 slg slg  171011320 May 14 22:13 022DF9C5-B888-E211-928D-003048679168.root
-rw-r--r-- 1 slg slg         92 May 15 13:01 022E1005-E88E-E211-80B5-0026189438E1.root
-rw-r--r-- 1 slg slg         92 May 16 00:52 02378FAA-DD8E-E211-B3E5-003048D15DB6.root
-rw-r--r-- 1 slg slg         92 May 16 12:44 0247F992-A690-E211-9EAC-0026189438EB.root
-rw-r--r-- 1 slg slg   68565880 May 14 22:11 024BD127-DD8E-E211-BB28-002354EF3BDF.root
-rw-r--r-- 1 slg slg 2476189136 May 13 18:02 024D5430-9396-E211-AA8A-0026189437FA.root
-rw-r--r-- 1 slg slg         92 May 15 13:02 024EB1B9-AE89-E211-A288-00261894394D.root
-rw-r--r-- 1 slg slg         92 May 16 12:46 02520948-A990-E211-A85D-003048678E24.root
-rw-r--r-- 1 slg slg         92 May 16 00:54 02556488-DF8E-E211-BE75-003048678A78.root
-rw-r--r-- 1 slg slg         92 May 16 00:54 025DCA9F-D78E-E211-9AAA-00261894394D.root
-rw-r--r-- 1 slg slg         92 May 16 00:55 02621F4F-E28E-E211-ADEF-00304867901A.root
-rw-r--r-- 1 slg slg         92 May 16 00:55 02686EDB-D38E-E211-8893-003048FFD740.root
-rw-r--r-- 1 slg slg         92 May 16 00:55 026A3250-2A8F-E211-8746-0030486791DC.root
-rw-r--r-- 1 slg slg  121619980 May 14 22:13 027207A2-B089-E211-847F-002618943925.root

Hi @amughal The best download bandwidth is offered by the XRootD protocol. This should not lead to any HTTP gateway error 504 time out situations, since the HTTP gateway is not involved in the data serving chain at all.

Moreover, if you have used the cernopendata-client, then please note that the download-files command offers two options --retry-limit and --retry-sleep that allow to customise the number of file download retries in case something goes wrong.

Here is an example how to download all files attached to record 5500 using the XRootD protocol and doing 20 retries in case something goes wrong:

$ cernopendata-client download-files --recid 5500 --protocol xrootd --retry-limit 20 --verify

Thank you for these options, appreciated. Some datasets have thousands of files, Is there an option to improve throughput by parallelizing the number of downloads?

Unfortunately there is no option to make parallel downloads. You could script something around getting file locations via cernopendata-client get-file-locations --recid 5500 --protocol xrootd and passing them onto an xrdcp loop. Otherwise we can look into adding such an option; please open an issue in the client tracker: Issues · cernopendata/cernopendata-client · GitHub

Thanks

Hello, Please see the following output for a dataset … What is the reason that Progress does not reaches 100%? Thank you

cernopendata-client download-files --verify --protocol http --recid 737

==> Downloading file 1 of 408
→ File: ./737/002CAD35-22B2-E311-A18B-003048CFCBB2.root
→ Progress: 1061919/3213076 KiB (33%)
==> Downloading file 2 of 408
→ File: ./737/0035492F-8EB1-E311-8C02-0025901D4A0E.root
→ Progress: 1056479/2798961 KiB (37%)
==> Downloading file 3 of 408
→ File: ./737/00639082-0BB2-E311-9873-0025907FD430.root
→ Progress: 244970/2431478 KiB (10%)