Who to contact about errors in a dataset?

I am wondering if there is anywhere to seek help with a published dataset. The dataset which is listed here (CERN Open Data Portal) shows that there should only be certain values in column 10, but there are tons of files which have values outside of the given range…

For one concrete example, in the following filename there values outside of (-99, -11, 11, 0) which are listed in the above link as the possible values in that column. There are many such files which prevents any model from learning correctly. I think the model trained in the paper on the above linked page must have used a correct version and there were probably errors somewhere when it was uploaded to the portal.

Is there anyway I can get help on this dataset?

50d9d051-6f3e-43f4-883f-20642f8c1c8d_nevts1_evtid00000093_graphcnn_2l_3j.csv

Hi @jeffwillette - thanks for reporting this issue. You’re referring to an ATLAS open data set. I don’t know who is responsible on their side, but I hope @tiborsimko can help you with that.

Hello, I’m sorry about the confusion here. These are b- and c-jet labels, pid == 5 and pid==4 respectively. We were experimenting with including them and did not intend to include them. For the analysis in the note, we set their training label to 0 (same as a jet). If you do this, it should converge. Let me know if you have more questions.
Taylor

1 Like