ATLAS open data HZZAnalysis notebook broken

Hello,

I tried to run the HZZAnalysis Jupyter notebook, but found that the cell that loads data from a ROOT file throws an error. The cell in question is:

# Accessing the file from the online directory (":mini" opens the tree in a desired manner)
with uproot.open(data_A_path + ":mini") as t:
    tree = t

# There are 39 entries in the tree
print(tree.num_entries)

# We can view all the information stored in the tree using the .keys() method.
print(tree.keys())

# We can also view the entire tree using the .arrays() method
# This generates a 39-entry list of dictionaries
print(tree.arrays()) 

The error is pasted below. Moving the print(tree.arrays()) statement into the with block solves it, so there seems to be an issue with the file needing to be open to access the arrays.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[10], line 13
      9 print(tree.keys())
     11 # We can also view the entire tree using the .arrays() method
     12 # This generates a 39-entry list of dictionaries
---> 13 print(tree.arrays()) 

File /srv/conda/envs/notebook/lib/python3.11/site-packages/uproot/behaviors/TBranch.py:888, in HasBranches.arrays(self, expressions, cut, filter_name, filter_typename, filter_branch, aliases, language, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library, ak_add_doc, how)
    885                 ranges_or_baskets.append((branch, basket_num, range_or_basket))
    887 interp_options = {"ak_add_doc": ak_add_doc}
--> 888 _ranges_or_baskets_to_arrays(
    889     self,
    890     ranges_or_baskets,
    891     branchid_interpretation,
    892     entry_start,
    893     entry_stop,
    894     decompression_executor,
    895     interpretation_executor,
    896     library,
    897     arrays,
    898     False,
    899     interp_options,
    900 )
    902 # no longer needed; save memory
    903 del ranges_or_baskets

File /srv/conda/envs/notebook/lib/python3.11/site-packages/uproot/behaviors/TBranch.py:3099, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets, interp_options)
   3092     if (
   3093         isinstance(library, uproot.interpretation.library.Awkward)
   3094         and isinstance(interpretation, uproot.interpretation.objects.AsObjects)
   3095         and cache_key in branchid_to_branch
   3096     ):
   3097         branchid_to_branch[cache_key]._awkward_check(interpretation)
-> 3099 hasbranches._file.source.chunks(ranges, notifications=notifications)
   3101 def replace(ranges_or_baskets, original_index, basket):
   3102     branch, basket_num, range_or_basket = ranges_or_baskets[original_index]

File /srv/conda/envs/notebook/lib/python3.11/site-packages/uproot/source/fsspec.py:136, in FSSpecSource.chunks(self, ranges, notifications)
    108 """
    109 Args:
    110     ranges (list of tuple[int, int] 2-tuples): Intervals to fetch
   (...)
    133 chunks to be filled.
    134 """
    135 if self.closed:
--> 136     raise OSError(f"file {self._file_path!r} is closed")
    138 self._num_requests += 1
    139 self._num_requested_chunks += len(ranges)

OSError: file 'https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/4lep/Data/data_A.4lep.root' is closed

Thank you for the report! I see conflicting versions in the notebook earlier as well:

Installing collected packages: numpy
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
coffea 0.7.15 requires awkward<2,>=1.5.1, but you have awkward 2.8.1 which is incompatible.
coffea 0.7.15 requires numpy<1.22,>=1.16.0, but you have numpy 2.2.4 which is incompatible.
numba 0.61.0 requires numpy<2.2,>=1.24, but you have numpy 2.2.4 which is incompatible.

@gguerrie , this looks like your version update — any clues?

Best,
Zach

Hey @kaastran,
Thanks for the report indeed.
First question: are you running the notebook from your personal resources, or using binder, or other infrastructure?

I see that the setup cell in this notebook has not yet been updated to support the packages in the binder/environment.yml list, hence the clash during the installation. We need to fix this.

Could you try to replace the cell in the ā€œFirst time package installation on your computerā€ section with the following?

import yaml
import subprocess
import sys

# Path to your binder/environment.yml file
environment_file = "../../binder/environment.yml"

# Packages you want to install
required_packages = ['uproot', 'awkward', 'vector', 'numpy', 'matplotlib']

# Load the environment.yml file
with open(environment_file, 'r') as file:
    environment_data = yaml.safe_load(file)

# Extract dependencies
dependencies = environment_data.get('dependencies', [])

# Create a list to hold the packages with versions
install_packages = []

# Find the versions for the required packages
for dep in dependencies:
    # Check if the dependency is a string (package name)
    if isinstance(dep, str):
        for package in required_packages:
            if dep.startswith(package):
                install_packages.append(dep)

# Install packages using pip
if install_packages:
    print(f"Installing packages: {install_packages}")
    subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade", "--user"] + install_packages)
else:
    print("No matching packages found in environment.yml.")

At the moment I cannot reproduce your issue (OSError: file 'https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020/4lep/Data/data_A.4lep.root' is closed) by running locally (i.e. it works for me), but I can see that our automated check actually returns the same error as you.

Let me investigate a bit more, it could be an issue related to our storage.

I tried to downgrade NumPy to <2.2, but the problem persisted. I’m not much of a ROOT guy, but I couldn’t figure out a way to load the entire tree into memory. Based on that I changed the code to use

tree = uproot.open(data_A_path + ā€œ:miniā€)

instead of the ā€˜with’ statement, and that solves the problem (though it keeps a file open throughout the notebook, which I don’t know if it can be problematic)

Oh, you posted just as I did! I was running on Binder, have not tried locally. I realise I did run the setup cell even though I’m on Binder - perhaps that broke something?

Edit: I relaunched the notebook and skipped the first cell, but am getting the same error

If you are on Binder, all the packages are already there in principle, so you just need to import them.

Good to know that it works with the tree = uproot.open(data_A_path + ā€œ:miniā€) statement (actually, not very good, but at least you can use the notebook)
We’ll fix the setup cell and possibly the accessibility issue soon.

Just for your information, https://atlas-opendata.web.cern.ch/atlas-opendata/samples/2020 is a placeholder. We are planning to upgrade the filepaths to enable direct reading from the Open Data portal storage, and this will solve a number of issues (included this).

Will keep you posted as soon as we have a Pull Request in place. If you need any further assistance, please let us know in this thread!

And thanks again for pointing this out :smiley:

1 Like