Help for EVTGEN dataset naming convention and physics_short explanation

rakhi · 18 November 2025 16:32

Please could someone point me to where I can find explanations for the file naming conventions for the EVTGEN datasets (physics_short info)? I can no longer find it.

zmarshal · 18 November 2025 16:55

Hi @rakhi ,

Absolutely, thanks for asking! There’s a little bit here:

But I think we should add some more to the metadata page here:

which I think is where most people would expect it (MR pending now). The updated text on that page will be:

**Physics short**: Short name with information regarding the content of the dataset. Each dataset should have a unique physics short. It begins with a very compact description of the [event generators](/docs/documentation/monte_carlo/simulation_tools) used to create the dataset (e.g. Py8 for Pythia8), normally followed by the PDF and tune, and then some compact description of the physics (e.g. ttbar or Zll). Often when parameters are varied from nominal or in BSM model spaces, the varying parameter values will be included to distinguish datasets (e.g. multiple Z’ datasets with different Z’ masses).

Let me know if there’s more you think should be added there!

Best,

Zach

rakhi · 19 November 2025 12:55

Thanks for the swift response Zach. I agree the general metadata explanation info should be added to the metadata page, but I still think it’s not enough to understand what goes into a specific sample, and whether the events in the sample are inclusive, or complementary to to other samples. e.g. these datasets: (‘513094’, ‘MGPy8EG_Wenu_FxFx3jHT2bias_SW_CFilterBVeto’), and (‘513133’, ‘MGPy8EG_Wenu_FxFx3jHT2bias_SW_105M_CFilterBVeto’), the metadata doesn’t give me enough information to understand:

what HT2 bias means, what is the definition of HT2 being used, what SW means? What CFilterBVeto means exactly
The info in the job path doesn’t help either.

zmarshal · 19 November 2025 14:13

Hi @rakhi ,

Yep, I agree, the details are hard to work out. I fear it’s hopeless for me to try to write the gory details for all 6000 samples, so the best we’re going to be able to do is answer questions, rely on the search functionality in this forum, and help teach people how to read the job options. For the sample you mentioned:

You’ve got a brief description of the job at the top, “aMcAtNlo Wenu+0,1,2,3j NLO FxFx HT2-biased CFilterBVeto Py8 ShowerWeights". That isn’t enough to answer your specific questions, though. At the bottom of the file you can see:

include("MadGraphControl_Vjets_FxFx_shower.py")

include here is an ATLAS invention that literally reads the file specified in, as though it were part of the same file. The first place to look for a file that doesn’t have a directory attached to it is in the same directory as the job options file:

and sure enough we have:

which links to:

Now at the very bottom of that file you can see the precise version of CFilterBVeto (for example):

You have the definition of “SW” in there, which is “Shower weights” (I expect that’s going to be Pythia8 weights for Vincia showers):

As for the HT2 biasing — most of our colleagues just take it for granted that it’s working and doing something sensible Digging out what exactly is happening is relatively non-trivial (even for an ATLAS member). In this case, we have in the MG run card:

bias = event_norm        ! valid settings: average, sum, bias
1.0  = jetalgo   ! FastJet jet algorithm (1=kT, 0=C/A, -1=anti-kT)
1.0  = jetradius     ! The radius parameter for the jet algorithm
8  = ptj           ! Min jet transverse momentum
-1.0  = etaj        ! Max jet abs(pseudo-rap) (a value .lt.0 means no cut)
5   = maxjetflavor

Here’s the actual configuration of the fortran cuts file doing the biasing:

The biasing should just affect the event weights, to give us a more uniform statistical distribution across the HT spectrum.

Not easy to track down! And I hope that makes clear why I don’t want to do that for all the samples But again, I’m happy to answer as many questions about specific samples as you have, and we can build up the body of knowledge here.

Best,
Zach

rakhi · 19 November 2025 14:36

Hi Zach, thanks for the comprehensive response. I guess you’re just advising me to poke around in the opendata directories looking for info. I understand (and agree!) that it’s not worth your time to give detailed documentation of the specifics of implementation for all the datasets, but it would be super helpful to simply have a dictionary of physics_short abbreviations, something like what is given here: 13 TeV 2025 Data — Beta | ATLAS Open Data.

E.g. SW = ‘shower weights’, 105M = ?, maxHTpTV2 = ?, mW_105_ECMS = ?. Ideally, this dictionary would include links to the python files containing more details. It would save everyone a lot of time!