Do you happen to have samples with ttbar → mu + nu + jets at 13.6 TeV? I’m questioning the wisdom of downloading dataset 601229 (PhPy8EG_A14_ttbar_hdamp258p75_SingleLep) for which I would need to request more events than are currently available anyway, to then cut away the 2/3s that don’t contain a muon, vs generating 20M of my own ttbar events.
Thanks Zach. Okay, I guess I’m just going to bite the bullet and ask for the full dataset then. I’m trying to estimate the fake dimuon rate for 300fb^-1 lumi at 13.6 TeV, so good stats are crucial.
@zmarshal/ openEvtGen team, not trying to be pushy, but is it possible to get higher stats for the ttbar semileptonic dataset with muon filter, dataset id #601229, and within what sort of timeframe? Trying to weigh up waiting with generating my own. Thanks!
How many events are you after? Looks like we’ve processed to HEPMC (internally) 10M and released them all. We have (internally) something like 1B, but of course releasing that is a lot of disk space (even if it saves you a lot of compute).
Just checking: you’re trying to get after fakes from heavy flavor, or from ‘true’ ttbar events, or from something else?
Putting on my ‘what approximations are probably good enough’ hat, I think I would have no objection, if you’re trying to get after fakes from b fragmentation and decay, if you combined all the single lepton ttbar samples we have available (e.g. 601398, 601414, 601497, 604468, 604470, 604472, 604474, 604476, 604478, 604480, 604482). Those will have slightly different setups for hadronization and fragmentation (maybe that’s even a useful thing), and will end up with slightly different random seeds for the fragmentation and decay of the b-quarks, which will create sufficiently different events in those samples for you to use them all. If you were going to use those to try to set a systematic uncertainty from fragmentation and hadronization, of course, that won’t work, but then I’d worry you’ll need higher statistics for lots of samples.
Hi @zach, it looks like 2.5M ttbar events that I generate use 280GB of storage, whereas the same number of MC events downloaded from your dataset requires 1.3TB of storage. I need 20M MC events, and am storage-limited in the short term, so I think that swings it in favour of generating events myself. Any idea what makes your datasets so much larger? Do you think it’s the specific generator (e.g. MGaMC vs Sherpa)? Or the hadronization settings (MPI turned off for me)?