Procedure for adding tags to an open data file assembled with CMSSW_7_6_7

I’ve noticed that many of the TUNE files appear not to contain any tag information.
For example: CERN Open Data Portal
In this file all tag information is filled in with the placeholder value of -1000.

I have added the appropriate RecoBTag package to my open data CMSSW_7_6_7 container but I’m unsure of how to actually fill in the btag information and make sure the tags are assigned properly.

I’m particularly interested in the slimmedJetsPuppi branch of each file in that record which seems to have slots of the tags pfImpactParameter, pfSecondaryVertex, pfInclusiveSecondaryVertexFinder, softPFMuons, softPFElectrons, pfInclusiveSecondaryVertexFinderCvsL while other pat::Jet branches seem to have placeholders for other tags. I have confirmed that all of these are available in the RecoBTag python configuration file but I’m unsure how to actually add them.

Is there an existing tool/script that adds these tags and/or a tutorial that demonstrates the process for adding the appropriate tags for an entire root file or specific branch for this particular CMSSW_7_6_7 release? Most of the demo’s I’ve found seem to relate to some specific problems or other CMSSW releases and it’s hard to tell what is relevant to this problem.

This source provides an example for a 7_4 release and seems to show a way to remake the pat::Jet objects to add tags but I’m not sure what parts are directly usable.
https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCMSDataAnalysisSchool2015BTaggingExercise#Part_II_Remaking_b_tag_discrimin

Hi,

So the two collections of interest for “standard” radius jets are “slimmedJets” and “slimmedJetsPuppi”. They are the same inputs to the anti-kt jet clustering algorithm, but have different pileup mitigation techniques.

Which b tag are you looking for? If you’re interested in the standard CMS algorithm, here’s an example of how to access it with our Physics Object Extractor Tool software:
https://github.com/cms-opendata-analyses/PhysObjectExtractorTool/blob/2015MiniAOD/PhysObjectExtractor/src/JetAnalyzer.cc#L410.
A tutorial that shows how to use this software is here:
General Physics Objects and POET – Pre-Learning: CMS Physics Objects

In my experience, the most likely reason to get a “-999” when you were expecting real values is a misspelling in the name of the b discriminator.

It would be a surprise to me if that standard tagger was really missing from these files, though I haven’t processed literally this sample through a test analysis like POET. But perhaps if you’re interested in some less standard things we’ll need to set up a configuration to add them. The examples from that b-tagging exercise look good, but I would use the “updateJetCollection” option to just take in slimmedJets or slimmedJetsPuppi and add new b-taggers. The details of updateJetCollection are here:

Let me know if using the method shown in POET allows you to access the standard CSVv2 tagger, and we can go from there if needed.

Regards,
Julie

1 Like

Thanks for the help. I was able to get POET working and it turns out the files contained the btags after all up to and including pfCombinedInclusiveSecondaryVertexV2BJetTags. The original placeholder value results were, as you suspected, a result of passing the wrong b-tag type argument. I was originally trying to access the btag information incorrectly by passing the labels from tagInfoLabels to bDiscriminator. The tag labels give different strings than when I just accessed all b-tag values and names directly by using getPairDiscri. But when I use the name from getPair Discri – bDiscriminator(“pfCombinedInclusiveSecondaryVertexV2BJetTags”) – I can access them properly using ROOT and also FWLite.

I am possibly interested in adding more recent taggers like DeepCSV or DeepFlavour tags as well since those generally seem to be stronger discriminators. Do you know if adding either of those is supported in the UpdateJetCollection tool for that particular CMSSW release?

Also: a small note for anybody wanting to use POET is that it needs to go into the CMSSW_X_Y_Z/src folder or you get all sorts of file pathing issues.

Hi Eric,

That’s great news!

On your message about the POET config settings:

  1. Everything on “jec” and “jer” is general in terms of b-tagging, BUT there are different files for CHS jets (“slimmedJets”) vs PUPPI jets. It was not common in CMS to use PUPPI for the AK4 jets in Run 2, so we likely don’t have those files by default in POET. You can find the PUPPI versions here for MC, and in the corresponding folder one level up for data:
    https://github.com/cms-jet/JECDatabase/tree/master/textFiles/Fall15_25nsV2_MC

  2. For 2015 MiniAOD processing, JetAnalyzer is the only one that exists since the MiniAOD files contain pat::Jets already. So the JetAnalyzer in the POET 2015MiniAOD branch is the best example for this year.

Regarding DeepCSV and DeepFlavour, I don’t believe either discriminator was ever trained for 2015, and I’m not sure if the code required to produce them was ever backported to CMSSW_7_6_7. I’ll reach out to the b-tagging folks and see what they have to say about this. It will certainly be available in the future 2016 data releases.

Regards,
Julie

1 Like