Fund Project:Project supported by the National Natural Science Foundation of China (Grant Nos. 11675033, 12075043)
Received Date:18 September 2020
Accepted Date:13 November 2020
Available Online:06 March 2021
Published Online:20 March 2021
Abstract:The jet tagging task in high-energy physics is to distinguish signals of interest from the background, which is of great importance for the discovery of new particles, or new processes, at the large hadron collider. The energy deposition generated in the calorimeter can be seen as a kind of picture. Based on this notion, tagging jets initiated by different processes becomes a classic image classification task in the computer vision field. We use jet images as the input built on high dimensional low-level information, energy-momentum four-vectors, to explore the potential of convolutional neural networks (CNNs). Four models of different depths are designed to make the best underlying useful features of jet images. Traditional multivariable method, boosted decision tree (BDT), is used as a baseline to determine the performance of networks. We introduce four observable quantities into BDTs: the mass, transverse momenta of fat jets, the distance between the leading and subleading jets, and N-subjettiness. Different tree numbers are adopted to build three kinds of BDTs, which is intended to have variable classifying abilities. After training and testing, the results show that the CNN 3 is the neatest and most efficient network under the design of stacking convolutional layers. Deepening the model could improve the performance to a certain extent but it is unable to work all the time. The performances of all BDTs are almost the same, which is possibly due to a small number of input observable types. The performance metrics show that the CNNs outperform the BDTs: the background rejection efficiency increases up to 150% at 50% signal efficiency. Besides, after inspecting the best and the worst samples, we conclude the characteristics of jets initiated by different processes: jets obtained by Z boson decays tend to concentrate in the center of jet images or have a clear differentiable substructure; the substructures of jets from general quantum chromodynamics processes have more random forms and not only just have two subjets. As the final step, the confusion matrix of the CNN 3 indicate that it comes to be kind of conservative. Exploring the way of keeping the balance between conservative and radical is our goal in the future work. Keywords:decays of Z bosons/ quarks/ gluons/ neural network
全文HTML
--> --> --> -->
3.1.卷积神经网络(CNN)
本文卷积块(ConvBlock)由一个卷积层, 一个批归一化层, 一个最大池化层组成. 为了保持输入的尺寸大小不变, 卷积层的填充数设置为1, 卷积步长设置为3. 在这样的设计下, CNN可以有更深的结构. 为了防止模型过于复杂带来的过拟合, 在卷积块的最后添加了丢弃层, 有50%的概率丢弃与之相连的特征图. 总共探索了四种CNN结构: 所包含卷积块的个数分别为2, 3, 4, 5, 分别命名为CNN 1, CNN 2, CNN 3, CNN 4. 展示了包含4个卷积块的CNN 3结构, 如图2所示. 所有的结构都是卷积块层层堆叠组成的, 最后加上一个全连接的分辨层得到输出. 随着层数变深, 中间得到的特征图通道数逐渐增多, 尺寸变小, 直到最后的单像素图. 除了这一种结构, 文献[9, 17]还探索了不同的结构. 在训练过程中采用了Adam优化算法, 学习率设置为0.001, 同样为了防止过拟合采用了早停法, 在20个周期内如果验证集上的损失没有下降的话, 训练将会终止. 此外使用了交叉熵损失函数. 模型由Pytorch搭建而成, 训练使用了Pytorch的高级封装Skorch. 图 2 CNN 3结构示意图, 产生这张图片的程序来自https://github.com/gwding/draw_convnet Figure2. Architecture of the CNN 3. This figure was generated by adapting the code from https://github.com/gwding/draw_convnet.
23.2.增强决策树(BDTs) -->
3.2.增强决策树(BDTs)
为了衡量CNN的分辨效果, 将增强决策树作为基线, 聚集产生的胖喷注的质量、横向动量, 首要和次要亚喷注之间的${\Delta }R$, 以及喷注形状N-subjettiness中的$ {\tau }_{21} $作为它的输入, 图3(a)—(d)分别显示了它们的分布. 我们采用的是Sklearn中的梯度增强决策树(gradient boosted decision tree, GBDT). 其中学习率设置为0.1, 用来训练不同树的样本比例设置为0.9, 每个树的最大深度设置为3. 对于树的个数, 分别采用了100, 200, 300来试图找到最佳的设置. 需要注意的是, 这里出现的并不是全部的设置, 其他的设置可能会出现更好的模型, 这个将在未来进行探索. 图 3 (a)胖喷注的质量分布; (b)胖喷注的横向动量分布; (c)胖喷注含有的首要与次要喷注的距离分布; (d) N-subjettiness $ {\tau }_{21} $的分布 Figure3. (a) Mass distribution of fat jets; (b) transverse momentum distribution of fat jets; (c) distribution of distance between leading and subleading subjets; (d) distribution of N-subjettiness $ {\tau }_{21} $.
其中, $ i $表示输出神经元所代表的输入类别, 0代表背景, 1代表信号; $ o $代表了神经元的本身的输出. 选取信号神经元来查看由不同类别的输入得到的输出分布, 如图5所示. 图中信号的输出大部分集中于1附近, 背景集中于0到0.3附近, 模型可以很好地将它们区分开来. 图 5 CNN 3信号神经元对于信号(橘色)与背景(蓝色)的输出分布 Figure5. Distribution of the signal neuron of the CNN 3 on signal and background samples.
图 7 最优与最差的背景喷注图 Figure7. The best and the worst background jet images.
图 8 CNN 3在测试集上的混淆矩阵, 其中纵坐标代表喷注图的真实类别, 横坐标代表模型预测的类别 Figure8. Confusion matrix of the CNN 3 on the test set. The true label is on the vertical axis, and the predicted label in on the horizontal axis.