Classifying cosmic-ray proton and light groups in LHAASO-KM2A experiment with graph neural network

HTML

--> --> -->

3.Graph neural network

3.1.Graph neural network overview

-->

3.1.Graph neural network overview

GNN architectures are specialized to effectively analyze graph-structured data. Many of them adopt the concept from convolution networks and design their graph convolution operations. In comparison with different graph convolution schemes, most GNN models are classified into two categories, including the spectral and spatial domains [31]. The spectral methods are formulated based on the graph signal processing theory [32,33], where the graph convolution is interpreted as filtering the graph signal on a set of weighted Fourier basis functions. Spatial methods explicitly aggregate the information from the neighbor through the weighted edges.
Suppose an undirected, connected, weighted graph is denoted as $ G = \{ V, E, A\} $, which consists of a set of vertices $ V $, a set of edges $ E $, and a weighted adjacency matrix $ A $. The spectral-based approach is defined based on the normalized graph Laplacian, defined as $ L = I - D^{-1/2}AD^{-1/2} $, where $ D $ is the diagonal matrix of $ G $. Because the Laplacian $ L $ is a real symmetric positive semidefinite matrix, it can be factored as $ L = U \Lambda U^T $ through the eigenvalue decomposition algorithm. Hence, the Fourier basis $ F(x) = U^T x $ can be used for the graph filtering, and the spectral graph convolution operation is further simplified as

$ \begin{array}{l} x * G \ g_{\theta} = U g_{\theta} U^T x , \end{array} $

(1)

where $ g_{\theta} = {\rm diag}(U^T g) $ is the learnable filter.
Bruna et al. [34] proposed the first spectral convolution neural network (spectral CNN), with the spectral filter $ g_{\theta} = \Theta^k_{i,j} $ as a set of learnable parameters. Because of the high computation complexity of the Fourier basis U, Defferrard et al. [35] proposed the Chebyshev spectral CNN (ChebNet) by introducing Chebyshov polinomials as the filter, i.e., $ g_{\theta} = \sum^K_{i = 1} \theta_i T_k(\tilde{\Lambda}) $, where $ \tilde{\Lambda} = 2\Lambda / \lambda_{\max} - I_N $. Consequently, the ChebNet can avoid computation of the graph Fourier basis and significantly reduce computation complexity. Furthermore, Kipf et al. [36] simplified the ChebNet as a first-order approximation by assuming the $ K = 1 $ and $ \lambda_{\max} = 2 $. The resulting graph convolution is located entirely in the spatial domain.
The spatial-based graph convolution is defined based on the node's spatial relations. Following the idea of "correlation with template", the graph convolution relies on employing the local system at each node to extract the patches. Masci et al. [37] introduced the geodesic CNN (GCNN) framework, which generalizes the CNN into the non-Euclidean manifolds. Boscaini et al. [38] considered it as the anisotropic diffusion process. Monti et al. [39] generalized those spatial-domain networks and proposed mixture model networks (MoNet), a generic framework deep learning in the non-Euclidean domains. In this framework, a spatial convolution layer is given by a template-matching procedure as

$ \begin{array}{l} (f * g)(x) = \displaystyle\sum\limits^{J}_{j = 1} g_j D_j(x) f . \end{array} $

(2)

The patch operator in Eq. (2) assumes the form

$ \begin{array}{l} D_j(x)f = \displaystyle\sum\limits_{y \in N(x)} \omega_j(u(x,y))f(y), j = 1, \cdots, J \ , \end{array} $

(3)

where J represents the dimension of the extracted graph; $ x $ denotes a point in the graph or the manifold, and $ y \in N(x) $ represents the neighbors of $ x $. $ u(x,y) $ associates the node with the pseudo coordinate, and $ \omega_j(u) $ is the weighting function parameterized by learnable parameters.
The definition of the patch operator associates MoNet with other spatial-based graph convolutional models through the choice of the pseudo coordinate $ u(x,y) $ and the weighting function $ \omega_j(u(x,y)) $. Consequently, those spatial-based methods can be considered as particular instances of the MoNet. In particular, a convenient choice of the weighting function is the Gaussian kernel

$ \begin{array}{l} \omega_j(u) = \exp \left(-\frac12 (u - \mu_j)^T \Sigma^{-1}_j (u - \mu_j)\right), \end{array} $

(4)

where $ \Sigma_j $ and $ \mu_j $ are the learnable $ d \times d $ and $ d \times 1 $ covariance matrix and mean vector of a Gaussian kernel, respectively.
Spectral-based methods have their mathematical foundations in the graph signal processing; however, high computational costs are involved in calculating the Fourier transform. Spatial-based methods are intuitive by directly aggregating information from the neighbors, and they have the potential to handle large graphs. In contrast, as the Laplacian-based representation is required for spectral convolution, a learned model cannot be applied on another different graph, while the spatial-based convolution can be shared across different locations and structures. Because the CR EAS event changes its location, direction, and energy, the spatial-domain method is suitable to analyze the LHAASO-KM2A experiment.
2

3.2.Graph neural network on LHAASO-KM2A

-->

3.2.Graph neural network on LHAASO-KM2A

LHAASO-KM2A detectors can record the arrival time and photoelectron amplitude of shower secondary particles. The distribution of detector photonelectrons with respect to the distance from the shower core roughly obeys the NKG function [40,41] with the most dense region located at the shower core, while the distribution of arrival times can be parameterized as a plane perpendicular to the direction of the shower. Accordingly, we perform the data preprocessing procedure by reconstructing the event to locate the shower core position ($ x_0 $, $ y_0 $) and direction ($ \theta_0 $, $ \phi_0 $). The photoelectrons are normalized to the reconstructed event energy for an energy-invariant representation, denoted as $ pe $. Because the shower geometry is often treated as a slanted symmetric plane around the shower core, we transform the detector positions ($ x_i $, $ y_i $) into the cylinder coordinate ($ r_i $, $ \phi_i $) with the zero point at the shower core. The shower event along the time axis is represented by the detector's time residual $ {d}T_i $, defined as

$ \begin{array}{l} {d}T_i = T_i - \dfrac{{{ r}_i} \cdot {{ r}_0}}{c \lVert {{ r}_0} \rVert} - T_0 ,\end{array} $

(5)

where $ T_i $ is the recording time by the detector, and $ T_0 $ is the reference time defined as the earliest time along the arrival direction surrounding the shower core within 15 m. $ r_i $ and $ r_0 $ represent vectors of the node position and shower direction, respectively, and $ c $ is the speed of light.
The ED and MD detectors are constructed as independent, weighted, and undirected dense graphs, with each node containing a three-dimensional vector $ [pe_i $, $ dT_i $, $ r_i] $. The collection of these vectors depicts the topology of the event showers. An event graph is shown in Fig. 3 for illustration. As mentioned above, heavier nuclei may interact at a higher altitudes, thus the secondary particles will suffer more Compton scattering and result in the flatter shower fronts than lighter nuclei. The relations are illustrated as $ pe - r $ and $ dT - r $ distributions in Fig. 4, based on the simulation from Section 4. The three-dimensional feature is normalized for each channel independently. We construct the GNN model similar to Refs. [17,39]. The $ n \ \times \ n $ adjacency matrix A is defined by applying the Gaussian kernel to the pairwise distance $ \lVert x_i - x_j \rVert $ between the activated detectors, as follows

Figure3. (color online) Graph-structured LHAASO-KM2A detectors activated by a 500-TeV EAS event, where red dots represent EDs, and blue dots represent MDs. Dot size depicts logarithmic scale of recorded photoelectrons.

Figure4. (color online) Relations among three-dimensional vectors. Left panel: $pe - r$ distribution. Right panel: $dT - r$ distribution. CR groups (P (red), He (violet), CNO (blue), MgAlSi (yellow), Fe (green)) are shown for comparison.

$ \begin{array}{l} d_{ij} = {\rm e}^{- \frac12 (\lVert x_i - x_j \rVert - \mu_t)^2 / \sigma_t^2} , \end{array} $

(6)

$ \begin{array}{l} a_{ij} = \frac{d_{ij}}{\displaystyle\sum\limits_{k \in N} d_{ik}}. \end{array} $

(7)

In Eq. (7), $ a_{ij} $ is the normalized weight element in the adjacency matrix, and $ N $ represents the set of adjacent detectors with respect to the detector $ i $. The $ \mu_t $ and $ \sigma_t $ are learnable parameters, which define the locality of the convolutional kernel. In addition, the diagonal elements in the matrix A are set to zero.
Before implementing the graph convolution layers, we extract the higher-dimensional features from the input vectors through the learnable function, as shown in Eq. (8), where the $ n \ \times \ 3 $ vertex matrix $ v $ converts into the $ n \ \times \ d^{(0)} $ matrix $ x^{(0)} $.

$ \begin{array}{l} x^{(0)} = {\rm ReLu}(W^{(0)} v + b^{(0)}). \end{array} $

(8)

Then, we define a sequence of T convolution layers, as shown in Eq. (10). Each convolution layer $ t $ first aggregates the neighbors by multiplication with the adjacency matrix A and expands the vector from $ d^{(t)} $- to the $ 2 d^{(t)} $- dimension. Subsequently, the weighting function is applied to update the vector into the $ d^{(t+1)} $- dimension. The nonlinear activation function $ {\rm ReLu} $ is employed, except in the last convolution layer $ T $.

$ \begin{array}{l} G{\rm Conv}(x^{(t)}) = W^{(t)} [ x^{(t)}, A x^{(t)} ] + b^{(t)}, \end{array} $

(9)

$ \begin{array}{l} {x^{(t+1)} = \begin{cases} {\rm ReLu}(G{\rm Conv}(x^{(t)})), & t+1 < T \\ G{\rm Conv}(x^{(t)}), & t+1 = T \end{cases}} \end{array} $

(10)

The graph structure is preserved during convolutional operations. In the last convolution layer, i.e., Tth layer, we add a global pooling layer to collect features across the entire graph and compress the graph into a size-invariant representation. The $ n \times d^{(T)} $ feature matrix is averaged and converted into a $ 1 \times d^{(T)} $-dimensional matrix. The definition of the global pooling layer is

$ \begin{aligned} x_i^{\rm (pool)} = \frac1N \sum\limits_{n \in N} x_{ni}^{(T)}. \end{aligned} $

(11)

At the last layer, we employ a linear layer, and the logistic regression is applied to evaluate the event score as the classifier,

$ \begin{array}{l} y = {\rm sigmoid}( W^{\rm (pool)} x^{\rm (pool)} + b^{\rm (pool)} ) , \end{array} $

(12)

where $ x^{\rm (pool)} $ is the $ d^{(T)} $-dimensional feature from the global pooling layer, and $ y $ is the voting score. The activation function $ {\rm sigmoid} $ ensures that the score y spreads within the range $ [0,1] $, where the signal-like or background-like event approaches 1 or 0, respectively.
We construct the GNNs for ED and MD independently, and fuse their outputs through the linear layer in Eq. (12) with the $ x^{\rm (pool)} $ as a $ 2d^{(T)} $-dimensional vector. Independent GNN models for ED and MD are preserved for comparison. The entire GNN architecture is illustrated in Fig. 5.

Figure5. (color online) KM2A GNN model. Upper red network represents GNN ED model, and lower blue network represents GNN MD model. Right-most rectangle contains the fusion operation of the two models (GNN ED+MD) and their independent outputs.

4.Experiment

We employ the Monte Carlo simulation to generate event data for training and evaluating KM2A GNN performance. The primary EAS events are generated by the CORSIKA package with the hadronic model QGSJETII [42]. The KM2A detector simulation is performed based on the Geant4 framework [43, 44]. We generate major CR groups including the Proton (P), Helium (He), medium group (CNO), heavy group (MgAlSi), and Iron (Fe). Total events are generated into four energy fragments, including 10 ~ 100 TeV, 100 ~ 1 PeV, 1 ~ 10 PeV, 10 ~ 100 PeV, with the spectral index of –2.7. Reconstructed energies from 100 TeV to 10 PeV are considered, which cover most of the CR knee region. For each task, these groups are divided into independent signal and background groups, where only P belongs to the signal for the P task, and P&He forms the signal for the L task.
After reconstruction of the simulated events [26], we further select events according to their reconstructed locations and directions. The reconstructed shower core spread inside the KM2A array within the distance 200 ~ 500 m from the array center is selected. We ignore the inner circular area (within 200 m) to suppress the disturbance from the WCDA for the KM2A reconstruction. Further, the reconstructed zenith angle below $ 35^{\circ} $ is also required. Consequently, 105732 events remain for the following analysis. We split the selected events into train, test, and evaluation data sets. In consideration for the data balance, the group ratios for each data set are readjusted to maintain roughly $ 1:1 $ signal-to-noise ratio (SNR). The readjusted data sets for each task are listed in Table 1. The dataset ratio between the two major energy fragments, with 100 TeV ~ 1 PeV and 1 ~ 10 PeV, is around $ 2:1 $.

data set	P		L
data set	signal	background	signal	background
train	14635	14595	24358	23733
test	2875	2831	4754	4713
evaluation	24921	22994	24921	22994

Table1.Number of signal and background events for each dataset.

To train the GNN models, we employ supervised learning techniques with the mean square error (MSE) as the loss function. For each training epoch, the loss is calculated on the test dataset to avoid overfitting. The Adam [45] optimizer is used to optimize the model parameters based on adaptive estimation of low-order moments. The training procedure includes two steps, (i) two independent trainings for the GNN ED and MD models with the learning rate 0.001, and (ii) a subsequent fine-tuning procedure fuses the ED and MD model together with the learning rate 0.0001. It runs over a total of 80 epochs with the model already converged. All code is written in Python using the open-source deep learning framework PyTorch with GPU acceleration. For each model training, four identical candidates with different randomized weights are trained, and the one with the best performance is selected for further processing, which helps suppress the local optimization.

	P	L
baseline	0.836	0.904
GNN MD	0.847	0.93
GNN ED	0.861	0.936
GNN ED+MD	0.878	0.959

	Purity (%) (+stat.+sys.)		Aperture (${\rm m^2 \cdot sr}$) (+stat.+sys.)
handcraft (hybrid) [27]	~90	~95		~1.5e3	~4e3
GBDT (hybrid) [28]	~90	~97	~3.6e3	~7.2e3
baseline (KM2A)	73.4±2.5±2.4	93.20.9±1.1	3.2e5±1.3e3±1.0e4	6.3e5±2.7e3±7.6e3
CNN (KM2A)	75.4±2.5±2.4	93.3±0.9±1.1	3.2e5±1.3e3±1.0e4	6.3e5±2.7e3±7.6e3
GNN MD (KM2A)	77.1±2.3±2.5	95.9±0.6±1.2	3.2e5±1.3e3±1.0e4	6.3e5±2.7e3±7.6e3
GNN ED (KM2A)	82.8±1.9±2.6	96.6±0.6±1.2	3.2e5±1.3e3±1.0e4	6.3e5±2.7e3±7.6e3
GNN ED+MD (KM2A)	84 ±1.9±2.7	98.2±0.4±1.2	3.2e5±1.3e3±1.0e4	6.3e5 ±2.7e3±7.6e3

6.Conclusion

Deep learning has contributed extensively to significant progress in numerous fields. Therefore, we leverage this technology to improve classification performance in the LHAASO -KM2A experiment. We propose a fused GNN model, which constructs independent networks for the KM2A ED and MD arrays, and fuse their outputs for classification. This model is demonstrated to be effective, and its performance outperforms the traditional physics-based method as well as the CNN-based method over the entire energy range. Furthermore, we compare the performance of the GNN framework for independent ED and MD arrays. The ED array is found to behave better than the MD array. We attribute this to the higher density configuration of the ED array. Moreover, in comparison with the LHAASO hybrid detection method, our KM2A GNN model exhibits competitive classification performance. Owing to the large area and full duty cycle of the KM2A array, it can acquire statistics on the order of ~ 870× higher than the hybrid detection.
We thank the LHAASO Collaboration for their support on this project.

本站小编 Free考研考试/2022-01-01

Corresponding author: Chao Jin, jinchao@mail.ihep.ac.cn

HTML

3.1.Graph neural network overview

3.2.Graph neural network on LHAASO-KM2A

相关话题/Classifying cosmic proton

领限时大额优惠券,享本站正版考研考试资料!

Erratum and Addendum: Empirical pairing gaps and neutron-proton correlations (Chin. Phys. C, 43(1):

The first excited single-proton resonance in 15F by complex-scaled Green's function method

Shielding effects in fusion reactions with a proton-halo nucleus

Thermodynamics and weak cosmic censorship conjecture of an AdS black hole with a monopole in the ext

Antiproton production in heavy-ion collisions at subthreshold energies

Isotopic cross-sections in proton induced spallation reactions based on the Bayesian neural network

Weak cosmic censorship conjecture in BTZ black holes with scalar fields

Weak cosmic censorship conjecture and thermodynamics in quintessence AdS black hole under charged pa

Isospin-sensitive observables as a probe of proton transition momentum in the HMT

Exact recession velocity and cosmic redshift based on cosmological principle and Yang-Mills gravity