1.Key Laboratory of Particle Astrophysics, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China 2.University of Chinese Academy of Science, Beijing 100049, China 3.Institut für Astronomie und Astrophysik, Eberhard Karls Universit?t, Tübingen 72076, Germany 4.Kunming University, Kunming 650214, China 5.Institut de Physique Nucléaire d’Orsay, IN2P3-CNRS, Université Paris-Sud, Université, Paris-Saclay, Orsay Cedex 91406, France 6.China Nuclear Power Engineering Co., Ltd., Beijing 100084, China Received Date:2019-03-18 Available Online:2019-07-01 Abstract:The Large High Altitude Air Shower Observatory (LHAASO) is a composite cosmic ray observatory consisting of three detector arrays: kilometer square array (KM2A), which includes the electromagnetic detector array and muon detector array, water Cherenkov detector array (WCDA) and wide field-of-view Cherenkov telescope array (WFCTA). One of the main scientific objectives of LHAASO is to precisely measure the cosmic rays energy spectrum of individual components from $ 10^{14} $ eV to $ 10^{18} $ eV. The hybrid observation will be employed by the LHAASO experiment, in which the lateral and longitudinal distributions of extensive air shower can be observed simultaneously. Thus, many kinds of parameters can be used for primary nuclei identification. In this paper, high purity cosmic ray simulation samples of the light nuclei component are obtained using multi-variable analysis. The apertures of 1/4 LHAASO array for pure proton and mixed proton and helium (H&He) samples are $ 900 \rm\ m^{2}Sr $ and $ 1800 \rm\ m^{2}Sr $ , respectively. Prospect of obtaining proton and H&He spectra from 100 TeV to 4 PeV is discussed.
HTML
--> --> --> -->
3.1.Simulation
The cascade processes of primary cosmic rays in the atmosphere are simulated with the CORSIKA [20] program, version 6990. The EGS4 model is chosen for electromagnetic interactions. The QGSJET02 and GHEISHA models are chosen for the high and low energy hadronic processes, respectively. The information about Cherenkov photons and secondary particles at the level of the observatory are recorded to simulate hybrid observation. Five components, protons, helium, CNO, MgAlSi, and iron are generated with energies from 10 TeV to 10 PeV according to a power law spectrum with an index of ?2.7. The directions of the showers are from $ 24^{\circ} $ to $ 38^{\circ} $ in zenith, and from $ 77^{\circ} $ to $ 103^{\circ} $ in azimuth. The shower core position is evenly distributed in an area of 260 m$ \times $260 m. The primary energy spectrum is normalized to the exponent of ?2.7 from 10 TeV to 10 PeV, as shown in Fig. 2. Different components are normalized to that of protons. Figure2. (color online) Energy distribution of all simulation events after normalization.
In addition, the five components are also normalized according to the H?randel model. The relationship between the original proton spectrum and the two normalized proton spectra is that the number of original proton events, the number of proton events with the exponent ?2.7 and the number of proton events with the exponent from the H?randel model are the same at 100 TeV. The event number in the other energy bins and other components are normalized in proportion. The normalized statistics of each component are equivalent to the observation data for one year with 10% duty cycle according to the H?randel model [21]. The simulation of the response of three LHAASO detector arrays to EAS is performed separately. In the simulation, the actual coordinates of the detector arrays are transferred to the CORSIKA coordinate system. The program of photon ray-tracing is used for WFCTA simulation. The trigger pattern of the telescope is set to 7, namely seven neighboring pixels should be triggered. The fast simulation program [22] is used for the simulation of WCDA++ and MD. Only photoelectrons are sampled according to the energy, direction and particle id of the secondary particles in EAS. The time response is not included. The simulated results of different detectors are then merged before reconstruction. 23.2.Event reconstruction -->
3.2.Event reconstruction
According to the actual data acquisition, as long as the Cherenkov telescope is triggered, the events measured by three detector arrays are reconstructed. First, the shower core position is given by WCDA++ using NKG fitting [23]. Due to the lack of time response information in the WCDA++ fast simulation program, there is no information about the shower fronts. Therefore, the arrival direction of the shower is obtained by Gauss sampling with an angular resolution of $0.3 ^{\circ} $. Secondly, the perpendicular distance between the telescope and the shower axis ($ R_p $) is calculated. The Cherenkov image cleaning is then performed. The first step is to remove the fired pixels which contain less than 30 photoelectrons in the image. The second step is to find a fired pixel with the largest number of photoelectrons, and use it as the center for projecting the whole image outward. Then, all isolated fired pixels are removed. After the image cleaning, the Hillas parameters [24], are obtained, as well as the total number of photoelectrons ($ N^{pe} $) in the image. $ N^{pe} $ is a good energy estimator. Before energy reconstruction, $ N^{pe} $ should be normalized to $ R_p = 0 $ and $ \alpha = 0 $, namely
where $ \alpha $ is the space angle between the shower direction and the optical axis of the telescope, and the parameters $ a $ and $ b $ depend on the primary component of cosmic rays. Finally, the lateral distribution of muons is fitted by an empirical formula Eq. (2) using maximum likelihood fitting [9]. The normalized muon size is given by
where $ k_1\cong1.4 $, $ k_2\cong1.0 $, $ r_G\cong220\;{\rm m} $. These three parameters are reconfirmed according to the layout of the MD array. 23.3.Event selection -->
3.3.Event selection
There are three basic principles for event selection: first, the shower core position must fall in WCDA++; second, the Cherenkov image is complete; and third, the muon lateral distribution is well fitted. Specifically, the reconstructed shower core located within $ 130\;{\rm m}\times 130\;{\rm m} $ around WCDA++ is selected. For the Cherenkov image, the number of fired pixels must be more than 10, and the angular distance between the weighted center of the image and the center of the camera plane less than $ 6^{\circ} $. For muon lateral distribution fitting, the fitted normalized muon size ($ k_G N_{\mu} $) should be more than $ 10^{-7} $. The number of events in each energy bin after selection is shown in Fig. 3 (left). The dotted line represents the number of proton events in the simulation, which is consistent with the black line in Fig. 2. The solid lines represent the number of events after selection, and different colors represent different components. Fig. 3 (left) shows that the hybrid observation becomes nearly fully efficient above 100 TeV for all components. Figure3. (color online) Left: energy distribution after event selection. Right: detection efficiency for protons and H&He.
The detection efficiency is defined as the ratio of the number of selected events to the number of simulation events in each energy bin, i.e. the black solid line divided by the blue dotted line in Fig. 3 (left). In our simulations, the detection efficiency is about 15% above 100 TeV for all components, as shown in Fig. 3 (right). The red dots show the detection efficiency of H&He and the black dots show the the detection efficiency for protons. It is obvious that the error bars of the last two points in Fig. 3 (right) are large due to small statistics. Therefore the results for $ \lg(E/{\rm TeV}) > 3.6 $ are masked in the subsequent analysis. -->
4.1.Component sensitive parameters
34.1.1.Parameters from WFCTA -->
4.1.1.Parameters from WFCTA
It is known that the iron induced air showers are larger, develop faster and have a smaller interaction mean free path than the primary particles, which travel in the atmosphere [9]. Thus, the proton induced shower develops its maximum later than the iron shower. The atmospheric depth of the shower maximum, $ X_{\max} $, is a traditional mass sensitive parameter. We use $ {\Delta}{\theta} $, the parameter used to reconstruct $ X_{\max} $, instead of the reconstructed $ X_{\max} $. $ {\Delta}{\theta} $ is the angular distance between the direction of the arriving shower and the gravity center of the Cherenkov image. It can not be used directly to classify the primary particles because of the $ R_{p} $ and shower energy ($ N_{0}^{pe} $) dependence. After normalization, the structure of the mass sensitive parameter $ P_{X} $ is as follows:
Moreover, the proton induced showers exhibit an elongated elliptic shape [24]. The ratio of the length and width of the Cherenkov image is also a traditional effective parameter:
The distributions of $ P_{X} $ and $ P_{C} $ for proton and iron showers are shown in Fig. 4. Figure4. (color online) Distributions of mass sensitive parameters $P_{X}$ (left) and $P_{C}$ (right) for the proton (red line) and iron (blue) induced showers.
34.1.2.Parameters from MD -->
4.1.2.Parameters from MD
The muon size $ N_{\mu} $ of a shower heavily depends on the atomic number ($ A $) of the primary particle: $ N_{\mu}^{A}/N_{\mu}^{p} \approx A^{(1-\eta )} $, where $ \eta $ is approximately 0.9 [9] , and $ p $ indicates the proton shower. Therefore, the fitted muon size ($ N_{\mu}^{F} $), the total number of detected muons ($ N_{\mu}^{M} $) and the number of fired MDs (NMD) are significant variables for identification of the primary particles:
$ Rlp $ is the perpendicular distance between the center of the MD array and the shower axis. The distributions of the three parameters $ P_{\mu1} $, $ P_{\mu2} $ and $ P_{\mu3} $ for proton and iron showers are shown in Fig. 5. Figure5. (color online) Distributions of mass sensitive parameters $P_{\mu1}$ (up-left), $P_{\mu2}$ (up-right), $P_{\mu3}$ (middle-left), $P_{F2}$ (middle-right), $P_{F3}$ (bottom-left) and $P_{F4}$ (bottom-right) for the proton (red line) and iron (blue) induced showers.
34.1.3.Parameters from WCDA -->
4.1.3.Parameters from WCDA
It is also known that the iron induced shower is more extensive at the same observation level. This means that the lateral extension of secondary particles in iron induced air shower is larger. WCDA++, the fully covered detector array, can detect the lateral distribution of secondary particles well. The formulas for the average lateral distribution $ <ER> $ and its $ {\rm RMS} $ are given using Eq. (2) to Eq. (5) as
where $ R_{i} $ is the distance between the shower core position and the $ ith $ fired cell in WCDA++ and $ Pe_{i} $ are the photoelectrons measured by the $ ith $ fired cell, and
The distributions of the three parameters $ P_{F2} $, $ P_{F3} $ and $ P_{F4} $ for proton and iron showers are shown in Fig. 5. In addition, near the shower core axis, the particle density of the iron-shower is lower than that of the proton shower [6]. Hence, the number of photoelectrons in the brightest cell ($ N_{\max} $) measured by WCDA++ is sensitive to components:
The distributions of the two parameters $ P_{F} $ and $ P_{E} $ for proton and iron showers are shown in Fig. 6. Figure6. (color online) Distributions of mass sensitive parameters $P_{F}$ (left) and $P_{E}$ (right) for the proton (red line) and iron (blue) induced showers.
Other parameters have also been studied, such as the photoelectrons measured by the brightest pixel in air Cherenkov image ($ P_{C1} $), the total photoelectrons located 45 meters away from the shower core in WCDA++ ($ P_{F1} $), the muon density detected at 80 m to 100 m away from the shower core position ($ P_{\mu4} $), etc. Nevertheless, the particle classification of these variables is weak. After tuning of the parameters, the above ten parameters are used as input for the BDTG classifiers. 34.1.4.Correlation of parameters -->
4.1.4.Correlation of parameters
The parameters described above are not independent. There is a correlation between some of them. Parameters provided by the same detector array are highly correlated, such as $ P_{F} $ and $ P_{F2} $, $ P_{\mu1} $ and $ P_{\mu3} $. The correlations of parameters from WCDA and MD for five components of cosmic rays are shown in Fig. 7. The parameters $ P_{F} $ and $ P_{F2} $ have negative correlation (left plot), while $ P_{\mu1} $ and $ P_{\mu3} $ are position correlated (right plot). According to the development characteristics of EAS, it is easy to understand that as the lateral distribution of the secondary particles becomes more extensive there are less particles near the shower core. Also, as more muon detectors are fired there is a larger muon content in the shower. Figure7. (color online) Correlations of mass sensitive parameters from WCDA (left) and MD (right).
The correlation of parameters provided by the different detector arrays is weak, such as $ P_{F} $ and $ P_{X} $, and $ P_{\mu1} $ and $ P_{F4} $. The correlation of parameters from different detector arrays for five components are shown in Fig. 8. The distributions are approximately circular. However, the correlation between parameters of different components is slightly different. The parameters for protons tend to be more correlated, as shown by black dots in Fig. 8. Figure8. (color online) Correlations of mass sensitive parameters from different detector arrays.
The correlation matrix of the ten variables for proton induced showers is shown in Fig. 9. The correlation coefficient of 100% means a linear positive correlation, and ?100% means linear negative correlation. A zero correlation coefficient means no correlation among parameters. It should be noted that the parameter $ P_{C} $ is basically independent of other parameters, as shown in the third column or in the eighth row in Fig. 9. This is because $ P_{C} $ reflects the overall development characteristic, not just the lateral or longitude distribution of the air shower. Figure9. (color online) Linear correlation matrix for the input variables of proton events.
After removing the highly correlated parameters (97% or 98% correlation coefficient), six parameters, $ P_{C} $, $ P_{X} $, $ P_{F} $, $ P_{F4} $, $ P_{\mu2} $ and $ P_{\mu3} $, are used for TMVA training and analysis. 24.2.TMVA analysis -->
4.2.TMVA analysis
Drawing on the experience of ARGO-YBJ/WFCT hybrid detection, two parameters are used for particle identification [27] in the event-by-event cut. However, having too many variables is not suitable for particle identification when using event-by-event cuts because it is complicated to get an optimal cut for each parameter. Therefore, MVA is used. As an important branch of statistics, multivariate analysis has been applied in most disciplines. TMVA is specially developed for high energy physics based on the ROOT integrated environment. It is powerful for signal and background classification. In accelerator physics, it can effectively screen out the b-tagging signals from a large number of background particles in a jet [28]. Likewise, it enables to identify the components of cosmic rays [29]. TMVA is based on machine learning. It integrates multiple advanced algorithms, such as Boosted Decision Trees (BDT), Artificial Neural Networks (ANN), Support Vector Machine (SVM), etc. Users need to input variable samples and select the algorithm, and the machine training and testing are then carried out. The result is one variable which is used by the user to select signals. Here, Boosted Decision Trees with Gradient (BDTG) is chosen, which is the most widely used algorithm. The classification of protons and H&He from other mass components is carried out independently. The selected hybrid events are divided into two equal parts. One is used for TMVA training and testing and the other is used as data. The statistics of the signal (proton or H&He) and background (others) are shown in Table 1.
NO. of events
proton
H&He
signal
40215
61098
background
91330
70447
Table1.The number of events for BDTG classifier training and testing.
To avoid overtraining, several parameters are adjusted to achieve best performance of the classifier. Parameters of BDTG classifiers for the separation of protons from the other nuclei are as follows: -Number of trees in the forest is 380; -Minimum training events required in the leaf node are 30; -Maximum depth of the decision tree is 2. Parameters of BDTG classifiers for the separation of H&He from the other nuclei are as follow: -Number of trees in the forest is 300; -Minimum training events required in the leaf node are 50; -Maximum depth of the decision tree is 2. The other parameters are set at their default values. The training results are shown in Fig. 10. $ {\rm H\&He} $ can be well separated from the other components. However, separation of protons from the other nuclei is barely satisfactory. The background rejection versus signal efficiency ("ROC curve") is obtained by cuts in the BDTG classifier outputs for proton and H&He samples, as shown in Fig. 11. It is obvious that for the same signal efficiency, the background rejection of the other heavy nuclei (H&He vs. other nuclei) is higher. Figure10. (color online) Training results of BDTG classifier for protons (left) and H&He (right).
Figure11. (color online) ROC curves for BDTG classifier for protons (red line) and H&He (black line).
24.3.Results -->
4.3.Results
The training results are applied to the other half of the data sample. Generally, signal events can be selected according to the best cut points provided by TMVA. However, because our final goal is to obtain selection efficiency with different energy bins, the distribution of the output BDTG classifier versus reconstructed energy is studied first. As shown in Fig. 12, the separation of the output BDTG classifiers between H&He and heavy nuclei becomes larger as the primary energy increases. The mean value of the signal (red dots) in each energy bin is slightly higher than the background (black dots), and the RMS of the signal gradually decreases. The reason for larger separation is that the performance of the input parameters gets better at higher energies. Figure12. (color online) Distribution of the BDTG output parameter as a function of reconstructed energy. The error bars show the RMS in each bin.
To get a coincident selection efficiency, the cut values are scaled following Eq. (19) and Eq. (20):
$\rm Cut~ Value (H) = 0.09\times lg(Energy/TeV)+0.55, $
(19)
$\rm Cut~ Value (H\&He) = 0.33\times lg(Energy/TeV)-0.26. $
(20)
After the cut described above, the aperture and contamination of hybrid observation are calculated. The aperture for a pure proton is about $ 900\;{\rm m}^{2}\rm Sr $ , and the aperture for H&He is about $ 1800\;{\rm m}^{2}\rm Sr $, as shown in Fig. 13 (left). The contamination of the proton sample is about 10% , and the contamination of H&He sample is less than 3% according to the H?randel model, as shown in Fig. 13 (right). Figure13. (color online) The final selected aperture (left) and contamination (right) for protons and H&He.
Considering the uncertainty due to the hadronic interaction models, two batches of data, QGSJET-FLUKA and EPOS-FLUKA, were used for error analysis. It is found that the aperture for H&He is consistent within $ \pm 5 $%.