2School of Mathematical Sciences, Shanghai Key Laboratory of PMMP, Shanghai Key Laboratory of Trustworthy Computing,
3College of Mathematics and Systems Science,
4Department of Physics,
Received:2020-07-31Revised:2020-10-1Accepted:2020-10-21Online:2020-12-18
Abstract
Keywords:
PDF (847KB)MetadataMetricsRelated articlesExportEndNote|Ris|BibtexFavorite
Cite this article
Jun Li(李军), Yong Chen(陈勇). A physics-constrained deep residual network for solving the sine-Gordon equation. Communications in Theoretical Physics, 2021, 73(1): 015001- doi:10.1088/1572-9494/abc3ad
1. Introduction
Solving nonlinear evolution equations computationally plays a very important role in physics and engineering. Approximating these equations using deep learning rather than traditional numerical methods has been studied [1–3]. Han et al [1] introduce a deep learning approach to approximate the gradient of the unknown solution. However, Raissi et al [2, 3] approximate the latent solution using deep neural networks directly. Recently, we also explore the applications in other evolution equations such as the Burgers equation (it should be emphasized that strictly speaking, we just study kink-type solitary wave solutions to this equation), the Korteweg–de Vries (KdV) equation, the modified KdV equation, and the Sharma–Tasso–Olver equation [4, 5] where these integrable equations have exact and explicit solutions to test the accuracy of the solutions via neural network methods. These data-driven solutions are closed analytic, differentiable and easy to be used in subsequent calculations compared with traditional numerical approaches. Moreover, compared with other traditional numerical approaches, the deep learning method does not require the discretization of spatial and temporal domains.Some experiments show that the model in [2] could not approximate the solutions to some equations including highly nonlinear source terms such as $\sin (u)$ very well and then the model in [3] usually cannot represent the solution dynamics when using some activation functions. Thus, it is very interesting to improve the network framework to solve or alleviate these problems. In this paper, we propose a physics-constrained deep residual network that combines the deep residual neural network [6, 7] with some underlying laws of physics. To our knowledge, it is the first framework combining the residual network with the underlying physical laws. This framework is easier to optimize than the classical feedforward networks. Moreover, it can increase gradient flow and then alleviate the gradient vanishing/exploding problems through adding the skip connection accordingly. It can also be used to train very deep networks to improve the performance of the network. By the way, here we just use a simple identity connection, more complex connections between these network layers are enable [8].
Specifically, in this paper, we just consider the sine-Gordon equation [9–11] which includes a highly nonlinear source term:
The paper is organized as follows. In section
2. Method
The residual block usually adds the output of a series of layers to the input of the block directly. Formally, in this work, we consider a residual block defined asFigure 1.
New window|Download| PPT slideFigure 1.This figure sketches how the residual block works.
In addition, we know that this residual structure is just a special case of the Euler forward scheme [12]
In this work, we approximate directly the unknown solution u using a deep residual network composed of three residual blocks and accordingly define a network
From equation (
Figure 2.
New window|Download| PPT slideFigure 2.The comparison of some common loss functions.
Some studies indicate that orthogonal initializations can restrict unit variance to improve the training procedure and the performance of architecture, but for most applications, Xavier initialization [17] used in this paper is enough. The activation function is what makes a neural network nonlinear. This nonlinearity property makes the network more expressive. Specifically, in this paper, the second-order derivatives of the solution with respect to the spatial variable x and the temporal variable t are needed, so ReLU ($\max \{0,x\}$) evidently does not work. Then, the data are usually rescaled to [−1, 1]. The sigmoid function (σ, $1/(1+\exp (-x))$), however, restricts the output to [0, 1]. Thus, this function also does not work well. Moreover, most of the derivative values of this type of S-shaped functions including σ and tanh tend to 0 which will lose too much information and lead to the vanishing gradient problem to some extent. In addition, ReLU can not represent complicated and fine-grained details. Some numerical experiments demonstrate that they indeed cannot recover the solution dynamics correctly. We think that other different choices of weight initializations and data normalizations may also affect the selection of the activation functions. So, in this paper, we choose some periodic functions such as the sinusoid functions as the activation functions [18]. For $\sin (x)$ which is zero centered, its derivative $\cos (x)$ is just a shifted sine function.
All experiments in this work are conducted on a MacBook Pro computer with 2.4 GHz Dual-Core Intel Core i5 processor.
3. Numerical Results
The soliton phenomena exist widely in physics, biology, communication and other scientific disciplines. The soliton solution (namely, kink or antikink) of the sine-Gordon equation we mention here, which is distinctly different from the KdV soliton (bell shaped), represents topologically invariant quantities in a system. Specifically, the exact one antikink solution is given by the Bäcklund transformation:For the antikink solution (
From figure 3, we obviously know that the $\mathrm{logcosh}$ function as the loss objective is significantly better than the square function. Moreover, the algorithm only takes much less iterations for the former to achieve its optimal performance, that is to say, the function can accelerate the convergence of the optimization algorithm.
Figure 3.
New window|Download| PPT slideFigure 3.The comparison of the loss (taking the natural logarithm) curves when the physics-constrained deep residual network is trained with different loss functions.
Figure 4 graphically shows the antikink evolution of the sine-Gordon equation. The top panel of figure 4 compares between the exact dynamics and the predicted spatiotemporal behavior. The model achieves a relative ${{\mathbb{L}}}_{2}$ error of size 6.09e-04 in a runtime of about 10 min where the error is defined as ∥utrue−upred∥2/∥utrue∥2. More detailed assessments are presented in the bottom panel of figure 4. In particular, we present a comparison between the exact solutions and predicted solutions at the three different instants t=1.24, 3.74, 8.76. The result indicates that the model can accurately capture the antikink dynamics of the sine-Gordon equation. Moreover, we can observe the reconstructed single antikink motion better from figure 5.
Figure 4.
New window|Download| PPT slideFigure 4.The antikink solution to the sine-Gordon equation. Top: an exact antikink is compared to the predicted solution of the learned model (right panel). The model correctly captures the dynamics and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 6.09e-04. Bottom: the comparison of the predicted solutions and the exact solutions at the three different temporal snapshots depicted by the white vertical lines in the top panel is presented.
Figure 5.
New window|Download| PPT slideFigure 5.The antikink solution to the sine-Gordon equation. (a) The spatiotemporal behavior of the reconstructed antikink; (b) the spattiotemporal dynamics of the corresponding potential where the potential is given by v=−ux.
Additionally, lots of numerical experiments show that our model is very robust for different random initializations (see table 1).
Table 1.
Table 1.Relative ${{\mathbb{L}}}_{2}$ errors under different random seeds.
1.79e-03 | 2.02e-03 | 1.77e-03 | 7.87e-04 |
1.88e-03 | 3.20e-03 | 1.61e-03 | 3.45e-04 |
New window|CSV
Next, the performance of this framework in the presence of noise is compared. By adding some small amounts of noise
Figure 6.
New window|Download| PPT slideFigure 6.The comparison of the relative ${{\mathbb{L}}}_{2}$ errors of the prediction results under small perturbations with different noise levels.
As the noise level increases, however, the accuracy of the predicted solution decreases and the training time increases remarkably. Here, for noisy data, we can increase the numbers of training data to decrease the relative error and then improve the accuracy of the predicted solution. The experimental comparison of the influence of different numbers of sub-sampled data points on the prediction under different noise levels is given in table 2.
Table 2.
Table 2.Relative ${{\mathbb{L}}}_{2}$ errors for different numbers of data points under the distortion of different noise levels.
Noise | 0% | Noise | 1% | ||||||
---|---|---|---|---|---|---|---|---|---|
Nf | Nf | ||||||||
Nu | 1000 | 5000 | 10 000 | 20 000 | Nu | 1000 | 5000 | 10 000 | 20 000 |
10 | 3.67e-01 | 1.47e-01 | 1.02e-01 | 2.27e-01 | 10 | 5.17e-01 | 1.41e-01 | 8.58e-02 | 5.34e-02 |
50 | 6.98e-03 | 2.12e-03 | 1.14e-03 | 1.68e-03 | 50 | 1.35e-02 | 5.71e-03 | 5.81e-03 | 3.75e-03 |
100 | 2.91e-03 | 1.48e-03 | 1.37e-03 | 1.23e-03 | 100 | 9.52e-03 | 5.95e-03 | 6.92e-03 | 3.55e-03 |
200 | 1.07e-03 | 1.21e-03 | 9.73e-04 | 6.09e-04 | 200 | 3.60e-03 | 1.42e-03 | 2.59e-03 | 2.03e-03 |
New window|CSV
Generally speaking, with more layers and more neurons, the model has a better performance [22]. So, we carry out a lot of numerical experiments in order to check this empirical result. See table 3 for the comparison with different numbers of hidden layers and neurons per hidden layer. Some more detailed theoretical analyses are also conducted [23, 24].
Table 3.
Table 3.Relative ${{\mathbb{L}}}_{2}$ errors for different numbers of hidden layers and neurons per hidden layer under a fixed random seed.
Neurons | ||||
---|---|---|---|---|
Hidden layers | 10 | 20 | 40 | 80 |
4 | 5.09e-01 | 7.01e-04 | 4.98e-04 | 1.21e-03 |
8 | 1.76e-03 | 1.47e-03 | 9.65e-04 | 8.43e-04 |
12 | 7.91e-04 | 7.87e-04 | 6.09e-04 | 7.05e-03 |
16 | 6.65e-04 | 3.38e-04 | 3.70e-04 | 2.16e-03 |
New window|CSV
Last, we also train the model using the Adam [25] optimizer with default parameter setups to approximate the antikink solution to the sine-Gordon equation. The training procedure takes approximately 2.5 h with Nu=100 and Nf=10 000 under 30 000 epochs. The relative ${{\mathbb{L}}}_{2}$ error between the exact and predicted solutions is 1.47e-02. The experimental result shows that the L-BFGS algorithm is much faster than Adam in this case and it gets a more accurate solution. However, the former sometimes suffers from some convergence issues.
4. Conclusions and discussion
In this paper, we propose a new architecture that combines deep residual network with underlying physical laws for extracting soliton dynamics of the sine-Gordon equation from spatiotemporal data. This architecture can be used easily to train very deep networks and then alleviates the gradient exploding and vanishing problems. Moreover, we use the $\mathrm{logcosh}$ function rather than the square function in the objective in this paper in order to accelerate the training and improve the performance of the network. The numerical results show that the model could reconstruct the solution behaviors of the equation very accurately. Moreover, this model is remarkably robust under small disturbances to some extent.Despite some progress, we are still at the early stages of understanding the capabilities and limitations of such deep learning models. In addition, other advanced frameworks, for example, generative adversarial networks, recurrent neural networks and networks with some numerical schemes embedded, will also be considered in the future research.
Acknowledgments
The first author would like to express his sincere thanks to Dr Yuqi Li and Dr Xiaoen Zhang for their valuable comments and excellent suggestions on this work. The authors gratefully acknowledge the support of the National Natural Science Foundation of China (Grant No. 11675054), Shanghai Collaborative Innovation Center of Trustworthy Software for Internet of Things (Grant No. ZF1213) and Science and Technology Commission of Shanghai Municipality (Grant No. 18dz2271000).Reference By original order
By published year
By cited within times
By Impact factor
DOI:10.1073/pnas.1718942115 [Cited within: 2]
[Cited within: 2]
DOI:10.1016/j.jcp.2018.10.045 [Cited within: 3]
72
DOI:10.1088/1572-9494/aba243 [Cited within: 1]
72
DOI:10.1088/1572-9494/abb7c8 [Cited within: 1]
[Cited within: 1]
[Cited within: 1]
DOI:10.1007/s10851-019-00922-y [Cited within: 1]
DOI:10.1103/PhysRevLett.30.1262 [Cited within: 1]
DOI:10.1063/1.1286770
DOI:10.1016/j.chaos.2004.05.003 [Cited within: 1]
DOI:10.13189/ms.2017.050101 [Cited within: 1]
[Cited within: 1]
[Cited within: 1]
[Cited within: 1]
DOI:10.1007/BF01589116 [Cited within: 1]
[Cited within: 1]
[Cited within: 1]
DOI:10.1080/00401706.1987.10488205 [Cited within: 1]
DOI:10.1103/PhysRevB.15.1578 [Cited within: 1]
DOI:10.1016/j.physleta.2005.10.079 [Cited within: 1]
[Cited within: 1]
DOI:10.1007/s10955-017-1836-5 [Cited within: 1]
[Cited within: 1]
[Cited within: 1]