A deep learning method for solving third-order nonlinear evolution equations

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-02

Jun Li(李军)¹, Yong Chen(陈勇)^,²^,³^,⁴¹ Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, 200062, China
² School of Mathematical Sciences, Shanghai Key Laboratory of PMMP, Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, 200062, China
³ College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
⁴ Department of Physics, Zhejiang Normal University, Jinhua, 321004, China

Received:2020-05-15Revised:2020-07-29Accepted:2020-07-29Online:2020-10-20

Abstract
It has still been difficult to solve nonlinear evolution equations analytically. In this paper, we present a deep learning method for recovering the intrinsic nonlinear dynamics from spatiotemporal data directly. Specifically, the model uses a deep neural network constrained with given governing equations to try to learn all optimal parameters. In particular, numerical experiments on several third-order nonlinear evolution equations, including the Korteweg–de Vries (KdV) equation, modified KdV equation, KdV–Burgers equation and Sharma–Tasso–Olver equation, demonstrate that the presented method is able to uncover the solitons and their interaction behaviors fairly well.
Keywords： deep learning;nonlinear evolution equations;soliton interaction;nonlinear dynamics

PDF (2173KB)Metadata Metrics Related articlesExportEndNote|Ris|Bibtex Favorite
Cite this article
Jun Li(李军), Yong Chen(陈勇). A deep learning method for solving third-order nonlinear evolution equations. Communications in Theoretical Physics[J], 2020, 72(11): 115003- doi:10.1088/1572-9494/abb7c8

1. Introduction

Nonlinear evolution equations, which depend on certain space–time signatures, have a multitude of important applications across broad disciplines including physics, finance and biology. Certain special solutions to such equations can exhibit soliton behaviors, that is, they do not disperse and thus conserve their original forms after the collision [1]. Moreover, interaction between solitons is one of the most fascinating features of many soliton phenomena [2].

While direct numerical solutions to some evolution equations are computationally expensive, with the revival of deep learning, it has attracted much interest on the development of more efficient data-driven solutions to nonlinear evolution equations [3 –5]. As a direction of machine learning, deep learning methods are able to effectively learn the feature representations from raw data [6 –10]. However, to our knowledge, previous works focus mainly on some simple solutions to the given equations, which could not uncover the soliton behaviors under some circumstances. Thus, we propose to combine a neural network framework with some underlying physical laws to reconstruct the soliton solutions.

For a certain amount of physical systems, some nonlinear and dispersive processes compete while the dissipation can be neglected. Therefore, in this paper, we will study nonlinear time-dependent partial differential equations where each contains the dispersive term in addition to other partial derivatives. These equations often play important roles in many scientific applications and physical phenomena. Specifically, we consider the (1+1)-dimensional third-order nonlinear evolution equations of the form(1)

$ \begin{eqnarray}{u}_{t}={ \mathcal N }(u,{u}_{x},{u}_{{xx}},{u}_{{xxx}}),\end{eqnarray}$

in order to solve their soliton solutions, where the subscripts t and x denote the partial derivatives with respect to them and

${ \mathcal N }$

is a nonlinear function of the solution u and its arbitrary-order partial derivatives with respect to the spatial variable x (concretely, in this work, the highest order is three).

Specifically, we approximate the latent solution u with a deep neural network [11 –13] and then compute the derivatives of the network approximation u with respect to time t and space x with the help of automatic differentiation [14, 15].

Consequently, define the residual network(2)

$ \begin{eqnarray}f:= {u}_{t}-{ \mathcal N }(u,{u}_{x},{u}_{{xx}},{u}_{{xxx}}),\end{eqnarray}$

and then the solution network is trained to satisfy the residual constraint(2 ), which plays a role of regularization and is embedded into the mean-squared objective function [16](3)

$ \begin{eqnarray}L=\displaystyle \frac{1}{{N}_{u}}\displaystyle \sum _{i=1}^{{N}_{u}}| u({t}_{u}^{i},{x}_{u}^{i})-{u}^{i}{| }^{2}+\displaystyle \frac{1}{{N}_{f}}\displaystyle \sum _{j=1}^{{N}_{f}}| f({t}_{f}^{j},{x}_{f}^{j}){| }^{2}.\end{eqnarray}$

In this work, we choose the network architecture in a consistent fashion [17]. Specifically, we learn the unknown solution u by using a 13-layer feedforward network with 40 neurons per hidden layer. For the choice of activation functions, we have conducted many experiments for different functions such as tanh, sin, sigmoid (σ) and rectified linear units (ReLU) in different number of layers and neurons. We find that the tanh function is a little unstable. Moreover, the results indicate that the σ and ReLU functions could not represent the data in current settings. So we select sin as the activation function in most cases. In addition, we just tune all parameters of the objective(3 ) using the L-BFGS method [18]. More modern and efficient algorithms can be adopted for larger-scale data, for example, Adam [19], which is a variant of the stochastic gradient descent algorithm. All numerical examples reported here are run on a MacBook Pro computer with 2.4 GHz Dual-Core Intel Core i5 processor and 8 GB memory.

The outline of this paper follows. In section 2, we reconstruct the one-soliton and two-soliton solutions to the KdV equation from data collected from simulations. Consequently, we recover the one-soliton and breather solutions to the mKdV equation in section 3 . In section 4, we then consider the kink solution to the KdV–Burgers equation. In section 5, we focus mainly on the soliton fusion and fission phenomena of the STO equation. Finally, some concluding discussion and remarks are contained in section 6 .

2. The KdV equation

The KdV equation [20, 21] is a canonical model which describes the unidirectional propagation of shallow water waves with certain small amplitude and long wavelength. It also is one of the earliest equations with soliton solutions. The KdV equation can be regarded as a dispersive modification of the Burgers equation and converted by the Cole–Hopf transformation. The dispersion and nonlinearity of this equation balance each other which leads to the wave propagation without losing energy. However, The manifestation of the balance may vary from system to system, thus other evolution equations could have different soliton forms from the KdV equation whose soliton solutions are bell-shaped.

In this section, we consider the KdV equation along with Dirichlet boundary conditions [22 –24] given by(4)

$ \begin{eqnarray}\left\{\begin{array}{l}{u}_{t}+6{{uu}}_{x}+{u}_{{xxx}}=0,x\in [-20,20],t\in [-5,5],\\ u({t}_{0},x)={u}_{0}(x),\\ u(t,-20)=u(t,20)=0,\end{array}\right.\end{eqnarray}$

where u₀ (x) is an arbitrary real-valued function. In this case,

${ \mathcal N }=-6{{uu}}_{x}-{u}_{{xxx}}$

.

Note that this equation and the mKdV equation which will be considered in the next section are both special cases of the generalized KdV equation

$ \begin{eqnarray*}{u}_{t}+{u}_{{xxx}}+{\left({u}^{p}\right)}_{x}=0,\end{eqnarray*}$

where the case p =2 obviously corresponds to the KdV equation and p =3 to the mKdV equation. By the way, these two equations are completely integrable.

2.1. One-soliton solution

Here, we first consider the one soliton problem. Some exact soliton solutions to such nonlinear evolution equations can be expressed in terms of elementary functions and then these solutions are very important for understanding the nonlinearity of these systems better. Meanwhile, they are also useful in testing the performance and accuracy of certain numerical methods. Applying some analytic methods [25, 26], one can show that the exact one-soliton solution to equation (4 ) admits the explicit expression given by

$ \begin{eqnarray*}u(t,x)=\displaystyle \frac{c}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{c}}{2}(x-{x}_{0}-{ct})\right).\end{eqnarray*}$

Specifically, we just set c =3 for convenience. Then, the corresponding initial condition is obtained with a specific initial displacement by(5)

$ \begin{eqnarray}{u}_{0}(x)=\displaystyle \frac{3}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{3}}{2}(x+15)\right).\end{eqnarray}$

We simulate equation (4 ) using the conventional spectral method to obtain the data. Specifically, starting from the initial condition(5 ), we use the Chebfun package [27] with a Fourier spatial discretization with 512 modes and a 4th-order explicit Runge–Kutta (RK) integrator with time-step size 1 × 10⁻⁴, and then integrate the equation up to the final instant t =5. The solution is saved every Δt =0.05 to give us totally 201 snapshots. We generate a smaller training dataset out of this data by randomly sub-sampling N_u =100 initial-boundary data and N_f =10 000 collocation points which are generated by the Latin hypercube sampling method [28].

Figure 1 demonstrates our result for the data-driven one-soliton solution to the KdV equation (4 ). Specifically, given a set of initial and boundary data points, we try to learn the latent solution u (t, x) by tuning all learnable parameters of the network using the loss function(3 ). The top panel of figure 1 compares between the exact solution and the predicted spatiotemporal solution. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 3.44 × 10⁻³ in a runtime of approximately three and half a minute. We can see a more detailed assessment in the bottom panel of figure 1 . We particularly present a comparison between the exact solution and the predicted solutions at different times t =−3.75, −1.25, 3.75. The algorithm accurately reconstructs the one-soliton solution to the KdV equation.

Figure 1.

New window|Download| PPT slide
Figure 1.The KdV equation. Top: a one-soliton solution to the KdV equation (left panel) is compared to the corresponding predicted solution to the learned equation (right panel). The network correctly captures the dynamics behavior and accurately reproduces the soliton solution with a relative ${{\mathbb{L}}}_{2}$ error of 3.44 × 10^–3 . Bottom: the comparison of the predicted and exact soliton solutions which correspond to the three temporal snapshots depicted by the white vertical lines in the top panel is presented.

From figure 2, we can observe the reconstructed single solitary wave motion better.

Figure 2.

New window|Download| PPT slide
Figure 2.The spatiotemporal behavior of a one-soliton solution to the learned KdV equation.

2.2. Two-soliton solutions

Many non-integrable equations also possess localized shape-preserving traveling waves that resemble soliton solutions. For example, it would be indistinguishable from a KdV soliton to single traveling wave solution of the wave equation expressed by

$ \begin{eqnarray*}{u}_{t}+{u}_{x}=0.\end{eqnarray*}$

However, only integrable ones have the universal property of possessing several exact multi-soliton solutions which reflect perfectly nonlinear elastic interactions between individual solitons. Thus, we now consider the two-soliton problem [26] as an example. Using certain similar analytical methods, the exact two-soliton solution to equation (4 ) is given by

$ \begin{eqnarray*}\begin{array}{rcl}u(t,x) & = & \displaystyle \frac{{c}_{1}}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{{c}_{1}}}{2}(x-{x}_{1}-{c}_{1}t)\right)\\ & & +\displaystyle \frac{{c}_{2}}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{{c}_{2}}}{2}(x-{x}_{2}-{c}_{2}t)\right),\end{array}\end{eqnarray*}$

where c₁ and c₂ denote the speeds of two individual solitons, respectively. From this expression, we know that the width of the soliton is inversely proportional to the square root of the wave speed for the KdV equation. Assuming

${c}_{1}\gg {c}_{2}$

without loss of generality, if such two solitons are well separated with the taller (and thus narrower) to the left of the shorter, then the taller soliton travels faster to the right and would interact nonlinearly and collide elastically with the shorter one [1, 29, 30].

As an example, an initial solution is given explicitly by(6)

$ \begin{eqnarray}{u}_{0}(x)=\displaystyle \frac{3}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{3}}{2}(x+15)\right)+\displaystyle \frac{1}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{1}{2}(x+3)\right).\end{eqnarray}$

Using the above same spectral method, starting from the initial condition(6 ), we use the Chebfun package [27] with a Fourier spatial discretization with 512 modes and a 4th-order explicit RK integrator with time-step size 1 × 10⁻⁴, and then integrate the equation up to the final instant t =5. The solution is saved every Δt =0.05 to give us totally 201 snapshots. We generate a smaller training dataset out of this data by randomly sub-sampling N_u =100 initial-boundary data and N_f =10 000 collocation points.

Figure 3 demonstrates the evolution of two KdV solitons with different amplitudes, which enables the unique determination of the governing equation [31]. Specifically, given a set of initial and boundary data points, we try to learn the unknown solution u (t, x) by training the network using the loss function(3 ). The top panel of figure 3 compares between the exact dynamics and the predicted solution. Initially, we have two clearly separated solitons. Then, they lose their identities in certain sort and merge into a composite structure during the interaction. Numerical simulations of the process show that a lower wave hump is formed in the interaction region. The result indicates that it is a nonlinear superposition of shifted counterparts which distinguishes from some other simple solitary traveling waves. After a while, these two solitons emerge from the interaction again. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 7.39% in a runtime of approximately half an hour. We can see a more detailed assessment of the predicted solution in the bottom panel of figure 3 . We present a comparison between the exact solutions and the predicted solutions at different instants t =−3.75, −1.25, 3.75. From the bottom of figure 3, we see that the wave patterns they produced match with the exact solutions well.

Figure 3.

New window|Download| PPT slide
Figure 3.The KdV equation. Top: a two-soliton solution to the KdV equation is compared to the corresponding predicted solution to the learned equation (right panel). The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 7.39 × 10⁻² . Bottom: the comparison of the predicted solutions and exact solutions which correspond to the three temporal snapshots is presented.

From figure 10, we can observe the elastic collision of two individual solitons with different amplitudes better.

Figure 4.

New window|Download| PPT slide
Figure 4.The spatiotemporal behavior of a two-soliton solution to the learned KdV equation.

In addition, if the speeds of these two solitons are close, i.e.

$0\lt \tfrac{{c}_{1}-{c}_{2}}{{c}_{1}+{c}_{2}}\ll 1$

, the solitons will exchange their sizes and speeds at certain much long distance and consequently avoid the collision [32]. For instance, we consider the initial condition(7)

$ \begin{eqnarray}\begin{array}{rcl}{u}_{0}(x) & = & \displaystyle \frac{1.01}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{\sqrt{1.01}}{2}(x+12)\right)\\ & & +\displaystyle \frac{1}{2}{{\rm{sech}} }^{2}\left(\displaystyle \frac{1}{2}(x-2)\right),\end{array}\end{eqnarray}$

and adopt the same data generation and sampling method.

Then, from figure 5, we can just observe that the two solitons never cross, but rather repulse each other at a long distance. However, the detailed process may be difficult to be observed numerically. See, e.g. [29, 33] for more analytical details.

Figure 5.

New window|Download| PPT slide
Figure 5.The KdV equation. Top: another two-soliton solution to the KdV equation (left panel) is compared to the predicted solution to the learned equation. The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 2.53 × 10⁻² . Bottom: the comparison of the predicted solutions and exact solutions is presented. The model training took about 7.5 min.

3. The mKdV equation

The mKdV equation, which can be regarded as the KdV equation with a cubic nonlinearity, is also an integrable model that possesses most of the properties of the KdV equation [34 –39] and even has a richer family of solutions including breathers. By the way, it can be obtained from the KdV equation by the Miura transformation.

3.1. One-soliton solution

First, we consider the one-soliton solution to the mKdV equation along with Dirichlet boundary conditions read as(8)

$ \begin{eqnarray}\left\{\begin{array}{l}{u}_{t}+6{u}^{2}{u}_{x}+{u}_{{xxx}}=0,x\in [-20,20],t\in [-5,5],\\ u({t}_{0},x)={u}_{0}(x)=\sqrt{3}{\rm{sech}} \left(\sqrt{3}(x+15)\right),\\ u(t,-20)=u(t,20)=0.\end{array}\right.\end{eqnarray}$

Obviously, we know that

${ \mathcal N }=-6{u}^{2}{u}_{x}-{u}_{{xxx}}$

in this case.

To obtain the training and testing data, we simulate equation (8 ) using the spectral method. Starting from the initial condition, we use the Chebfun package [27] with a Fourier spatial discretization with 512 modes and a 4th-order explicit RK integrator with time-step size 1 × 10⁻⁴, and then integrate the equation up to the final instant t =5. The solution is saved every Δt =0.05 to give us totally 201 snapshots. We generate a smaller training data subset by randomly sub-sampling N_u =100 initial and boundary data and N_f =10 000 collocation points.

Specifically, given a set of initial and boundary data, we attempt to parameterize the solution u (t, x) by training the network using the loss function(3 ). In figure 6, we graphically show the wave profile of a one-soliton solution to the the mKdV equation (8 ). The top panel of figure 6 compares between the exact dynamics and the predicted solution. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 4.57% in a runtime of about 13 minutes. From the viewpoint of training time, the mKdV equation is more complicated compared with the KdV equation obviously. We can see a more detailed assessment in the bottom panel of figure 6 . We present a comparison between the exact solutions and the predicted solutions at different points t =−3.75, −1.25, 3.75.

Figure 6.

New window|Download| PPT slide
Figure 6.The mKdV equation. Top: a one-soliton solution to the mKdV equation (left panel) is compared to the predicted solution to the learned equation. The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 4.57 × 10⁻² . Bottom: the comparison of the predicted solutions and exact solutions corresponding to the three temporal snapshots is given.

3.2. Breather solution

Now, we consider the breather solution, which is not only spatially localized but also time periodic, to the mKdV equation:(9)

$ \begin{eqnarray}\left\{\begin{array}{l}{u}_{t}+6{u}^{2}{u}_{x}+{u}_{{xxx}}=0,x\in [-20,20],t\in [-0.3,0.3],\\ u({t}_{0},x)={u}_{0}(x),\\ u(t,-20)=u(t,20)=0.\end{array}\right.\end{eqnarray}$

One could obtain the exact breather solution using some analytical methods [40]:

$ \begin{eqnarray*}\begin{array}{l}u(t,x)=2{\partial }_{x}\left[\arctan \left(\displaystyle \frac{\beta }{\alpha }\displaystyle \frac{\sin (\alpha (x+\delta t))}{\cosh (\beta (x+\gamma t))}\right)\right]\\ \qquad =\ 2\beta {\rm{sech}} (\beta (x+\gamma t))\\ \times \ \left[\displaystyle \frac{\cos (\alpha (x+\delta t))-(\beta /\alpha )\sin (\alpha (x+\delta t))\tanh (\beta (x+\gamma t))}{1+{\left(\beta /\alpha \right)}^{2}{\sin }^{2}(\alpha (x+\delta t)){{\rm{sech}} }^{2}(\beta (x+\gamma t))}\right],\end{array}\end{eqnarray*}$

with

$\delta ={\alpha }^{2}-3{\beta }^{2}$

and

$\gamma =3{\alpha }^{2}-{\beta }^{2}$

, where α and β are arbitrary constants.

When α =1.5 and β =1.0, we generate the data of 201 snapshots directly on the regular space–time grid every Δt =0.003. We generate a smaller training data subset scattered in space and time by randomly sub-sampling N_u =100 initial data and N_f =10 000 collocation points. Specifically, given a set of initial and boundary data, we try to learn the solution u (t, x) by training all learnable parameters of the network. Figure 7 demonstrates the evolution of the breather solution within about a time period(9 ). The top panel of figure 7 compares between the exact dynamics and the predicted solution. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 1.05% in a runtime of about 2.2 h. We can see a more detailed assessment in the bottom panel of figure 7 . We present a comparison between the exact solutions and the predicted solutions at different points t =−0.22, 0, 0.23. From figure 7, we observe that the model exactly reproduces the breather pattern.

Figure 7.

New window|Download| PPT slide
Figure 7.The mKdV equation. Top: a breather solution to the mKdV equation (left panel) is compared to the predicted solution to the learned equation. The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 1.05 × 10⁻² . Bottom: the comparison of the predicted solutions and exact solutions is presented.

4. The KdV–Burgers equation

The KdV–Burgers equation is often utilized for a large number of nonlinear systems because this model has damping and dispersion terms [41 –43]. Specifically, we consider the KdV–Burgers equation with Dirichlet boundary conditions given by(10)

$ \begin{eqnarray}\left\{\begin{array}{l}{u}_{t}+{{uu}}_{x}-\alpha {u}_{{xx}}-\beta {u}_{{xxx}}=0,x\in [-40,40],t\in [-5,5],\\ u({t}_{0},x)={u}_{0}(x),\\ u(t,-40)=u(t,40)=0,\end{array}\right.\end{eqnarray}$

where α and β are constants. In this case,

${ \mathcal N }=-{{uu}}_{x}\,+\alpha {u}_{{xx}}+\beta {u}_{{xxx}}$

.

The exact one-soliton solution, that is actually a kink, is obtained:

$ \begin{eqnarray*}u(t,x)=\displaystyle \frac{3{\alpha }^{2}}{25\beta }\left(2+2\tanh \displaystyle \frac{z}{2}-{{\rm{sech}} }^{2}\displaystyle \frac{z}{2}\right),\end{eqnarray*}$

with

$z=\tfrac{\alpha }{5\beta }\left(x-\tfrac{6{\alpha }^{2}}{25\beta }t\right)$

, where α and β are constants.

When α =1.0 and β =−1.0, we generate the data. In this case, we just sample the data on the regular space–time grid every Δt =0.05 and finally obtain totally 201 snapshots. Out of this data, we generate a smaller training data subset by randomly sub-sampling N_u =100 initial data and N_f =10 000 collocation points. Figure 8 summarizes our result for the kink solution to the KdV–Burgers equation. The top panel of figure 8 compares between the exact dynamics and the predicted spatiotemporal solution and the resulting prediction error is measured at 8.08 × 10⁻³ in the relative

${{\mathbb{L}}}_{2}$

-norm with a runtime of about one and half a minute. More detailed assessments are presented in the middle and bottom panels of figure 8 . Moreover, we present a comparison between the exact solutions and the predicted solutions at different time instants t =−3.75, −1.25, 3.75. This model can accurately capture the kink dynamics behavior of the KdV–Burgers equation.

Figure 8.

New window|Download| PPT slide
Figure 8.The KdV–Burgers equation. Top: a one-kink solution to the KdVB equation (left panel) is compared to the predicted solution to the learned equation. The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 8.08 × 10⁻³ . Bottom: the comparison of the predicted solutions and exact solutions is presented.

5. The STO equation

The STO equation has important applications in many scientific areas. It has been investigated using different analytic methods, such as the Cole–Hopf transformation and Hirota’s bilinear method. Here, we consider the STO equation [44] with the Dirichlet boundary condition given by:(11)

$ \begin{eqnarray}\left\{\begin{array}{l}{u}_{t}+3\alpha {u}_{x}^{2}+3\alpha {u}^{2}{u}_{x}+3\alpha {{uu}}_{{xx}}+\alpha {u}_{{xxx}}=0,x\in [-40,40],t\in [-5,5],\\ u({t}_{0},x)={u}_{0}(x),\\ u(t,-40)=a,u(t,40)=b,\end{array}\right.\end{eqnarray}$

where α is an arbitrary constant, and a, b are fixed values which can be easily obtained given an initial condition. In this case,

${ \mathcal N }=-3\alpha {u}_{x}^{2}-3\alpha {u}^{2}{u}_{x}-3\alpha {{uu}}_{{xx}}-\alpha {u}_{{xxx}}$

.

An exact soliton solution is obtained using certain method mentioned above:

$ \begin{eqnarray*}u(t,x)=\displaystyle \frac{{k}_{1}{e}^{{k}_{1}(x-\alpha {k}_{1}^{2}t)}+{k}_{2}{e}^{{k}_{2}(x-\alpha {k}_{2}^{2}t)}}{1+{e}^{{k}_{1}(x-\alpha {k}_{1}^{2}t)}+{e}^{{k}_{2}(x-\alpha {k}_{2}^{2}t)}}.\end{eqnarray*}$

5.1. Soliton fusion

The soliton fusion phenomenon is a resonance-like inelastic interaction where two or more solitons fuse into one single structure or less solitons, that is to say, the total number of solitons is not conserved.

Specifically, when α =1.0 and k₁ =−1.8, k₂ =1.0, we obtain the solution data. In this case, we sample the data on the regular space–time grid every Δt =0.05 and finally obtain totally 201 snapshots. Out of this data, we generate a smaller training data subset by randomly sub-sampling N_u =100 initial-boundary data and N_f =10 000 collocation points. Specifically, given a set of initial and boundary data, we try to learn the solution u (t, x) by tuning all parameters of the network. Figure 9 graphically shows the evolution of the soliton fusion phenomena of the the STO equation (11 ). The top panel of figure 9 compares between the exact dynamics and the predicted spatiotemporal solution. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 1.61% in a runtime of approximately 10 minutes. More detailed assessments are presented in the middle and bottom panels of figure 9 . We present a comparison between the exact solutions and the predicted solutions at different time points t =−3.75, −1.25, 3.75.

Figure 9.

New window|Download| PPT slide
Figure 9.The soliton fusion phenomenon of the STO equation. Top: a solution to the STO equation (left panel) is compared to the predicted solution to the learned equation. The model correctly exhibits the dynamics behavior and accurately reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 1.61 × 10⁻² . Middle: the comparison of the predicted solutions and exact solutions is presented. Bottom: the comparison of the corresponding predicted solutions and exact solutions of the potential −u_x is also given.

Figure 10.

New window|Download| PPT slide
Figure 10.The soliton fusion pattern of the STO equation. (a) The spatiotemporal behavior of the reconstructed solution; (b) the spatiotemporal dynamics of the corresponding potential.

From figure 4, we can more clearly observe that two solitons with different speeds fuse into a single soliton with a larger amplitude.

5.2. Soliton fission

Now, we consider a sort of inverse of the fusion process, namely, one or several solitons may crack into two or more solitons.

Note that, we reset

$x\in [-60,20]$

and

$t\in [0,4]$

in this case. When α =−1.0 and k₁ =−1.8, k₂ =−1.0, we obtain the data. In this case, we just sample the data on the regular grid every Δt =0.008 from t =0 up to the final instant t =4 and finally obtain totally 501 snapshots. Out of this data, we generate a smaller training data subset by randomly sub-sampling N_u =200 initial-boundary data and N_f =20 000 collocation points.

For this soliton fission case, the sin(x) activation is often not good, thus we choose the

$\tanh (x)$

function as the activation. Specifically, given a set of initial and boundary data, we try to fit the solution u (t, x) by training the network using the loss function(3 ). Figure 11 graphically shows the evolution of the soliton fission process of the the STO equation (11 ). The top panel of figure 11 compares between the exact dynamics and the predicted spatiotemporal solution. The model achieves a relative

${{\mathbb{L}}}_{2}$

error of size 2.41% in a runtime of approximately 11 min. More detailed assessments are presented in the middle and bottom panels of figure 11 . We present a comparison between the exact solutions and the predicted solutions at different points t =0.5, 1.5, 3.5.

Figure 11.

New window|Download| PPT slide
Figure 11.The soliton fission phenomenon of the STO equation. Top: a solution to the STO equation (left panel) is compared to the predicted solution to the learned equation. The model approximately exhibits the dynamics behavior and reproduces the solution with a relative ${{\mathbb{L}}}_{2}$ error of 2.41 × 10⁻² . Middle: the comparison of the predicted solutions and exact solutions is presented. Bottom: the comparison of the corresponding predicted solutions and exact solutions of the potential is also given.

This model approximately reconstructs the exact solution from the coarse-grained sampled data. However, from the middle and bottom panels of figure 11, it obviously can not exhibit the vicinity of wave humps well. One could devise more sophisticated sampling strategies to enable adaptive refinement, for instance, by tracking the curvature of the solution. This will be further investigated in the future research.

6. Remarks and discussion

Deep learning offers a quite different approach for modeling these dynamical behaviors by using the training data to parameterize the solution manifold itself; in other words, it learns both the intrinsic features and their interactions from data collected from experiments and simulations. In this paper, we present a neural network framework for extracting soliton dynamics of evolution equations from the spatiotemporal data. The framework provides a universal treatment of (1+1)-dimensional third-order nonlinear evolution equations. Specifically, we outline how different categories of soliton solutions (e.g. general soliton solutions, breathers and kinks) to the equations come about due to different choices of initial and boundary data. The results show that the model could recover the different soliton behaviors of these equations fairly well.

Note that a low loss value is a necessary but not sufficient condition for stable training and accurate prediction. For the soliton fission case in the previous section, in particular, the model with low training loss exhibits relatively poor stability and prediction result. In addition, soliton behaviors under certain small perturbations have been studied to some extent [30, 45, 46]. Correspondingly, it is very interesting to extend to the stability of solitons with training the neural network with noisy data. These remain important areas of exploration for future work.

Acknowledgments

The first author would like to express his sincere thanks to Tao Xu for his valuable comments and excellent suggestions on this work. The authors gratefully acknowledge the support of the National Natural Science Foundation of China (No. 11675054), the Shanghai Collaborative Innovation Center of Trustworthy Software for Internet of Things (Grant No. ZF1213) and the Science and Technology Commission of Shanghai Municipality (No. 18dz2271000).

Reference By original order
By published year
By cited within times
By Impact factor

[1]

Zabusky

N J

Kruskal

M D

1965 Phys. Rev. Lett. 15 240 243 240–3

DOI:10.1103/PhysRevLett.15.240 [Cited within: 2]

[2]

Craig

Guyenne

Hammack

Henderson

Sulem

2006 Phys. Fluids 18 057106

DOI:10.1063/1.2205916 [Cited within: 1]

[3]

Bongard

Lipson

2007 Proc. Natl Acad. Sci. USA 104 9943 9948 9943–8

DOI:10.1073/pnas.0609476104 [Cited within: 1]

[4]

Raissi

Perdikaris

Karniadakis

G E

2017 J. Comput. Phys. 348 683 693 683–93

DOI:10.1016/j.jcp.2017.07.050

[5]

Raissi

Karniadakis

G E

2018 J. Comput. Phys. 357 125 141 125–41

DOI:10.1016/j.jcp.2017.11.039 [Cited within: 1]

[6]

Lagaris

I E

Likas

Fotiadis

D I

1998 IEEE Trans. Neural Networks 9 987 1000 987–1000

DOI:10.1109/72.712178 [Cited within: 1]

[7]

Yadav

Kumar

2015 An Introduction to Neural Network Methods for Differential Equations Berlin Springer

[8]

Sirignano

Spiliopoulos

2018 J. Comput. Phys. 375 1339 1364 1339–64

DOI:10.1016/j.jcp.2018.08.029

[9]

Han

Jentzen

Weinan

2018 Proc. Natl Acad. Sci. USA 115 8505 8510 8505–10

DOI:10.1073/pnas.1718942115

[10]

Bar-Sinai

Hoyer

Hickey

Brenner

M P

2019 Proc. Natl Acad. Sci. USA 116 15344 15349 15344–9

DOI:10.1073/pnas.1814058116 [Cited within: 1]

[11]

Raissi

2018 J. Mach. Learn. Res. 19 932 955 932–55

[Cited within: 1]

[12]

Raissi

Perdikaris

Karniadakis

G E

2019 J. Comput. Phys. 378 686 707 686–707

DOI:10.1016/j.jcp.2018.10.045

[13]

Meng

Mao

Karniadakis

G E

2019 DeepXDE: A deep learning library for solving differential equations
(arXiv:1907.04502 )

[Cited within: 1]

[14]

Abadi

et al. 2016 12th USENIX Symp. on Operating Systems Design and Implementation vol 16 265 283 pp 265–83

[Cited within: 1]

[15]

Baydin

A G

Pearlmutter

B A

Radul

A A

Siskind

J M

2018 J. Mach. Learn. Res. 18 1 43 1–43

[Cited within: 1]

[16]

Choromanska

Henaff

Mathieu

Arous

G B

Lecun

2015 Proc. 18 Int. Conf. on Artificial Intelligence and Statistics, PMLR vol 38 192 204 pp 192–204

[Cited within: 1]

[17]

Raghu

Poole

Kleinberg

Ganguli

Sohl-Dickstein

2017 Proc. 34th Int. Conf. on Machine Learning, PMLR vol 70 2847 2854 pp 2847–54

[Cited within: 1]

[18]

Liu

D C

Nocedal

1989 Math. Program. 45 503 528 503–28

DOI:10.1007/BF01589116 [Cited within: 1]

[19]

Kingma

D P

2015

Int. Conf. on Learning Representations (ICLR)

[Cited within: 1]

[20]

Korteweg

D J

de Vries

1895 Phil. Mag. 539 422 443 422–43

DOI:10.1080/14786449508620739 [Cited within: 1]

[21]

Gardner

C S

Greene

J M

Kruskal

M D

Miura

R M

1967 Phys. Rev. Lett. 19 1095 1097 1095–7

DOI:10.1103/PhysRevLett.19.1095 [Cited within: 1]

[22]

Hirota

1971 Phys. Rev. Lett. 27 1192 1194 1192–4

DOI:10.1103/PhysRevLett.27.1192 [Cited within: 1]

[23]

Bona

J L

Smith

1975 Phil. Trans. R. Soc. A 278 555 601 555–601

DOI:10.1098/rsta.1975.0035

[24]

Eckhaus

Schuur

1983 Math. Methods Appl. Sci. 5 97 116 97–116

DOI:10.1002/mma.1670050108 [Cited within: 1]

[25]

Wadati

Toda

1972 J. Phys. Soc. Japan 32 1403 1411 1403–11

DOI:10.1143/JPSJ.32.1403 [Cited within: 1]

[26]

Gardner

C S

Greene

J M

Kruskal

M D

Miura

R M

1974 Commun. Pure Appl. Math. 27 97 133 97–133

DOI:10.1002/cpa.3160270108 [Cited within: 2]

[27]

Driscoll

T A

Hale

Trefethen

L N

2014 Chebfun Guide Oxford Pafnuty Publications

[Cited within: 3]

[28]

Stein

M L

1987 Technometrics 29 143 151 143–51

DOI:10.1080/00401706.1987.10488205 [Cited within: 1]

[29]

Lax

P D

1968 Commun. Pure Appl. Math. 21 467 490 467–90

DOI:10.1002/cpa.3160210503 [Cited within: 2]

[30]

Tao

2009 Bull. Am. Math. Soc. 46 1 33 1–33

DOI:10.1090/S0273-0979-08-01228-7 [Cited within: 2]

[31]

Rudy

S H

Brunton

S L

Proctor

J L

Kutz

J N

2017 Sci. Adv. 3 e1602614

DOI:10.1126/sciadv.1602614 [Cited within: 1]

[32]

Martel

2019

Proc. Int. Congress of Mathematicians (ICM 2018)

2439 2466 pp 2439–66

[Cited within: 1]

[33]

LeVeque

1987 SIAM J. Appl. Math. 47 254 262 254–62

DOI:10.1137/0147017 [Cited within: 1]

[34]

Wadati

1972 J. Phys. Soc. Japan 32 1681

DOI:10.1143/JPSJ.32.1681 [Cited within: 1]

[35]

Hirota

1972 J. Phys. Soc. Japan 33 1456 1458 1456–8

DOI:10.1143/JPSJ.33.1456

[36]

Wadati

1973 J. Phys. Soc. Japan 34 1289 1296 1289–96

DOI:10.1143/JPSJ.34.1289

[37]

Fonseca

Linares

Ponce

1999 Commun. PDE 24 683 705 683–705

DOI:10.1080/03605309908821438

[38]

Hayashi

Naumkin

2001 Math. Phys. Anal. Geom. 4 197 201 197–201

DOI:10.1023/A:1012953917956

[39]

Germain

Pusateri

Rousset

2016 Adv. Math. 299 271 330 271–330

DOI:10.1016/j.aim.2016.04.023 [Cited within: 1]

[40]

Alejo

M A

Muñoz

2013 Commun. Math. Phys. 324 233 262 233–62

DOI:10.1007/s00220-013-1792-0 [Cited within: 1]

[41]

Johnson

R S

1970 J. Fluid Mech. 42 49 60 49–60

DOI:10.1017/S0022112070001064 [Cited within: 1]

[42]

Canosa

Gazdag

1977 J. Comput. Phys. 23 393 403 393–403

DOI:10.1016/0021-9991(77)90070-5

[43]

Ahmad

Seadawy

A R

Khan

T A

2020 Phys. Scr. 95 045210

DOI:10.1088/1402-4896/ab6070 [Cited within: 1]

[44]

Wang

Tang

Lou

2004 Chaos Solitons Fractals 21 231 239 231–9

DOI:10.1016/j.chaos.2003.10.014 [Cited within: 1]

[45]

Benjamin

T B

1972 Proc. R. Soc. A 328 153 183 153–83

DOI:10.1098/rspa.1972.0074 [Cited within: 1]

[46]

Bona

1975 Proc. R. Soc. A 344 363 374 363–74

DOI:10.1098/rspa.1975.0106 [Cited within: 1]