Abstract In this paper, based on physics-informed neural networks (PINNs), a good deep learning neural network framework that can be used to effectively solve the nonlinear evolution partial differential equations (PDEs) and other types of nonlinear physical models, we study the nonlinear Schrödinger equation (NLSE) with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential, which is an important physical model in many fields of nonlinear physics. Firstly, we choose three different initial values and the same Dirichlet boundary conditions to solve the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential via the PINN deep learning method, and the obtained results are compared with those derived by the traditional numerical methods. Then, we investigate the effects of two factors (optimization steps and activation functions) on the performance of the PINN deep learning method in the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. Ultimately, the data-driven coefficient discovery of the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential or the dispersion and nonlinear items of the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential can be approximately ascertained by using the PINN deep learning method. Our results may be meaningful for further investigation of the nonlinear Schrödinger equation with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential in the deep learning. Keywords:nonlinear Schrödinger equation;generalized PT-symmetric scarf-II potential;physics-informed neural networks;deep learning;initial value and dirichlet boundary conditions;data-driven coefficient discovery
PDF (1041KB)MetadataMetricsRelated articlesExportEndNote|Ris|BibtexFavorite Cite this article Jiaheng Li, Biao Li. Solving forward and inverse problems of the nonlinear Schrödinger equation with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential via PINN deep learning. Communications in Theoretical Physics, 2021, 73(12): 125001- doi:10.1088/1572-9494/ac2055
1. Introduction
With the rapid development of modern science and technology, many scientific fields and real social lives provide us the exponential growth of data. At present and even for a long time in the future, how to extract effective information from these huge amounts of data has become a major and significant challenge. Recently, some scholars have used machine learning (deep learning) and data analysis to deal with these huge amounts of data, and good results have been achieved in some fields including but not limited to data mining, image recognition, cognitive science, genes, product recommendation systems, natural language processing (NLP), automatic quantitative driving, stock market trading [1–6] and solving equations [7–19]. Different scientific fields, more or less, may have some unique challenges, but how to obtain good results with a small amount of data or with data with noise error has long been a common problem in all fields, but it has not been effectively solved. As is well known, it is critical to use accurate and adequate data in deep learning training, and the lack of sample data or using error data leads to poor robustness, which also usually makes the results of deep learning unsatisfactory. In addition, for solving higher dimensional nonlinear partial differential equations (PDEs) [18, 19], conventional numerical solutions have been a longstanding challenge, and finite difference methods become infeasible in higher dimensions due to the explosion in the number of grid points and the demand for reduced time step size. Therefore, these limitations urge us to develop better deep learning neural network frameworks to improve the accuracy and optimize the robustness for solving those linear or nonlinear mathematical physical models.
In recent years, thanks to the continuous research of some scholars, some new deep learning neural network frameworks [16, 18] and their optimizations [20–22] for solving PDEs have been put forward. The prospect of using structured prior information to build data-efficient and physical information machine learning models has been preliminarily demonstrated [9–11]. Using Gaussian process regression to design a functional representation for a given linear operator allows us to accurately infer solutions and provide uncertainty estimates for several classical problems in mathematical physics [12]. An extension of the nonlinear problem involving inference and system identification has been proposed [13, 14]. Despite the flexibility and mathematical elegance of Gaussian processes in encoding prior information, the treatment of nonlinear problems introduces two important limitations. Firstly, the users have to locally linearize any potential nonlinear term in time, thus limiting the applicability of the proposed method in the discrete time domain and affecting the accuracy of the prediction in the strongly nonlinear region. Secondly, the Bayesian nature of Gaussian process regression requires a certain priori assumption, which may limit the model's representation ability and cause robustness problems, especially for nonlinear problems. However, the PINN deep learning method [16] does a good job of avoiding these limitations. Additionally, it is worth noting that, based on some excellent properties of PINNs, some interesting results have been achieved by some scholars, mainly on nonlinear wave solutions of nonlinear partial differential equations [23–29].
Generally, the PINN deep learning method can be used to learn the latent (hidden) solution $\hat{w}(t,x)$ of the nonlinear evolution PDEs of the following general form [16]:$\begin{eqnarray}{w}_{t}-{ \mathcal N }[x,w;\lambda ]=0,\quad (t,x)\in [0,T]\times {\rm{\Omega }}.\end{eqnarray}$Specifically, in this paper, we consider the nonlinear evolution PDEs of the general form with initial value and boundary conditions as follows:$\begin{eqnarray}\left\{\begin{array}{ll}{\rm{i}}{w}_{t}={ \mathcal N }[x,w;{\lambda }_{0}], & (t,x)\in [0,T]\times {\rm{\Omega }},\\ { \mathcal I }[w(t,x)]{| }_{t=0}={w}_{{ \mathcal I }}(x), & x\in {\rm{\Omega }}\ ({\rm{initial}}\ {\rm{value}}\ {\rm{conditions}}),\\ { \mathcal B }[w(t,x)]{| }_{x\in \partial {\rm{\Omega }}}={w}_{{ \mathcal B }}(t), & t\in [0,T]\ ({\rm{boundary}}\ {\rm{conditions}}),\end{array}\right.\end{eqnarray}$where 0 and T represent the lower and upper boundaries of the time variable t, respectively, ω stands for the range of spatial variable x, ∂ω is the boundary of the spatial domain ω, and the latent solution $\hat{w}(t,x)$ is of course unknown. ${ \mathcal N }[\bullet ;{\lambda }_{0}]$ is the combination of linear and nonlinear operators parameterized by the initial vector λ0, ${ \mathcal I }[\bullet ]$ and ${ \mathcal B }[\bullet ]$ are the initial value and boundary operators, ${ \mathcal I }[w(t,x)]{| }_{t\,=\,0}={w}_{{ \mathcal I }}(x)$ and ${ \mathcal B }[w(t,x)]{| }_{x\in \partial {\rm{\Omega }}}={w}_{{ \mathcal B }}(t)$ stand for the initial value conditions and the boundary conditions, respectively. In this paper, we set a complex-valued (multi-output) physics model f(t, x) as follows:$\begin{eqnarray}f(t,x)={\rm{i}}{w}_{t}-{ \mathcal N }[x,w;{\lambda }_{0}].\end{eqnarray}$
Using the automatic differentiation technique [30], a derivative technique based on the chain rule is widely used for the back propagation (BP) [31] of the feed-forward neural networks, the derivatives of the time variable t and space variable x of the latent solution $\hat{w}(t,x)$ in the neural networks can be quickly obtained. It is worth noting that this technique of calculating derivatives is superior to the traditional numerical or symbolic differentiation. To ensure that the BP, automatic differentiation and related operations in the complex-valued PINN deep learning method can perform well, we use TensorFlow [32], a relatively mature, mainstream deep learning neural network scientific computing library with open source code. Based on the experimental results (shown in Section 4.2) of deep learning, we select the hyperbolic tangent (tanh) as our nonlinear activation function as follows:$\begin{eqnarray}{H}_{j}=\tanh ({\omega }_{j}\cdot {H}_{j-1}+{b}_{j}),\end{eqnarray}$where the weight ωj is a dim(Hj) × dim(Hj−1) matrix, the returned matrix Hj ∈ Rn, the bias bj is a dim(Hj) vector, and the subscript j is a positive integer that j = 1, 2, 3, ⋯ ,n. Moreover, the loss function of the latent solution $\hat{w}(t,x)$ can be found by minimizing the mean squared error loss:$\begin{eqnarray}\begin{array}{rcl}L(\hat{w})&=&\displaystyle \frac{1}{{N}_{{ \mathcal I }}}\sum _{j=1}^{{N}_{{ \mathcal I }}}{\left|\ { \mathcal I }[\hat{w}({t}_{{ \mathcal I }},{x}_{{ \mathcal I }}^{j})]{| }_{{t}_{{ \mathcal I }}=0}-{w}_{{ \mathcal I }}({x}_{{ \mathcal I }}^{j})\ \right|}^{2}\\ & & +\ \displaystyle \frac{1}{{N}_{{ \mathcal B }}}\sum _{j=1}^{{N}_{{ \mathcal B }}}{\left|\ { \mathcal B }[\hat{w}({t}_{{ \mathcal B }}^{j},{x}_{{ \mathcal B }})]{| }_{{x}_{{ \mathcal B }}\in \partial {\rm{\Omega }}}-{w}_{{ \mathcal B }}({t}_{{ \mathcal B }}^{j})\ \right|}^{2}\\ & & +\ \displaystyle \frac{1}{{N}_{{ \mathcal C }}}\sum _{j=1}^{{N}_{C}}{\left|\ f({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j})\ \right|}^{2},\end{array}\end{eqnarray}$where $\{{x}_{{ \mathcal I }}^{j},{w}_{{ \mathcal I }}^{j}\}{}_{j=1}^{{N}_{{ \mathcal I }}}$ and $\{{t}_{{ \mathcal B }}^{j},{w}_{{ \mathcal B }}^{j}\}{}_{j=1}^{{N}_{{ \mathcal B }}}$ stand for the initial and boundary value training sets, respectively, and $\{{t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j},f({x}_{{ \mathcal C }}^{j},{t}_{{ \mathcal C }}^{j})\}{}_{j=1}^{{N}_{{ \mathcal C }}}$ represents the collocations points at f(t, x). The loss function $L(\hat{w})$ denotes how well the latent solution $\hat{w}(t,x)$ satisfies the differential operator, initial value conditions, boundary conditions and the punishment for the PDE not being satisfied on the collocation points. In addition, all sample points are generated by the classical space-filling Latin hypercube sampling (LHS) technique [33], and the optimization method for all loss functions is L − BFGS algorithm [34], which is a mainstream full-batch gradient descent optimization method. Using various techniques and strategies, our ultimate goal is to obtain the latent functions $\hat{w}(t,x)$ when the loss function $L(\hat{w})$ is as close to zero as possible.
In summary, from what has been described above, the main steps of the PINN deep learning schemes [16] for solving the nonlinear evolution PDEs with initial value and boundary conditions can be broadly composed of three ingredients: define the architecture of the Feed-forward neural network by setting its depth (number of layers), width (number of neurons in per layer) and activation function (4); prepare the initial value and boundary conditions as the training data sets and select random collocation points sets by the space-filling LHS algorithm [33]; introduce and minimize the loss function L($\hat{w}$) (5) to determine the optimal arguments, including the weight and bias $(\hat{\omega },\hat{b})$, using the L − BFGS algorithm [34] and the BP with the automatic differentiation technique [30] after initializing all data sets via Xavier initialization [35] firstly.
Since the ${ \mathcal P }{ \mathcal T }$-symmetric was proposed from quantum mechanics [36], the study of linear and nonlinear physical models with ${ \mathcal P }{ \mathcal T }$-symmetric has been continuous [37–43]. Recently, the investigation of nonlinear wave equations with ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential has been an important subject [38, 40, 41]. To our knowledge, the deep learning methods has not been used to study nonlinear wave equations with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. Therefore, in this paper, we will investigate the nonlinear Schrödinger equation (NLSE) with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential via the PINN deep learning method.
The rest of the paper is arranged as follows. In Section 2, we introduce the schemes for solving the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential via the PINN deep learning method. In Section 3, we use the PINNs to deeply study the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential by setting diverse initial value conditions and same Dirichlet boundary conditions. In Section 4, we investigate the effects of two factors (optimization steps and activation functions) on the performance of the PINN deep learning method in the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. In Section 5, we use the PINN deep learning method to obtain the data-driven coefficient discovery of the PT-symmetric Scarf-II potential or the dispersion and nonlinear terms of the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. Finally, our conclusions and discussions are presented in Section 6.
2. The scheme of the PINNs for solving the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential
The nonlinear Schrödinger equation (NLSE) plays an important role both in the integrable system theory and many physical fields, including but not limit to matter physics, plasma physics, and nonlinear optics [44–47]. In 1997, Bang et al found bright spatial solitons in defocusing Kerr media supported by cascaded nonlinearities [48]. Then, Musslimani et al studied the optical solitons in ${ \mathcal P }{ \mathcal T }$ periodic potentials [38], Shi et al obtained the bright spatial solitons in defocusing Kerr media with ${ \mathcal P }{ \mathcal T }$-symmetric potentials [40] and Yan et al investigated the spatial solitons and stability in self-focusing and defocusing Kerr nonlinear media with the generalized parity-time-symmetric Scarf-II potentials [41]. The NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potentials as follows [42]:$\begin{eqnarray}{\rm{i}}\displaystyle \frac{\partial \psi }{\partial t}=-\left[m\displaystyle \frac{{\partial }^{2}}{\partial {x}^{2}}+V(x)+{\rm{i}}W(x)+g| \psi {| }^{2}\right]\psi ,\end{eqnarray}$where ψ = ψ(t, x) denotes a complex field, m is a non-zero constant mass, g characterizes the self-focusing (g > 0) or defocusing (g < 0) Kerr nonlinearity, respectively. V(x) and W(x) are the real and imaginary components of the complex ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential where V(x) is an even function and W(x) is odd. Physically, V(x) is associated with index guiding while W(x) represents the gain or loss distribution of the optical potential [40].
In this paper, we use the complex-valued PINN deep learning method shown above to study the data-driven solutions of the NLSE with the complex generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and the initial boundary value conditions:$\begin{eqnarray}\begin{array}{l}{\rm{i}}{\psi }_{t}=-{\psi }_{{xx}}-[V(x)+{\rm{i}}W(x)]\psi -g| \psi {| }^{2}\psi ,\\ (t,x)\in [0,T]\times {\rm{\Omega }},\\ \psi (0,x)={\psi }_{{ \mathcal I }}(x),\\ x\in {\rm{\Omega }}\ ({\rm{initial}}\ {\rm{value}}\ {\rm{conditions}}),\\ \psi (t,-x)=\psi (t,x),\\ (t,x)\in [0,T]\times \partial {\rm{\Omega }}\ ({\rm{boundary}}\ {\rm{conditions}}),\end{array}\end{eqnarray}$where ψ = ψ(t, x) is a complex field, V(x) and W(x) are real-valued functions of space variable x. Based on the previous description, the real and the imaginary components of the ${ \mathcal P }{ \mathcal T }$-symmetric potential satisfy the following relations V(−x) = V(x), W(−x) = −W(−x), respectively. Additionally, we chose the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential, which is important in the nonlinear optical beam dynamics in the ${ \mathcal P }{ \mathcal T }$-symmetric complex potential [38].$\begin{eqnarray}\begin{array}{l}V(x)={V}_{0}{{\rm{sech}} }^{2}(x),\\ W(x)={W}_{0}{\rm{sech}} (x)\tanh (x),\end{array}\end{eqnarray}$where V0 and W0 being the amplitudes of the real and imaginary part, V0 and W0 have a constraint condition with ${W}_{0}\leqslant {V}_{0}+\tfrac{1}{4}$. The coefficient of the Kerr nonlinearity g can be chosen as g = ± 1. Moreover, either the ${ \mathcal P }{ \mathcal T }$-symmetric complex potential or the constant g both have important physical significance to equation (7).
Because $\hat{\psi }(t,x)$ is a latent complex-valued solution of equation (7), we need to set the $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$, where the $\hat{u}(t,x)$ and $\hat{v}(t,x)$ are real-valued functions of t and x, and real and imaginary parts of $\hat{\psi }(t,x)$, respectively. We use the complex-valued PINNs f(t, x) (3), then let f(t, x) = ifu(t, x) − fv(t, x) where fu(t, x) and − fv(t, x) stand for the imaginary and real parts, respectively, and f(t, x), fu(t, x), fv(t, x) satisfy$\begin{eqnarray}\begin{array}{l}f(t,x)={\rm{i}}{\hat{\psi }}_{t}+{\hat{\psi }}_{{xx}}\\ +[V(x)+{\rm{i}}W(x)]\hat{\psi }+g| \hat{\psi }{| }^{2}\hat{\psi },\\ {f}_{u}(t,x)={\hat{u}}_{t}+{\hat{v}}_{{xx}}\\ +W(x)\hat{u}+V(x)\hat{v}+g({\hat{u}}^{2}+{\hat{v}}^{2})\hat{v},\\ {f}_{v}(t,x)={\hat{v}}_{t}-{\hat{u}}_{{xx}}\\ -V(x)\hat{u}+W(x)\hat{v}-g({\hat{u}}^{2}+{\hat{v}}^{2})\hat{u},\end{array}\end{eqnarray}$and proceed by placing a complex-valued (multi-out) deep neural network prior on ψ(t, x) = [u(t, x), v(t, x)]. To this end, [u(t, x), v(t, x)] can be simply defined as follows.
# To obtain the real and imaginary parts of psi(t, x)=[u(t, x), v(t, x)]
Correspondingly, the complex-valued (multi-out) PINNs f(t, x) = [fu(t, x), fv(t, x)] takes the form
# To obtain the real and imaginary parts of f(t, x)=[f_u, f_v]
def f_uv(t, x, W, V, g):
u, v=psi_uv(t, x)
u_t=tf.gradients(u, t)[0]
u_x=tf.gradients(u, x)[0]
u_xx=tf.gradients(u_x, x)[0]
v_t=tf.gradients(v, t)[0]
v_x=tf.gradients(v, x)[0]
v_xx=tf.gradients(v_x, x)[0]
f_u=u_t + v_xx + W ∗ u+V ∗ v+g ∗ (u ∗∗ 2 + v ∗∗ 2) ∗ v
f_v=v_t-u_-V ∗ u+W ∗ v-g ∗ (u ∗∗2 + v ∗∗ 2) ∗ u
return f_u, f_v
The shared parameters of the complex-valued PINNs f(t, x) and latent solution $\hat{\psi }(t,x)$ can be learned by minimizing the loss function (5)$\begin{eqnarray}L(\hat{\psi })={L}_{{ \mathcal I }}(\hat{\psi })+{L}_{{ \mathcal B }}(\hat{\psi })+{L}_{{ \mathcal C }}(\hat{\psi }),\end{eqnarray}$where the ${L}_{{ \mathcal I }},{L}_{{ \mathcal B }},{L}_{{ \mathcal C }}$ are defined as shown below:$\begin{eqnarray*}\begin{array}{l}{L}_{{ \mathcal I }}(\hat{\psi })=\displaystyle \frac{1}{{N}_{{ \mathcal I }}}\sum _{j=1}^{{N}_{{ \mathcal I }}}{\left|{ \mathcal I }[\hat{\psi }({t}_{{ \mathcal I }},{x}_{{ \mathcal I }}^{j})]{| }_{{t}_{{ \mathcal I }}=0}-{\psi }_{{ \mathcal I }}({x}_{{ \mathcal I }}^{j})\right|}^{2}\\ =\displaystyle \frac{1}{{N}_{{ \mathcal I }}}\sum _{j=1}^{{N}_{{ \mathcal I }}}\left({\left|\hat{u}(0,{x}_{0}^{j})-{u}_{0}({x}_{0}^{j})\right|}^{2}+{\left|\hat{v}(0,{x}_{0}^{j})-{v}_{0}({x}_{0}^{j})\right|}^{2}\right),\\ {L}_{{ \mathcal B }}(\hat{\psi })=\displaystyle \frac{1}{{N}_{{ \mathcal B }}}\sum _{j=1}^{{N}_{{ \mathcal B }}}{\left|{ \mathcal B }[\hat{\psi }({t}_{{ \mathcal B }}^{j},{x}_{{ \mathcal B }})]{| }_{{x}_{{ \mathcal B }}\in \partial {\rm{\Omega }}}-{\psi }_{{ \mathcal B }}({x}_{{ \mathcal B }}^{j})\right|}^{2}\\ =\displaystyle \frac{1}{{N}_{{ \mathcal I }}}\sum _{j=1}^{{N}_{{ \mathcal I }}}\left({\left|\hat{u}({t}_{{ \mathcal B }}^{j},-x)-\hat{u}({t}_{{ \mathcal B }}^{j},x)\right|}_{x\in \partial {\rm{\Omega }}}^{2}+\left|\hat{v}({t}_{{ \mathcal B }}^{j},-x)\right.\right.\\ \left.{\left.-\hat{v}({t}_{{ \mathcal B }}^{j},x)\right|}_{x\in \partial {\rm{\Omega }}}^{2}\right),\\ {L}_{{ \mathcal C }}(\hat{\psi })=\displaystyle \frac{1}{{N}_{{ \mathcal C }}}\sum _{j=1}^{{N}_{{ \mathcal C }}}{\left|f({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j})\right|}^{2}\\ =\displaystyle \frac{1}{{N}_{{ \mathcal C }}}\sum _{j=1}^{{N}_{{ \mathcal C }}}\left({\left|{f}_{u}({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j})\right|}^{2}+{\left|{f}_{v}({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j})\right|}^{2}\right),\end{array}\end{eqnarray*}$with $\hat{\psi }(t,x)$ standing for the approximate latent solution of equation (7), ${\left\{{x}_{{ \mathcal I }}^{j},{u}_{0}^{j},{v}_{0}^{j}\right\}}_{j=1}^{{N}_{{ \mathcal I }}}$ denotes the initial data sets (${\psi }_{{ \mathcal I }}$(x) = u0(x) + iv0(x)), ${\left\{{t}_{{ \mathcal B }}^{j},u{(\pm x,{t}_{{ \mathcal B }}^{j})}_{x\in \partial {\rm{\Omega }}},v{(\pm x,{t}_{{ \mathcal B }}^{j})}_{x\in \partial {\rm{\Omega }}}\right\}}_{j\,=\,1}^{{N}_{{ \mathcal B }}}$ denotes the boundary data sets, $\left\{{t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j},{f}_{u}({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j}),{f}_{v}({t}_{{ \mathcal C }}^{j},{x}_{{ \mathcal C }}^{j})\right\}$ stands for the collocation points at f(t, x). Consequently, ${L}_{{ \mathcal I }}(\hat{\psi })$ and ${L}_{{ \mathcal B }}(\hat{\psi })$ represent the loss on the initial and boundary data sets, respectively, and ${L}_{{ \mathcal C }}(\hat{\psi })$ penalizes the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential not being satisfied on the collocation points.
Before training the complex-valued PINNs, we need to generate and initialize the training data sets for the initial and boundary value conditions by using spectral Fourier discretization with 256 modes and a fourth-order explicit Runge–Kutta temporal integrator with 201 temporal sampling points at the same space/time interval, and select collection points at ψ(t, x) = u(t, x) + iv(t, x) with a 201 × 256 matrix as inputting data sets. All sample points can be generated by a space-filling LHS algorithm. Moreover, all the codes in this paper are based on Python 3.7 and Tensorflow 1.14, and these numerical experiments run on the ACER Aspire E5-571G laptop with 2.20 GHz 4-cores i5-5200U CPU.
3. Data-driven solutions of the NLSE with the different initial value and same Dirichlet boundary conditions
In this section, we will use the PINN deep learning method described above to study the soliton solutions of the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential under the different initial value and same Dirichlet boundary conditions.
3.1. The initial value condition comes from optical soliton
We can obtain the exact optical soliton from equation (7) with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential (8):$\begin{eqnarray}\psi (t,x)=K{\rm{sech}} (x)\exp \left\{{\rm{i}}\left[\mu {\tan }^{-1}\left(\sinh (x)\right)+\rho t\right]\right\},\end{eqnarray}$where K is a non-zero constant and$\begin{eqnarray}K=\sqrt{\displaystyle \frac{1}{g}\left(\displaystyle \frac{{W}_{0}^{2}}{9}+2-{V}_{0}\right)},\quad \mu =\displaystyle \frac{{W}_{0}}{3},\quad \rho =1.\end{eqnarray}$Moreover, when ∣x∣ → ∞ , ∣ψ(t, x)∣ → 0 and ${\int }_{-\infty }^{\infty }| \psi (t,x){| }^{2}{\rm{d}}x\,=2{K}^{2}$ [42].
Case 1 : Selecting the coefficient of Kerr nonlinear (self-focusing) g = 1 and ${ \mathcal P }{ \mathcal T }$-symmetric potential V0 = 1, W0 = 0.5.
Considering that the initial value condition ${\psi }_{{ \mathcal I }}(x)$ of equation (7) comes from the optical soliton (11), we use the complex-valued PINN deep learning method to learn equation (7) with the initial value condition as follows:$\begin{eqnarray}{\psi }_{{ \mathcal I }}(x)=K{\rm{sech}} (x)\exp \left\{{\rm{i}}\mu {\tan }^{-1}\left[\sinh (x)\right]\right\},\quad x\in {\rm{\Omega }},\end{eqnarray}$where $K=\tfrac{\sqrt{37}}{6},\mu =\tfrac{1}{6}$ and the Dirichlet boundary condition ψ(t, − x) = ψ(t, x), x ∈ ∂ω. We simulate equation (7) with high-resolution data sets generated by using the conventional spectral method. Under the data-driven setting, we can find all measurements $\{{x}_{{ \mathcal I }}^{j},{u}_{0}^{j},{v}_{0}^{j}\}{}_{j=1}^{{N}_{{ \mathcal I }}}$ of equation (11) at time t = 0 and ${\left\{{t}_{{ \mathcal B }}^{j},u{(\pm x,{t}_{{ \mathcal B }}^{j})}_{x\in \partial {\rm{\Omega }}},v{(\pm x,{t}_{{ \mathcal B }}^{j})}_{x\in \partial {\rm{\Omega }}}\right\}}_{j=1}^{{N}_{{ \mathcal B }}}$ on the Dirichlet boundaries. Specifically, the training data sets including ${N}_{{ \mathcal I }}=50$ random sample points with ${\psi }_{{ \mathcal I }}(x)$ and ${N}_{{ \mathcal B }}=100$ random sample points with the Dirichlet boundary conditions. Moreover, we generate ${N}_{{ \mathcal C }}=10000$ random sample collocation points used to learn better equation (7) inside the solution domain. Here, all random sample points are generated by a space-filling LHS strategy[33].
Based on training data sets of the initial value and Dirichlet boundary conditions, we approximate the latent solution $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$ via the deep-layer complex-valued PINNs with 12 layers and 50 neurons per hidden layer and a hyperbolic tangent activation function, and we set space domain ω = [−5, 5] and time range [0, 2](i. e., T = 2).
After training the PINNs, the results for approximating the latent solution of the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential (8) are summarized, as shown in Figure 1. Specifically, the top panel of Figure 1 shows the the magnitude of the exact optical soliton solution and the approximate predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$ with the locations of the initial and boundary training data sets, and the absolute error between the exact solution ψ and the predicted solution $\hat{\psi }$, respectively. The relative ${{\mathbb{L}}}_{2}$-norm errors of {ψ(t, x), u(t, x), v(t, x)} are {1.068591e − 03, 1.040790e − 03, 1.465346e − 03 }, and the absolute error is very small. The whole training time of this case is 20 637.650 s. The bottom panel of Figure 1 shows a more detailed comparison between the exact solution and predicted solution as presented in the top panel at the time instants t = 0.5, 1.0 and 1.5, respectively. Obviously, the complex-valued PINNs can accurately capture the tanglesome nonlinear dynamic behavior of equation (7) with only a small number of initial data.
Figure 1.
New window|Download| PPT slide Figure 1.The self-focusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial value conditions (13): Top : The magnitude of the exact solution ∣ψ(t, x)∣, predicted solution $| \hat{\psi }(t,x)| $ with ${N}_{{ \mathcal I }}=50$ random sample points with the initial value conditions and ${N}_{{ \mathcal B }}=100$ random sample points with the Dirichlet boundary conditions. ${N}_{{ \mathcal C }}=10000$ collocation points are generated by a space-filling LHS strategy, and the absolute error between the exact solutions and predicted solutions are shown, respectively. The relative ${{\mathbb{L}}}_{2}$-norm error for this case is 1.068591e-03. Bottom : Comparisons of the exact solutions and predicted solutions at the three temporal snapshots described by the three dotted black lines in the top panel corresponding to time instants t = 0.5, 1.0 and 1.5, respectively.
Case 2 : Selecting the coefficient of Kerr nonlinear (defocusing) g = −1 and ${ \mathcal P }{ \mathcal T }$-symmetric potential V0 = 2.91, W0 = 0.3.
Similar to Case 1 , we use equation (13) with $K=\sqrt{0.9},\mu =0.1$ and the Dirichlet boundary conditions ψ(t, − x) = ψ(t, x), x ∈ ∂ω. Specifically, the training data sets including ${N}_{{ \mathcal I }}=50$ random sample points at ψ(0, x) = {u0(t, x), v0(t, x)} and ${N}_{{ \mathcal B }}=100$ random sample points with the Dirichlet boundary conditions ${\left\{u(t,\pm x),v(t,\pm x)\right\}}_{x\in \partial {\rm{\Omega }}}$, and we generate ${N}_{{ \mathcal C }}=10000$ random sample collocation points as the inputting data sets used to learn better equation (7) inside the solution domain. Moreover, based on the initial values and boundary conditions described above, we approximate the latent solution $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$ via the deep-layer complex-valued PINNs with 12 layers and 50 neurons per hidden layer and a hyperbolic tangent activation function, and we set space domain ω = [−5, 5] and time range [0, 2] (i. e., T = 2).
After training the PINNs, the results for approximating the latent solution of the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential (8) are summarized, as shown in Figure 2. Specifically, the top panel of Figure 2 shows the the magnitude of the exact optical soliton solution and the approximate predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$ with the locations of the initial and boundary training data sets, and the absolute error between the exact solution ψ and the predicted solution $\hat{\psi }$, respectively. The relative ${{\mathbb{L}}}_{2}$-norm errors of {ψ(t, x), u(t, x), v(t, x)} are {2.439456e − 03, 1.407065e − 03, 8.380674e − 04}. The whole training time of this case is 20 656.014 s. The bottom panel of Figure 2 describes a more detailed comparison between the exact solution and predicted solution as presented in the top panel at the time instants t = 0.5, 1.0 and 1.5, respectively.
Figure 2.
New window|Download| PPT slide Figure 2.The defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial value conditions (13): Top : The magnitude of the exact solution ∣ψ(t, x)∣, predicted solution $| \hat{\psi }(t,x)| $ with ${N}_{{ \mathcal I }}=50$ random sample points with the initial value conditions and ${N}_{{ \mathcal B }}=100$ random sample points with the Dirichlet boundary conditions. ${N}_{{ \mathcal C }}=10000$ collocation points are generated by a space-filling LHS strategy, and the absolute error between the exact solutions and predicted solutions are shown, respectively. The relative ${{\mathbb{L}}}_{2}$-norm error for this case is 8.380674e-04. Bottom : Comparisons of the exact solutions and predicted solutions at the three temporal snapshots described by the three dotted black lines in the top panel corresponding to time instants t = 0.5, 1.0 and 1.5.
3.2. The initial value condition comes from instability solution
In this subsection, we use equation (7) with the new initial value condition that does not satisfy the stability equation (7).$\begin{eqnarray}\begin{array}{l}{\psi }_{{ \mathcal I }}(x)=K{\rm{sech}} (x)\exp \left\{-{\rm{i}}\mu {\tan }^{-1}\left[\sinh (x)\right]\right\},\\ x\in {\rm{\Omega }},\end{array}\end{eqnarray}$where K in equation (12) with other parameters V0 = 1, W0 = 0.6, g = 1 so $K=\sqrt{1.04}$ and we can obtain$\begin{eqnarray}\begin{array}{l}{\psi }_{{ \mathcal I }}(x)=\sqrt{1.04}{\rm{sech}} (x)\exp \left\{-0.2{\mathrm{itan}}^{-1}\left[\sinh (x)\right]\right\},\\ x\in {\rm{\Omega }},\end{array}\end{eqnarray}$with the Dirichlet boundary conditions ψ(t, − x) = ψ(t, x), x ∈ ∂ω. Moreover, in order to study equation (7) with an initial value and boundary conditions shown above via the complex-valued PINNs, we need to obtain the corresponding data sets, including ${N}_{{ \mathcal I }}=50$ randomly distributed points with the initial value condition ${\psi }_{{ \mathcal I }}(x)$ at the space domain ω = [−5, 5], ${N}_{{ \mathcal B }}=100$ randomly distributed points with the Dirichlet boundary conditions at time domain t ∈ [0, 2] and ${N}_{{ \mathcal C }}=10000$ random collocation points by a space-filling LHS strategy.
In this case, we use the complex-valued PINN deep learning method with five hidden layers and 100 neurons per hidden layer and a hyperbolic tangent activation function to approximate the latent solution $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$. After training the PINNs with the above-mentioned data sets for about 20 371.696 s, we obtain the results show in Figure 3. Specifically, the top panel of Figure 3 shows the magnitude of the approximate predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$ with the locations of the initial and boundary training data sets. The relative ${{\mathbb{L}}}_{2}$-norm errors of {ψ(t, x), u(t, x), v(t, x)} are {9.409916e − 03, 1.017411e − 02, 1.311260e − 02}, respectively. The bottom panel of Figure 3 describes a more detailed comparison between the exact solution and the predicted solution as presented in the top panel at the time instants t = 1.5, 3.0 and 4.5, respectively.
Figure 3.
New window|Download| PPT slide Figure 3.The defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial value conditions (15): Top : The magnitude of the predicted solution $| \hat{\psi }(t,x)| $ with ${N}_{{ \mathcal I }}=50$ random sample points with the initial value conditions and ${N}_{{ \mathcal B }}=100$ random sample points with the Dirichlet boundary conditions. ${N}_{{ \mathcal C }}=10000$ collocation points are generated by a space-filling LHS strategy. The relative ${{\mathbb{L}}}_{2}$-norm error for this case is 9.409916e-03. Bottom : Comparisons of the exact solutions and predicted solutions at the three temporal snapshots described by the three dotted black lines in the top panel corresponding to time instants t = 1.5, 3.0 and 4.5, respectively.
3.3. The initial value condition comes from a hyperbolic secant function and one constant
In this example, we use equation (7) with a hyperbolic secant function and one constant as initial value condition:$\begin{eqnarray}{\psi }_{{ \mathcal I }}(x)=1+{\rm{sech}} ({\rm{x}}),\quad x\in {\rm{\Omega }}=[-5.0,5.0],\end{eqnarray}$where the Dirichlet boundary conditions ψ(t, − 5.0) = ψ(t, 5.0), t ∈ [0, 1.4], and other parameters g = 1, V0 = 1.0, W0 = 0.5 in (12) so that we can obtain high-precision data sets with the above-mentioned parameters, random collocation points and initial and boundary value conditions to numerical simulate equation (7).
We use the complex-valued PINNs with five hidden layers and 100 neurons per hidden layer and a hyperbolic tangent activation function to aspproximate the latent solution ψ(t, x) = u(t, x) + iv(t, x) where u(t, x), v(t, x) stand for the real and imaginary parts of ψ(t, x), respectively. After training the PINNs with above-mentioned data sets about 26 350.433 s, we obtain the results shown in Figure 4. Specifically, the top panel of Figure 4 shows the magnitude of the approximate predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$ with the locations of the initial and boundary training data sets. The relative ${{\mathbb{L}}}_{2}$-norm errors of {ψ(t, x), u(t, x), v(t, x)} are {3.110669e − 02, 3.166569e − 02, 4.096653e − 02}. The bottom panel of Figure 4 describes a more detailed comparison between the exact solution and the predicted solution as presented in the top panel at the time instants t = 0.35, 0.70 and 1.05, respectively.
Figure 4.
New window|Download| PPT slide Figure 4.The self-focusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial value conditions (16): Top : The magnitude of the predicted solution $| \hat{\psi }(t,x)| $ with ${N}_{{ \mathcal I }}=50$ random sample points with the initial value conditions and ${N}_{{ \mathcal B }}=100$ random sample points with Dirichlet boundary conditions. ${N}_{{ \mathcal C }}=10000$ collocation points are generated by a space-filling LHS strategy. The relative ${{\mathbb{L}}}_{2}$-norm error for this case is 3.110669e-02. Bottom : Comparisons of the exact solutions and predicted solutions at the three temporal snapshots described by the three black dotted lines in the top panel corresponding to time instants t = 2.5, 5.0 and 7.5, respectively.
4. Comparisons of the influencing factors for the PINN deep learning method in the NLSE
In this section, we will discuss the influence of two factors on the learning ability of this complex-valued PINNs.
4.1. The influence of optimization steps on the learning ability of this complex-valued PINNs
In the following, we will study the influence of optimization steps on the learning ability of this complex-valued PINNs in the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. Moreover, choosing the same example given in Section 3.2 to simplify this problem, we study the nonlinear evolution equation (7) with initial and boundary conditions (14) in the four different optimization steps: N = {20000, 30 000, 40 000, 50 000}. Here, we use 10-hidden-layer deep complex-valued PINNs and 50 neurons per layer with a hyperbolic tangent activation function to approximate the latent solution and these training steps optimized by L-BFGS algorithm.
In Figure 5, the first column describes the magnitude of the predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$, and the other three columns show the comparisons between the exact solution and predicted solution presented in the first column at the time instants t = 1.5, 3.0 and 4.5, respectively. The first row to the last row describe the different optimization steps N = {20000, 30 000, 40 000, 50 000}, and the training time of these four cases are {8671.709, 12 704.724, 16 640.617, 20 642.990} s, respectively. Finally, all of these numerical experiments shown in Table 1. Obviously, noting Figure 5 and Table 1, the learning ability of the complex-valued PINN deep learning method becomes better before the optimized steps up to 50 000. Whether the learning ability of the complex-valued PINN deep learning method will be stronger when more steps are optimized is for future work to determine; in this paper, we will not carry out such a detailed study.
Figure 5.
New window|Download| PPT slide Figure 5.The optimization steps' influence on the learning ability of this complex-valued PINNs in the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial boundary value conditions in Section 3.2: the deep learning results in four different optimization steps N = {20000, 30 000, 40 000, 50 000} from the first row to the last row. Left column images : the magnitude of the approximate predicted solution $| \hat{\psi }(t,x)| $ with the initial and Dirichlet boundary training data and 10 000 collocation points. Right three columns : comparisons of the exact solutions and predicted solutions at the three temporal snapshots described by the three dotted black lines in the first column corresponding to time instants t = 1.5, 3.0 and 4.5.
Table 1. Table 1.Comparison between the ${{\mathbb{L}}}_{2}$-norm errors and optimization steps of the predicted solution $\hat{\psi }(t,x)$.
4.2. The influence of activation functions on the learning ability of this complex-valued PINN
In this subsection, we will study the influence of activation functions on the complex-valued PINN's learning ability in the self-focusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. Moreover, in order to simply this problem, we use the same example as Case 1 given in Section 3.1. Using four common activation functions, ReLU, Leaky ReLUs, Sigmoid, Tanh (4) with the 50 000 L-BFGS optimized steps:$\begin{eqnarray}\begin{array}{rcl}{H}_{j}&=&{\rm{relu}}({\omega }_{j}\cdot {H}_{j-1}+{b}_{j}),\\ {H}_{j}&=&{\rm{leaky}}\_{\rm{relu}}({\omega }_{j}\cdot {H}_{j-1}+{b}_{j}),\\ {H}_{j}&=&{\rm{sigmoid}}({\omega }_{j}\cdot {H}_{j-1}+{b}_{j}).\end{array}\end{eqnarray}$Note that the activation functions with the same name mentioned above are implemented in TensorFlow library and can be used directly. Moreover, we use the complex-valued PINNs with a 10-hidden-layer neural network with 50 neurons per hidden layer to approximate the latent solution.
In Figure 6, the first column describes the magnitude of the predicted solution $| \hat{\psi }(t,x)| =\sqrt{{\hat{u}}^{2}(t,x)+{\hat{v}}^{2}(t,x)}$, and the other three columns show the detailed comparisons between the exact solution and predicted solution as presented in the first column at the time instants t = 0.5, 1.0 and 1.5, respectively. The first row to the last row describe the different activation functions $\{{\rm{ReLU}},{\rm{Leaky}}\ {\rm{ReLUs}},{\rm{Sigmoid}},{\rm{Tanh}}\}$, and the training time of these four cases are {6487.622, 6498.930, 24 081.133, 21 082.208} s, respectively. Finally, all of these numerical experiments are shown in Table 2. Obviously, when noting Figure 6 and Table 2, the learning ability of the complex-valued PINN deep learning method with activation function Tanh is far superior to ReLU and Leaky ReLUs in ${{\mathbb{L}}}_{2}$-norm error, and slightly better than Sigmoid in ${{\mathbb{L}}}_{2}$-norm error and training time.
Figure 6.
New window|Download| PPT slide Figure 6.The activation function's influence on the learning ability of this complex-valued PINN in the self-focusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and initial boundary value conditions in Case 1 given in Section 3.1: the deep learning results in four different activation functions $\{{\rm{ReLU}},{\rm{Leaky}}\ {\rm{ReLUs}},{\rm{Sigmoid}},{\rm{Tanh}}\}$ from the first row to the last row. Left column images : the magnitude of the approximate predicted solution $| \hat{\psi }(t,x)| $ with the initial and Dirichlet boundary training data and 10 000 collocation points. Right three columns : comparisons of the exact and predicted solutions at the three temporal snapshots described by the three dotted black lines in the panels in the first column corresponding to time instants t = 0.5, 1.0 and 1.5.
Table 2. Table 2.Comparison between the ${{\mathbb{L}}}_{2}$-norm errors and the activation functions of the predicted solution $\hat{\psi }(t,x)$.
5. Data-driven coefficient discovery of the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential
In order to explore the coefficients of the defocusing (g = −1) NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential (6), we will use the data-driven discovery method of PDEs that comes from the PINN deep learning framework [16]. The scheme of data-driven coefficient discovery of PDEs is similar for obtaining the data-driven solutions of PDEs.
5.1. Explore the coefficients of the dispersive and nonlinear items
In this subsection, we will explore the unknown parameters [m, g], which are the coefficients of the second-order dispersive item and defocusing Kerr nonlinearity, respectively, in the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential:$\begin{eqnarray}\begin{array}{l}{\rm{i}}{\psi }_{t}=-m{\psi }_{{xx}}-[V(x)+{\rm{i}}W(x)]\psi -g| \psi {| }^{2}\psi ,\\ (t,x)\in [0,T]\times {\rm{\Omega }},\end{array}\end{eqnarray}$where$\begin{eqnarray*}\begin{array}{l}V(x)=2.91{{\rm{sech}} }^{2}(x),\\ W(x)=0.3{\rm{sech}} (x)\tanh (x).\end{array}\end{eqnarray*}$
Here, the latent function $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$ with $[\hat{v}(t,x),\hat{v}(t,x)]$ are the real and imaginary parts, respectively. We use the deep complex-valued PINNs f(t, x) = ifu(t, x) − fv(t, x) with f(t, x), fu(t, x) and fv(t, x) satisfying$\begin{eqnarray}\begin{array}{l}f(t,x)={\rm{i}}{\hat{\psi }}_{t}+m{\hat{\psi }}_{{xx}}+[2.91{{\rm{sech}} }^{2}(x)\\ +\,0.3{\rm{i}}\ {\rm{sech}} (x)\tanh (x)]\hat{\psi }+g| \hat{\psi }{| }^{2}\hat{\psi },\\ {f}_{u}(t,x)={\hat{u}}_{t}+m{\hat{v}}_{{xx}}+0.3{\rm{sech}} (x)\tanh (x)\ \hat{u}\\ +\,2.91{{\rm{sech}} }^{2}(x)\ \hat{v}+g({\hat{u}}^{2}+{\hat{v}}^{2})\hat{v},\\ {f}_{v}(t,x)={\hat{v}}_{t}-m{\hat{u}}_{{xx}}-2.91{{\rm{sech}} }^{2}(x)\ \hat{u}\\ +\,0.3{\rm{sech}} (x)\tanh (x)\ \hat{v}-g({\hat{u}}^{2}+{\hat{v}}^{2})\hat{u}.\end{array}\end{eqnarray}$Training the inputting data sets [u(t, x), v(t, x)] and unknown parameters [m, g] via the deep complex-valued PINNs to approximate the undiscovered solution $\hat{\psi }(t,x)=\hat{u}(t,x)\,+{\rm{i}}\hat{v}(t,x)$ by minimizing the loss function,$\begin{eqnarray}\begin{array}{l}L(\hat{\psi })=\displaystyle \frac{1}{{N}_{S}}\sum _{j=1}^{{N}_{S}}\left({\left|\ \hat{u}({t}^{j},{x}^{j})-u({t}^{j},{x}^{j})\ \right|}^{2}\right.\\ +\,{\left|\ \hat{v}({t}^{j},{x}^{j})-v({t}^{j},{x}^{j})\ \right|}^{2}+{\left|{f}_{u}({t}_{S}^{j},{x}_{S}^{j})\right|}^{2}\\ \left.+\,{\left|{f}_{v}({t}_{S}^{j},{x}_{S}^{j})\right|}^{2}\right).\end{array}\end{eqnarray}$
In order to obtain the unknown parameters [m, g] of equation (18), we randomly choose NS = 10000 sample points from the exact solution ψ(t, x) = u(t, x) + iv(t, x) with m = 1, g = − 1 (Case 2 in Section 3.1) in the spatiotemporal domain [0, 2] × [ − 5, 5], and [u(tj, xj), v(tj, xj)] given by equation (20) is equivalent to [u(t, x), v(t, x)] of the exact solution ψ(t, x). Using the training data sets with the 10-hidden-layer PINNs and 50 neurons per hidden layer, a hyperbolic tangent activation function and 50 000 steps L-BFGS optimization are used to explore the unknown parameters [m, g] by minimizing the loss function (20).
After training the complex-valued PINNs for 20 609.737 s, we obtain the unknown parameters [m, g]. All of the results are shown in Table 3 including the relative error of the correct NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and identical twos with clean data or 1% noise data, respectively, and V(x), W(x) are given by (18).
Table 3. Table 3.The correct NLSE with ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and the identified twos obtained by learning m and g, and the relative errors.
5.2 Explore the coefficients of the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential
In the following, we will explore the coefficients of the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential [V0, W0] in the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential:$\begin{eqnarray}\begin{array}{l}{\rm{i}}{\psi }_{t}=-{\psi }_{{xx}}-[{V}_{0}{{\rm{sech}} }^{2}(x)\\ +\,{\rm{i}}{W}_{0}{\rm{sech}} (x)\tanh (x)]\psi +| \psi {| }^{2}\psi ,\\ (t,x)\in [0,T]\times {\rm{\Omega }},\end{array}\end{eqnarray}$where [V0, W0] are the unknown real-valued parameters.
Here, the latent function $\hat{\psi }(t,x)=\hat{u}(t,x)+{\rm{i}}\hat{v}(t,x)$ with $[\hat{u}(t,x),\hat{v}(t,x)]$ are the real and imaginary parts, respectively. We use the complex-valued PINNs f(t, x) = ifu(t, x) − fv(t, x) with f(t, x), fu(t, x) and fv(t, x) satisfying$\begin{eqnarray}\begin{array}{l}f(t,x)={\rm{i}}{\hat{\psi }}_{t}+{\hat{\psi }}_{{xx}}+[{V}_{0}{{\rm{sech}} }^{2}(x)\\ +\,{\rm{i}}{W}_{0}\ {\rm{sech}} (x)\tanh (x)]\hat{\psi }-| \hat{\psi }{| }^{2}\hat{\psi },\\ {f}_{u}(t,x)={\hat{u}}_{t}+{\hat{v}}_{{xx}}+{W}_{0}{\rm{sech}} (x)\tanh (x)\ \hat{u}\\ +\,{V}_{0}{{\rm{sech}} }^{2}(x)\ \hat{v}-({\hat{u}}^{2}+{\hat{v}}^{2})v,\\ {f}_{v}(t,x)={\hat{v}}_{t}-{\hat{u}}_{{xx}}-{V}_{0}{{\rm{sech}} }^{2}(x)\ \hat{u}\\ +\,{W}_{0}{\rm{sech}} (x)\tanh (x)\ \hat{v}+({\hat{u}}^{2}+{\hat{v}}^{2})\hat{u}.\end{array}\end{eqnarray}$In order to obtain the unknown parameters [V0, W0] of equation (18), we randomly choose NS = 10000 sample points from the exact solution ψ(t, x) = u(t, x) + iv(t, x) with V0 = 2.91, W0 = 0.3 (given by Case 2 in Section 3.1) in the spatiotemporal domain [0, 2] × [ − 5, 5], and [u(tj, xj), v(tj, xj)] given by equation (20) is equivalent to [u(t, x), v(t, x)] of the exact solution ψ(t, x). Using the training data sets with the 10-hidden-layer PINNs and 50 neurons per hidden layer, a hyperbolic tangent activation function and 50 000 steps L-BFGS optimization are used to explore the unknown parameters [V0, W0] by minimizing the loss function (20).
After training the deep complex-valued PINNs for 20 782.791 s, we obtain the unknown parameters [V0, W0]. All of these results are shown in Table 4, including the relative error of the correct NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and identical twos with clean data or 1% noise data, respectively.
Table 4. Table 4.The correct NLSE with ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential and the identified twos obtained by learning V0 and W0, and the relative errors.
In conclusion, we have solved the nonlinear Schrödinger equation (NLSE) with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential via the complex-valued PINN deep learning method under same Dirichlet boundary and different initial value conditions including optical soliton, instability solution and a hyperbolic secant function with one constant. Particularly, when selecting the exact optical soliton solution as the initial value conditions, we find that the complex-valued PINNs can better learn the nonlinear dynamic behaviors and structures of corresponding solutions. Moreover, we also confirmed the learning ability of the complex-valued PINNs for solving the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential in different optimization steps or activation functions, and we find that the complex-valued PINN deep learning method can learn better with more optimized steps and a hyperbolic tangent activation function. We also investigated the data-driven coefficients discovery of the defocusing NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential; the results show that the complex-valued PINN deep learning method can also be used to learn other nonlinear wave equations in many nonlinear science fields.
As can be seen from the figures and tables here, obviously, with a small number of training data, the complex-valued PINNs can effectively learn the NLSE with the generalized ${ \mathcal P }{ \mathcal T }$-symmetric Scarf-II potential. However, there are still some questions to answer: (1) if we set more hidden layers and neurons of per hidden layer, will the learning ability of the complex-valued PINN deep learning method improve? (2) If more steps are optimized, will the learning ability of the complex-valued PINN deep learning method be the stronger? (3) Is there a more generalized loss function for the other classical nonlinear partial differential equations? (4) Is there a more effective physical constraint for the complex-valued PINN deep learning method? Meaningful discussions and optimizations of the neural network model will be undertaken in future work.
Conflict of interest
The authors declare that they have no conflict of interests.
Data availability statements
All data generated or analyzed during this study are included in this article.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant Nos. 11 775 121 and 11 435 005, and the K. C. Wong Magna Fund of Ningbo University.
EmanuelloJ et al. 2021Proceedings—AI/ML for Cybersecurity: Challenges, Solutions, and Novel Ideas at SIAM Data MiningarXiv: 2104.13254 [Cited within: 1]
KrizhevskyASutskeverIHintonG E2012 Imagenet classification with deep convolutional neural networks 25 10971105 DOI:10.1145/3065386
LakeB MSalakhutdinovRTenenbaumJ B2015 Human-level concept learning through probabilistic program induction 350 13321338 DOI:10.1126/science.aab3050
AlipanahiBDelongAWeirauchM TFreyB J2015 Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning 33 831838 DOI:10.1038/nbt.3300
RaissiMPerdikarisPKarniadakisG E2017 Inferring solutions of differential equations using noisy multifidelity data 335 736746 DOI:10.1016/j.jcp.2017.01.060
LiJChenY2020 A deep learning method for solving third-order nonlinear evolution equations 72 115003 DOI:10.1088/1572-9494/aba243
LiJChenY2021 A physics-constrained deep residual network for solving the sine-Gordon equation 73 015001 DOI:10.1088/1572-9494/abc3ad
ZhouZ JYanZ Y2021 Solving forward and inverse problems of the logarithmic nonlinear Schrödinger equation with ${ \mathcal P }{ \mathcal T }$-symmetric harmonic potential via deep learning 387 127010 DOI:10.1016/j.physleta.2020.127010
WangLYanZ Y2021 Data-driven rogue waves and parameter discovery in the defocusing nonlinear Schrödinger equation with a potential using the PINN deep learning 404 127408 DOI:10.1016/j.physleta.2021.127408
ZhouZ JYanZ Y2021 Deep learning neural networks for the third-order nonlinear Schrödinger equation: Solitons, breathers, and rogue waves arXiv: 2104.14809v1
WangLYanZ Y2021 Data-driven peakon and periodic peakon travelling wave solutions of some nonlinear dispersive equations via deep learning arXiv: 2101.04371v1 [Cited within: 1]
BaydinA GPearlmutterB ARadulA ASiskindJ M2018 Automatic differentiation in machine learning: a survey 18 143 [Cited within: 2]
AbadiM et al. 2016 Tensorflow: a system for large-scale machine learning XII USENIX Symposium on Operating Systems Design and Implementation265283 DOI:10.5555/3026877.3026899 [Cited within: 1]
MakrisK GEl-GanainyRChristodoulidesD NMusslimaniZ H2011 ${ \mathcal P }{ \mathcal T }$-symmetric periodic optical potentials 50 10191041 DOI:10.1007/s10773-010-0625-6
ShiZ WJiangX JZhuXLiH G2011 Bright spatial solitons in defocusing Kerr media with ${ \mathcal P }{ \mathcal T }$-symmetric potentials 84 053855 DOI:10.1103/PhysRevA.84.053855 [Cited within: 3]
YanZ YWenZ CHangC2015 Spatial solitons and stability in self-focusing and defocusing Kerr nonlinear media with generalized parity-time-symmetric Scarf-II potentials 92 022913 DOI:10.1103/PhysRevE.92.022913 [Cited within: 2]
ChenY et al. 2017 Families of stable solitons and excitations in the ${ \mathcal P }{ \mathcal T }$-symmetric nonlinear Schrodinger equations with position-dependent effective masses 7 20452322 DOI:10.1038/s41598-017-01401-3 [Cited within: 2]
YanZ YChenY2017 The nonlinear Schrödinger equation with generalized nonlinearities and ${ \mathcal P }{ \mathcal T }$-symmetric potentials: stable solitons, interactions, and excitations 27 073114 DOI:10.1063/1.4995363 [Cited within: 1]
PeregrineD H1983 Water waves, nonlinear Schrödinger equations and their solutions 25 1643 DOI:10.1017/S0334270000003891
ZabuskyN JKruskalM D1965 Interaction of solitons in a collisionless plasma and the recurrence of initial states 15 240243 DOI:10.1103/PhysRevLett.15.240
HasegawaATappertF1973 Transmission of stationary nonlinear optical pulses in dispersive dielectric fibers: I. Anomalous dispersion 23 142144 DOI:10.1063/1.1654836 [Cited within: 1]
BangOKivsharY SBuryakA V1992 Bright spatial solitons in defocusing Kerr media supported by cascaded nonlinearities 22 16801682 DOI:10.1364/OL.22.001680 [Cited within: 1]