Power law decay of stored pattern stability in sparse Hopfield neural networks

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-02

Fei Fang, Zhou Yang, Sheng-Jun Wang^,School of Physics and Information Technology, Shaanxi Normal University, Xi’an 710119, China

First author contact: ^*Author to whom any correspondence should be addressed.
Received:2020-04-13Revised:2020-11-6Accepted:2020-11-13Online:2021-01-15

Abstract
Hopfield neural networks on scale-free networks display the power law relation between the stability of patterns and the number of patterns. The stability is measured by the overlap between the output state and the stored pattern which is presented to a neural network. In simulations the overlap declines to a constant by a power law decay. Here we provide the explanation for the power law behavior through the signal-to-noise ratio analysis. We show that on sparse networks storing a plenty of patterns the stability of stored patterns can be approached by a power law function with the exponent −0.5. There is a difference between analytic and simulation results that the analytic results of overlap decay to 0. The difference exists because the signal and noise term of nodes diverge from the mean-field approach in the sparse finite size networks.
Keywords： Hopfield neural network;attractor neural networks;associative memory

PDF (974KB)Metadata Metrics Related articlesExportEndNote|Ris|Bibtex Favorite
Cite this article
Fei Fang, Zhou Yang, Sheng-Jun Wang. Power law decay of stored pattern stability in sparse Hopfield neural networks. Communications in Theoretical Physics, 2021, 73(2): 025601- doi:10.1088/1572-9494/abcfb0

1. Introduction

The Hopfield attractor neural network [1, 2] is a well-known Ising-like model of associative memory [3-15]. Trained by the Hebbian rule, a number of patterns can be stored and become attractors of the neural network. As a pattern is presented to the trained attractor neural network, the network state converges to the nearest stored pattern which most closely resembles the input pattern. Therefore Hopfield neural networks can function as a form of associative memory [1]. When only a few of patterns are stored, these patterns are stable attractors in Hopfield neural networks. With the number of patterns increases, the quality of recognition decreases [16]. As a stored pattern is presented to the network, retrieval errors occur because of the crosstalk between stored patterns, that is, the stability of stored patterns decreases [3].

The performance of the Hopfield model on scale-free (SF) networks has been studied. In SF networks, the connection degrees, i.e. the number of connections of each node, are heterogeneous. The distribution of degree is power law. The heterogeneous degree affects the behavior of the Hopfield model. For example, retrieval errors are not distributed uniformly among nodes [3]. Especially, the power law relation between the performance of the network and the number of patterns has been found on SF networks [16].

The power law relation has an important influence on the network’s property, that is, the stability of patterns declines slowly as the number of stored patterns increases. Therefore, this is a beneficial feature for the dynamics of associative memory. In fact, power law behaviors pervade neural activities [17] and experiments which has shown that memory exhibits power law forgetting [18-20]. Using a mean-field neural network model with slow plasticity, an elegant analysis has been proposed that neural networks can exhibit power law forgetting, i.e. universal power law relaxation of the mean synaptic strength along critical manifold or at the tricritical point [18]. However, it is still an open question that how the power law behavior emerges in Hopfield neural networks.

In this work we present the mechanism of the power law relation through analytical analysis. We simulate the Hopfield network on SF networks and homogeneous Erdös-Rényi (ER) random networks, and obtain that the overlap obeys power law decay to a constant which depends on network size and connection density. Using signal-to-noise analysis, we obtain that as the network is sparse, the relation between the overlap and the number of stored patterns can be described by a power law function, while the overlap decays to 0.

2. Model

In the Hopfield neural network model [1, 2], node states s_i=±1,i∈{1,2, …, N} are binary. The dynamical equation of the system is [1, 3](1)$\begin{eqnarray}{s}_{i}(t+1)=\mathrm{sgn}\left(\displaystyle \sum _{j=1}^{N}{J}_{{ij}}{s}_{j}(t)\right),\end{eqnarray}$where J_ij is the entry of the coupling matrix. Using the Hebbian rule:(2)$\begin{eqnarray}{J}_{{ij}}={a}_{{ij}}\displaystyle \sum _{\alpha =1}^{n}{\xi }_{i}^{\alpha }{\xi }_{j}^{\alpha }.\end{eqnarray}$Patterns of node states $\{{\xi }_{i}^{\alpha }\}$ (α=1, 2, ..., n) are stored. These stored patterns are attractors of the neural networks. {a_ij} is the adjacent matrix representing the structure of the network. a_ij=1 if node i and node j are connected, otherwise a_ij=0. The number of connections of a node is called the connection degree which is denoted by k_i, i.e. ${k}_{i}={{\rm{\Sigma }}}_{j=1}^{N}{a}_{{ij}}$.

When the number n of stored patterns is small, these patterns are stable attractors of the Hopfield neural network. When a stored pattern or a blurred pattern is presented to the neural network, the output of the Hopfield neural network is the same as the attractor, ${x}_{i}={\xi }_{i}^{\alpha }$. When the number n of stored patterns is large, stored patterns cannot be completely recovered [1]. Some nodes in the network flipped their states s_i and the network state is different from the attractor. However the network state can be close to one of stored patterns. So the network still recognize stored patterns [16]. To quantify the computational performance of Hopfield attractor neural networks, we employ the stability of stored patterns [3]. A stored pattern is presented to a neural network, then the overlap between the output state and the initial state is computed as the measurement of the stability. The overlap is defined like this:(3)$\begin{eqnarray}{\phi }_{\alpha }\equiv \displaystyle \frac{1}{N}\displaystyle \sum _{i=1}^{N}{x}_{i}{\xi }_{i}^{\alpha },\end{eqnarray}$where x_i=±1 denotes the output of the ith node, i.e. the node state of the network in which the dynamic has converged into the stable state.

We use ER random network model and SF network model in simulations. ER random networks are generated as follows: starting with N disconnected nodes, we connect each of possible edges with a probability 0<p<1. The degree distribution is a Poisson distribution [11](4)$\begin{eqnarray}{P}_{\mathrm{ER}}(k)={{\rm{e}}}^{-k}\displaystyle \frac{\langle k{\rangle }^{k}}{k!},\end{eqnarray}$where ⟨k⟩ denotes the average degree in the network. The average degree of the model is equal to pN.

SF networks are generated as follows: starting with m₀ fully connected nodes, at each time step, a new node with m≤m₀ edges is added by linking the new node j to m different nodes already present in the system. The probability that the new node j is linked to an existing node i is linearly proportional to the actual degree of i:(5)$\begin{eqnarray}\displaystyle \prod _{j\to i}=\displaystyle \frac{{k}_{i}}{\displaystyle \sum _{l}{k}_{l}}.\end{eqnarray}$The average degree of SF model approaches to ⟨k⟩=2m in large networks. The degree distribution of SF model is like:(6)$\begin{eqnarray}P(k)\sim {k}^{-\gamma },\gamma =3.\end{eqnarray}$We take m=m₀ in our work and the network is connected for any value of average degree.

3. Results

On SF networks, the power law relation between the overlap and the number of stored patterns was obtained using computer simulations in [16]. So we first analyze the stability of patterns on SF networks. Then we also analyze the behavior of the Hopfield neural network model on ER random networks.

3.1. SF networks

The relation between the overlap and the number of stored patterns has been derived in [11, 21]. First, we review the formula of overlap on SF networks.

If Hopfield neural networks consist of N nodes and memorized n patterns, the condition for a pattern ${s}_{i}^{\alpha }$ to be stable can be expressed by N inequalities, one for each node i:(7)$\begin{eqnarray}{s}_{i}^{\alpha }\displaystyle \sum _{j=1}^{N}{J}_{{ij}}{s}_{j}^{\alpha }\gt 0.\end{eqnarray}$Substitute the Hebbian rule (i.e. equation (2)) into the inequality (7) gives(8)$\begin{eqnarray}{s}_{i}^{\alpha }\displaystyle \sum _{j=1}^{N}\displaystyle \sum _{\alpha ^{\prime} =1}^{n}{a}_{{ij}}{s}_{i}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }\gt 0.\end{eqnarray}$Break the left hand side of equation into two parts(9)$\begin{eqnarray}\displaystyle \sum _{j=1}^{N}{a}_{{ij}}{s}_{i}^{\alpha }{s}_{i}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }{| }_{\alpha ^{\prime} =\alpha }+\displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}\displaystyle \sum _{j=1}^{N}{a}_{{ij}}{s}_{i}^{\alpha }{s}_{i}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }\gt 0.\end{eqnarray}$The first term on the left is called the signal term, because it is equal to the connection degree k_i of the node i which is positive and tends to reinforce the state stable. This term is denoted by T_signal. The double sum of the second part includes (n−1)k_i terms equal to +1 or −1. Because patterns are uncorrelated, this sum is 0 on the average. Its standard deviation is the square root of the number of terms (n−1)k_i. This term is called noise term which tends to make the state unstable, denoted by T_noise. When patterns are uncorrelated, the value of noise term follows a Gaussian distribution(10)$\begin{eqnarray}f({T}_{\mathrm{noise}})=\displaystyle \frac{1}{\sqrt{2\pi }\sigma }{{\rm{e}}}^{-{\left({T}_{\mathrm{noise}}-\mu \right)}^{2}/2{\sigma }^{2}},\end{eqnarray}$where the mean value μ=0, the standard deviation $\sigma =\sqrt{k(n-1)}$. The noise term T_noise<−k, the sum of the signal term and the noise term less than zero, then the state of node is unstable. So the probability that nodes of degree k are unstable is:(11)$\begin{eqnarray}U(k)=\displaystyle \frac{1}{\sqrt{2\pi k(n-1)}}{\int }_{-\infty }^{-k}{{\rm{e}}}^{-\tfrac{{y}^{2}}{2k(n-1)}}{\rm{d}}y.\end{eqnarray}$For simplicity, we transform the unstable probability into the formula represented by error function:(12)$\begin{eqnarray}U(k)=\displaystyle \frac{1}{2}\left(1-\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right)\right),\end{eqnarray}$where $\mathrm{erf}(x)=(2/\sqrt{\pi }){\int }_{0}^{x}{{\rm{e}}}^{-{y}^{2}}{\rm{d}}y$. For the given number of patterns n, U(k) is a decreasing function.

The stability of stored patterns in networks can be predicted by the probability U(k). For the network which has converged into the stable state, the overlap order parameter can be rewritten as $\phi =\tfrac{1}{N}(N-2{N}_{\mathrm{flip}})$, N_flip is the number of flipped nodes [11]. For SF networks, assuming a sharp cutoff of the distribution at low k, the distribution is normalized as [12](13)$\begin{eqnarray}{P}_{\mathrm{SF}}(k)=(\gamma -1){k}_{\min }^{\gamma -1}{k}^{-\gamma }.\end{eqnarray}$

The number of flipped nodes is(14)$\begin{eqnarray}\begin{array}{rcl}{N}_{\mathrm{flip}} & = & {\int }_{{k}_{\min }}^{\infty }N(\gamma -1){k}_{\min }^{\gamma -1}{k}^{-\gamma }\\ & & \times \displaystyle \frac{1}{2}\left(1-\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right)\right){\rm{d}}k.\end{array}\end{eqnarray}$Then the overlap order parameter is:(15)$\begin{eqnarray}\phi =(\gamma -1){k}_{\min }^{\gamma -1}{\int }_{{k}_{\min }}^{\infty }{k}^{-\gamma }\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right){\rm{d}}k.\end{eqnarray}$

Based on the result of overlap, we take further analytic treatment to interpret the power law relation between the overlap and the number of patterns. The power series expansion of error function has the form(16)$\begin{eqnarray}\begin{array}{rcl}\mathrm{erf}(x) & = & \displaystyle \frac{2}{\sqrt{\pi }}\displaystyle \sum _{i=0}^{\infty }\displaystyle \frac{{\left(-1\right)}^{i}{x}^{2i+1}}{i!(2i+1)}\\ & = & \displaystyle \frac{2}{\sqrt{\pi }}\left(x-\displaystyle \frac{{x}^{3}}{3}+\displaystyle \frac{{x}^{5}}{10}-\ldots \right).\end{array}\end{eqnarray}$When x is fairly small, only a few terms of this series are necessary for obtaining a good approximation.

For very a small average degree k and a large number of patterns n, $x=\sqrt{\tfrac{k}{2(n-1)}}$ is small. We use the expansion with only the first term in computing the overlap on SF networks(17)$\begin{eqnarray}\phi =(\gamma -1){k}_{\min }^{\gamma -1}\displaystyle \frac{2}{\sqrt{\pi }}{\int }_{{k}_{\min }}^{\infty }{k}^{-\gamma }{\left(\displaystyle \frac{k}{2(n-1)}\right)}^{\tfrac{1}{2}}{\rm{d}}k.\end{eqnarray}$

Then(18)$\begin{eqnarray}\begin{array}{rcl}\phi & = & \sqrt{\displaystyle \frac{2}{\pi }}(\gamma -1){k}_{\min }^{\gamma -1}{\displaystyle \int }_{{k}_{\min }}^{N}{k}^{\tfrac{1}{2}-\gamma }{\rm{d}}k\times {\left(n-1\right)}^{-\tfrac{1}{2}}\\ & = & \sqrt{\displaystyle \frac{2}{\pi }}(\gamma -1){k}_{\min }^{\gamma -1}\left(\displaystyle \frac{1}{\tfrac{3}{2}-\gamma }\right){\left|{k}^{\left(\tfrac{3}{2}-\gamma \right)}\right|}_{{k}_{\min }}^{N}\times {\left(n-1\right)}^{-\tfrac{1}{2}}.\end{array}\end{eqnarray}$We use y₁ to represent the coefficient(19)$\begin{eqnarray}{y}_{1}=\sqrt{\displaystyle \frac{2}{\pi }}(\gamma -1){k}_{\min }^{\gamma -1}\left(\displaystyle \frac{1}{\tfrac{3}{2}-\gamma }\right){\left|{k}^{\left(\tfrac{3}{2}-\gamma \right)}\right|}_{{k}_{\min }}^{N}.\end{eqnarray}$For SF networks, the coefficient y₁ can be numerically computed, when the parameters γ and ${k}_{\min }=m$ are given. And the overlap is(20)$\begin{eqnarray}\phi ={y}_{1}\times {\left(n-1\right)}^{-\tfrac{1}{2}}.\end{eqnarray}$When n ≫ 1, an approximate formula reads as:(21)$\begin{eqnarray}\phi \simeq {y}_{1}\times {n}^{-\tfrac{1}{2}}.\end{eqnarray}$

Therefore, we obtain that the relation between the overlap and the number of stored patterns follows a power law function when the network is sparse and the number of stored patterns is large. Theoretical analysis shows that the exponent is −1/2.

It was obtained in simulations in the [16] that the overlap φ varies roughly as (φ−φ⁰)∝n^−τ. Following the [16], we measure the overlap difference (φ−φ⁰) for SF networks in simulations. The network is built using the SF network model [22, 23]. The size of networks is N=10 000. One of the stored patterns is used as the network’s initial state. Here we recover the relation reported in [16]. The simulation results are shown in figures 1, 2. With the average degree increases, the value of φ⁰ decrease. When the average degree ⟨k⟩=4,10,20 and 50, the final overlap φ⁰ respectively is 0.24, 0.23, 0.21 and 0.19. In figure 1, for networks with ⟨k⟩=4, the relation between the overlap difference and the number of patterns is power law. By fitting the simulation results into a power law function, we obtain the exponent of the relation is −0.541±0.001, which is close to the theoretical results of the exponent −1/2.

Figure 1.

New window|Download| PPT slide
Figure 1.The overlap difference φ−φ⁰ versus the number of patterns n on scale-free networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=2 to 2500 are used in fitting. The parameters are N=10 000, average degree ⟨k⟩=4. Inset: the overlap φ versus the number of patterns n on scale-free networks on a log-log scale. The final overlap φ⁰ is 0.24. Each data point is the average over 1000 realizations.

The limit of the number of patterns that can be stored with small error is called the storage capacity [9]. We compute the storage capacity using the threshold of overlap 0.95. Below the capacity, the relation between overlap and the number of patterns cannot be a power decay. For network with N=10 000, the capacity of SF networks is n=2 with ⟨k⟩=4. Therefore the whole curve shown in the figure 1 can exhibit a power law decay. The value of φ⁰ is 0.24.

When the average degree of the network is large, the capacity becomes large, the relation is not power law for small values of the number of patterns. However, our analysis shows that as the the ratio k/2(n−1) becomes small, the relation turns to a power law. The results of networks with the average degree ⟨k⟩=50 are shown in figure 2. The capacity of SF networks is n=10 with ⟨k⟩=50. As the number of patterns is larger than 10, the relation changes into a power law. For networks with ⟨k⟩=50, the final overlap φ⁰ decreased to 0.19, as shown in the inset of figure 2. The exponent is −0.689±0.003.

Figure 2.

New window|Download| PPT slide
Figure 2.The overlap difference φ−φ⁰ versus the number of patterns n on scale-free networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=40 to 2500 are used in fitting. The parameters are N=10 000, average degree ⟨k⟩=50. Inset: the overlap φ versus the number of patterns n on scale-free networks on a linear coordination. The final overlap φ⁰ is 0.19. Each data point is the average over 1000 realizations.

The change of the relation can be explained by the signal-to-noise ratio (SNR) analysis. The power law relation is based on the expansion of the error function under the condition that the connection degree is relatively small compared with the number of patterns k/n ≪ 1. As the network become dense, the condition k/n ≪ 1 is not satisfied. The relation φ−φ⁰ versus n cannot be expressed as a power law function, as seen in figure 2. As the number of stored patterns increases, the neural network satisfies the condition, then the relation turn into a power law function.

There is a difference between the above simulation and the analytic results, as shown in the inset of figure 1. In analytic results, the overlap decays to 0, i.e. φ∝n^−0.5. The φ⁰ cannot be obtained in the analysis. The difference between simulation and analytic results may due to two facts. Firstly, in the network with finite size, the distribution of the noise term of nodes deviates from the exact Gaussian distribution. We calculate the distribution of the noise term of nodes in a realization of the SF network, as shown in figure 3. However, the distribution of the noise term of nodes is not the exact Gaussian distribution. In the region T_noise<−⟨k⟩ the probability obtained by simulation is lower than the analytic results. The extreme values of the noise term appear rarely. We compute the unstable probability of nodes using both simulations and analysis and shown them in figure 4. In a network with finite size, the difference from the exact Gaussian distribution makes the unstable probability smaller.

Figure 3.

New window|Download| PPT slide
Figure 3.The distribution of the noise term T_noise in scale-free networks. The dashed line is the exact Gaussian distribution in the analytic results. The curve is the numerical simulation results. The parameters are N=10 000, pattern number n=100, and average degree ⟨k⟩=50.

Figure 4.

New window|Download| PPT slide
Figure 4.The unstable probability function U(k) versus the degree k in scale-free networks. The red dashed line is the analytic results. The black solid line is the numerical simulation results. The parameters are N=10 000, pattern number n=100, average degree ⟨k⟩=4. Each data point is 10 000 realizations.

Secondly, the simulation is performed in the way that nodes are updated in a serial order. As one node is flipped by updating, the local field of its neighbor is changed. We use the initial network state to calculate the distribution of the noise term T_noise of nodes in SF networks. For a given network realization, nodes i=5001, ..., 10 000 are used in computing the distribution. The distribution is shown by the dashed line in figure 5. The center of the curve is at T_noise=0. In simulation, we update a part of nodes which are i=1, …, 5000. Then we use the nodes i=5001, ..., 10 000 to compute the distribution again. The results is shown by the solid line in figure 5. The distribution of the noise term T_noise has shifted to the right. The noise term of un-updated nodes becomes larger, so the unstable probability becomes smaller.

Figure 5.

New window|Download| PPT slide
Figure 5.The distribution of the noise term T_noise of nodes in a realization of the scale-free network. The dashed line is the results obtained at the initial state. The solid line is the results obtained when 5000 nodes are updated. The parameters are N=10 000, pattern number n=10 000, and average degree ⟨k⟩=50. The nodes i=5001, ..., 10 000 are used in computing the two results.

We provide an understanding of the change of noise. For node A which flips, the noise is negative, like:(22)$\begin{eqnarray}\begin{array}{rcl}{T}_{A} & = & \displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}\displaystyle \sum _{j=1}^{N}{a}_{{Aj}}{s}_{A}^{\alpha }{s}_{A}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }\\ & = & \displaystyle \sum _{j=1}^{N}{a}_{{Aj}}\displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}{s}_{A}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }{s}_{A}^{\alpha }\lt -{k}_{A},\end{array}\end{eqnarray}$where k_A is the connection degree of node A. On average, ${\displaystyle \sum }_{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}{s}_{A}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }{s}_{A}^{\alpha }\lt -1$, for j satisfying a_Aj=1. For the node B which is a neighbor of the node A, the noise term T_B at the initial state is:(23)$\begin{eqnarray}{T}_{B}=\displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}\displaystyle \sum _{j\ne A}^{N}{a}_{{Bj}}{s}_{B}^{\alpha }{s}_{B}^{\alpha ^{\prime} }{s}_{j}^{\alpha ^{\prime} }{s}_{j}^{\alpha }+\displaystyle \sum _{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}{a}_{{BA}}{s}_{B}^{\alpha }{s}_{B}^{\alpha ^{\prime} }{s}_{A}^{\alpha ^{\prime} }{s}_{A}^{\alpha }.\end{eqnarray}$The second term is a part of T_A, and its mean value is negative. When node A flips, ΔT_B $=\,-2{\displaystyle \sum }_{\displaystyle \genfrac{}{}{0em}{}{\alpha ^{\prime} =1}{\alpha ^{\prime} \ne \alpha }}^{n}{a}_{{BA}}{s}_{B}^{\alpha }{s}_{B}^{\alpha ^{\prime} }{s}_{A}^{\alpha ^{\prime} }{s}_{A}^{\alpha }\gt 0$ on average. So the noise term of nodes gets bigger. When average degree ⟨k⟩ is larger, the contribution of the term becomes smaller.

Moreover, we compare the fraction R of flipped nodes predicted by the analysis and the simulation. The fraction R of flipped nodes is computed using a moving window of 500 nodes. Using noise obtained at the initial state of a given realization, the fraction R is close to 0.47, which is shown by the dashed line in figure 6. In the simulation of the same realization, we update node state in the numerical order, i.e. from node i=1 to node i=N. As one node is flipped by updating, the local field of its neighbor is changed. When node i is updated, we compute the fraction R of flipped nodes in nodes from node i−499 to node i. The fraction R of flipped nodes in updated nodes decreases in the updating process as shown by the solid line in figure 6. Because the local field is changed in simulation, the noise term of un-updated nodes becomes larger, so the unstable probability of un-updated nodes becomes smaller, the fraction R of flipped nodes decreases. Therefore, the simulation results deviate from the theoretical results.

Figure 6.

New window|Download| PPT slide
Figure 6.The fraction R of flipped nodes. The fraction R is computed in a moving window of 500 nodes. The dashed line is the results obtained at the initial state. The solid line is the results obtained by comparing the signal and noisy term for each node in the simulation of updating process. The parameters are N=10 000, pattern number n=10 000, and average degree ⟨k⟩=50.

The difference between the local field and the mean-field also induces that simulation results diverge from the theoretical results. Both of these two effects can be weakened by enlarging the network size and the network connection density.

We also simulated different sizes of SF networks with the average degree ⟨k⟩=50. When the network sizes of N=10 000, 20 000 and 40 000 nodes, the final overlap φ⁰ respectively is 0.19, 0.18 and 0.18. With the network size N gets larger, the final ovarlap φ⁰ decrease slightly. Figure 8 shows results in different sizes of networks.

Consider the effect of the finite size of the network and the density of connections on the final overlap separately, we keep the density of connections ⟨k⟩/N is the constant, and checked whether φ⁰ approaches to zero when N gets large. We let ⟨k⟩/N=0.01, when the network sizes of N=5000, 10 000, 20 000, 30 000, 40 000 and 80 000 nodes, the final overlap φ⁰ respectively is 0.202, 0.182, 0.167, 0.161, 0.155 and 0.146. When the density of connections is the constant, the final overlap φ⁰ decreases with network size, as shown in figure 7.

Figure 7.

New window|Download| PPT slide
Figure 7.The final overlap φ⁰ in scale-free networks. The density of connections ⟨k⟩/N=0.01. Parameters are the average degree ⟨k⟩=50,100,200,300,400 and 800. The size of networks respectively is N=5000, 10 000, 20 000, 30 000, 40 000 and 80 000.

We fitted the simulation results of the relation between φ−φ⁰ and n shown in figure 8. When the network sizes of N=10 000, 20 000 and 40 000 nodes, the slope respectively is −0.689±0.003, −0.679±0.005 and −0.659±0.001. As the network sizes increases, the slope close to −0.5.

Figure 8.

New window|Download| PPT slide
Figure 8.The overlap difference φ−φ⁰ versus the number of patterns n on scale-free networks on a log-log scale with different network sizes. Parameters are the average degree ⟨k⟩=50. The size of networks respectively is N=10 000(squares), 20 000(circles) and 40 000(upper triangles). Inset: the overlap φ versus the number of patterns n on scale-free networks on a linear coordination. Each data point is averaged over 1000 realizations.

3.2. ER random networks

For ER random networks [22, 23], the degree of nodes follows a Poisson distribution. The number of flipped node is(24)$\begin{eqnarray}\begin{array}{rcl}{N}_{\mathrm{flip}} & = & \displaystyle \sum _{k}{{NP}}_{\mathrm{ER}}(k)U(k)\\ & = & \displaystyle \sum _{k}N{{\rm{e}}}^{-k}\displaystyle \frac{\langle k{\rangle }^{k}}{k!}\displaystyle \frac{1}{2}\left(1-\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right)\right).\end{array}\end{eqnarray}$The overlap between the output state and the attractor is(25)$\begin{eqnarray}\phi =\displaystyle \sum _{k}{{\rm{e}}}^{-\langle k\rangle }\displaystyle \frac{\langle k{\rangle }^{k}}{k!}\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right).\end{eqnarray}$For $x=\sqrt{\tfrac{k}{2(n-1)}}$ is fairly small, we also use the first term of series to take place the error function. We subtract the series of error function into the overlap, and obtain(26)$\begin{eqnarray}\phi =\displaystyle \sum _{k}{{\rm{e}}}^{-\langle k\rangle }\displaystyle \frac{\langle k{\rangle }^{k}}{k!}\displaystyle \frac{2}{\sqrt{\pi }}{\left(\displaystyle \frac{k}{2(n-1)}\right)}^{\tfrac{1}{2}}.\end{eqnarray}$That is(27)$\begin{eqnarray}\phi =\sqrt{\displaystyle \frac{2}{\pi }}\displaystyle \sum _{k}{{\rm{e}}}^{-\langle k\rangle }\displaystyle \frac{\langle k{\rangle }^{k}}{k!}{k}^{\tfrac{1}{2}}\times {\left(n-1\right)}^{-\tfrac{1}{2}}.\end{eqnarray}$We use ${y}_{1}^{{\prime} }$ to represent the coefficient(28)$\begin{eqnarray}{y}_{1}^{{\prime} }=\sqrt{\displaystyle \frac{2}{\pi }}\displaystyle \sum _{k}{{\rm{e}}}^{-\langle k\rangle }\displaystyle \frac{\langle k{\rangle }^{k}}{k!}{k}^{\tfrac{1}{2}}.\end{eqnarray}$In ER random networks, the coefficient ${y}_{1}^{{\prime} }$ can be numerically computed when the parameter ⟨k⟩ is given. The overlap is(29)$\begin{eqnarray}\phi ={y}_{1}^{{\prime} }\times {\left(n-1\right)}^{-\tfrac{1}{2}}.\end{eqnarray}$When n ≫ 1, an approximate formula reads as(30)$\begin{eqnarray}\phi \simeq {y}_{1}^{{\prime\prime} }\times {n}^{-\tfrac{1}{2}}.\end{eqnarray}$Under the condition that the value of $\sqrt{\tfrac{k}{2(n-1)}}$ is small, the relation between the stability and the number of stored patterns on ER random networks is the same as that on SF networks.

We also compute the relation between the stability and the number of stored patterns on ER random networks using computer simulations. The size of networks is N=10 000. One of the stored patterns is used as the networks initial state. The simulation results are shown in figures 9, 10. The overlap φ varies roughly as (φ−φ⁰)∝n^−τ. As the same as SF network, there is φ⁰ in the relation. With the average degree increases, the final overlap φ⁰ decreases. When the average degree ⟨k⟩=4,10,20 and 50, the final overlap φ⁰ respectively is 0.27, 0.24, 0.22 and 0.19.

Figure 9.

New window|Download| PPT slide
Figure 9.The overlap difference φ−φ⁰ versus the number of patterns n on ER random networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=2 to 6500 are used in fitting. The parameters are N=10 000, average degree ⟨k⟩=4. Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. The final overlap φ⁰ is 0.27. Each data point is the average over 1000 realizations.

Our simulation results show that the power law behavior does not only appears on the SF networks but also on the homogeneous ER random networks. For network with N=10 000, the capacity of random networks is n=2 with ⟨k⟩=4. In figure 9, the power law behavior agrees with the theoretical prediction in the whole parameter region of n. The final overlap φ⁰ is 0.27, as shown in the inset of figure 9. The slope is τ=−0.527±0.004 and is close to the theoretical result.

As the average degree increases, the capacity increases. The capacity of random networks is n=13 with ⟨k⟩=50. The relation also changes from φ=1 to the power law relation when n is larger than the capacity, as seen in figure 10. The final overlap φ⁰ decreased to 0.19. The slope of the right hand side of the curve is −0.687±0.002. According to SNR analysis, as the network become dense, the condition k/n ≪ 1 is not satisfied. Then the relation between φ−φ₀ and n cannot be expressed as a power law function. As the number of stored patterns increases, the neural network satisfies the condition. Then the relation turn into the power law function. But the exponent is smaller than that of sparser networks.

Figure 10.

New window|Download| PPT slide
Figure 10.The overlap difference φ−φ⁰ versus the number of patterns n on ER random networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=40 to 6500 are used in fitting. The parameters are N=10 000, average degree ⟨k⟩=50. Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. The final overlap φ⁰ is 0.19. Each data point is the average over 1000 realizations.

This discrepancy is related to the finite size (i.e. N = 10 000) of the simulated network. We simulate larger ER random networks with the average degree ⟨k⟩=50. The inset of figure 11 shows results without subtracting φ⁰ in different sizes of networks. When the network sizes of N=10 000, 20 000, 40 000, 80 000 and 160 000 nodes, the final overlap φ⁰ respectively is 0.19, 0.19, 0.18, 0.18 and 0.18. Figure 11 shows the results of φ−φ⁰ in different sizes of networks.

Figure 11.

New window|Download| PPT slide
Figure 11.The overlap difference φ−φ⁰ versus the number of patterns n on ER random networks on a log-log scale. Parameters are the average degree ⟨k⟩=50. The size of networks respectively is N=10 000(squares), 20 000(circles), 40 000(upper triangles), 80 000(lower triangles) and 16 000(diamonds). Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. Each data point is averaged over 1000 realizations.

For ER random network, we let ⟨k⟩/N=0.01, when the network sizes of N=5000, 10 000, 20 000, 40 000, 80 000 and 160 000 nodes, the final overlap φ⁰ respectively is 0.207, 0.187, 0.172, 0.159, 0.149 and 0.140. When the density of connection is the constant, the final overlap φ⁰ decreases with network size, which is similar to the results of the SF networks.

We fitted the φ−φ⁰ curve in figure 11. When the network sizes is N=10 000, 20 000, 40 000, 80 000 and 160 000, the slope respectively is −0.687±0.002, −0.670±0.008, −0.563±0.006, −0.555±0.005 and −0.525±0.008. With the network size N gets larger, the exponent close to the theoretical result −0.5.

3.3. Network with randomly degree distribution

The SNR analysis suggests that the appearance of power-law decay of overlap does not depend on the network degree distribution. We use P(k) to represent an arbitrary degree distribution. The number of flipped nodes is(31)$\begin{eqnarray}{N}_{\mathrm{flip}}=\displaystyle \sum _{k}{NP}(k)U(k).\end{eqnarray}$The overlap φ between stable state and initial pattern can be written as(32)$\begin{eqnarray}\phi =\displaystyle \frac{1}{N}(N-2{N}_{\mathrm{flip}}).\end{eqnarray}$When we substitute equation (31) into equation (32), we get(33)$\begin{eqnarray}\phi =1-\displaystyle \sum _{k}P(k)+\displaystyle \sum _{k}P(k)\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right),\end{eqnarray}$where,(34)$\begin{eqnarray}\displaystyle \sum _{k}P(k)=1.\end{eqnarray}$Hence the overlap φ is(35)$\begin{eqnarray}\phi =\displaystyle \sum _{k}P(k)\mathrm{erf}\left(\sqrt{\displaystyle \frac{k}{2(n-1)}}\right).\end{eqnarray}$

For $x=\sqrt{\tfrac{k}{2(n-1)}}$ is fairly small, we also use the expansion equation (16) with only the first term in computing the overlap φ, and obtain(36)$\begin{eqnarray}\phi =\sqrt{\displaystyle \frac{2}{\pi }}\displaystyle \sum _{k}P(k){k}^{\tfrac{1}{2}}{\left(n-1\right)}^{-\tfrac{1}{2}}.\end{eqnarray}$We use ${y}_{1}^{{\prime\prime} }$ to represent the coefficient(37)$\begin{eqnarray}y{{\prime\prime} }_{1}=\sqrt{\displaystyle \frac{2}{\pi }}\displaystyle \sum _{k}P(k){k}^{\tfrac{1}{2}}.\end{eqnarray}$The overlap φ is(38)$\begin{eqnarray}\phi =y{{\prime\prime} }_{1}\times {\left(n-1\right)}^{-\tfrac{1}{2}}.\end{eqnarray}$When n ≫ 1, an approximate formula reads as(39)$\begin{eqnarray}\phi \simeq {y}_{1}^{\prime\prime} \times {n}^{-\tfrac{1}{2}},\end{eqnarray}$under the condition that the value of $\sqrt{\tfrac{k}{2(n-1)}}$ is small.

We consider the network with the distribution that the node degree k follows a randomly generated histogram whose mean value ⟨k⟩ is controlled. The histogram are generated as follows: for a given value of average degree ⟨k⟩, the range of degree k is(40)$\begin{eqnarray}\langle k\rangle -{\rm{\Delta }}\leqslant k\leqslant \langle k\rangle +{\rm{\Delta }},\end{eqnarray}$where 2Δ+1 is the width the range of k. The probability that a node has the degree k is(41)$\begin{eqnarray}P(k)=\displaystyle \frac{{\varepsilon }_{k}}{{\displaystyle \sum }_{k=\langle k\rangle -{\rm{\Delta }}}^{\langle k\rangle +{\rm{\Delta }}}{\varepsilon }_{k}},\end{eqnarray}$where ϵ_k is a random number uniformly distributed in [0, 1].

By computer simulation, we obtain the relation between the overlap and the number of stored patterns for two average degrees ⟨k⟩=25 or 50. Figure 12 shows that overlap turns from a constant to the power law decay as the number of stored patterns increase. For the average degree ⟨k⟩=25, the exponent of the power law decay is −0.658±0.003. As the average degree increases, the capacity of the network increases. The crossover from a constant to the power law decay shifts to right as the average increases from 25 to 50. For the average degree ⟨k⟩=50, the exponent of the power law decay is −0.683±0.004. These results are qualitatively consistent with the above simulation results. Therefore, the SNR analysis results also are applicable to the network with randomly degree distribution. These results suggest that under the condition of sparse connection density, the power law decay can occurs on various kinds of networks.

Figure 12.

New window|Download| PPT slide
Figure 12.The overlap difference φ−φ⁰ versus the number of patterns n on randomly degree distribution networks on a log-log scale for two average degree ⟨k⟩=25(squares) and ⟨k⟩=50(circles). Data points from n=21 to 10 000 are used in fitting(squares). Data points from n=59 to 10 000 are used in fitting(circles). The size of networks is N=10000. Inset: the overlap φ versus the number of patterns n on randomly degree distribution networks on a linear coordination. The final overlap φ⁰ is 0.21(squares) and 0.19(circles). Each data point is averaged over 1000 realizations.

4. Conclusions

We analytically studied the relation between the stability and the number of stored patterns in Hopfield neural networks. Using the SNR method, we analyzed the stability of patterns on SF networks and ER random networks. We show that the stability can be expressed by the error function. Under the condition that the network is sparse and the number of stored patterns is large, the stability can be approximated by a power law function with the exponent −0.5.

We performed computer simulations on both SF networks and ER random networks. In previous work the power law relation between the stability and the number of stored patterns was obtained in SF networks. Here we show that as the network is sparse enough both the SF and random topology can exhibit the power law relation with the exponent close to −0.5, which agrees with the theoretical result. On dense networks, the relation has a power law tail in the simulation result.

There is a difference between analytic results and simulation results. First, the overlap in simulations decay to a constant which decreases with the connection density, while the analytic results of overlap decay to 0. The difference may due to that the noise term in simulation deviates from the exact Gaussian distribution which is used in SNR analysis. Furthermore, node state updates in a serial order, the local field of nodes changes as neighbors flips. The difference manifests itself in finite size networks. Second, the exponent of the power law decay is smaller than −0.5 on dense networks with finite size. As the network size increases, the exponent close to −0.5.

We also numerically check the network with randomly degree distribution besides ER and SF networks. We showed that the power law decay with an exponent closes to −0.5 in the network, which implies the universality of this phenomenon, regardless of the network architecture.

Different from the forgetting [18-20], here the power law behavior is not about time. When consider memory accumulation in neural network, it is natural to assume that memory is steadily filled up [10]. Then the relation between the overlap and the number of stored patterns is potentially helpful for understanding power law forgetting.

Acknowledgments

This work was supported by NSFC (Grant No. 11 675 096), and FPALAB-SNNU (Grant No. 16QNGG007).

Reference By original order
By published year
By cited within times
By Impact factor

[1]

Hopfield

J J

1982 Proc. Natl. Acad. Sci. USA 79 2554

DOI:10.1073/pnas.79.8.2554 [Cited within: 5]

[2]

Amari

S-I

1977 Biol. Cybern. 26 175

DOI:10.1007/BF00365229 [Cited within: 2]

[3]

McGraw

P N

Menzinger

2003 Phys. Rev. E 68 047102

DOI:10.1103/PhysRevE.68.047102 [Cited within: 5]

[4]

Guo

Huang

2003 Phys. Rev. E 67 061902

DOI:10.1103/PhysRevE.67.061902

[5]

Uezu

Hirano

Okada

2004 J. Phys. Soc. Japan 73 867

DOI:10.1143/JPSJ.73.867

[6]

Kim

B J

2004 Phys. Rev. E 69 045101

DOI:10.1103/PhysRevE.69.045101

[7]

Jin

Zhao

2005 Phys. Rev. E 72 066111

DOI:10.1103/PhysRevE.72.066111

[8]

Wang

S J

A C

Z X

X J

Wang

Y H

2007 Phys. Rev. E 75 046113

DOI:10.1103/PhysRevE.75.046113

[9]

Oshima

Odagaki

2007 Phys. Rev. E 76 036114

DOI:10.1103/PhysRevE.76.036114 [Cited within: 1]

[10]

Karandashev

Kryzhanovsky

Litinskii

2012 Phys. Rev. E85 041925

DOI:10.1103/PhysRevE.85.041925 [Cited within: 1]

[11]

Wang

S J

Huang

Z G

X J

Wang

Y H

2013 Eur. Phys. J. B 86 424

DOI:10.1140/epjb/e2013-30960-3 [Cited within: 3]

[12]

Bar-Yam

Epstein

I R

2004 Proc. Natl. Acad. Sci. USA 101 4341

DOI:10.1073/pnas.0400673101 [Cited within: 1]

[13]

Liu

Chen

Yan

Wang

2018 Int. J. Mod. Phys. C 29 1850076

DOI:10.1142/S0129183118500766

[14]

Zhang

Hai

2018 Chin. Phys. B 27 010202

DOI:10.1088/1674-1056/27/1/010202

[15]

Huang

Zhou

Kong

Zhou

Fang

2020 Commun. Theor. Phys. 72 015003

DOI:10.1088/1572-9494/ab5452 [Cited within: 1]

[16]

Stauffer

Aharony

da Fontoura Costa

Adler

2003 Eur. Phys. J. B 32 395

DOI:10.1140/epjb/e2003-00114-7 [Cited within: 7]

[17]

Kello

C T

Brown

G D

Cancho

R F I

Holden

J G

Linkenkaer-Hansen

Rhodes

Orden

G C V

2010 Trends Cogn. Sci. 14 223

DOI:10.1016/j.tics.2010.02.005 [Cited within: 1]

[18]

Luck

J M

Mehta

2014 Phys. Rev. E 90 032709

DOI:10.1103/PhysRevE.90.032709 [Cited within: 3]

[19]

Wixted

J T

Ebbesen

E B

1997 Mem. Cogn. 25 731

DOI:10.3758/BF03211316

[20]

Wixted

J T

Ebbesen

E B

1991 Psychol. Sci. 2 409

DOI:10.1111/j.1467-9280.1991.tb00175.x [Cited within: 2]

[21]

Wang

S J

Yang

2017 Phys. Rev. E 95 012309

DOI:10.1103/PhysRevE.95.012309 [Cited within: 1]

[22]

Albert

Barabási

A L

2002 Rev. Mod. Phys. 74 47

DOI:10.1103/RevModPhys.74.47 [Cited within: 2]

[23]

Boccaletti

Latora

Moreno

Chavez

Hwang

D U

2006 Proc. Phys. Rep. 424 175

DOI:10.1016/j.physrep.2005.10.009 [Cited within: 2]