First author contact:
Received:2020-04-13Revised:2020-11-6Accepted:2020-11-13Online:2021-01-15
Abstract
Keywords:
PDF (974KB)MetadataMetricsRelated articlesExportEndNote|Ris|BibtexFavorite
Cite this article
Fei Fang, Zhou Yang, Sheng-Jun Wang. Power law decay of stored pattern stability in sparse Hopfield neural networks. Communications in Theoretical Physics, 2021, 73(2): 025601- doi:10.1088/1572-9494/abcfb0
1. Introduction
The Hopfield attractor neural network [1, 2] is a well-known Ising-like model of associative memory [3-15]. Trained by the Hebbian rule, a number of patterns can be stored and become attractors of the neural network. As a pattern is presented to the trained attractor neural network, the network state converges to the nearest stored pattern which most closely resembles the input pattern. Therefore Hopfield neural networks can function as a form of associative memory [1]. When only a few of patterns are stored, these patterns are stable attractors in Hopfield neural networks. With the number of patterns increases, the quality of recognition decreases [16]. As a stored pattern is presented to the network, retrieval errors occur because of the crosstalk between stored patterns, that is, the stability of stored patterns decreases [3].The performance of the Hopfield model on scale-free (SF) networks has been studied. In SF networks, the connection degrees, i.e. the number of connections of each node, are heterogeneous. The distribution of degree is power law. The heterogeneous degree affects the behavior of the Hopfield model. For example, retrieval errors are not distributed uniformly among nodes [3]. Especially, the power law relation between the performance of the network and the number of patterns has been found on SF networks [16].
The power law relation has an important influence on the network’s property, that is, the stability of patterns declines slowly as the number of stored patterns increases. Therefore, this is a beneficial feature for the dynamics of associative memory. In fact, power law behaviors pervade neural activities [17] and experiments which has shown that memory exhibits power law forgetting [18-20]. Using a mean-field neural network model with slow plasticity, an elegant analysis has been proposed that neural networks can exhibit power law forgetting, i.e. universal power law relaxation of the mean synaptic strength along critical manifold or at the tricritical point [18]. However, it is still an open question that how the power law behavior emerges in Hopfield neural networks.
In this work we present the mechanism of the power law relation through analytical analysis. We simulate the Hopfield network on SF networks and homogeneous Erdös-Rényi (ER) random networks, and obtain that the overlap obeys power law decay to a constant which depends on network size and connection density. Using signal-to-noise analysis, we obtain that as the network is sparse, the relation between the overlap and the number of stored patterns can be described by a power law function, while the overlap decays to 0.
2. Model
In the Hopfield neural network model [1, 2], node states si=±1,i∈{1,2, …, N} are binary. The dynamical equation of the system is [1, 3]When the number n of stored patterns is small, these patterns are stable attractors of the Hopfield neural network. When a stored pattern or a blurred pattern is presented to the neural network, the output of the Hopfield neural network is the same as the attractor, ${x}_{i}={\xi }_{i}^{\alpha }$. When the number n of stored patterns is large, stored patterns cannot be completely recovered [1]. Some nodes in the network flipped their states si and the network state is different from the attractor. However the network state can be close to one of stored patterns. So the network still recognize stored patterns [16]. To quantify the computational performance of Hopfield attractor neural networks, we employ the stability of stored patterns [3]. A stored pattern is presented to a neural network, then the overlap between the output state and the initial state is computed as the measurement of the stability. The overlap is defined like this:
We use ER random network model and SF network model in simulations. ER random networks are generated as follows: starting with N disconnected nodes, we connect each of possible edges with a probability 0<p<1. The degree distribution is a Poisson distribution [11]
SF networks are generated as follows: starting with m0 fully connected nodes, at each time step, a new node with m≤m0 edges is added by linking the new node j to m different nodes already present in the system. The probability that the new node j is linked to an existing node i is linearly proportional to the actual degree of i:
3. Results
On SF networks, the power law relation between the overlap and the number of stored patterns was obtained using computer simulations in [16]. So we first analyze the stability of patterns on SF networks. Then we also analyze the behavior of the Hopfield neural network model on ER random networks.3.1. SF networks
The relation between the overlap and the number of stored patterns has been derived in [11, 21]. First, we review the formula of overlap on SF networks.If Hopfield neural networks consist of N nodes and memorized n patterns, the condition for a pattern ${s}_{i}^{\alpha }$ to be stable can be expressed by N inequalities, one for each node i:
The stability of stored patterns in networks can be predicted by the probability U(k). For the network which has converged into the stable state, the overlap order parameter can be rewritten as $\phi =\tfrac{1}{N}(N-2{N}_{\mathrm{flip}})$, Nflip is the number of flipped nodes [11]. For SF networks, assuming a sharp cutoff of the distribution at low k, the distribution is normalized as [12]
The number of flipped nodes is
Based on the result of overlap, we take further analytic treatment to interpret the power law relation between the overlap and the number of patterns. The power series expansion of error function has the form
For very a small average degree k and a large number of patterns n, $x=\sqrt{\tfrac{k}{2(n-1)}}$ is small. We use the expansion with only the first term in computing the overlap on SF networks
Then
Therefore, we obtain that the relation between the overlap and the number of stored patterns follows a power law function when the network is sparse and the number of stored patterns is large. Theoretical analysis shows that the exponent is −1/2.
It was obtained in simulations in the [16] that the overlap φ varies roughly as (φ−φ0)∝n−τ. Following the [16], we measure the overlap difference (φ−φ0) for SF networks in simulations. The network is built using the SF network model [22, 23]. The size of networks is N=10 000. One of the stored patterns is used as the network’s initial state. Here we recover the relation reported in [16]. The simulation results are shown in figures 1, 2. With the average degree increases, the value of φ0 decrease. When the average degree 〈k〉=4,10,20 and 50, the final overlap φ0 respectively is 0.24, 0.23, 0.21 and 0.19. In figure 1, for networks with 〈k〉=4, the relation between the overlap difference and the number of patterns is power law. By fitting the simulation results into a power law function, we obtain the exponent of the relation is −0.541±0.001, which is close to the theoretical results of the exponent −1/2.
Figure 1.
New window|Download| PPT slideFigure 1.The overlap difference φ−φ0 versus the number of patterns n on scale-free networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=2 to 2500 are used in fitting. The parameters are N=10 000, average degree 〈k〉=4. Inset: the overlap φ versus the number of patterns n on scale-free networks on a log-log scale. The final overlap φ0 is 0.24. Each data point is the average over 1000 realizations.
The limit of the number of patterns that can be stored with small error is called the storage capacity [9]. We compute the storage capacity using the threshold of overlap 0.95. Below the capacity, the relation between overlap and the number of patterns cannot be a power decay. For network with N=10 000, the capacity of SF networks is n=2 with 〈k〉=4. Therefore the whole curve shown in the figure 1 can exhibit a power law decay. The value of φ0 is 0.24.
When the average degree of the network is large, the capacity becomes large, the relation is not power law for small values of the number of patterns. However, our analysis shows that as the the ratio k/2(n−1) becomes small, the relation turns to a power law. The results of networks with the average degree 〈k〉=50 are shown in figure 2. The capacity of SF networks is n=10 with 〈k〉=50. As the number of patterns is larger than 10, the relation changes into a power law. For networks with 〈k〉=50, the final overlap φ0 decreased to 0.19, as shown in the inset of figure 2. The exponent is −0.689±0.003.
Figure 2.
New window|Download| PPT slideFigure 2.The overlap difference φ−φ0 versus the number of patterns n on scale-free networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=40 to 2500 are used in fitting. The parameters are N=10 000, average degree 〈k〉=50. Inset: the overlap φ versus the number of patterns n on scale-free networks on a linear coordination. The final overlap φ0 is 0.19. Each data point is the average over 1000 realizations.
The change of the relation can be explained by the signal-to-noise ratio (SNR) analysis. The power law relation is based on the expansion of the error function under the condition that the connection degree is relatively small compared with the number of patterns k/n ≪ 1. As the network become dense, the condition k/n ≪ 1 is not satisfied. The relation φ−φ0 versus n cannot be expressed as a power law function, as seen in figure 2. As the number of stored patterns increases, the neural network satisfies the condition, then the relation turn into a power law function.
There is a difference between the above simulation and the analytic results, as shown in the inset of figure 1. In analytic results, the overlap decays to 0, i.e. φ∝n−0.5. The φ0 cannot be obtained in the analysis. The difference between simulation and analytic results may due to two facts. Firstly, in the network with finite size, the distribution of the noise term of nodes deviates from the exact Gaussian distribution. We calculate the distribution of the noise term of nodes in a realization of the SF network, as shown in figure 3. However, the distribution of the noise term of nodes is not the exact Gaussian distribution. In the region Tnoise<−〈k〉 the probability obtained by simulation is lower than the analytic results. The extreme values of the noise term appear rarely. We compute the unstable probability of nodes using both simulations and analysis and shown them in figure 4. In a network with finite size, the difference from the exact Gaussian distribution makes the unstable probability smaller.
Figure 3.
New window|Download| PPT slideFigure 3.The distribution of the noise term Tnoise in scale-free networks. The dashed line is the exact Gaussian distribution in the analytic results. The curve is the numerical simulation results. The parameters are N=10 000, pattern number n=100, and average degree 〈k〉=50.
Figure 4.
New window|Download| PPT slideFigure 4.The unstable probability function U(k) versus the degree k in scale-free networks. The red dashed line is the analytic results. The black solid line is the numerical simulation results. The parameters are N=10 000, pattern number n=100, average degree 〈k〉=4. Each data point is 10 000 realizations.
Secondly, the simulation is performed in the way that nodes are updated in a serial order. As one node is flipped by updating, the local field of its neighbor is changed. We use the initial network state to calculate the distribution of the noise term Tnoise of nodes in SF networks. For a given network realization, nodes i=5001, ..., 10 000 are used in computing the distribution. The distribution is shown by the dashed line in figure 5. The center of the curve is at Tnoise=0. In simulation, we update a part of nodes which are i=1, …, 5000. Then we use the nodes i=5001, ..., 10 000 to compute the distribution again. The results is shown by the solid line in figure 5. The distribution of the noise term Tnoise has shifted to the right. The noise term of un-updated nodes becomes larger, so the unstable probability becomes smaller.
Figure 5.
New window|Download| PPT slideFigure 5.The distribution of the noise term Tnoise of nodes in a realization of the scale-free network. The dashed line is the results obtained at the initial state. The solid line is the results obtained when 5000 nodes are updated. The parameters are N=10 000, pattern number n=10 000, and average degree 〈k〉=50. The nodes i=5001, ..., 10 000 are used in computing the two results.
We provide an understanding of the change of noise. For node A which flips, the noise is negative, like:
Moreover, we compare the fraction R of flipped nodes predicted by the analysis and the simulation. The fraction R of flipped nodes is computed using a moving window of 500 nodes. Using noise obtained at the initial state of a given realization, the fraction R is close to 0.47, which is shown by the dashed line in figure 6. In the simulation of the same realization, we update node state in the numerical order, i.e. from node i=1 to node i=N. As one node is flipped by updating, the local field of its neighbor is changed. When node i is updated, we compute the fraction R of flipped nodes in nodes from node i−499 to node i. The fraction R of flipped nodes in updated nodes decreases in the updating process as shown by the solid line in figure 6. Because the local field is changed in simulation, the noise term of un-updated nodes becomes larger, so the unstable probability of un-updated nodes becomes smaller, the fraction R of flipped nodes decreases. Therefore, the simulation results deviate from the theoretical results.
Figure 6.
New window|Download| PPT slideFigure 6.The fraction R of flipped nodes. The fraction R is computed in a moving window of 500 nodes. The dashed line is the results obtained at the initial state. The solid line is the results obtained by comparing the signal and noisy term for each node in the simulation of updating process. The parameters are N=10 000, pattern number n=10 000, and average degree 〈k〉=50.
The difference between the local field and the mean-field also induces that simulation results diverge from the theoretical results. Both of these two effects can be weakened by enlarging the network size and the network connection density.
We also simulated different sizes of SF networks with the average degree 〈k〉=50. When the network sizes of N=10 000, 20 000 and 40 000 nodes, the final overlap φ0 respectively is 0.19, 0.18 and 0.18. With the network size N gets larger, the final ovarlap φ0 decrease slightly. Figure 8 shows results in different sizes of networks.
Consider the effect of the finite size of the network and the density of connections on the final overlap separately, we keep the density of connections 〈k〉/N is the constant, and checked whether φ0 approaches to zero when N gets large. We let 〈k〉/N=0.01, when the network sizes of N=5000, 10 000, 20 000, 30 000, 40 000 and 80 000 nodes, the final overlap φ0 respectively is 0.202, 0.182, 0.167, 0.161, 0.155 and 0.146. When the density of connections is the constant, the final overlap φ0 decreases with network size, as shown in figure 7.
Figure 7.
New window|Download| PPT slideFigure 7.The final overlap φ0 in scale-free networks. The density of connections 〈k〉/N=0.01. Parameters are the average degree 〈k〉=50,100,200,300,400 and 800. The size of networks respectively is N=5000, 10 000, 20 000, 30 000, 40 000 and 80 000.
We fitted the simulation results of the relation between φ−φ0 and n shown in figure 8. When the network sizes of N=10 000, 20 000 and 40 000 nodes, the slope respectively is −0.689±0.003, −0.679±0.005 and −0.659±0.001. As the network sizes increases, the slope close to −0.5.
Figure 8.
New window|Download| PPT slideFigure 8.The overlap difference φ−φ0 versus the number of patterns n on scale-free networks on a log-log scale with different network sizes. Parameters are the average degree 〈k〉=50. The size of networks respectively is N=10 000(squares), 20 000(circles) and 40 000(upper triangles). Inset: the overlap φ versus the number of patterns n on scale-free networks on a linear coordination. Each data point is averaged over 1000 realizations.
3.2. ER random networks
For ER random networks [22, 23], the degree of nodes follows a Poisson distribution. The number of flipped node isWe also compute the relation between the stability and the number of stored patterns on ER random networks using computer simulations. The size of networks is N=10 000. One of the stored patterns is used as the networks initial state. The simulation results are shown in figures 9, 10. The overlap φ varies roughly as (φ−φ0)∝n−τ. As the same as SF network, there is φ0 in the relation. With the average degree increases, the final overlap φ0 decreases. When the average degree 〈k〉=4,10,20 and 50, the final overlap φ0 respectively is 0.27, 0.24, 0.22 and 0.19.
Figure 9.
New window|Download| PPT slideFigure 9.The overlap difference φ−φ0 versus the number of patterns n on ER random networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=2 to 6500 are used in fitting. The parameters are N=10 000, average degree 〈k〉=4. Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. The final overlap φ0 is 0.27. Each data point is the average over 1000 realizations.
Our simulation results show that the power law behavior does not only appears on the SF networks but also on the homogeneous ER random networks. For network with N=10 000, the capacity of random networks is n=2 with 〈k〉=4. In figure 9, the power law behavior agrees with the theoretical prediction in the whole parameter region of n. The final overlap φ0 is 0.27, as shown in the inset of figure 9. The slope is τ=−0.527±0.004 and is close to the theoretical result.
As the average degree increases, the capacity increases. The capacity of random networks is n=13 with 〈k〉=50. The relation also changes from φ=1 to the power law relation when n is larger than the capacity, as seen in figure 10. The final overlap φ0 decreased to 0.19. The slope of the right hand side of the curve is −0.687±0.002. According to SNR analysis, as the network become dense, the condition k/n ≪ 1 is not satisfied. Then the relation between φ−φ0 and n cannot be expressed as a power law function. As the number of stored patterns increases, the neural network satisfies the condition. Then the relation turn into the power law function. But the exponent is smaller than that of sparser networks.
Figure 10.
New window|Download| PPT slideFigure 10.The overlap difference φ−φ0 versus the number of patterns n on ER random networks on a log-log scale. The blue dashed line is the analytic results. Squares represent the simulation results. The red solid line is the fitted line. Data points from n=40 to 6500 are used in fitting. The parameters are N=10 000, average degree 〈k〉=50. Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. The final overlap φ0 is 0.19. Each data point is the average over 1000 realizations.
This discrepancy is related to the finite size (i.e. N = 10 000) of the simulated network. We simulate larger ER random networks with the average degree 〈k〉=50. The inset of figure 11 shows results without subtracting φ0 in different sizes of networks. When the network sizes of N=10 000, 20 000, 40 000, 80 000 and 160 000 nodes, the final overlap φ0 respectively is 0.19, 0.19, 0.18, 0.18 and 0.18. Figure 11 shows the results of φ−φ0 in different sizes of networks.
Figure 11.
New window|Download| PPT slideFigure 11.The overlap difference φ−φ0 versus the number of patterns n on ER random networks on a log-log scale. Parameters are the average degree 〈k〉=50. The size of networks respectively is N=10 000(squares), 20 000(circles), 40 000(upper triangles), 80 000(lower triangles) and 16 000(diamonds). Inset: the overlap φ versus the number of patterns n on random networks on a linear coordination. Each data point is averaged over 1000 realizations.
For ER random network, we let 〈k〉/N=0.01, when the network sizes of N=5000, 10 000, 20 000, 40 000, 80 000 and 160 000 nodes, the final overlap φ0 respectively is 0.207, 0.187, 0.172, 0.159, 0.149 and 0.140. When the density of connection is the constant, the final overlap φ0 decreases with network size, which is similar to the results of the SF networks.
We fitted the φ−φ0 curve in figure 11. When the network sizes is N=10 000, 20 000, 40 000, 80 000 and 160 000, the slope respectively is −0.687±0.002, −0.670±0.008, −0.563±0.006, −0.555±0.005 and −0.525±0.008. With the network size N gets larger, the exponent close to the theoretical result −0.5.
3.3. Network with randomly degree distribution
The SNR analysis suggests that the appearance of power-law decay of overlap does not depend on the network degree distribution. We use P(k) to represent an arbitrary degree distribution. The number of flipped nodes isFor $x=\sqrt{\tfrac{k}{2(n-1)}}$ is fairly small, we also use the expansion equation (
We consider the network with the distribution that the node degree k follows a randomly generated histogram whose mean value 〈k〉 is controlled. The histogram are generated as follows: for a given value of average degree 〈k〉, the range of degree k is
By computer simulation, we obtain the relation between the overlap and the number of stored patterns for two average degrees 〈k〉=25 or 50. Figure 12 shows that overlap turns from a constant to the power law decay as the number of stored patterns increase. For the average degree 〈k〉=25, the exponent of the power law decay is −0.658±0.003. As the average degree increases, the capacity of the network increases. The crossover from a constant to the power law decay shifts to right as the average increases from 25 to 50. For the average degree 〈k〉=50, the exponent of the power law decay is −0.683±0.004. These results are qualitatively consistent with the above simulation results. Therefore, the SNR analysis results also are applicable to the network with randomly degree distribution. These results suggest that under the condition of sparse connection density, the power law decay can occurs on various kinds of networks.
Figure 12.
New window|Download| PPT slideFigure 12.The overlap difference φ−φ0 versus the number of patterns n on randomly degree distribution networks on a log-log scale for two average degree 〈k〉=25(squares) and 〈k〉=50(circles). Data points from n=21 to 10 000 are used in fitting(squares). Data points from n=59 to 10 000 are used in fitting(circles). The size of networks is N=10000. Inset: the overlap φ versus the number of patterns n on randomly degree distribution networks on a linear coordination. The final overlap φ0 is 0.21(squares) and 0.19(circles). Each data point is averaged over 1000 realizations.
4. Conclusions
We analytically studied the relation between the stability and the number of stored patterns in Hopfield neural networks. Using the SNR method, we analyzed the stability of patterns on SF networks and ER random networks. We show that the stability can be expressed by the error function. Under the condition that the network is sparse and the number of stored patterns is large, the stability can be approximated by a power law function with the exponent −0.5.We performed computer simulations on both SF networks and ER random networks. In previous work the power law relation between the stability and the number of stored patterns was obtained in SF networks. Here we show that as the network is sparse enough both the SF and random topology can exhibit the power law relation with the exponent close to −0.5, which agrees with the theoretical result. On dense networks, the relation has a power law tail in the simulation result.
There is a difference between analytic results and simulation results. First, the overlap in simulations decay to a constant which decreases with the connection density, while the analytic results of overlap decay to 0. The difference may due to that the noise term in simulation deviates from the exact Gaussian distribution which is used in SNR analysis. Furthermore, node state updates in a serial order, the local field of nodes changes as neighbors flips. The difference manifests itself in finite size networks. Second, the exponent of the power law decay is smaller than −0.5 on dense networks with finite size. As the network size increases, the exponent close to −0.5.
We also numerically check the network with randomly degree distribution besides ER and SF networks. We showed that the power law decay with an exponent closes to −0.5 in the network, which implies the universality of this phenomenon, regardless of the network architecture.
Different from the forgetting [18-20], here the power law behavior is not about time. When consider memory accumulation in neural network, it is natural to assume that memory is steadily filled up [10]. Then the relation between the overlap and the number of stored patterns is potentially helpful for understanding power law forgetting.
Acknowledgments
This work was supported by NSFC (Grant No. 11 675 096), and FPALAB-SNNU (Grant No. 16QNGG007).Reference By original order
By published year
By cited within times
By Impact factor
DOI:10.1073/pnas.79.8.2554 [Cited within: 5]
DOI:10.1007/BF00365229 [Cited within: 2]
DOI:10.1103/PhysRevE.68.047102 [Cited within: 5]
DOI:10.1103/PhysRevE.67.061902
DOI:10.1143/JPSJ.73.867
DOI:10.1103/PhysRevE.69.045101
DOI:10.1103/PhysRevE.72.066111
DOI:10.1103/PhysRevE.75.046113
DOI:10.1103/PhysRevE.76.036114 [Cited within: 1]
DOI:10.1103/PhysRevE.85.041925 [Cited within: 1]
DOI:10.1140/epjb/e2013-30960-3 [Cited within: 3]
DOI:10.1073/pnas.0400673101 [Cited within: 1]
DOI:10.1142/S0129183118500766
DOI:10.1088/1674-1056/27/1/010202
DOI:10.1088/1572-9494/ab5452 [Cited within: 1]
DOI:10.1140/epjb/e2003-00114-7 [Cited within: 7]
DOI:10.1016/j.tics.2010.02.005 [Cited within: 1]
DOI:10.1103/PhysRevE.90.032709 [Cited within: 3]
DOI:10.3758/BF03211316
DOI:10.1111/j.1467-9280.1991.tb00175.x [Cited within: 2]
DOI:10.1103/PhysRevE.95.012309 [Cited within: 1]
DOI:10.1103/RevModPhys.74.47 [Cited within: 2]
DOI:10.1016/j.physrep.2005.10.009 [Cited within: 2]