Let \(\Theta = \mathbb{R}^p \) and \( \mathcal{S} = \mathbb{R}^n \), and let \(E: \Theta \times \mathcal{S} \to \mathbb{R} \) and \(C: \mathcal{S} \to \mathbb{R} \) be two scalar functions. Consider the problem of minimizing \begin{equation} {\mathcal L}(\theta) := C(s(\theta)) \end{equation} with respect to \(\theta \in \Theta\), subject to the constraint that \begin{equation} s(\theta) := \underset{s \in \mathcal{S}}{\arg\min} \, E(\theta,s). \end{equation} We wish to optimize \({\mathcal L}(\theta)\) by gradient descent. Equilibrium propagation (EP) achieves this by defining a family of equilibrium states \(s_\theta^\beta\) parameterized by \(\theta \in \Theta \) and a scalar \(\beta \in \mathbb{R} \), \begin{equation} s_\theta^\beta = \underset{s \in \mathcal{S}}{\arg\min} \left[ E(\theta,s) + \beta \, C(s) \right]. \end{equation} Assuming that the functions \( E \) and \( C \) are continuously differentiable, given \( \theta \in \Theta \) fixed, there exists a continuous mapping \( \beta \mapsto s_\theta^\beta \) satisfying \( s_\theta^0 = s(\theta) \) and the above equation for any \(\beta \in \mathbb{R}\). The EP formula relates the gradient of the loss to the partial derivatives of the function \(E\): \begin{equation} \nabla_\theta \mathcal{L} (\theta) = \left. \frac{d}{d\beta} \right|_{\beta=0} \partial_\theta E \left( \theta,s_\theta^\beta \right). \end{equation} In this expression, \( \partial_\theta E(\theta,s) \) denotes the partial derivative of the function \( E(\theta,s) \) with respect to its first argument, and \( \left. \frac{d}{d\beta} \right|_{\beta=0} \) denotes the (total) derivative with respect to \( \beta \) (through \( s_\theta^\beta \) ) at the point \( \beta=0 \).
Proof of the EP formula.
As a consequence of this identity, the EP algorithm performs (or approximates) gradient descent on the loss.