Functional Gradient Descent through Directional Derivatives
Resumen
Many problems in science and engineering can be understood as a risk minimization procedure over a suitable linear space, such as regression tasks in statistics, or finding solutions to partial differential equations and inverse problems. See [1] for other examples. The natural spaces where the solutions to these problems live are often infinite dimensional, which leads to tractability problems. In response, many solution approaches involve, in one way or another, a reformulation within a parametric, finite-dimensional setting. In boundary value problems for PDEs, for example, finite element methods use a weak formulation of the PDE over a suitable discretization of the domain to arrive at a finite-dimensional linear system. Even more modern ideas, such as the Physics Informed Neural Networks (PINNs) put forward in [2], involve representing the solution to the PDE through parametric function in the form of a neural network, although the number of parameters can be quite high. In this work, we wish to directly tackle the problem of risk minimization in an infinite dimensional space, and our strategy will be to perform (stochastic) gradient descent within this space. The main difficulty comes from the fact that, in our problems of interest, it is not possible to compute the (stochastic) gradient exactly, so we must employ approximation strategies. Let (H, ⟨·, ·⟩) be a Hilbert space. The problem we want to solve is minh∈H R(h), where R is a risk functional R : H → ℝ which we assume to be Fréchet differentiable with its derivative being denoted by DR : H → H*. Note that, for h ∈ H, DR(h) : H → ℝ is a continuous linear functional on H and, by the Riesz Representation Theorem, admits a representation as the inner product with a member of H, which we call the gradient of R at h and denote by ∇R(h). Our intent is to tackle problem (1) through gradient descent, however, as mentioned before, we are not able to exactly compute (stochastic) gradients. The object which we can compute is the directional derivative of R at h ∈ H in any direction v ∈ H, which we denote by DR(h)(v). Fortunately, this is enough to obtain useful gradient estimators: [...]
Descargas
Citas
Y. R. Fonseca and Y. F. Saporito. Statistical Learning and Inverse Problems: A Stochastic Gradient Approach. 2022. arXiv: 2209.14967 [stat.ML].
M. Raissi, P. Perdikaris, and G. Karniadakis. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. 2017. arXiv: 1711.10561 [cs.AI].