International Conference on Machine Learning 2026 (ICML 2026)
Abstract: Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.
Given an undirected graph \( \mathcal{G}=(V,E) \) and given a function from nodes to features \( s \colon V\to\mathbb{C}^d \), the Rayleigh quotient is defined as:
\[ R_{\mathcal{G}}(\mathbf{X}) = \frac{1}{2}\frac{{\sum_{(u, v) \in E}} \left\| \frac{s(u)}{\sqrt{d_u}} - \frac{s(v)}{\sqrt{d_v}}\right\|^2}{{\sum_{w \in V}} || s(w) ||^2} \]
Intuitively, the Rayleigh quotient measures the mean difference in node features for adjacent nodes. A graph with identical degree-weighted node features has a Rayleigh quotient of zero. The smoother the signal defined over the graph, the lower the Rayleigh quotient.
The Rayleigh quotient can also be defined in matrix form:
\[ R_{\mathcal{G}}(\mathbf{X}) = \frac{\text{Tr}{(\mathbf{X^\dagger(I - \tilde{A})X})}}{||\mathbf{X}||^2} \]
where \( \mathbf{A} \) is the adjacency matrix, \( \mathbf{D} \) is a diagonal matrix with the node degrees on the diagonal. Denote by \( \tilde{\mathbf{A}}= \mathbf{D}^{-1/2}\mathbf{AD}^{-1/2} \) the normalized adjacency matrix and \( \mathbf{X} \in \mathbb{C}^{n \times d} \) a matrix with the \( i \)-th row set to feature vector \( s(i) \).
Unitary Convolution [1] preserves the Rayleigh quotient! There are two unitary convolution variants, separable unitary convolution:
\[ f_{\rm UniConv}^{\mathrm{sep}}(\mathbf{X; A}) = \mathbf{\exp(iAt)XU}, \quad \mathbf{U^\dagger U =I} \]
and Lie unitary convolution
\[ f_{\rm UniConv}^{\mathrm{Lie}}(\mathbf{X;A}) = \mathbf{\exp(AXW)}, \quad \mathbf{W = - W^\dagger}. \]
[1] proves the following proposition:
Proposition: Unitary Convolution Preserves Smoothness
Both Lie or separable unitary convolutions preserves the Rayleigh quotient: \( R_{\mathcal{G}}(\mathbf{X}) = R_{\mathcal{G}}(f(\mathbf{X})) \)
Constraining our models to always preserve smoothness using unitary convolutions is an assumption that is sometimes overconstraining. We prove this theoretically and show this empirically via a motivating experiment.
In this experiment, we are looking to model heat diffusion over a grid graph. In this particular task, the signals defined on the graphs actually become smoother over time. We benchmark three models:
Our results show that the strictly unitary framework is overconstraining. It is unable to adapt to the smoothness (measured by the Rayleigh quotient) of the true dynamics. Similarly we show that a vanilla graph convolutional neural network does indeed oversmooth. Our relaxed unitary convolution network outperforms both the overconstrained and unconstrained counterparts.
To understand when unitarity is overconstraining, we derive an approximation error bound for unitary functions. We see that unitarity is especially overconstraining (and relaxations are the most useful) when the norm of the true function has a high angular variance.
Theorem: Approximation bound for unitary functions
Let \( F \) be a fundamental domain of \( \mathrm{SU}(n) \) in \( Z \), e.g. \( F = \{ t e \colon t \in \mathbb{R}_+\} \) where \( e \) is a standard basis vector of \( \mathbb{C}^n \). Let \(u \) be a unitary function. The approximation error of \( u \) of \( f \) has lower bound:
\[ \int_{Z}p(z)\lVert u(z) - f(z)\rVert_2^2dz \geq \int_{F} p(\lVert te \rVert) \mathbb{V}_{Gz}[\rVert f\lVert]dz. \]
A simple way to relax unitary convolutions is early truncation of the Taylor series defining the exponential map. If the expected energy dissipation of the target PDE is known, then Taylor’s theorem can be used as a model selection criterion to select the appropriate \( \mathbf{T_{\rm max}} \) required to match smoothness without finetuning or grid search! This is the method that we used for our motivating experiment.
To make our insights on relaxed unitary convolutions more practical to real engineering domains, we generalize unitary convolutions, relaxed unitary convolutions, and the Rayleigh quotient from graphs to meshes. We also provide a second unitary convolution relaxation strategy that trades off fine-grained control for scalability.
To generalize the framework from graphs to meshes, we replace the standard degree normalized Laplacian in the Rayleigh quotient with the symmetric cotangent Laplacian as follows:
\[ \mathcal{W}_{ij} = \begin{cases} \frac{1}{2}\left(\cot \alpha_{ij} + \cot \beta_{ij}\right), & j \in \mathcal{N}(i)\\ -\displaystyle\sum_{k \in \mathcal{N}(i)} \mathcal{W}_{ik}, & i = j\\ 0, & \text{Otherwise.} \end{cases} \]
\[ R_{\mathcal{M}}(\mathbf{X}) = \frac{1}{2}\frac{\displaystyle\sum_{(u, v) \in \mathcal{E}} \mathcal{W}_{uv}\left\| \frac{s(u)}{\sqrt{d_u}} - \frac{s(v)}{\sqrt{d_v}}\right\|^2}{\displaystyle\sum_{w \in V} \left\| s(w) \right\|^2} = \frac{\text{Tr}(\mathbf{X}^\dagger \tilde{L} \mathbf{X})}{\|\mathbf{X}\|^2_F} \]
This definition of the Rayleigh quotient takes into account the curvature of the underlying manifold in the edge weights. We define unitary mesh convolutions using the same edge weights in the adjacency matrix:
\[ f_{\rm UniMeshConv}^{\mathrm{Sep}}(\mathbf{X;A}, \mathcal{W}) = \exp(i\, \tilde{\mathbf{A}})\mathbf{X U} \]
\[ f_{\rm UniMeshConv}^{\mathrm{Lie}}(\mathbf{X;A}, \mathcal{W}) = \exp(\tilde{\mathbf{A}}\, \mathbf{X W}) \]
For a sufficiently “nice” mesh (a Delaunay mesh), we prove that unitary mesh convolution preserves the mesh Rayleigh quotient:
Corollary: Unitary mesh convolution preserves smoothness
Given a mesh \( \mathcal{M} \) with normalized adjacency matrix \( \mathbf{\tilde{A}} = \mathbf{D^{-1/2}}(\mathcal{W} \odot \mathbf{A})\mathbf{D^{-1/2}} \), the mesh Rayleigh quotient is invariant under normalized unitary or orthogonal graph convolution, i.e. \( R_{\mathcal{M}} (\mathbf{X}) = R_\mathcal{M} (f_{\rm UniMeshConv} (\mathbf{X})) \) where \( f_{\rm UniMeshConv} \) is either separable or Lie.
We next create an architecture by coupling these convolution layers with an unconstrained readout. Crucially, we use zero padding to increase the node feature dimension instead of using a learned embedding. Since unitary convolutions are constrained to preserve the feature dimension, this step is necessary to increase the number of parameters without making the network deeper. Zero padding also is useful because it preserves smoothness. We confirm the importance of this through ablation.
Our method is competitive with and exceeds strong baselines, especially for diffusive dynamics!
@misc{berman2026smoothnesserrorsdynamicsmodels,
title={Smoothness Errors in Dynamics Models and How to Avoid Them},
author={Edward Berman and Luisa Li and Jung Yeon Park and Robin Walters},
year={2026},
eprint={2602.05352},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.05352},
}
come see us at the ICML poster session on july 9th, 2:30 PM – 4:15 PM, hall A, poster 63683