- Probability, valuations, hyperspace: Three monads on Top and the support as a morphism We consider three monads on Top, the category of topological spaces, which formalize topological aspects of probability and possibility in categorical terms. The first one is the Hoare hyperspace monad H, which assigns to every space its space of closed subsets equipped with the lower Vietoris topology. The second is the monad V of continuous valuations, also known as the extended probabilistic powerdomain. We construct both monads in a unified way in terms of double dualization. This reveals a close analogy between them, and allows us to prove that the operation of taking the support of a continuous valuation is a morphism of monads from V to H. In particular, this implies that every H-algebra (topological complete semilattice) is also a V-algebra. Third, we show that V can be restricted to a submonad of tau-smooth probability measures on Top. By composing these two morphisms of monads, we obtain that taking the support of a tau-smooth probability measure is also a morphism of monads. 3 authors · Oct 8, 2019
- On Characterizing the Capacity of Neural Networks using Algebraic Topology The learnability of different neural architectures can be characterized directly by computable measures of data complexity. In this paper, we reframe the problem of architecture selection as understanding how data determines the most expressive and generalizable architectures suited to that data, beyond inductive bias. After suggesting algebraic topology as a measure for data complexity, we show that the power of a network to express the topological complexity of a dataset in its decision region is a strictly limiting factor in its ability to generalize. We then provide the first empirical characterization of the topological capacity of neural networks. Our empirical analysis shows that at every level of dataset complexity, neural networks exhibit topological phase transitions. This observation allowed us to connect existing theory to empirically driven conjectures on the choice of architectures for fully-connected neural networks. 2 authors · Feb 12, 2018
- The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Self-predictive unsupervised learning methods such as BYOL or SimSiam have shown impressive results, and counter-intuitively, do not collapse to trivial representations. In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning. We start with the observation that those methods crucially rely on the presence of a predictor network (and stop-gradient). With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works. In addition, this framework demonstrates the crucial role of the exponential moving average and stop-gradient operator in BYOL as an efficient orthonormalization mechanism. We use these insights to propose four new closed-form predictor variants of BYOL to support our analysis. Our closed-form predictors outperform standard linear trainable predictor BYOL at 100 and 300 epochs (top-1 linear accuracy on ImageNet). 6 authors · Feb 9, 2023
- Private Frequency Estimation Via Residue Number Systems We present ModularSubsetSelection (MSS), a new algorithm for locally differentially private (LDP) frequency estimation. Given a universe of size k and n users, our varepsilon-LDP mechanism encodes each input via a Residue Number System (RNS) over ell pairwise-coprime moduli m_0, ldots, m_{ell-1}, and reports a randomly chosen index j in [ell] along with the perturbed residue using the statistically optimal SubsetSelection (SS) (Wang et al. 2016). This design reduces the user communication cost from Θbigl(ωlog_2(k/ω)bigr) bits required by standard SS (with ωapprox k/(e^varepsilon+1)) down to lceil log_2 ell rceil + lceil log_2 m_j rceil bits, where m_j < k. Server-side decoding runs in Θ(n + r k ell) time, where r is the number of LSMR (Fong and Saunders 2011) iterations. In practice, with well-conditioned moduli (i.e., constant r and ell = Θ(log k)), this becomes Θ(n + k log k). We prove that MSS achieves worst-case MSE within a constant factor of state-of-the-art protocols such as SS and ProjectiveGeometryResponse (PGR) (Feldman et al. 2022) while avoiding the algebraic prerequisites and dynamic-programming decoder required by PGR. Empirically, MSS matches the estimation accuracy of SS, PGR, and RAPPOR (Erlingsson, Pihur, and Korolova 2014) across realistic (k, varepsilon) settings, while offering faster decoding than PGR and shorter user messages than SS. Lastly, by sampling from multiple moduli and reporting only a single perturbed residue, MSS achieves the lowest reconstruction-attack success rate among all evaluated LDP protocols. 1 authors · Nov 14
- Green functions of Energized complexes If h is a ring-valued function on a simplicial complex G we can define two matrices L and g, where the matrix entries are the h energy of homoclinic intersections. We know that the sum over all h values on G is equal to the sum of the Green matrix entries g(x,y). We also have already seen that that the determinants of L or g are both the product of the h(x). In the case where h(x) is the parity of dimension, the sum of the energy values was the standard Euler characteristic and the determinant was a unit. If h(x) was the unit in the ring then L,g are integral quadratic forms which are isospectral and inverse matrices of each other. We prove here that the quadratic energy expression summing over all pairs h(x)^* h(y) of intersecting sets is a signed sum of squares of Green function entries. The quadratic energy expression is Wu characteristic in the case when h is dimension parity. For general h, the quadratic energy expression resembles an Ising Heisenberg type interaction. The conjugate of g is the inverse of L if h takes unit values in a normed ring or in the group of unitary operators in an operator algebra. 1 authors · Oct 18, 2020
- Analytic Solution for the Helicity Evolution Equations at Small $x$ and Large $N_c\&N_f$ We construct an exact analytic solution of the revised small-x helicity evolution equations, where the contributions of the quark-to-gluon and gluon-to-quark transition operators were newly included. These evolution equations are written in the large-N_c&N_f limit and are double-logarithmic, resumming powers of alpha_sln^2(1/x). Here N_c and N_f are the numbers of quark colors and flavors, while alpha_s is the strong coupling constant and x is the Bjorken-x variable. Using our solution, we obtain analytic expressions for the flavor singlet quark and gluon helicity parton distribution functions (PDFs) and for the g_1 structure function as double-inverse Laplace transforms. We also extract analytic expressions for the four DGLAP polarized anomalous dimensions Delta gamma_{qq}, Delta gamma_{qG}, Delta gamma_{Gq}, and Delta gamma_{GG}: these expressions resum powers of alpha_s/omega^2 to all orders at large-N_c&N_f (with omega the Mellin moment variable). We extract the leading small-x growth of the helicity distributions, align \Delta\Sigma(x,Q^2) \sim \Delta G(x,Q^2)\sim g_1(x,Q^2) \sim \left(1{x}\right)^{\alpha_h}, align where the intercept alpha_h satisfies an algebraic equation. We determine alpha_h numerically for various values of N_c and N_f. We further obtain the explicit asymptotic expressions for the helicity distributions, which yield numerical values for the ratio of the gluon helicity PDF to the flavor singlet quark helicity PDF in the small-x asymptotic limit (for different N_f/N_c). We find that all our predictions for polarized DGLAP anomalous dimensions are fully consistent with the existing finite-order calculations. Similar to the large-N_c case, our intercept alpha_h exhibits a very slight disagreement with the predictions made within the infrared evolution equations framework. 2 authors · Jul 31