Telegram Web
Hyperbolic band theory through Higgs bundles

Hyperbolic lattices underlie a new form of quantum matter with potential applications to quantum computing and simulation and which, to date, have been engineered artificially. A corresponding hyperbolic band theory has emerged, extending 2-dimensional Euclidean band theory in a natural way to higher-genus configuration spaces. Attempts to develop the hyperbolic analogue of Bloch's theorem have revealed an intrinsic role for algebro-geometric moduli spaces, notably those of stable bundles on a curve. We expand this picture to include Higgs bundles, which enjoy natural interpretations in the context of band theory. First, their spectral data encodes a crystal lattice and momentum, providing a framework for symmetric hyperbolic crystals. Second, they act as a complex analogue of crystal momentum. As an application, we elicit a new perspective on Euclidean band theory.

https://arxiv.org/pdf/2201.12689.pdf
🔥1
Category Theory in Machine Learning

Over the past two decades machine learning has permeated almost every realm of technology. At the same time, many researchers have begun using category theory as a unifying language, facilitating communication between different scientific disciplines. It is therefore unsurprising that there is a burgeoning interest in applying category theory to machine learning. We aim to document the motivations, goals and common themes across these applications. We touch on gradient-based learning, probability, and equivariant learning.

https://arxiv.org/pdf/2106.07032.pdf
🔥3
Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations

Self-supervision is recently surging at its new frontier of graph learning. It facilitates graph representations beneficial to downstream tasks; but its success could hinge on domain knowledge for handcraft or the often expensive trials and errors. Even its stateof-the-art representative, graph contrastive learning (GraphCL), is not completely free of those needs as GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations. Our work aims at advancing GraphCL by answering the following questions: How to represent the space of graph augmented views? What principle can be relied upon to learn a prior in that space? And what framework can be constructed to learn the prior in tandem with contrastive learning?
🔥1
GEODIFF: A GEOMETRIC DIFFUSION MODEL FOR MOLECULAR CONFORMATION GENERATION

GEODIFF treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be rototranslational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-toend fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GEODIFF is superior or comparable to existing state-of-the-art approaches, especially on large molecules.

https://openreview.net/pdf?id=PzcvxEMzvQC
🔥2
Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs

Cellular sheaves equip graphs with “geometrical” structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands.

https://arxiv.org/pdf/2202.04579v1.pdf

https://www.youtube.com/watch?v=yEmuc4Pmf0M&ab_channel=AppliedAlgebraicTo
🤔1
Schubert varieties and distances between subspaces of different dimensions

We resolve a basic problem on subspace distances that often arises in applications: How can the usual Grassmann distance between equidimensional subspaces be extended to subspaces of different dimensions? We show that a natural solution is given by the distance of a point to a Schubert variety within the Grassmannian. This distance reduces to the Grassmannian distance when the subspaces are equidimensional and does not depend on any embedding into a larger ambient space. Furthermore, it has a concrete expression involving principal angles, and is efficiently computable in numerically stable ways. Our results are largely independent of the Grassmannian distance — if desired, it may be substituted by any other common distances between subspaces.

https://arxiv.org/abs/1407.0900

http://helper.ipam.ucla.edu/publications/glws1/glws1_15465.pdf

https://web.ma.utexas.edu/users/vandyke/notes/deep_learning_presentation/presentation.pdf
🔥1
What BERT is not: Lessons from a new suite of psycholinguisti c diagnostics for language models

Предельно любопытная статья, где анализируется архитектура BERT в задачах языкового моделирования на примере лингвистических промахов. Что удивительно, авторы показывают: BERT плохо справляется с предложением-отрицанием. (в копилку статей критикующих стохастических попугаев 🦜🦜🦜)

In this paper we introduce a suite of diagnostics drawn from human language experiments, which allow us to ask targeted questions about the information used by language models for generating predictions in context. As a case study, we apply these diagnostics to the popular BERT model, finding that it can generally distinguish good from bad completions involving shared category or role reversal, albeit with less sensitivity than humans, and it robustly retrieves noun hypernyms, but it struggles with challenging inferences and role-based event prediction...

https://arxiv.org/pdf/1907.13528v1.pdf
🔥1
Molecular Contrastive Learning of Representations via Graph Neural Networks

Molecular Machine Learning (ML) bears promise for efficient molecule property prediction and drug discovery. However, labeled molecule data can be expensive and time-consuming to acquire. Due to the limited labeled data, it is a great challenge for supervised-learning ML models to generalize to the giant chemical space. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework that leverages large unlabeled data (∼10M unique molecules). In MolCLR pre-training, we build molecule graphs and develop GNN encoders to learn differentiable representations. Three molecule graph augmentations are proposed: atom masking, bond deletion, and subgraph removal. A contrastive estimator maximizes the agreement of augmentations from the same molecule while minimizing the agreement of different molecules.

https://arxiv.org/pdf/2102.10056.pdf
🔥2
On the capacity of deep generative networks

We study the efficacy and efficiency of deep generative networks for approximating probability distributions. We prove that NN can transform a low-dimensional source distribution to a distribution that is arbitrarily close to a high-dimensional target distribution, when the closeness are measured by Wasserstein distances and maximum mean discrepancy. Upper bounds of the approximation error are obtained in terms of the width and depth of neural network. Furthermore, it is shown that the approximation error in Was. distance grows at most linearly on the ambient dimension and that the approximation order only depends on the intrinsic dimension of the target distribution. On the contrary, when f-diverg. are used as metrics of distributions, the approximation property is different. We show that in order to approximate the target distribution in f-diverg., the dimension of the source distribution cannot be smaller than the intrinsic dimension of the target distribution.
🔥1
Sequence to Sequence Learning with Neural Networks (test of time)

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM’s BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences.

https://arxiv.org/abs/1409.3215
👍1🔥1🫡1
Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers

This paper presents the first large-scale meta-evaluation of machine translation (MT). We annotated MT evaluations conducted in 769 research papers published from 2010 to 2020. Our study shows that practices for automatic MT evaluation have dramatically changed during the past decade and follow concerning trends. An increasing number of MT evaluations exclusively rely on differences between BLEU scores to draw conclusions, without performing any kind of statistical significance testing nor human evaluation, while at least 108 metrics claiming to be better than BLEU have been proposed. MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable.

https://arxiv.org/abs/2106.15195
🔥1
вопросы к статье "Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers":

1) Как отражается то обстоятельство, что в большинстве работ по машинному переводу используется метрика BLEU, на качестве статей? При том, что в работе показано - корреляция между BLEU и оценкой человека не самая высокая среди всех метрик. Почему научный консенсус все равно выстроен на метрике BLEU, потому что она проще считается?

2) Наблюдается ли то, что разные метрики подходят лучше к разным семействам языков? Можно ли придумать мета-метрику, которая бы максимально объективно отражала качество перевода внутри всего многообразия языков мира?
🔥2
MathBERT: A Pre-Trained Model for Mathematical Formula Understanding

Large-scale pre-trained models like BERT, have obtained a great success in various Natural Language Processing (NLP) tasks, while it is still a challenge to adapt them to the math-related tasks. Current pre-trained models neglect the structural features and the semantic correspondence between formula and its context. To address these issues, we propose a novel pre-trained model, namely MathBERT, which is jointly trained with mathematical formulas and their corresponding contexts. In addition, in order to further capture the semantic-level structural features of formulas, a new pre-training task is designed to predict the masked formula substructures extracted from the Operator Tree (OPT), which is the semantic structural representation of formulas. We conduct various experiments on three downstream tasks to evaluate the performance of MathBERT

https://arxiv.org/pdf/2105.00377.pdf
🔥2
The Geometry of Deep Generative Image Models and its Applications

Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, limiting the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs.

https://github.com/Animadversio/GAN-Geometry

https://arxiv.org/abs/2101.06006
🔥1
The theoretical research of generative adversarial networks: an overview

Generative adversarial networks (GAN) has received great attention and made great progress since its emergence in 2014. In this paper, we focus on the theoretical achievements of GAN and discuss them in detail for readers who wish to know more about GAN. Based on the number of the implemented network architectures, we category the improved methods into two groups: GAN variants, which are composed of two networks, to improve the performance by adding some regularization to the loss function; hybrid GANs, which are usually combined with other generative models to improve the training stability. For GAN variants, we discuss the theoretical results of the distribution divergence, training dynamics and various improved methods. For hybrid GANs, we introduce the improved methods of combining encoder, autoencoder or VAE.

https://sci-hub.ru/https://doi.org/10.1016/j.neucom.2020.12.114
🔥1
Intrinsic persistent homology via density-based metric learning

We address the problem of estimating intrinsic distances in a manifold from a finite sample. We prove that the metric space defined by the sample endowed with a computable metric known as sample Fermat distance converges a.s. in the sense of Gromov–Hausdorff. The limiting object is the manifold itself endowed with the population Fermat distance, an intrinsic metric that accounts for both the geometry of the manifold and the density that produces the sample. This result is applied to obtain intrinsic persistence diagrams, which are less sensitive to the particular embedding of the manifold in the Euclidean space. We show that this approach is robust to outliers and deduce a method for pattern recognition in signals, with applications in real data.

https://arxiv.org/abs/2012.07621
🔥1
Forwarded from DL in NLP (Vlad Lialin)
Прикладной DL и матан к сожалению (или к счастью) всё ещё очень далеки. Однако так как на физтехе меня научили любить математику, поэтому вот пара интересных и достаточно вводных материалов по matrix convexity, concentration inequalities, KL-divergence и прочим полезным для теоретического DL штукам. Кванторы и красивые анимации прилагаются.

1. Playing with positive definite matrices – I: matrix monotony and convexity
2. Playing with positive definite matrices – II: entropy edition

И пара более специфичных для DL постов из того же блога:

1. Gradient descent for wide two-layer neural networks – I : Global convergence
2. Gradient descent for wide two-layer neural networks – II: Generalization and implicit bias
Learning Theory from First Principles.pdf
3.9 MB
Learning Theory from First Principles, Francis Bach.
🔥2
2025/07/12 13:57:00
Back to Top
HTML Embed Code: