Telegram Web
In Search of Projectively Equivariant Neural Networks

Equivariance of linear neural network layers is well studied. In this work, we relax the equivariance condition to only be true in a projective sense. In particular, we study the relation of projective and ordinary equivariance and show that for important examples, the problems are in fact equivalent. The rotation group in 3D acts projectively on the projective plane. We experimentally study the practical importance of rotation equivariance when designing networks for filtering 2D-2D correspondences. Fully equivariant models perform poorly, and while a simple addition of invariant features to a strong baseline yields improvements, this seems to not be due to improved equivariance

https://arxiv.org/pdf/2209.14719.pdf
1👍1🔥1
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations – usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance

https://arxiv.org/abs/2203.10833
👍1🔥1
Do Vision Transformers See Like Convolutional Neural Networks?

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.

https://arxiv.org/pdf/2108.08810.pdf
Learning Where To Look – Generative NAS is Surprisingly Efficient

The efficient, automated search for well-performing neural architectures (NAS) has drawn increasing attention in the recent past. Thereby, the predominant research objective is to reduce the necessity of costly evaluations of neural architectures while efficiently exploring large search spaces. To this aim, surrogate models embed architectures in a latent space and predict their performance, while generative models for neural architectures enable optimization-based search within the latent space the generator draws from. Both, surrogate and generative models, have the aim of facilitating query-efficient search in a well-structured latent space. In this paper, we further improve the trade-off between query-efficiency and promising architecture generation by leveraging advantages from both, efficient surrogate models and generative design. To this end, we propose a generative model, paired with a surrogate predictor, that iteratively learns to generate samples from increasingly promising latent subspaces. This approach leads to very effective and efficient architecture search, while keeping the query amount low. In addition, our approach allows in a straightforward manner to jointly optimize for multiple objectives such as accuracy and hardware latency. We show the benefit of this approach not only w.r.t. the optimization of architectures for highest classification accuracy but also in the context of hardware constraints and outperform state-of-the-art methods on several NAS benchmarks for single and multiple objectives. We also achieve state-of-the-art performance on ImageNet.

https://arxiv.org/pdf/2203.08734.pdf
🔥2
On Predicting Generalization using GANs

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters. While generalization bounds can give many insights about architecture design, training algorithms etc., what they do not currently do is yield good predictions for actual test error. A recently introduced Predicting Generalization in Deep Learning competition (Jiang et al., 2020) aims to encourage discovery of methods to better predict test error. The current paper investigates a simple idea: can test error be predicted using synthetic data, produced using a Generative Adversarial Network (GAN) that was trained on the same training dataset? Upon investigating several GAN models and architectures, we find that this turns out to be the case. In fact, using GANs pre-trained on standard datasets, the test error can be predicted without requiring any additional hyperparameter tuning. This result is surprising because GANs have well-known limitations (e.g. mode collapse) and are known to not learn the data distribution accurately. Yet the generated samples are good enough to substitute for test data. Several additional experiments are presented to explore reasons why GANs do well at this task. In addition to a new approach for predicting generalization, the counter-intuitive phenomena presented in our work may also call for a better understanding of GANs’ strengths and limitations.

https://arxiv.org/pdf/2111.14212.pdf

http://www.offconvex.org/2022/06/06/PGDL/
🔥1
https://www.youtube.com/watch?v=AjghL908gQc

This is a survey. The main subject of this survey is the homotopical or homological nature of certain structures which appear in classical problems about groups, Lie rings and group rings. It is well known that the (generalized) dimension subgroups have complicated combinatorial theories. In this paper we show that, in certain cases, the complexity of these theories is based on homotopy theory. The derived functors of non-additive functors, homotopy groups of spheres, group homology etc appear naturally in problems formulated in purely group-theoretical terms. The variety of structures appearing in the considered context is very rich. In order to illustrate it, we present this survey as a trip passing through examples having a similar nature.
🔥2👍1🤔1
Forwarded from DLStories
На ICLR-2022 была, оказывается, такая интересная работа: авторы показали, что принцип работы Transformer’ов (с небольшим дополнением) схож с принципом работы гиппокампа и энторинальной коры головного мозга человека.
(Автор работы, если что, Ph.D. по computational/ theoretical neuroscience из Stanford и Oxford. Понимает, о чем говорит)

Подробнее:
Гиппокамп и энториальная кора мозга вместе отвечают за память, восприятие времени и пространства. Энториальная кора является “шлюзом” для гиппокампа: она обрабатывает поступающую в гиппокамп и исходящую из него информацию. Гиппокамп же обрабатывает и структурирует все виды памяти: краткосрочную, долгосрочную, пространственную.
То есть, связка “гиппокамп + энторинальная кора” (EC-hippocampus) играют важную роль при решении человеком задач, связанных с пространственным восприятием.

Как показали, почему Transformer “похож” на EC-hippocampus: авторы статьи взяли Transformer и обучили его на простую задачу, в которой нужно выдавать ответ, имея в виду текущее пространственно положение. Архитектура Transformer была стандартная с парой небольших отличий в формуле для attention и position encodings. Вычисление position encodings было изменено так, что стало обучаемым.

После обучения модели ученые посмотрели на “пространственную карту весов position encodings”. Карта составляется просто: для каждого пространственного положения из задачи, которую учил Tranformer, вычисляется средняя активация position encodings. Так вот, оказалось, что эта карта структурно схожа с той, что получается из активаций нейронов в EC-hippocampus

Но это еще не все: только такая “похожесть” карт активаций нейронов в мозге и модели недостаточно убедительна. Авторы статьи так же показали следующее: архитектура Transformer эквивалентна математической модели EC-hippocampus, которую нейробиологи построили не так давно и активно используют. Эта матмодель называется TEM (Tolman-Eichenbaum Machine), и она хорошо описывает основные процессы, происходящие в EC-hippocampus. TEM — обучаемся модель, которая при обучении должна имитировать процессы, происходящие в EC-hippocampus.

Так вот, упомянутый выше модифицированный Transformer, оказывается, имеет аналогичное с TEM устройство. Аторы назвали такой трансформер TEM-t. В статье авторы показывают аналогии между отдельными компонентами Transformer и TEM. В частности, “модель памяти” TEM оказывается эквивалентной self-attention из Tranformer.
Более того, авторы заявляют, что TEM-t может служить более эффективной моделью EC-hippocampus, чем существующий TEM: он гораздо быстрее обучается, имеет больший потенциал по памяти (может “запоминать” и “вытаскивать” больше бит памяти). Также плюсом является то, что пространственная карта весов position encodings трансформера похожа на такую карту из мозга (о чем писала выше).

Подробнее об устройстве TEM, TEM-t, экспериментах и о том, какое значение это имеет для нейробиологии — в статье. А еще там есть описание того, как архитектура Transformer может быть реализована на биологических нейронах. Блин, а вдруг какие-то части нашего мозга — это реально transformer’ы?)

Еще ссылка: статья в Quantamagazine об этой работе

P.S. Надеюсь, я нигде сильно не наврала. Все же в вопросах устройства мозга и подобном я дилетант. Feel free поправлять меня в комментариях
#ai_inside
🔥2👍1
DO WIDE AND DEEP NETWORKS LEARN THE SAME THINGS? UNCOVERING HOW NEURAL NETWORK REPRESENTATIONS VARY WITH WIDTH AND DEPTH

A key factor in the success of deep neural networks is the ability to scale models to improve performance by varying the architecture depth and width. This simple property of neural network design has resulted in highly effective architectures for a variety of tasks. Nevertheless, there is limited understanding of effects of depth and width on the learned representations. In this paper, we study this fundamental question. We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models. We demonstrate that this block structure arises when model capacity is large relative to the size of the training set, and is indicative of the underlying layers preserving and propagating the dominant principal component of their representations. This discovery has important ramifications for features learned by different models, namely, representations outside the block structure are often similar across architectures with varying widths and depths, but the block structure is unique to each model. We analyze the output predictions of different model architectures, finding that even when the overall accuracy is similar, wide and deep models exhibit distinctive error patterns and variations across classes.

https://arxiv.org/pdf/2010.15327.pdf
😱3
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥2🤯1😱1
Солженицына цитируют даже западные специалисты по графовым нейросетям

https://towardsdatascience.com/a-new-computational-fabric-for-graph-neural-networks-280ea7e3ed1a

“Topology! The stratosphere of human thought! In the twenty-fourth century, it might possibly be of use to someone.” — Aleksandr Solzhenitsyn, In the First Circle (1968)
👍3👎1🔥1🥴1
HOW DO VISION TRANSFORMERS WORK?

Global and local aspects consistently show that MSAs flatten loss landscapes. Left: Loss landscape visualizations show that ViT has a flatter loss than ResNet. Right: The magnitude of the Hessian eigenvalues of ViT is smaller than that of ResNet during training phases. Since the Hessian represents local curvature, this also suggests that the loss landscapes of ViT is flatter than that of ResNet.

https://arxiv.org/pdf/2202.06709.pdf
🦄2🔥1🙏1🤨1🎃1
TopoAct: Visually Exploring the Shape of Activations in Deep Learning

Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a global view of what neurons detect by studying their activations. In particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we present TopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios using TopoAct that provide valuable insights into learned representations of neural networks. We expect TopoAct to give a topological perspective that enriches the current toolbox of neural network analysis, and to provide a basis for network architecture diagnosis and data anomaly detection

https://arxiv.org/pdf/1912.06332.pdf

здесь можно потыкать само приложение: https://tdavislab.github.io/TopoAct/single-layer-view.html
5🤨1🦄1
An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters

Авторы показывают, что задачу NAS(Neural Architecture Search) можно разбить на 2 подзадачи - определение структуры (поиск самого DAG - пространство 𝒜) и поиск гиперпараметров (поиск операций в узлах DAG - пространство гиперпараметров Λ⁢(a)). В итоге пространство поиска раскладывается в произведение пространств - Ω=(𝒜×{Λ⁢(a),a∈𝒜}), где a - это архитектура. Далее используется процедура NAS на основе эволюционного элгоритма. Применяется для поиска DNN для задачи предсказания временных рядов.

https://arxiv.org/pdf/2303.12797.pdf
🫡3👾2🤨1
Forwarded from MAA — САП
https://arxiv.org/abs/2305.02023 В классическом покере есть 4-мерная сфера.

We examine the complexity of the ``Texas Hold'em'' variant of poker from a topological perspective. We show that there exists a natural simplicial complex governing the multi-way winning probabilities between various hands, and that this simplicial complex contains 4-dimensional spheres as induced subcomplexes. We deduce that evaluating the strength of a pair of cards in Texas Hold'em is an intricate problem, and that even the notion of who is bluffing against whom is ill-defined in some situations.
5👍1🔥1
Towards Universal Fake Image Detectors that Generalize Across Generative Models

Авторы решают задачу распознавания фейковых примеров. Используют ViT из CLIP для извлечения признаков из (фейковых/реальных изображениях), затем тренируют обычный log-reg на бинарную классификацию на фичах. Их решение обобщается на разные домены: обучили на данных с генеративной модели ProGAN, и полученный классификатор работает на данных с других доменов (т.е. генеративных моделях - другие GAN: StyleGAN, BigGAN итд и диффузионных моделях). Применение преобученного ViT из CLIP работает лучше, чем файнтюнинг обычного ViT на классификацию.

https://arxiv.org/pdf/2302.10174.pdf
🫡1🗿1🦄1
AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. Unlike previous attention visualization techniques, our approach enables the analysis of global patterns across multiple input sequences. We create an interactive visualization tool, AttentionViz (demo: http://attentionviz.com), based on these joint query-key embeddings, and use it to study attention mechanisms in both language and vision transformers. We demonstrate the utility of our approach in improving model understanding and offering new insights about query-key interactions through several application scenarios and expert feedback
🫡3👍2🔥2🦄1
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks

The natural world is full of complex systems characterized by intricate relations between their components: from social interactions between individuals in a social network to electrostatic interactions between atoms in a protein. Topological Deep Learning (TDL) provides a comprehensive framework to process and extract knowledge from data associated with these systems, such as predicting the social community to which an individual belongs or predicting whether a protein can be a reasonable target for drug development. TDL has demonstrated theoretical and practical advantages that hold the promise of breaking ground in the applied sciences and beyond. However, the rapid growth of the TDL literature has also led to a lack of unification in notation and language across Topological Neural Network (TNN) architectures. This presents a real obstacle for building upon existing works and for deploying TNNs to new real-world problems. To address this issue, we provide an accessible introduction to TDL, and compare the recently published TNNs using a unified mathematical and graphical notation. Through an intuitive and critical review of the emerging field of TDL, we extract valuable insights into current challenges and exciting opportunities for future development.

https://arxiv.org/pdf/2304.10031.pdf
2👍2🔥2🫡2
Data Topology-Dependent Upper Bounds of Neural Network Widths

Our primary contribution is to introduce data topology-dependent upper bounds on the network width. Specifically, we first show that a three-layer neural network, applying a ReLU activation function and max pooling, can be designed to approximate an indicator function over a compact set, one that is encompassed by a tight convex polytope. This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure. Further, we calculate upper bounds in relation to the Betti numbers of select topological spaces. Finally, we prove the universal approximation property of three-layer ReLU networks using our topological approach. We also verify that gradient descent converges to the network structure proposed in our study.

https://arxiv.org/pdf/2305.16375.pdf
👍3🔥2👨‍💻1
Riemannian Geometry of Symmetric Positive Definite Matrices via Cholesky Decomposition

We present a new Riemannian metric, termed Log-Cholesky metric, on the manifold of symmetric positive definite (SPD) matrices via Cholesky decomposition. We first construct a Lie group structure and a bi-invariant metric on Cholesky space, the collection of lower triangular matrices whose diagonal elements are all positive. Such group structure and metric are then pushed forward to the space of SPD matrices via the inverse of Cholesky decomposition that is a bijective map between Cholesky space and SPD matrix space. This new Riemannian metric and Lie group structure fully circumvent swelling effect, in the sense that the determinant of the Fréchet average of a set of SPD matrices under the presented metric, called Log-Cholesky average, is between the minimum and the maximum of the determinants of the original SPD matrices. Comparing to existing metrics such as the affine-invariant metric and Log-Euclidean metric, the presented metric is simpler, more computationally efficient and numerically stabler. In particular, parallel transport along geodesics under Log-Cholesky metric is given in a closed and easy-to-compute form.

Data Analysis with the Riemannian Geometry of Symmetric Positive-Definite Matrices
http://www.ipam.ucla.edu/abstract/?tid=15457&pcode=GLWS3
🔥3🫡1💊1
Neural Networks are Decision Trees

In this manuscript, we show that any neural network with any activation function can be represented as a decision tree. The representation is equivalence and not an approximation, thus keeping the accuracy of the neural network exactly as is. We believe that this work provides better understanding of neural networks and paves the way to tackle their black-box nature. We share equivalent trees of some neural networks and show that besides providing interpretability, tree representation can also achieve some computational advantages for small networks. The analysis holds both for fully connected and convolutional networks, which may or may not also include skip connections and/or normalizations.

https://www.youtube.com/watch?v=_okxGdHM5b8
💊4👍2👾2🦄1
2025/07/09 20:37:31
Back to Top
HTML Embed Code: