Mashkka про Data Science@mashkka_ds P.1950

Notice: file_put_contents(): Write of 9606 bytes failed with errno=28 No space left on device in /var/www/tgoop/post.php on line 50

Warning: file_put_contents(): Only 12288 of 21894 bytes written, possibly out of free disk space in /var/www/tgoop/post.php on line 50
Mashkka про Data Science@mashkka_ds P.1950

MASHKKA_DS Telegram 1950

Mashkka про Data Science

Forwarded from Kali Novskaya

🌸Подборка NeurIPS: LLM-статьи 🌸
#nlp #про_nlp #nlp_papers

Вот и прошёл NeurIPS 2024, самая большая конференция по машинному обучению. Ниже — небольшая подборка статей, которые мне показались наиболее интересными. Про некоторые точно стоит сделать отдельный обзор.

Агенты
🟣StreamBench: Towards Benchmarking Continuous Improvement of Language Agents arxiv
🟣SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering arxiv
🟣AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents arxiv

🟣DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents arxiv

Бенчмарки
🟣DevBench: A multimodal developmental benchmark for language learning arxiv
🟣CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark arxiv
🟣LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages arxiv
🟣CLUE - Cross-Linked Unified Embedding for cross-modality representation learning arxiv
🟣EmoBench: Evaluating the Emotional Intelligence of Large Language Models arxiv

LLM
🟣The PRISM Alignment dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models arxiv
🟣UniGen: A Unified Framework for Textual Dataset Generation via Large Language Models arxiv
🟣A Watermark for Black-Box Language Models arxiv

Please open Telegram to view this post

VIEW IN TELEGRAM

StreamBench: Towards Benchmarking Continuous Improvement of Language Agents

Recent works have shown that large language model (LLM) agents are able to improve themselves from experience, which is an important ability for continuous enhancement post-deployment. However,...

👍5🤡5

www.tgoop.com/mashkka_ds/1950

1.28K viewsDec 17, 2024 at 16:18

tgoop.com/mashkka_ds/1950

Create: 2024-12-17
Last Update: 2025-07-30 09:55:43

🌸Подборка NeurIPS: LLM-статьи 🌸
#nlp #про_nlp #nlp_papers

Вот и прошёл NeurIPS 2024, самая большая конференция по машинному обучению. Ниже — небольшая подборка статей, которые мне показались наиболее интересными. Про некоторые точно стоит сделать отдельный обзор.

Агенты
🟣StreamBench: Towards Benchmarking Continuous Improvement of Language Agents arxiv
🟣SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering arxiv
🟣AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents arxiv

🟣DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents arxiv

Бенчмарки
🟣DevBench: A multimodal developmental benchmark for language learning arxiv
🟣CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark arxiv
🟣LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages arxiv
🟣CLUE - Cross-Linked Unified Embedding for cross-modality representation learning arxiv
🟣EmoBench: Evaluating the Emotional Intelligence of Large Language Models arxiv

LLM
🟣The PRISM Alignment dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models arxiv
🟣UniGen: A Unified Framework for Textual Dataset Generation via Large Language Models arxiv
🟣A Watermark for Black-Box Language Models arxiv

BY Mashkka про Data Science

Share with your friend now:
tgoop.com/mashkka_ds/1950

Open in Telegram

Telegram News

Date: 2025-07-30|

Don’t publish new content at nighttime. Since not all users disable notifications for the night, you risk inadvertently disturbing them. Activate up to 20 bots Some Telegram Channels content management tips Clear To delete a channel with over 1,000 subscribers, you need to contact user support
from us

Telegram Mashkka про Data Science
FROM American