Mini thread on LLM benchmarks
https://fixupx.com/evgeniyzhe/status/1862903906025496704
https://fixupx.com/evgeniyzhe/status/1862903906025496704
🧵 Thread • FxTwitter / FixupX
Evgenii Zheltonozhskii (@evgeniyzhe)
Good LLM benchmark has no public answers or is live updated, has deterministic eval and has result far from 0/100% for current LLMs. The first is why I'm not optimistic regarding @DanHendrycks humanity's last exam long-term, and the last is why I use @lmarena_ai…
Forwarded from Хроники Непуганых Идиотов (Larisa M)
Мы тут немного HashCode и AtCoder пошатали.
https://x.com/PetarV_93/status/1863542737552748984
https://x.com/PetarV_93/status/1863542737552748984
🔥6
NLTS Hamiltonians from good quantum codes https://arxiv.org/abs/2206.13228
arXiv.org
NLTS Hamiltonians from good quantum codes
The NLTS (No Low-Energy Trivial State) conjecture of Freedman and Hastings [2014] posits that there exist families of Hamiltonians with all low energy states of non-trivial complexity (with...
Can ChatGPT pass a physics degree? Making a case for reformation of assessment of undergraduate degrees https://arxiv.org/abs/2412.01312
arXiv.org
Can ChatGPT pass a physics degree? Making a case for reformation...
The emergence of conversational natural language processing models presents a significant challenge for Higher Education. In this work, we use the entirety of a UK physics undergraduate (BSc with...
👍3🤣3
AI Benchmarking Hub https://epoch.ai/data/ai-benchmarking-dashboard
Epoch AI
AI Benchmarking Dashboard
Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. The dashboard tracks AI progress…
👍1
YOLOv8 on tinygrad powered by WebGPU
https://github.com/wpmed92/yolov8-webgpu-tinygrad?tab=readme-ov-file
doesn't seem to work with Firefox tho
https://github.com/wpmed92/yolov8-webgpu-tinygrad?tab=readme-ov-file
doesn't seem to work with Firefox tho
👍4
Anti-topological crystal and non-Abelian liquid in twisted semiconductor bilayers https://arxiv.org/abs/2411.19898
arXiv.org
Anti-topological crystal and non-Abelian liquid in twisted...
We show that electron crystals compete closely with non-Abelian fractional Chern insulators in the half-full second moiré band of twisted bilayer MoTe$_2$. Depending on the twist angle and...
Mastering Board Games by External and Internal Planning with Language Models https://storage.googleapis.com/deepmind-media/papers/SchultzAdamek24Mastering/SchultzAdamek24Mastering.pdf
🔥9👍1
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark https://arxiv.org/abs/2411.19941
arXiv.org
Perception Test 2024: Challenge Summary and a Novel Hour-Long...
Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the...
Forwarded from Hacker News
Waymo announces Miami as its next ride hailing city (🔥 Score: 154+ in 2 hours)
Link: https://readhacker.news/s/6j3Lv
Comments: https://readhacker.news/c/6j3Lv
Link: https://readhacker.news/s/6j3Lv
Comments: https://readhacker.news/c/6j3Lv
🔥2
Forwarded from Hacker News
Willow, Our Quantum Chip (🔥 Score: 157+ in 1 hour)
Link: https://readhacker.news/s/6jg7b
Comments: https://readhacker.news/c/6jg7b
Link: https://readhacker.news/s/6jg7b
Comments: https://readhacker.news/c/6jg7b
Google
Meet Willow, our state-of-the-art quantum chip
Our new quantum chip demonstrates error correction and performance that paves the way to a useful, large-scale quantum computer.
Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators https://openreview.net/forum?id=J2wI2rCG2u
openreview.net
Stochastic Taylor Derivative Estimator: Efficient amortization for...
Optimizing neural networks with loss that contain high-dimensional and high-order differential operators
is expensive to evaluate with back-propagation due to $\mathcal{O}(d^{k})$ scaling of the...
is expensive to evaluate with back-propagation due to $\mathcal{O}(d^{k})$ scaling of the...
👌1
How to Build a Quantum Supercomputer: Scaling Challenges and Opportunities https://arxiv.org/abs/2411.10406
arXiv.org
How to Build a Quantum Supercomputer: Scaling from Hundreds to...
In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for...
Konwinski Prize
$1M for the AI that can close 90% of new GitHub issues
https://www.kaggle.com/competitions/konwinski-prize
$1M for the AI that can close 90% of new GitHub issues
https://www.kaggle.com/competitions/konwinski-prize
Kaggle
Konwinski Prize
$1M for the AI that can close 90% of new GitHub issues
👍8
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data https://arxiv.org/abs/2412.07762
arXiv.org
Efficient Online Reinforcement Learning Fine-Tuning Need Not...
The modern paradigm in machine learning involves pre-training on diverse data, followed by task-specific fine-tuning. In reinforcement learning (RL), this translates to learning via offline RL on...
👍7
When a Crystal Ball Isn’t Enough to Make You Rich https://elmwealth.com/crystal-ball/
Elm Wealth
When a Crystal Ball Isn't Enough to Make You Rich - Elm Wealth
Elm: making sense of your wealth. Take the guesswork out of wealth management with our sensible-yet-sophisticated approach to investing.
1❤4😁3
Set of samples from veo2
https://fixupx.com/medhini_n/status/1868773280121077804
https://fixupx.com/doomie/status/1868747266368192543
https://fixupx.com/joeybab3/status/1868791012954718528
https://fixupx.com/ai_for_success/status/1868851632567693625
https://fixupx.com/babaeizadeh/status/1868841586739822638
https://fixupx.com/emollick/status/1868900000463528054
https://fixupx.com/hhm/status/1868773180032290997
https://fixupx.com/noonescente/status/1868761667041202449
https://fixupx.com/hhm/status/1868779762162057481
https://fixupx.com/joeybab3/status/1868792639568724384
https://fixupx.com/MayorKingAI/status/1868748484268187677
https://fixupx.com/medhini_n/status/1868773280121077804
https://fixupx.com/doomie/status/1868747266368192543
https://fixupx.com/joeybab3/status/1868791012954718528
https://fixupx.com/ai_for_success/status/1868851632567693625
https://fixupx.com/babaeizadeh/status/1868841586739822638
https://fixupx.com/emollick/status/1868900000463528054
https://fixupx.com/hhm/status/1868773180032290997
https://fixupx.com/noonescente/status/1868761667041202449
https://fixupx.com/hhm/status/1868779762162057481
https://fixupx.com/joeybab3/status/1868792639568724384
https://fixupx.com/MayorKingAI/status/1868748484268187677
🧵 Thread • FxTwitter / FixupX
Medhini Narasimhan (@medhini_n)
@iandanforth @doomie
👍3👻1
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation https://arxiv.org/abs/2412.09754
arXiv.org
ViCaS: A Dataset for Combining Holistic and Pixel-level Video...
Recent advances in multimodal large language models (MLLMs) have expanded research in video understanding, primarily focusing on high-level tasks such as video captioning and question-answering....