TLDR AI 2024-04-17

Stanford AI Index Report 📃, Google’s Infinite Context LLMs 📚, Megalodon Efficient Transformer Pretraining 🌐

🚀

Headlines & Launches

Stanford HAI Releases 2024 AI Index Report (Website)

The Stanford Institute for Human-Centered AI has released its seventh annual AI Index report. This year's report covers the rise of multimodal foundation models, major cash investments into generative AI, new performance benchmarks, shifting global opinions, and new major regulations.

Apple iOS 18 Will Be On-Device (1 minute read)

Apple's upcoming AI features in iOS 18 are rumored to focus on privacy, with the initial set of enhancements functioning entirely on-device without the need for an internet connection or cloud-based processing, thanks to the company's in-house large language model known internally as "Ajax."

Google's New Technique Gives LLMs Infinite Context (5 minute read)

Google researchers have introduced Infini-attention, a technique that enables LLMs to work with text of infinite length while keeping memory and compute requirements constant.

🧠

Research & Innovation

Compression represents intelligence linearly (18 minute read)

Most modern AI is built around the idea of compressing a training dataset into a model. The better the compression, the better the model. This paper shows that relation rigorously and posits that scale benchmark scores correlate strongly to a model's ability to compress novel text.

Megalodon Efficient Transformer Pretraining (17 minute read)

Another long context paper - this time, a new architecture that uses two novel weight updating schemes. It outperforms Llama 2 on the same number of training tokens 2T. It also scales to infinite context length at inference time.

Feedback in Transformers (24 minute read)

TransformerFAM provides a feedback mechanism that allows Transformers to attend to their own latent representations. This can, in theory, introduce recurrence into the model for processing extremely long inputs in context.

👨‍💻

Engineering & Resources

Enhanced Vision-Language Model (GitHub Repo)

Vision Language Models (vLLMs) often struggle with processing multiple queries per image and identifying when objects are absent. This study introduces a new query format to tackle these issues, and incorporates semantic segmentation into the training process.

AI system that creates detailed, cited reports with retrieval (GitHub Repo)

Stanford has released a neat research system called Storm that uses retrieval guided language models to create reports for specific topics.

Road Line Segmentation for Autonomous Driving (16 minute read)

Accurately segmenting road lines and markings is crucial for autonomous driving but challenging due to occlusions caused by vehicles, shadows, and glare. The Homography Guided Fusion (HomoFusion) module uses video frames to identify and classify obscured road lines by leveraging a novel surface normal estimator and a pixel-to-pixel attention mechanism.

🎁

Miscellaneous

Qwen Coder (12 minute read)

Code Qwen 1.5 is a new set of 7B models trained on 3T tokens of code related data. It performs well on HumanEval, with a non-zero score on SWE-bench. The chat variant specifically shows promise for long context retrieval tasks up to 64k tokens.

1-bit Quantization (7 minute read)

Extreme low-bit quantization for small pre-trained models, like Llama2-7B, is challenging, but fine-tuning just 0.65% of parameters significantly improves performance. Newly fine-tuned 1-bit models outperform 2-bit Quip# models, while 2-bit models with specialized data can exceed full-precision counterparts. This research suggests that proper fine-tuning and quantization may enhance efficiency without compromising model quality, potentially shifting focus from training smaller models to optimizing larger, quantized ones.

Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10 (5 minute read)

Anyscale's latest release of Ray, Ray 2.10, adds support for Intel Gaudi 3. Developers can now spin up and manage their own Ray Clusters, provisioning Ray Core Task and Actors on a Gaudi fleet directly through Ray Core APIs, tap into Ray Serve on Gaudi through Ray Serve APIs for a higher level experience, and configure Intel Gaudi accelerator infrastructure for use at the Ray Train layer.

⚡️

Join 500,000 readers for

Privacy Careers Advertise