Interesting Content in AI, Software, Business, and Tech- 02/14/2024

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

11 min readFeb 14, 2024

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 02/14/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf.

Community Spotlight: Ravindranath Nemani

Ravindranath Nemani is a data scientist at IBM. If you’re looking to go deep into technical research, you should check out his profile. He shares a ton of resources, notes, and publications on a variety of topics. If you are looking for ideas that might define the next 2–3 decades, then his profile is a great place to go looking.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

AI Made Simple

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

Microsoft’s New Future of Work Report

There are many good resources on software engineering productivity, but Abi Noda is one of them. His newsletter is a cut above the generic, “10x your career by stealing my email template” style of career gurus. He goes into research into what defines high-performers to give you useful advice (his newsletter was how I found the excellent Microsoft paper, “What distinguishes great software engineers?”). In my top 5 resources for Software Engineers and Engineering managers.

“There have been many excellent papers published about the ways in which work may change as LLMs, such as GitHub’s Copilot, are integrated. Microsoft’s report synthesizes some of the most important or emerging themes from this research.

In the future, I’ll cover papers on AI and software development in more depth. This issue gives an overview of the most important ideas and emerging research themes related to AI in the workplace.”

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Found this gem while doing some market research on how to to handle really large documents. Based on some early experiments, the results don’t carry over perfectly (you need to do a lot of tinkering with good design to meaningfully process large documents), but this is a great foundation to build upon.

“LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in ultra-low precisions, such as sub-4-bit. In this work, we present KVQuant, which addresses this problem by incorporating novel methods for quantizing cached KV activations, including: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges; and (v) Q-Norm, where we normalize quantization centroids in order to mitigate distribution shift, providing additional benefits for 2-bit quantization. By applying our method to the LLaMA, LLaMA-2, and Mistral models, we achieve <0.1 perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches. Our method enables serving the LLaMA-7B model with a context length of up to 1 million on a single A100–80GB GPU and up to 10 million on an 8-GPU system. “

Why Green Skyscrapers are a Terrible Idea

Green Skyscrapers are a very strong example of greenwashing. This is a great video on all the logistical issues of green skyscrapers, and proposes alternatives that would be cheaper, better, and more eco-friendly.

Diffusion World Model

As someone who has dealt with the nightmare of multi-step time forecasting on more than one occasion, this is an interesting approach. Will have to experiment more before drawing conclusions, but the idea is worth checking out.

“We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive queries. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a 44% performance gain, and achieves state-of-the-art performance.”

Compare GPT-4 and Gemini Ultra, side-by-side

Our boy Adam Binks put a lot of work into this comparison. While comparisons like this are a dime a dozen, Adam has put a lot of effort into designing creative prompts and building a website that makes it easy to search through multiple criteria.

LLM Paper Reading Notes — January 2024

Any time an expert like Jean David Ruvini shares their notes on a topic, you want to listen. Seeing what kinds of things they focus on is very useful as a proxy for identifying underappreciated but very important metrics.

“Sharing short notes about LLM research papers I came across in December. These notes, intended for my future self, differ in their level of detail and precision. I hope they’re still useful in piquing your curiosity and helping you breathe under the waterfall. At the current pace of AI, it takes the power of all of us to keep up.”

Blink: The Power of Thinking Without Thinking

I wish I found this talk before writing my article on the limitations of data. This is a great overview of how our data collection methods/analysis can introduce lots of flaws and hidden assumptions.

“How do we make decisions — good and bad — and why are some people so much better at it than others? Utilizing case studies as diverse as speed dating, pop music, and the shooting of Amadou Diallo, Gladwell reveals that what we think of as decisions made in the blink of an eye are much more complicated than assumed. Drawing on cutting-edge neuroscience and psychology, he shows how the difference between good decision-making and bad has nothing to do with how much information we can process quickly, but on the few particular details on which we focus. Gladwell reveals how we can become better decision makers — in our homes, our offices, and in everyday life. Never again will you think about thinking the same way.”

How Taco Bell Crippled KFC & Pizza Hut

A brilliant case-study on how disadvantages can become strengths and

“Taco Bell is an extraordinary outlier by every measure. It’s a fast food chain that boasts a deeply passionate fanbase, enjoys a reputation for reliability, speed, and accuracy, and when it comes to business — Taco Bell has grown at such a breakneck pace over the past 20 years that it outperforms giants like McDonald’s, Burger King, and KFC in per-store earnings. The Mexican chain is so popular that it’s one of the few companies whose per-store earnings have stayed ahead of inflation.

Remarkably, Taco Bell boasts some of the highest ever profit margins ever reported in not just fast food, but also in the restaurant industry. Taco Bell’s renaissance is a miracle in an era where fast food chains all follow the same cookie-cutter playbook of cost-cutting and international expansion to cover up domestic decline like KFC, McDonald’s, and Starbucks.

Business is a zero-sum game where every decision is connected, every action has a cause and effect, and the rise of one brand contributes to the fall of another. In this episode, we’ll cover the rise of Taco Bell, their strategy that’s made them so successful, and why Taco Bell is the missing puzzle piece behind the downfall of KFC and Pizza Hut and how their struggles in fried chicken and pizza have molded Taco Bell to what it is today, for better and for worse.”

AI Content

Weak-to-Strong Jailbreaking on Large Language Models

S/o to Ujjawal Panchal for this fantastic find.

Large language models (LLMs) are vulnerable to jailbreak attacks — resulting in harmful, unethical, or biased text generations. However, existing jailbreaking methods are computationally costly. In this paper, we propose the weak-to-strong jailbreaking attack, an efficient method to attack aligned LLMs to produce harmful text. Our key intuition is based on the observation that jailbroken and aligned models only differ in their initial decoding distributions. The weak-to-strong attack’s key technical insight is using two smaller models (a safe and an unsafe one) to adversarially modify a significantly larger safe model’s decoding probabilities. We evaluate the weak-to-strong attack on 5 diverse LLMs from 3 organizations. The results show our method can increase the misalignment rate to over 99% on two datasets with just one forward pass per example. Our study exposes an urgent safety issue that needs to be addressed when aligning LLMs. As an initial attempt, we propose a defense strategy to protect against such attacks, but creating more advanced defenses remains challenging. The code for replicating the method is available at this https URL

The boundary of neural network trainability is fractal

Another argument for Complex NNs. When CVNNs take over, remember where you heard about them. Credit to Lior Sinclair for this find (he’s another great resource for keeping in touch with AI).

Some fractals — for instance those associated with the Mandelbrot and quadratic Julia sets — are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.

Replicability and stability in learning

Have to learn some math to really make assessments, but the first read seems pretty damming.

Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran (’20) in the context of differential privacy.

Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results, in addition, imply that besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized.

The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a “hard input distribution” by applying the Poincaré-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.