Interesting Content in AI, Software, Business, and Tech- 08/07/2024

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

12 min readAug 8, 2024

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 08/07/2024. If you missed last week’s readings, you can find it here.

Reminder- We started an AI Made Simple Subreddit. Come join us over here- https://www.reddit.com/r/AIMadeSimple/. If you’d like to stay on top of community events and updates, join the discord for our cult here: https://discord.com/invite/EgrVtXSjYf. Lastly, if you’d like to get involved in our many fun discussions, you should join the Substack Group Chat Over here.

Community Spotlight: Emergent Garden

Emergent Garden puts out very interesting videos on Life simulations, neural networks, cellular automata, and other emergent programs. They’re more “interesting” and less “informational” than a lot of the other sources I put out, but I often find myself wanting to learn much more about an idea after watching one of EG’s videos. It’s a pretty good way to engage with a lot of the more advanced ideas in AI without having to deal with the load of learning 50 new theorems and ideas. Also, EG covers Evolutionary Algos a lot more than other AI creators, so I feel a sense of kin-ship with him.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, ask me to promote your company/product, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in our chocolate milk cult. No costs/obligations are attached.

Previews

Curious about what articles I’m working on? Here are the previews for the next planned articles-

Tech Made Simple

The Economics of ESports

AI Made Simple-

Here’s a teaser. See if you can guess the topic (FT is fine-tuning).

Highly Recommended

These are pieces that I feel are particularly well done. If you don’t have much time, make sure you at least catch these works.

LLM Paper Reading Notes — August 2024

Every month, Jean David Ruvini posts his notes on LLM/NLP related papers, and every month I share his notes here. JD has been doing cutting-edge NLP at scale for a while, so his insights are very valuable. Given how many people want to stay cutting edge in NLP- and how hard it is to know what to focus on- domain specific sources like JD are a godsend-

Sharing short notes (from myself and others) about LLM research papers I came across in July. These notes differ in their level of detail and precision. I hope they’re still useful in piquing your curiosity and helping you breathe under the waterfall. At the current pace of AI, it takes the power of all of us to keep up.

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

A lot of you have seen discussions of this paper, so this is your reminder to read it.

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG’s significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our findings provide a guideline for long-context applications of LLMs using RAG and LC.

How to Use Benchmarks to Build Successful Machine Learning Systems

I’ve discussed Goodhart’s Law and how any sufficiently powerful system significantly changes its environment (and why this makes our way of training AI incompatible with AGI). Logan Thorneloe takes a slightly different approach, explaining how overfitting to benchmarks leads to system issues. Logan is at his best exploring the intersection of ML and Software Engineering, and this is one such case.

Tl;dr: Software engineers building applications using machine learning need to test models in real-world scenarios before choosing which model performs best. Benchmarks are good preliminary measures but don’t reflect the complexities of real-world scenarios.

How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com [Technique Tuesdays]

A piece by yours truly for our sister publication, Tech Made Simple, on Github Actions and how they are used to speed-up workflows. This is the kind of AI that often gets overlooked in discussions of AI and it’s utility, which I think is a bummer b/c there’s so much really awesome shit happening all over. It would be nice if people could stop being so tribal about every development, and just marvel at the cool things being built.

Recently, I came across a very interesting piece called, “How GitHub uses GitHub Actions and Actions larger runners to build and test GitHub.com”, which is a pretty interesting overview of using Github Actions for CI/CD (learn more about what it is and how it enables smoothness and collaborations across large, diverse teams here). It was pretty interesting, and I think it’s always good to study different software engineering tools to see how we can improve our own work experiences-

we run 15,000 CI jobs within an hour across 150,000 cores of compute

This article will be my overview + analysis of the article to understand how GitHub achieves speed, efficiency, and reliability at a massive scale. To understand the article, it’s helpful to first understand Github Actions and Action Runners.

If you like this article, please consider becoming a premium subscriber to my newsletter AI Made Simple so I can spend more time researching and sharing information on truly important topics. We have a pay-what-you-can model, which lets you support my efforts to bring high-quality technical Education to everyone for less than the price of a cup of coffee.

Subscribe to Artificial Intelligence Made Simple

Covering the implications of important ideas in AI from all angles- technical, social, and economic. Read in over 180…

artificialintelligencemadesimple.substack.com

I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.

RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

As mentioned, I’m going to do a piece on RAG soon, so I’m doing a lot of research on it. This was an interesting piece on it. I’ll have to dig into this, but if this paper says what I think it’s saying- it will change the way I do RAG.

Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction of ranking data into the training blend, and outperform existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For generation, we compare our model with many strong baselines, including GPT-4–0613, GPT-4-turbo-2024–0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks. Specifically, our Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. In addition, it also performs comparably to GPT-4 on five RAG benchmarks in the biomedical domain without instruction fine-tuning on biomedical data, demonstrating its superb capability for generalization to new domains.

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Another interesting example, showing the benefits of good data. The quantification of the relative sparsity of vocab size to model sizes is pretty interesting, and I wonder what other performance gains we’re leaving on the table when we rush into model size.

Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size. We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations. We propose three complementary approaches for predicting the compute-optimal vocabulary size: IsoFLOPs analysis, derivative estimation, and parametric fit of the loss function. Our approaches converge on the same result that the optimal vocabulary size depends on the available compute budget and that larger models deserve larger vocabularies. However, most LLMs use too small vocabulary sizes. For example, we predict that the optimal vocabulary size of Llama2–70B should have been at least 216K, 7 times larger than its vocabulary of 32K. We validate our predictions empirically by training models with 3B parameters across different FLOPs budgets. Adopting our predicted optimal vocabulary size consistently improves downstream performance over commonly used vocabulary sizes. By increasing the vocabulary size from the conventional 32K to 43K, we improve performance on ARC-Challenge from 29.1 to 32.0 with the same 2.3e21 FLOPs. Our work emphasizes the necessity of jointly considering model parameters and vocabulary size for efficient scaling.

Confabulation: The Surprising Value of Large Language Model Hallucinations

I’ve been saying this for a while, but hallucinations are not the devil people paint them to be. Yes they are a problem, if you can’t plan for them, but what we call hallucinations are just a natural by-product of the way Autoregressive Models work. This paper presents an interesting inversion to the way we think about Hallucinations (note this doesn’t mean that you don’t need to account for hallucinations in your product, just that it’s a risk of doing business and should be treated accordingly).

This paper presents a systematic defense of large language model (LLM) hallucinations or ‘confabulations’ as a potential resource instead of a categorically negative pitfall. The standard view is that confabulations are inherently problematic and AI research should eliminate this flaw. In this paper, we argue and empirically demonstrate that measurable semantic characteristics of LLM confabulations mirror a human propensity to utilize increased narrativity as a cognitive resource for sense-making and communication. In other words, it has potential value. Specifically, we analyze popular hallucination benchmarks and reveal that hallucinated outputs display increased levels of narrativity and semantic coherence relative to veridical outputs. This finding reveals a tension in our usually dismissive understandings of confabulation. It suggests, counter-intuitively, that the tendency for LLMs to confabulate may be intimately associated with a positive capacity for coherent narrative-text generation.

An Update on Cloud Markets and AI Value Creation

If it seems like I’ve been fan-boying Eric Flaningam recently, it’s because it’s true. Super glad I found his newsletter, b/c it’s my favorite for understanding the kinds business/investor side of things. His articles + ModernMBA’s YouTube deepdives are a must for any technical person so that they can better understand the money side of the industry.

I like to provide a quarterly update on the hyperscalers as they give us the best gauge on technology markets as a whole. We get data across infra, cloud, and applications. For those interested in AI adoption, they also give us the best insight into AI adoption (Capex, AI cloud revenue, and AI app adoption).

For background info, I published a primer on the cloud here, providing a breakdown of its history, technology, and markets.

This article will be structured as follows:

Background on the Hyperscalers’ AI Strategy
The Capex Story
Market Share Data
Azure Quarterly Update
AWS Quarterly Update
GCP Quarterly Update

Turkey’s comeback, Russia’s overheating economy & more

Another Economics related resource I absolutely love are Joeri Schasfoort ‘s economics videos for Money and Macro. This video is a great way to stay in touch with important economic developments and debates happening around the world.

The Lurker’s Labyrinth

I haven’t read as much Alejandro Piad Morffis as I should have, and that’s totally on me. I found this article in my bookmarks, and it is a masterpiece. Do yourself a favor and subscribe to his newsletter because it will make your journey into Computer Science and AI so much easier. This piece is a great introduction to Graphs and some of the core algorithms that drive everything else.

The Paltry Economics of Esports

I’m not super into Gaming and Esports, so this video was an eye opener to me on so many levels. Learning about business models is one of my favorite things, and it’s done a lot of good things for me.