Interesting Content in AI, Software, Business, and Tech- 7/5/2023

Content to help you keep up with Machine Learning, Deep Learning, Data Science, Software Engineering, Finance, Business, and more

5 min readJul 6, 2023

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week. Some will be technical, others not really. I will add whatever content I found really informative (and I remembered throughout the week). These won’t always be the most recent publications- just the ones I’m paying attention to this week. Without further ado, here are interesting readings/viewings for 7/5/2023. If you missed last week’s readings, you can find it here.

Join 35K+ tech leaders and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

Reader Spotlight- Jean David Ruvini

Large Language Models (LLMs) have the potential to revolutionize the way we engage with textual information. In a short but exciting video (https://youtu.be/uhZkG7lCVRc), Jean-David showcases how LLMs can enable users to zoom in and out of text, much like we commonly do with images. This approach also facilitates dynamic alterations such as replacing “backgrounds”, removing or adding objects, and much more.

The code (about a hundred lines of code) leverages ChatGPT, Langchain and Gradio, and can be found at https://github.com/jean-david-ruvini/textzoom.

If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me. There are no rules- you could talk about a paper you’ve written, an interesting project you’ve worked on, some personal challenge you’re working on, your content platform, or anything else you consider important. The goal is to get to know you better, and possibly connect you with interesting people in the community. No costs/obligations attached.

AI Papers/Writeups

D3: An Automated System to Detect Data Drifts

How Uber detects Data Drift. Introduction (written by the company)-

Data powers almost all critical, customer-facing flows at Uber. Bad data quality impacts our ML models, leading to a bad user experience (incorrect fares, ETAs, products, etc.) and revenue loss.

Still, many data issues are manually detected by users weeks or even months after they start. Data regressions are hard to catch because the most impactful ones are generally silent. They do not impact metrics and ML models in an obvious way until someone notices something is off, which finally unearths the data issue. But by that time, bad decisions are already made, and ML models have already underperformed.

This makes it critical to monitor data quality thoroughly so that issues are caught proactively.

Deep learning in a bilateral brain with hemispheric specialization

Thank you to William Lambos for this exceptional recommendation

Chandramouli Rajagopalan, David Rawlinson, Elkhonon Goldberg, Gideon Kowadlo

The brains of all bilaterally symmetric animals on Earth are divided into left and right hemispheres. The anatomy and functionality of the hemispheres have a large degree of overlap, but there are asymmetries and they specialize to possess different attributes. Several studies have used computational models to mimic hemispheric asymmetries with a focus on reproducing human data on semantic and visual processing tasks. In this study, we aimed to understand how dual hemispheres could interact in a given task. We propose a bilateral artificial neural network that imitates a lateralization observed in nature: that the left hemisphere specializes in specificity and the right in generalities. We used two ResNet-9 convolutional neural networks with different training objectives and tested it on an image classification task. Our analysis found that the hemispheres represent complementary features that are exploited by a network head which implements a type of weighted attention. The bilateral architecture outperformed a range of baselines of similar representational capacity that don’t exploit differential specialization. The results demonstrate the efficacy of bilateralism, contribute to an understanding of bilateralism in biological brains and the architecture serves as an inductive bias when designing new AI systems.

A Survey on Multimodal Large Language Models

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at this https URL.

Hyena Hierarchy: Towards Larger Convolutional Language Models

Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-spaces and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile), reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.