Squid: A Breakthrough On-Device Language Model

In the rapidly evolving landscape of artificial intelligence, the demand for efficient, accurate, and resource-friendly language models has never been higher. Nexa AI rises to this challenge with Squid, a language model designed to process long texts on resource-constrained devices. Squid delivers impressive performance enhancements, including a 10x improvement in energy efficiency, a 5x reduction in processing time, and high accuracy across various tasks. Let’s delve into Squid’s innovative architecture, training process, and the significant impact it holds for the future of on-device AI.

Introduction

There is a growing demand for enhanced privacy, reduced latency, and offline functionality. This fosters the deployment of language models directly on devices.
Mobile devices, with limited computational resources and battery life, poses significant challenges. Processing long contexts demands substantial memory and computational power, which can quickly drain battery life and degrade user experience due to slow response times—issues that are particularly problematic for real-time applications like voice assistants and interactive chatbots.
Squid addresses these issues while maintaining the accuracy and reducing their energy footprint on resource-constrained devices.

Squid: A New Approach to Handling Long Texts

Viewing Long Text as a New Type of Data Efficiently

Traditional language models process text linearly, which can be inefficient and resource-intensive, especially when dealing with lengthy documents.
Inspired by models that handle both images and text, Squid treats long pieces of text as a distinct kind of data.

Two-Part Model Architecture

Squid’s architecture is a key factor in its efficiency and performance. It consists of two interconnected models that work in tandem to manage and process long texts effectively:

1. Small Model

Size: 0.5 billion parameters
Role: Compresses long texts into concise representations

The small model’s primary task is to distill long texts into essential summaries. By doing so, it significantly reduces the amount of data the larger model needs to handle, thereby conserving computational resources and energy.

2. Large Model

Size: 7 billion parameters
Role: Based on the user’s request, generates accurate response using the compressed summary (from small model).

The large model leverages the concise summaries produced by the small model to deliver precise and contextually aware responses. This division of labor ensures that Squid can handle complex queries and generate detailed outputs without the large model being bogged down by processing vast amounts of text data directly.

Memory Tokens

To facilitate effective communication between the small and large models, memory tokens are itroduced that capture critical information from the long text. The small model generates these memory tokens, which the large model then utilizes to understand the context and generate appropriate responses. Memory tokens ensure that essential details are preserved.

Bridging the Models

A pivotal component of Squid’s architecture is the projector, which translates information between the small and large models. The projector ensures that the compressed summaries and memory tokens created by the small model are in a format that the large model can seamlessly interpret and use. This translation layer is crucial for maintaining coherence and accuracy across the two models.

Training Process of Squid

1. Restoration Training

The large language model is trained to reconstruct original long text from compressed summaries generated by a smaller model.
Restoration training ensures that the compression process retains all critical information, maintaining the integrity and accuracy of the data.

2. Continual Training

The models are trained on text segments to predict the next segment, enhancing their ability to generate coherent and contextually relevant continuations.
This is particularly valuable for tasks like story generation or extending conversations, where maintaining a logical flow is essential for a natural user experience.

3. Instruction Fine-Tuning

The model is improved by fine-tuning it on a large dataset of question-answer pairs to better understand and respond to user queries.
Instruction fine-tuning ensures that Squid provides helpful and relevant responses across various topics, improving its overall utility and reliability.

Why Squid Matters

Powers advanced AI capabilities on phones without depleting the battery.
Allows smartwatches and other wearable devices to utilize AI capabilities.
Keeps sensitive data private.
Works reliably in areas with poor connectivity.