Top 20 Most Influential AI Research Papers of 2024

Here are the 20 influential AI papers in 2024:

Mixtral of Experts (Jan 2024) [paper]

  • This paper describes Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) model.
  • It uses 8 expert networks per layer but only activates 2 experts per token.
  • Mixtral 8x7B outperformed Llama 2 70B and GPT-3.5 on a variety of benchmarks.

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model (Jan 2024) [paper]

  • Introduces the use of state-space models (SSMs) for computer vision tasks, offering a more efficient alternative to transformer-based architectures.
  • Achieves competitive performance with linear complexity, making it suitable for real-time applications like robotics and AR/VR systems.

Genie: Generative Interactive Environments (Feb 2024)[paper]

  • This paper introduces a generative model capable of creating interactive virtual environments.
  • Developed by Google DeepMind, Genie was trained on over 200,000 hours of gameplay videos.
  • Genie can generate diverse environments that users can explore interactively.

DoRA: Weight-Decomposed Low-Rank Adaptation (February 2024)[paper]

  • This paper extends LoRA, a popular method for parameter-efficient LLM finetuning.
  • DoRA decomposes a pretrained weight matrix into a magnitude vector and a directional matrix.
  • DoRA makes subtle directional adjustments without necessarily increasing the magnitude.

Simple and Scalable Strategies to Continually Pre-train Large Language Models (March 2024)[paper]

  • This 24-page paper reports on numerous experiments on continued pretraining of LLMs.
  • The paper suggests that re-warming and re-decaying the learning rate will improve LLM performance.
  • Adding a small portion of the original pretraining data to the new dataset will also improve LLM performance.

Gemma: Open Models Based on Gemini Research and Technology (Mar 2024)[paper]

  • This paper presents two of Google’s newest models, which are 2 billion and 7 billion parameters, respectively.
  • The models outperform similarly sized models in almost 70% of the investigated language tasks.

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study (April 2024)[paper]

  • This paper compares Direct Preference Optimisation (DPO) and Proximal Policy Optimisation (PPO).
  • The authors concluded that PPO tends to outperform DPO.
  • DPO is inferior when used with out-of-distribution data.

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (Apr 2024)[paper]

  • This paper, which won an award at the 2024 NeurIPS conference, presents a new approach to image generation called Visual AutoRegressive (VAR) modeling.
  • VAR modeling predicts images in stages, ranging from coarse to fine resolutions, resulting in more efficient training and enhanced performance.

Vision Transformers Need Registers (Apr 2024)[paper]

  • This paper won an Outstanding Paper Award at the International Conference of Learning Representations (ICLR 2024).
  • The authors added “register tokens” to the input of vision transformers to improve the model’s performance.
  • The register tokens address the issue of the model generating high-value tokens for less important areas of an image, such as the background.

KAN: Kolmogorov-Arnold Networks (Apr 2024) [paper]

  • Combines kernel methods with deep learning principles to create a novel architecture for data representation and processing.
  • Offers scalability and robustness, particularly in tasks requiring high interpretability or dynamic adaptability, such as time series analysis and scientific research.

Why Larger Language Models Do In-context Learning Differently? (May 2024)[paper]

  • This highly cited study found that small language models (SLMs) are more robust to noise than larger language models (LLMs).
  • SLMs are “less easily distracted” because they focus on a narrower range of hidden features.

Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3 (May 2024)[paper]

  • This paper introduces the new AlphaFold 3 (AF3) model.
  • This model can predict the joint structures of various biomolecular complexes.
  • AF3 outperforms previous specialized tools in predicting protein-ligand and protein-nucleic acid interactions.

LoRA Learns Less and Forgets Less (May 2024)[paper]

  • This empirical study compares LoRA with full finetuning for LLMs.
  • LoRA learns less than full finetuning.
  • LoRA consistently forgets less than full finetuning.

The Llama 3 Herd of Models (July 2024)[paper]

  • This paper introduced Meta’s new 405B-parameter multilingual language model.
  • The model integrates multimodal capabilities, enabling it to perform competitively in use cases such as speech, image, and video recognition.
  • Llama 3 also includes Llama Guard 3 for secure input and output.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (August 2024)[paper]

  • This paper introduces the phi-3 series, which includes phi-3-mini, phi-3-small, and phi-3-medium.
  • The phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens.
  • Despite its compact size, phi-3-mini can be deployed on devices like smartphones.

Qwen2 Technical Report (September 2024)[paper]

  • This paper introduces the Qwen2 series, including models from 0.5 to 72 billion parameters.
  • It outperforms its predecessor Qwen1.5 and many open-weight models.
  • Qwen2 supports 30 languages.

Movie Gen: A Cast of Media Foundation Models (October 2024)[paper]

  • This paper introduces foundation models for generating high-quality videos with audio.
  • The models excel in text-to-video synthesis, video personalisation, video editing, and text-to-audio generation.
  • The largest model can generate 16-second videos at 16 frames-per-second.

Byte Latent Transformer: Patches Scale Better Than Tokens (December 2024)[paper]

  • This paper introduces the Byte Latent Transformer (BLT), which achieves tokenization-based LLM performance with improved inference efficiency.
  • BLT encodes data into dynamically sized patches, allocating more computation where complexity increases.
  • BLT models are trained on raw bytes instead of fixed vocabularies.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory-Efficient, and Long Context Fine-Tuning and Inference (December 2024)[paper]

  • This paper introduces ModernBERT, which improves the BERT architecture.
  • Trained on 2 trillion tokens, ModernBERT achieves state-of-the-art results across a wide range of evaluations.
  • ModernBERT offers superior efficiency for inference on common GPUs.

DeepSeek-V3 Technical Report (December 2024)[paper]

  • This paper introduces DeepSeek-V3, a 671B parameter Mixture-of-Experts (MoE) model.
  • DeepSeek-V3 utilizes Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.
  • DeepSeek-V3 outperforms other open-source models and rivals leading closed-source models.

Reference

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top