ARTIFICIAL INTELLIGENCE (48) – Computer Vision (4) Regular Fine-Tuning and LoRA Fine-Tuning

Regular Fine‑Tuning

In standard fine‑tuning: The model starts with pretrained weights W, during training, the model learns a full weight update ΔW and the effective weight matrix becomes: Wnew=W+ΔW.

  • W (pretrained weights): large matrix of dimension d×d.
  • ΔW (weight update): same size as W.
  • Both W and ΔW contribute directly to the outputs.
  • All (or most) parameters are updated during training.

Consequences

High flexibility, very memory‑intensive, slow to train and requires storing a full copy of the model per task. For large language models, ΔW can contain billions of parameters, making regular fine‑tuning expensive and impractical in many settings.

LoRA

LoRA (Low‑Rank Adaptation) is based on a key observation: The task‑specific weight update ΔW often lies in a low‑dimensional subspace. Instead of learning the full ΔW, LoRA approximates it using two much smaller matrices.

Weight Update in LoRA

Instead of learning ΔW directly, LoRA parameterizes it as:

ΔW≈BA

Where:

  • A is a matrix of shape r×d
  • B is a matrix of shape d×r
  • r is the rank, and r≪d

The pretrained weights W remain frozen.

Effective weight during inference

Weffective=W+BA

This is functionally similar to regular fine‑tuning, but vastly cheaper.

Pretrained weights (W)

  • Large matrix.
  • Frozen during LoRA training.
  • Represents general language knowledge.

LoRA matrices (A and B)

  • Small, trainable matrices.
  • Capture task‑specific adaptation.
  • Together approximate the full update ΔW.

Rank r

  • Labeled.
  • A hyperparameter.
  • Controls the trade‑off between:
  • Expressiveness.
  • Parameter efficiency.

Typical values: 4, 8, 16, 32.ç

image

© Image. Shaoni Mukherjee

Why LoRA Is So Much More Efficient

Parameter count comparison

Let original weight size be d×d:

  • Regular fine‑tuning:

    d2 trainable parameters

  • LoRA fine‑tuning:

    2dr trainable parameters

Since r≪d, LoRA reduces parameters by orders of magnitude.

Practical Benefits of LoRA

Massive memory savings, Faster training, Multiple adapters per base model, Easy to switch or combine task behaviors and Ideal for large LLMs.

This is why LoRA is widely used for:

  • Domain adaptation.
  • Instruction tuning.
  • Personalization.
  • On‑device or low‑resource training.

Comparison

Aspect Regular Fine‑Tuning LoRA
Pretrained weights Updated Frozen
Trainable parameters Full ΔW Low‑rank A and B
Memory usage Very high Very low
Training stability Medium High
Task modularity Low High
Deployment cost High Low

The difference is:

  • Regular Fine-Tuning
    The model rewrites its knowledge for each new task.
  • LoRA
    The model keeps its original knowledge and learns a small adjustment layer that nudges behavior in the right direction.

You can think of LoRA as:

“Adding a small, task‑specific steering mechanism rather than rebuilding the entire engine.”

In short:

  • Regular fine‑tuning updates all model weights that is expensive.
  • LoRA freezes pretrained weights and learns a low‑rank update.
  • ΔW is approximated as B × A.
  • The rank r determines efficiency vs expressiveness.
  • LoRA enables scalable, modular, and cost‑effective fine‑tuning.

This is why LoRA has become a standard technique for adapting large language models.

 

References:

LoRA: Low-Rank Adaptation of Large Language Models Explained

 

 

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Deja un comentario