viva_glyph/codebook
Codebook - Learned vocabulary for vector quantization
A codebook is a set of prototype vectors (centroids). Each vector in latent space is mapped to its nearest centroid.
Theory
Based on Vector Quantization (VQ) from signal processing:
- K centroids partition the space into Voronoi cells
- Encoding: find nearest centroid index
- Decoding: lookup centroid by index
References
- Gray (1984). Vector Quantization. IEEE ASSP Magazine.
- van den Oord et al. (2017). VQ-VAE. NeurIPS.
Types
Codebook: collection of prototype vectors
pub type Codebook {
Codebook(
centroids: List(List(Float)),
dimension: Int,
size: Int,
)
}
Constructors
-
Codebook(centroids: List(List(Float)), dimension: Int, size: Int)Arguments
- centroids
-
Prototype vectors (centroids)
- dimension
-
Dimension of each vector
- size
-
Number of centroids (vocabulary size)
Result of quantization: index and reconstruction error
pub type QuantizeResult {
QuantizeResult(index: Int, error: Float)
}
Constructors
-
QuantizeResult(index: Int, error: Float)Arguments
- index
-
Index of nearest centroid
- error
-
Distance to nearest centroid (quantization error)
Values
pub fn dequantize(codebook: Codebook, index: Int) -> List(Float)
Dequantize: get centroid vector from index
pub fn from_vectors(
vectors: List(List(Float)),
) -> option.Option(Codebook)
Create codebook from list of vectors
pub fn get(
codebook: Codebook,
index: Int,
) -> option.Option(List(Float))
Get centroid by index
pub fn init_deterministic(
dimension: Int,
size: Int,
seed: Int,
) -> Codebook
Initialize codebook with random-ish values based on seed (Deterministic pseudo-random for reproducibility)
pub fn quantize(
codebook: Codebook,
input: List(Float),
) -> QuantizeResult
Find nearest centroid to input vector Returns index and quantization error