Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

blog

VITA: Variational Inference Transformer for Asymmetric Data

Published:

A decoder-free variational pretraining framework for time-series forecasting under feature asymmetry. VITA learns latent representations that bridge the gap between rich training features and limited deployment features, using a transformer encoder with a seasonal prior. Applied to agricultural yield forecasting, VITA achieves R² = 0.726 on extreme weather years, training in <2.5 hours on a single GPU.

Linear Regression as a System of Linear Equations

Published:

A systematic treatment of linear regression from the perspective of solving linear systems. We characterize when exact solutions exist based on rank conditions and derive the closed-form solution ($X^T X)^{-1} X^T y$. In the regression setting with noise, this formula emerges as both the least-squares minimizer and the maximum likelihood estimator.

Schema Search MCP Server

Published:

Natural language search for relational schemas with millisecond latency.

Geometric Interpretration of the Cauchy–Schwarz Inequality

Published:

How Pythagoras and positive definiteness geometrically prove the Cauchy–Schwarz inequality. The inequality simply states that projecting a vector onto another cannot increase its length. This geometric perspective reveals that Cauchy–Schwarz represents the cosine of an angle between vectors, connecting it to the law of cosines and Pearson correlation.

When Does No Correlation Imply Independence?

Published:

Zero correlation does not always imply independence. However, if two random variables have exponentially decaying tails and all their mixed polynomial covariances vanish, then they must be independent. The proof uses moment generating functions to extend polynomial factorization to the full distribution.

Why Does Normal Distribution Show Up in Central Limit Theorem?

Published:

The Central Limit Theorem states that sums of i.i.d random variables with finite mean and variance converge in distribution to a Gaussian. But why the normal distribution specifically? This post builds intuition by showing CLT is a direct consequence of the exponential limit $(1 + x/n)^n \to e^x$.

Adam and AdamW: How They Work and When They Fail

Published:

Adam is an adaptive optimizer that rescales gradients coordinate-wise and maintains a momentum, addressing two major problems with Stochastic Gradient Descent. This post discusses why it is so effective and under what circumstances it can fail.

portfolio

publications

talks

teaching