Thursday, October 30, 2014

Vanishing component analysis: Speculating about a vanishing ideal

Last year at ICML 2013, Ron Livni, et al. [1] presented a paper called Vanishing Component Analysis. This introduced me to this task of finding the vanishing ideal of a set. Formally, given a set of points $S = \left\{ x_1, \ldots, x_m \right \} \subset \textbf{R}^d$, the vanishing ideal $I(S)$ of $S$ is the set of $d$-variate polynomials that vanish on $S$ (that is, for any polynomial $f \in I(S)$, $f(x_i) = 0 \forall x_i \in S$). An ideal $I$  is a set of polynomials which is a sub-group with respect to addition (closed under addition, negation and contains the zero polynomial) and it "absorbs multiplication" (that is, if $f \in I$ and $g$ is a $d$-variate finite degree polynomial over reals, then $fg \in I$, where $f \cdot g(x) = f(x) \cdot g(x)$). Another way of talking about $g$ is that $g$ belongs to polynomial ring in $d$ variables over $\mathbb{R}$. Ron Livni, et al., 2013 [1] proposed a way of finding the set of generators for the vanishing ideal $I(S)$ of any set $S$. Given an ideal $I$, the set of generators is a set of polynomials $\left \{ f_1, \ldots, f_k \right \} \subset I $ such that, $\forall f \in I$, $\exists g_1, \ldots, g_k$ in the $d$-variate polynomial ring and $f = \sum_i g_i \cdot f_i$. Let us call this set of generators (of the vanishing ideal) of $S$ $F(S)$. So $F(S)$ is kind of the "lowest degree" subset of the vanishing ideal. 

This interests me for several reasons (just that some mathematics like this exists and can be done is exciting in itself), but I would like to note two specific things here:

  • the usability and the interpretability of the (set of generators of the) vanishing ideal of a set (which is what I talk about in the following blob of text)
  • the computational challenges for the proposed algorithms for large data sets (which I wish to get to in a future post)


Naively thinking about the set of the generators of the vanishing ideal $I(S)$, it seems to me that higher the degree of the polynomials in the set of generators, more "non-homogeneous" the set $S$ is. I am sure there is some proper interpretation of the degrees of the generators in algebraic geometry. So anything I say after this is pure speculation (mostly originating from my lack of knowledge in algebraic geometry) ...

Think of the situation where $d = 1$ so $S \subset \mathbb{R}$. So if we think of the points in $S$ lying on the $x$-axis, the vanishing ideal $I(S)$ of $S$ is the set of polynomials which intersects with the $x$-axis at every $x_i \in S$. More irregular and weird (for a lack of more mathematical terms) the points in $S$, the higher the degree of the polynomial(s) in $F(S)$ (or subsequently in $I(S)$) since higher degree polynomials can (almost) fit any arbitrary set of points on the $x$-axis. I am hoping that such a behaviour transfers (probably with the help of complicated mathematics I am yet to comprehend) to higher dimensional sets ($d > 1$). 

Given the set of generators of the vanishing ideal $F(S)$, we can define a notion of complexity of a data set $S$ which roughly corresponds to the degrees of the polynomials resulting in $F(S)$.  For example, the complexity $\mathcal{C}(S)$ of the set $S$ can be defined (pardon my lack of foresight here) as: 
$$ \mathcal{C}(S) = \sum_{f \in F(S)} \mbox{ max-degree of a monomial in } f.$$
Then this roughly captures the following fact: more complex/non-homogeneous $S$, higher the degrees of the polynomials in $F(S)$, hence higher $\mathcal{C}(S)$. I am sure there is a well defined notion of some such complexity already in literature. Another interesting characteristic of $F(S)$ would be the dimension wise statistic, such as:
$$\mathcal{C}(S, d) = \sum_{f \in F(S)} \mbox{ max degree of the } d^{\mbox{th}} \mbox{ variable over all monomials in } f.$$ 
The hope (and I emphasize on hope) is that this statistic indicates the level of complexity added to the data set by this dimension. 

Note. In both the definition of $\mathcal{C}(S)$ and $\mathcal{C}(S, d)$ the sum and the max operator can be interchanged to get some notion of complexity (if there is no established notion already). 

If the above happens to be somewhat true, it is very neat because it gives an actual number which corresponds to the level of complexity/non-homogeneity infused into the whole data set by a particular dimension -- a variable importance of sorts, which could be a tool for feature selection (if and when required). Related interesting (to me) questions also arise:

  1. What kinds of polynomials $f \in F(S)$ and numbers $\mathcal{C}(S, d)$ should we expect if all the dimensions are independent? Do we see polynomials which depends only on a single dimension at a time?
  2. What happens when some of the dimensions are correlated? For example, if the dimension $i$ and $j$ are correlated, would be see a polynomial of the form $f = x[i] - x[j]^c + constant$, where $c$ is an integer?
I will stop at this for now, but this mathematical problem generated a lot of questions in my mind which I thought I should document lest I forget. I have a feeling that most of my questions/ideas would be very easily answered/refuted/validated if I open a book on algebraic geometry (any recommendations for a novice?) ... 


[1] R. Livni, D. Lehavi, S. Schein, H. Nachlieli, S. Shalev-Schwartz & A. Globerson, Vanishing Component Analysis, ICML 2013