Jul 7 – 15, 2016
SISSA main building
Europe/Rome timezone

Minimal Complexity Extreme Learning Machines

Not scheduled
20m
Meeting room (7th floor) (SISSA main building)

Meeting room (7th floor)

SISSA main building

via Bonomea 265, 34136, Trieste, Italy
Student

Speaker

Mr Sumit Soman (PhD Candidate)

Description

Learning sparse representations and minimizing model complexity have gained much interest recently. Parsimonious models are expected to generalize well, are easier to implement, and lead to smaller test times. The recently proposed Minimal Complexity Machine (MCM) showed that for training data $X=\lbrace (x_i, y_i) | x_i \in \mathbb{R}^n, y_i \in \mathbb{R}, i=1,2,...M\rbrace$, minimizing $h^2$, where \begin{equation} h = \frac{\max_{i = 1, 2, ..., M} \|u^T x^i + v\|}{\min_{i = 1, 2, ..., M} \|u^T x^i + v\|}. \end{equation} leads to a hyperplane classifier $u^Tx + v = 0$ with a small VC dimension. This task was shown to be equivalent to \begin{equation} \min_{w, b, h} h + C \cdot \sum_{i = 1}^M q_i \end{equation} \begin{equation} h \geq y_i \cdot [{w^T x^i + b}] + q_i, ~i = 1, 2, ..., M \end{equation} \begin{equation} y_i \cdot [{w^T x^i + b}] + q_i \geq 1, ~i = 1, 2, ..., M \end{equation} \begin{equation} q_i \geq 0, ~i = 1, 2, ..., M. \end{equation} Models such as the Extreme Learning Machine (ELM) and Random Vector Functional Link Network (RVFLN) have been adapted to a number of applications and offer several advantages. Typically, the ELM solves \begin{equation} \min_{\beta, \xi} \; \frac{1}{2} ||\beta||^2 + \frac{1}{2} C \sum_i^M \xi_i^2 \end{equation} \begin{equation} h(x_i) \beta = y_i - \xi_i , \; i=1, 2, ..., M \end{equation} The last layer of the ELM network conventionally involves the computation of a pseudo-inverse; the hidden layer output matrix $H$ is computed as a solution to $H \beta = Y$, where $H(w_1, w_2,...,w_{\hat{n}}, b_1, b_2,...,b_{\hat{n}}, x_1, x_2,...,x_M) = g(w_i \cdot x_i + b)$, $\beta_i = [\beta_{i1}, \beta_{i2},...,\beta_{in}]^T$ is the weight vector connecting the $i^{th}$ hidden node and output nodes, $w=[w_{i1}, w_{i2},...,w_{in}]^T$ is the weight vector connecting the $i^{th}$ hidden node and input nodes, and $Y$ is the vector of $y_i$'s. We propose combining the ELM with the MCM. This allows us to build classifiers or regressors with lower complexity in terms of VC dimension, which induce sparsity in the connections between the neurons of the final layer of the network. This has shown to not only improve generalization, but also create sparser networks which depict models closer to human cognition. Numerical stability issues associated with the calculation of the pseudo-inverse are also avoided.

Primary authors

Prof. Jayadeva Dr (Department of Electrical Engineering, Indian Institute of Technology, Delhi, India) Mr Sumit Soman (PhD Candidate)

Presentation materials

There are no materials yet.