Refs:

Optimization refers to the case where we have variables \(x_1, \ldots, x_n\) that we can assign values and we want to minimize or maximize a certain objective function \(f(x_1, \ldots, x_n)\)

Unconstrained optimization

In this case there is no restriction for the values of \(x_i\).

A typical solution is to compute the gradient vector of the objective function [\(\delta f/\delta x_1, \ldots, \delta f/\delta x_n\)] and set it to [\(0, \ldots, 0\)]. Solve this equation and output the result \(x_1, \ldots, x_n\) which will give the local maximum.

Eg in R to find the minimum of function \(f(x_1,x_2) = (x_1-5)^2 + (x_2-6)^2\):

f <- function(x) { (x[1] - 5)^2 + (x[2] - 6)^2 }
initial_x <- c(10, 11)
x_optimal <- optim(initial_x, f, method="CG") # performs minimization
x_min <- x_optimal$par
x_min
## [1] 5 6

Using simulated annealing to find the minimum of a wild function with a global minimum at about -15.81515:

fw <- function (x) { 10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80 }
plot(fw, -50, 50, n = 1000, main = "optim() minimising 'wild function'")
abline(v=-15.81515,lty=3,col="red")

res <- optim(50, fw, method = "SANN", control = list(maxit = 20000, temp = 20, parscale = 20))
res$par
## [1] -15.81478

Function optim() has lots of options. Check its help file.

Equality constraint optimization

Now \(x_1, \ldots, x_n\) are not independent in some particular way: + \(g_1(x_1, \ldots, x_n) = 0\) + \(\ldots\) + \(g_k(x_1, \ldots, x_n) = 0\)

This can be solved by Linear Programming (check below)

Another way is to transform the objective function into \[f^*(x_1, \ldots, x_n, \lambda_1, \ldots, \lambda_k) = f(x_1, \ldots, x_n) + \lambda_1 g_1(x_1, \ldots, x_n) + \ldots + \lambda_k g_k(x_1, \ldots, x_n)\]

making it an unconstrained optimization problem using Lagrange multipliers and to solve it for

\[[\delta f/\delta x_1, \ldots, \delta f/\delta x_n, \delta f/\delta \lambda_1, \ldots, \delta f/\delta \lambda_k] = [0, \ldots, 0]\]

Inequality constraint optimization

We cannot use the Lagrange multiplier technique because it requires equality constraint. There is no general solution for arbitrary inequality constraints.

However, we can put some restriction in the form of constraint. In the following, if we restrict the constraints and the objective function to be linear functions of the variables then the problem can be solved again by Linear Programming

Linear Programming

Linear Programming (LP) works when the objective function is a linear function. The constraint functions are also linear combination of variables.

The first part of the next code sets this problem:

\[ \left\{ \begin{array}{rl} 6c_1 + 2c_2 + 4c_3 & \leq 150 \\ c_1 + c_2 + 6c_3 & \geq 0 \\ 4c_1 + 5c_2 + 4c_3 & = 40 \end{array} \right. \]

with the following objective function:

\[minimize: -3c_1 -4c_2 -3c_3\]

The result should be the following:

\[c_1 = 0 \wedge c_2 = 8 \wedge c_3 = 0\]

library(lpSolveAPI)
## Warning: package 'lpSolveAPI' was built under R version 3.1.2
lps.model <- make.lp(0, 3) # define 3 variables, the constraints are added below
add.constraint(lps.model, c(6,2,4), "<=", 150)
add.constraint(lps.model, c(1,1,6), ">=",   0)
add.constraint(lps.model, c(4,5,4), "=" ,  40)
# set objective function (default: find minimum)
set.objfn(lps.model, c(-3,-4,-3))  
# write model to a file
write.lp(lps.model,'model.lp',type='lp')

# these commands defines the model 
# /* Objective function */
#   min: -3 C1 -4 C2 -3 C3;
# 
# /* Constraints */
# +6 C1 +2 C2 +4 C3 <= 150;
# +  C1 +  C2 +6 C3 >=   0;
# +4 C1 +5 C2 +4 C3  =  40;
#
# writing it in the text file named 'model.lp'
solve(lps.model)
## [1] 0
# Retrieve the var values from a solved linear program model 
get.variables(lps.model)  # check with the solution above!
## [1] 0 8 0
# another eg
lps.model2 <- make.lp(0, 3)
add.constraint(lps.model2, c(1, 2, 3), "<=", 14)
add.constraint(lps.model2, c(3,-1,-6), ">=",  0)
add.constraint(lps.model2, c(1,-1, 0), "<=",  2)
set.objfn(lps.model2, c(3,4), indices = c(1,2)) # does not use C3
lp.control(lps.model2,sense='max')     # changes to max: 3 C1 + 4 C2 
## $anti.degen
## [1] "fixedvars" "stalling" 
## 
## $basis.crash
## [1] "none"
## 
## $bb.depthlimit
## [1] -50
## 
## $bb.floorfirst
## [1] "automatic"
## 
## $bb.rule
## [1] "pseudononint" "greedy"       "dynamic"      "rcostfixing" 
## 
## $break.at.first
## [1] FALSE
## 
## $break.at.value
## [1] 1e+30
## 
## $epsilon
##       epsb       epsd      epsel     epsint epsperturb   epspivot 
##      1e-10      1e-09      1e-12      1e-07      1e-05      2e-07 
## 
## $improve
## [1] "dualfeas" "thetagap"
## 
## $infinite
## [1] 1e+30
## 
## $maxpivot
## [1] 250
## 
## $mip.gap
## absolute relative 
##    1e-11    1e-11 
## 
## $negrange
## [1] -1e+06
## 
## $obj.in.basis
## [1] TRUE
## 
## $pivoting
## [1] "devex"    "adaptive"
## 
## $presolve
## [1] "none"
## 
## $scalelimit
## [1] 5
## 
## $scaling
## [1] "geometric"   "equilibrate" "integers"   
## 
## $sense
## [1] "maximize"
## 
## $simplextype
## NULL
## 
## $timeout
## [1] 0
## 
## $verbose
## [1] "neutral"
write.lp(lps.model2,'model2.lp',type='lp')
solve(lps.model2)
## [1] 0
get.variables(lps.model2)
## [1] 6 4 0

It is possible to restrict the type of values, namely to integers which makes it a ILP (Integer Linear Programming), binary/boolean values (BLP) or even mixed types, known as Mixed Integer Liner Programming (MILP).

Some egs:

lps.model <- make.lp(0, 3)
add.constraint(lps.model, c(6,2,4), "<=", 150)
add.constraint(lps.model, c(1,1,6), ">=", 0)
add.constraint(lps.model, c(4,5,4), "=", 40)
set.objfn(lps.model, c(-3,-4,-3))

set.type(lps.model, 2, "binary")
set.type(lps.model, 3, "integer")
get.type(lps.model) # This is Mixed Integer Linear Programming (MILP)
## [1] "real"    "integer" "integer"
set.bounds(lps.model, lower=-5, upper=5, columns=c(1))

# give names to columns and restrictions
dimnames(lps.model) <- list(c("R1","R2","R3"), c("C1","C2","C3")) 

print(lps.model)
## Model name: 
##             C1    C2    C3         
## Minimize    -3    -4    -3         
## R1           6     2     4  <=  150
## R2           1     1     6  >=    0
## R3           4     5     4   =   40
## Kind       Std   Std   Std         
## Type      Real   Int   Int         
## Upper        5     1   Inf         
## Lower       -5     0     0
solve(lps.model)
## [1] 0
get.objective(lps.model)
## [1] -30.25
get.variables(lps.model)
## [1] 4.75 1.00 4.00
get.constraints(lps.model)
## [1] 46.50 29.75 40.00
lps.model <- make.lp(0, 3)
add.constraint(lps.model, c(1,2,4), "<=", 5)
add.constraint(lps.model, c(1,1,6), ">=", 2)
add.constraint(lps.model, c(1,1,1), "=",  2)
set.objfn(lps.model, c(2,1,2))

set.type(lps.model, 1, "binary")
set.type(lps.model, 2, "binary")
set.type(lps.model, 3, "binary")

print(lps.model)
## Model name: 
##            C1   C2   C3       
## Minimize    2    1    2       
## R1          1    2    4  <=  5
## R2          1    1    6  >=  2
## R3          1    1    1   =  2
## Kind      Std  Std  Std       
## Type      Int  Int  Int       
## Upper       1    1    1       
## Lower       0    0    0
solve(lps.model)
## [1] 0
get.variables(lps.model)
## [1] 1 1 0

Quadratic Programming

Quadratic Programming (QP) works when the objective function is a quadratic function, ie, contains up to two ter products. Here the constraint functions are still linear combination of variables.

We can express the problem in matrix form.

Minize objective: \[\frac{1}{2} X^T D X - d^T X\] where \(X\) is the vector \([x_1,\ldots,x_n]^T\), \(D\) is the matrix of weights of each par \(x_ix_j\) and \(d\) are the weights for each \(x_i\). The \(\frac{1}{2}\) comes from the fact that \(D\) is simmetric and so, each \(x_ix_j\) is counted twice.

with constraints: \[A^T X [ = | \geq ]~b\], where the first \(k\) operators are equality, the others are \(\geq\) and \(b\) the values the constraints should be equal to.

An eg of a QP objective function: \[f(x_1, x_2, x_3) = 2.x_1^2 - x_1x_2 - + 2 x_2^2 + x_2x_3 + 2x_3^2 - 5.x_2 + 3.x_3\] Subject to constraints: + \(-4x_1 + -3x_2 = -8\) + \(2x_1 + x_2 = 2\) + \(-2x_2 + x_3 \geq 0\)

In R:

library(quadprog)
## Warning: package 'quadprog' was built under R version 3.1.1
Dmat       <- matrix(c( 2,-1, 0,
                       -1, 2,-1,
                        0,-1, 2),3,3)
dvec       <- c(0,-5,3)
Amat       <- matrix(c(-4,-3, 0,
                        2, 1, 0,
                        0,-2, 1),3,3)
bvec       <- c(-8,2,0)
n.eqs      <- 2 # the first two constraints are equalities
sol <- solve.QP(Dmat,dvec,Amat,bvec=bvec,meq=2)
sol$solution
## [1] -1  4  8
sol$value
## [1] 49

So, the solution is \(x_1=-1\), \(x_2=4\) and \(x_3=8\) with a minimum of \(49\).

In QP if \(D\) is a definitive positive matrix (ie, \(X^T D X \gt 0\), for all non-zero \(X\)) the problem is solved in polinomial time. if not QP is NP-Hard. If \(D\) has only one negative eigenvalue, the problem is NP-hard.

Function solve.QP() expects a definitive positive matrix \(D\).

General Non-linear Optimization

Package Rsolnp provides function solnp() which solves the general nonlinear programming problem:

\[min f(x)\]

such that

\[g(x)=0\] \[l_h \leq h(x) \leq u_h\] \[l_x \leq x \leq u_x\]

where \(f(x), g(x), h(x)\) are smooth functions.

Let’s see some example of use (egs from here and here).

library(Rsolnp)

fn <- function(x) { # f(x,y) = 5x-3y
  5*x[1] - 3*x[2]
}

# constraint z1: x^2+y^2=136
eqn <- function(x) { 
  z1=x[1]^2 + x[2]^2
  return(c(z1))
}
constraints = c(136)

x0 <- c(1, 1) # setup init values
sol1 <- solnp(x0, fun = fn, eqfun = eqn, eqB = constraints)
## 
## Iter: 1 fn: 37.4378   Pars:  30.55472 38.44528
## Iter: 2 fn: -147.9181     Pars:  -6.57051 38.35517
## Iter: 3 fn: -154.7345     Pars:  -20.10545  18.06907
## Iter: 4 fn: -96.4033  Pars:  -14.71366   7.61165
## Iter: 5 fn: -72.4915  Pars:  -10.49919   6.66517
## Iter: 6 fn: -68.1680  Pars:  -10.04485   5.98124
## Iter: 7 fn: -68.0006  Pars:   -9.99999   6.00022
## Iter: 8 fn: -68.0000  Pars:  -10.00000   6.00000
## Iter: 9 fn: -68.0000  Pars:  -10.00000   6.00000
## solnp--> Completed in 9 iterations
sol1$pars
## [1] -10   6
fn <- function(x) {  # f(x,y) = 4x^2 + 10y^2
  4*x[1]^2 + 10*x[2]^2
}

# constraint z1: x^2+y^2 <= 4
ineq <- function(x) { 
  z1=x[1]^2 + x[2]^2
  return(c(z1))
}

lh <- c(0)
uh <- c(4)

x0 = c(1, 1) # setup init values
sol1 <- solnp(x0, fun = fn, ineqfun = ineq, ineqLB = lh, ineqUB=uh)
## 
## Iter: 1 fn: 2.8697    Pars:  0.68437 0.31563
## Iter: 2 fn: 0.6456    Pars:  0.39701 0.03895
## Iter: 3 fn: 0.1604    Pars:  0.200217 0.002001
## Iter: 4 fn: 0.04009   Pars:  0.10011818 0.00005323
## Iter: 5 fn: 0.01002   Pars:  0.0500591342 0.0000006785
## Iter: 6 fn: 0.002506  Pars:   0.02502959503 -0.00000004495
## Iter: 7 fn: 0.0006265     Pars:   0.01251488111 -0.00000004998
## Iter: 8 fn: 0.0001566     Pars:   0.00625751 -0.00000005
## Iter: 9 fn: 0.00003916    Pars:   0.00312878 -0.00000005
## Iter: 10 fn: 0.000009791  Pars:   0.00156452 -0.00000005
## Iter: 11 fn: 0.000002448  Pars:   0.00078235 -0.00000005
## Iter: 12 fn: 0.0000006137     Pars:   0.00039171 -0.00000005
## Iter: 13 fn: 0.0000001564     Pars:   0.00019772 -0.00000005
## Iter: 14 fn: 0.00000004006    Pars:   0.00010007 -0.00000005
## Iter: 15 fn: 0.00000001307    Pars:   0.00005716 -0.00000005
## Iter: 16 fn: 0.000000006833   Pars:   0.00004133 -0.00000005
## solnp--> Completed in 16 iterations
sol1$pars
## [1]  4.133035e-05 -5.000052e-08

The result is quite close to \((0,0)\).

We can give some extra controls tot he procedure, like TOL which defines the tolerance for optimality (which impacts on the convergence steps) or trace=0 is switches off the printing of the major iterations. Eg:

ctrl <- list(TOL=1e-15, trace=0)
sol2 <- solnp(x0, fun = fn, ineqfun = ineq, ineqLB = lh, ineqUB=uh, control=ctrl)
sol2$pars
## [1] -1.957183e-08 -5.000000e-08
fn <- function(x,...){
  -x[1]*x[2]*x[3]
}

eqn <- function(x,...){
    4*x[1]*x[2]+2*x[2]*x[3]+2*x[3]*x[1]
}
constraints = c(100)

lx <- rep(1,3)
ux <- rep(10,3)

pars <- c(1.1,1.1,9) # tricky setup
ctrl <- list(TOL=1e-6, trace=0)
sol3 <- solnp(pars, fun=fn, eqfun=eqn, eqB = constraints, LB=lx, UB=ux, control=ctrl)
sol3$pars
## [1] 2.886751 2.886751 5.773503

The initial parameters can be sensible if the objective function is not smooth or there are many local minima. Check function gosolnp() that generates initial parameters (see manual for more info).

fn <- function(x)  # f(x,y,z) = 4y-2z
{
  4*x[2] - 2*x[3]
}

# constraint z1: 2x-y-z  = 2 
# constraint z2: x^2+y^2 = 1
eqn <- function(x){ 
  z1=2*x[1] - x[2] - x[3]
  z2=x[1]^2 + x[2]^2
  
  return(c(z1,z2))
}
constraints <- c(2,1)

x0 <- c(1, 1, 1)
ctrl <- list(trace=0)
sol4 <- solnp(x0, fun = fn, eqfun = eqn, eqB = constraints, control=ctrl)
sol4$pars
## [1]  0.55470020 -0.83205029 -0.05854931

Using CVX

This next section uses a low tech solution implemented by Jacob Bien. It needs Matlab and an installation of CVX. Here’s a local copy of the package.

For information about CVX check here.

Regression with Least-square

The next eg finds a least-squares regression for a cubic polynomial, ie, it minimizes the \(L_2\) norm:

library(CVXfromR) # http://faculty.bscb.cornell.edu/~bien/cvxfromr.html

df <- data.frame(X=c(1,3,4,5,7,8),  # the data
                 Y=c(4,6,5,3,5,6))

degree <- 3                         # cubic polynomial

# creating matrix A and vector b containing the constraints
A <- matrix(rep(NA,(degree+1)*nrow(df)), nrow=nrow(df))
for(i in 1:nrow(df))  # for each data point
  for (d in 1:(degree+1)) # for each degree
    A[i,d] <- df$X[i]^(d-1)
b <- df$Y

cvxcode <- "
   variables x(n);
   minimize( norm(A*x-b) );
  "

# it takes sometime to run a matlab session
opt.vals <- CallCVX(cvxcode, const.vars=list(n=degree+1, A=A, b=b),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

# returns a polynomial predictor based on a CVX optimization result
# vector 'as' keeps coefficients a_i of polynomial a_0 + a_1 x + ... + a_i x^i + ... + a_n x^n
get.poly.predictor <- function(as) {
  n <- length(as)
  function(x) {
    vals <- rep(NA,n)
    for(i in 1:n)
      vals[i] = x^(i-1)
    sum(vals * as)
  }
}

# get the predictor for the dataset
predictor.L2 <- get.poly.predictor(opt.vals$x)

plot(df, pch=19, xlim=c(0,8), ylim=c(0,8)) # plot points and prediction
xs  <- seq(0,8,len=101)
fit.L2 <- vapply(xs, predictor.L2, 0)
lines(xs, fit.L2, type='l', col="red", lwd=2)

Regression with \(L_1\) norm

Using the same dataset, but now with the \(L_1\) norm, ie, minimizing the sum of the absolute values of the residuals:

n   <- degree+1
m   <- length(df$X)
one <- rep(1,m)

cvxcode <- "
   variables x(n) y(m);
   minimize( one' * y );
   subject to
     A*x - b <=  y;
     A*x - b >= -y;
  "

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, m=m, A=A, b=b, one=one),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

predictor.L1 <- get.poly.predictor(opt.vals$x)

plot(df, pch=19, xlim=c(0,8), ylim=c(0,8)) # plot points and prediction
lines(xs, fit.L2, type='l', col="red",   lwd=2)
fit.L1 <- vapply(xs, predictor.L1, 0)
lines(xs, fit.L1, type='l', col="green", lwd=2)
legend("bottomright",c("L2 norm","L1 norm"), col=c("red","green"), lty=1, lwd=2) 

Regression with \(L_\infty\) norm

Again with the Chebyshev norm, \(L_\infty\), ie, the minimization of the maximum residual:

n   <- degree+1
m   <- length(df$X)
one <- rep(1,m)

cvxcode <- "
   variables x(n) t;
   minimize( t );
   subject to
     A*x - b <=  one * t;
     A*x - b >= -one * t;
  "

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, A=A, b=b, one=one),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

predictor.LInf <- get.poly.predictor(opt.vals$x)

plot(df, pch=19, xlim=c(0,8), ylim=c(0,8)) # plot points and prediction
lines(xs, fit.L2, type='l', col="red",   lwd=2)
lines(xs, fit.L1, type='l', col="green", lwd=2)
fit.LInf <- vapply(xs, predictor.LInf, 0)
lines(xs, fit.LInf, type='l', col="blue", lwd=2)

legend("bottomright",c("L2 norm","L1 norm","LInf norm"), col=c("red","green","blue"), lty=1, lwd=2)

Robust Regression with Huber loss

And with the Huber loss for robust regression:

df <- data.frame(X=c(1,3,4,5,6,7,8),  # including an outlier
                 Y=c(4,6,5,3,20,5,6))

degree <- 3                         # cubic polynomial
n <- degree+1

# creating matrix A and vector b containing the constraints
A <- matrix(rep(NA,n*nrow(df)), nrow=nrow(df))
for(i in 1:nrow(df))  # for each data point
  for (d in 1:n) # for each degree
    A[i,d] <- df$X[i]^(d-1)
b <- df$Y

# First compute standard least-squares:
cvxcode <- "
   variables x(n);
   minimize( norm(A*x-b) );
  "

# it takes sometime to run a matlab session
opt.vals <- CallCVX(cvxcode, const.vars=list(n=degree+1, A=A, b=b),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

predictor.L2 <- get.poly.predictor(opt.vals$x)

# Second compute with the Huber loss:
cvxcode <- "
   variables x(n);
   minimize( sum(huber(A*x-b)) );
  "

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, A=A, b=b),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

predictor.Huber <- get.poly.predictor(opt.vals$x)

plot(df, pch=19, xlim=c(0,8), ylim=c(0,20)) # plot points and prediction
fit.L2 <- vapply(xs, predictor.L2, 0)
lines(xs, fit.L2, type='l', col="red", lwd=2)
fit.Huber <- vapply(xs, predictor.Huber, 0)
lines(xs, fit.Huber, type='l', col="blue", lwd=2)

legend("topleft",c("L2 norm","Huber Loss (robust)"), col=c("red","blue"), lty=1, lwd=2)

De-Noising Data

Given \(x_{\text{corrupt}}\) with \(n\) noisy data points, produce a similar yet smoother dataset \(x\). Smoother means that the difference between neighboring data points should be smaller.

The objective goal is to minimize

\[\| x - x_{\text{corrupt}} \|^2 + \mu \sum_{k=1}^{n-1} (x_{k+1} - x_k)^2\]

where \(\mu\) is a smoothness parameter, where if \(\mu \rightarrow 0 \implies x \rightarrow x_{\text{corrupt}}\).

Let’s make some noisy dataset:

set.seed(101)

n  <- 201
xs <- seq(0,8,len=n)

x_corrupt <- sin(xs)^2/(1.5+cos(xs)) + rnorm(n,0,0.1)

plot(xs,x_corrupt, type='l')

Let’s specify the CVX problem:

cvxcode <- "
    variable x(n)
    minimize( norm(x-corrupt) + mu*norm( x(2:n)-x(1:n-1) ) );
  "

To apply the optimization, we must define a value for \(\mu\). Different values give different smoothing results:

mu <- 0.75

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, mu=mu, corrupt=x_corrupt),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

x_smooth <- opt.vals$x

plot(xs,x_corrupt, type='l')
lines(xs,x_smooth, col="red", lwd=2, type='l')

mu <- 2

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, mu=mu, corrupt=x_corrupt),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

x_smooth <- opt.vals$x

plot(xs,x_corrupt, type='l')
lines(xs,x_smooth, col="red", lwd=2, type='l')

mu <- 5

opt.vals <- CallCVX(cvxcode, const.vars=list(n=n, mu=mu, corrupt=x_corrupt),
                    opt.var.names="x", setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")

x_smooth <- opt.vals$x

plot(xs,x_corrupt, type='l')
lines(xs,x_smooth, col="red", lwd=2, type='l')

Margin Classifiers

df <- data.frame(X1=c(1,2,3,4,5,6,8),  # the data
                 X2=c(4,5,6,5,3,5,6),
                 Y =c(-1,-1,-1,-1,1,1,1) )

plot(df$X1, df$X2, col=df$Y+3, pch=19)

# separate both classes and place the coordinates into matrices

n <- 2 # number of dimensions (herein, X1 and X2)
plus1  <- matrix(c(df$X1[df$Y== 1], df$X2[df$Y== 1]), ncol=2)
minus1 <- matrix(c(df$X1[df$Y==-1], df$X2[df$Y==-1]), ncol=2)

cvxcode <- "
   variables a(n) b(1) u(R) v(Q);
   minimize( ones(1,R)*u + ones(1,Q)*v );
   P1 * a - b >= 1-u;
   P2 * a - b <= -(1-v);
   u >= 0;
   v >= 0;
  "

Variables u and v are slack variables that allow the algorithm to fit non-separable sets. The values 1 and -1 at the constraints are the initial ‘width’ of the hyper-plane that separates the two datasets. In two dimensions, vector a consists of two coordinates (X1 and X2) while b is just a number, ie, a 2D hyperplane is just a line.

opt.vals <- CallCVX(cvxcode, 
                    const.vars=list(n=n, P1=plus1, R=nrow(plus1), P2=minus1, Q=nrow(minus1)),
                    opt.var.names=c("a","b"), 
                    setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")


a <- x_smooth <- opt.vals$a # the separating line is given by  a_1 x_1 + a_2 x_2 + b = 0
b <- x_smooth <- opt.vals$b

plot(df$X1, df$X2, col=df$Y+3, pch=19)
abline(-b/a[2], -a[1]/a[2], col="purple", lwd=2, lty=2)

Max Margin with Outliers

Let’s introduce one outlier and some extra points:

df <- data.frame(X1=c(1,2,2,3,3,3.5,4,5,6,8,6,5,3.5),  # the data
                 X2=c(4,4.5,5,4.5,4,6,5,3,5,6,4,4,4.5),
                 Y =c(-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1) )

n <- 2 # number of dimensions (herein, X1 and X2)
plus1  <- matrix(c(df$X1[df$Y== 1], df$X2[df$Y== 1]), ncol=2)
minus1 <- matrix(c(df$X1[df$Y==-1], df$X2[df$Y==-1]), ncol=2)

plot(df$X1, df$X2, col=df$Y+3, pch=19)

With outliers we should try to keep the slacks as small as possible, but allowing some weight to the margin size that separates both classes. The margin size is proportional to the euclidean norm of a, and so the minimization can be expressed as

\[\min \|a\| + \gamma (1^Tu + 1^Tv)\]

where \(\gamma\) is a parameter that weights the importance of the slack sizes. usually, the user should test with different \(\gamma\) values to find a suitable one. Herein we’ll use \(\gamma=2.0\).

cvxcode <- "
   variables a(n) b(1) u(R) v(Q);
   minimize( norm(a) + 2*(ones(1,R)*u + ones(1,Q)*v) );
   P1 * a - b >= 1-u;
   P2 * a - b <= -(1-v);
   u >= 0;
   v >= 0;
  "

opt.vals <- CallCVX(cvxcode, 
                    const.vars=list(n=n, P1=plus1, R=nrow(plus1), P2=minus1, Q=nrow(minus1)),
                    opt.var.names=c("a","b"), 
                    setup.dir="C:\\Users\\jpn\\Documents\\Software\\cvx")


a <- x_smooth <- opt.vals$a # the separating line is given by  a_1 x_1 + a_2 x_2 + b = 0
b <- x_smooth <- opt.vals$b

plot(df$X1, df$X2, col=df$Y+3, pch=19)
abline(-b/a[2], -a[1]/a[2], col="purple", lwd=2, lty=2)