svm (parameters selection)

This file shows different parameter selection techniques and how they pertain to the choice of kernel parameters

Documentation
Set rng
Concept
Simple 2D problem
Parameter selection strategy
References

Documentation

The documentation for the svm class can be found here.

Set rng

Set random number generator seed:

rng(0)

Concept

When training a meta-model, one typically tries to do so such that it minimizes some metrics. In classification problems, that metric is often the generalization error, i.e., the integral of the misclassification over the domain

$err_{gen}=\int\mathcal{I}_{l(\mathbf{x})\neq \hat{l}(\mathbf{x})}\mathrm d\mathbf{x}$

where $l(\mathbf{x})$ is the true class and $\hat{l}(\mathbf{x})$ is the estimated class of $\mathbf{x}$ . One can show that a Leave One Out (LOO) procedure yields an almost unbiased estimator of the generalization error, Luntz and Brailovsky (1969). However, an LOO procedure may be numerically intractable. A Cross-Validation (CV), typically 10-fold, is usually faster and preferred as it is an estimator of the LOO error.

Specifically for Support Vector Machine, Vapnik (2000) drew a link between the number of support vectors and the generalization capability of the model. Later on, Chapelle et al. (2002) develeped an estimate of the LOO error based on the span of the support vectors.

In certain cases, such as highly unbalanced data, the generalization error is ill defined and the balanced generalization error makes more sense, Jiang and Missoum (2014). The Area Under the Curve (AUC), Metz (1978), is also widely used.

Finally, over the years, heuristics have been developed to quickly guess a suitable value for the kernel parameters. Such heuristics can be cross kernel or kernel specific. The heuristic referred to as 'stiffest' look for the most general case such that there is no misclassification error (Basudhar and Missoum, 2010). The heuristic referred to as 'fast', defined for the Gaussian kernel, defines the kernel parameter as the mean of the pairwise distances between +1 and -1 samples (Jaakkola et al., 1999).

Simple 2D problem

Consider a simple 2D classification problem:

f=@(x)x(:,2)-sin(10*x(:,1))/4-0.2;
x=CODES.sampling.cvt(200,2);
y=f(x);
[X,Y]=meshgrid(linspace(0,1,100));
Z=reshape(f([X(:) Y(:)]),100,100);
figure('Position',[200 200 500 500])
contour(X,Y,Z,[0 0],'k')
hold on
plot(x(y<=0,1),x(y<=0,2),'ro')
plot(x(y>0,1),x(y>0,2),'bo')
axis equal

For varying theta, train an SVM and compare the different aforementioned metrics:

svm=CODES.fit.svm(x,y);
[theta_lb,theta_ub,theta_mean]=svm.point_dist;  % Pairwise distances between +1 and -1 samples
thetas=linspace(theta_lb,theta_ub,20)';
loo=zeros(length(thetas),2);
cv=loo;Nsv=loo;chapelle=loo;auc=loo;
true_misc=loo;                               % True misclassification
X_t=[X(:) Y(:)];
Y_t=Z(:);

parfor i=1:length(thetas)
    svm_tmp=CODES.fit.svm(x,y,'theta',thetas(i),'C',1e4);
    loo(i,:)=[svm_tmp.loo svm_tmp.loo('use_balanced',true)];
    cv(i,:)=[svm_tmp.cv svm_tmp.cv('use_balanced',true)];
    Nsv(i,:)=[svm_tmp.N_sv_ratio svm_tmp.N_sv_ratio(true)];
    chapelle(i,:)=[svm_tmp.chapelle svm_tmp.chapelle(true)];
    auc(i,:)=100-[svm_tmp.auc(svm_tmp.X,svm_tmp.Y) svm_tmp.cv('metric','AUC')];
    true_misc(i,:)=[svm_tmp.me(X_t,Y_t) svm_tmp.me(X_t,Y_t)];
end

Plot a comparison of these metrics:

figure('Position',[200 200 750 500])
plot(thetas,true_misc(:,1),'k-.')
hold on
plot(thetas,loo(:,1),'r-')
plot(thetas,cv(:,1),'b-')
plot(thetas,Nsv(:,1),'c-')
plot(thetas,chapelle(:,1),'m-')
plot([theta_mean theta_mean],[0 max(max([loo;cv;Nsv;chapelle]))],'k--')
plot([theta_lb theta_lb],[0 max(max([loo;cv;Nsv;chapelle]))],'g--')
plot([theta_ub theta_ub],[0 max(max([loo;cv;Nsv;chapelle]))],'g--')
leg=legend('True','loo','cv','Nsv','Chapelle','Fast','Location','bestoutside');
set(leg,'interpreter','latex','FontSize',20)
set(gca,'xscale','log')
axis([0.99*theta_lb 1.01*theta_ub 0 40])
xlabel('$\theta$','interpreter','latex','FontSize',20)
ylabel('$err_{gen}$','interpreter','latex','FontSize',20)

figure('Position',[200 200 750 500])
plot(thetas,true_misc(:,2),'k-.')
hold on
plot(thetas,loo(:,2),'r-')
plot(thetas,cv(:,2),'b-')
plot(thetas,Nsv(:,2),'c-')
plot(thetas,chapelle(:,2),'m-')
plot(thetas,auc(:,1),'g-')
plot(thetas,auc(:,2),'g--')
plot([theta_mean theta_mean],[0 max(max([loo;cv;Nsv;chapelle]))],'k--')
plot([theta_lb theta_lb],[0 max(max([loo;cv;Nsv;chapelle]))],'g--')
plot([theta_ub theta_ub],[0 max(max([loo;cv;Nsv;chapelle]))],'g--')
leg=legend('True (bal)','loo (bal)','cv (bal)','Nsv (bal)','Chapelle (bal)','AUC','AUC (CV)','Fast','Location','bestoutside');
set(leg,'interpreter','latex','FontSize',20)
set(gca,'xscale','log')
axis([0.99*theta_lb 1.01*theta_ub 0 40])
xlabel('$\theta$','interpreter','latex','FontSize',20)
ylabel('$err_{gen}^{bal}$','interpreter','latex','FontSize',20)

Parameter selection strategy

Train different SVMs using different metrics and compare them:

f=@(x)x(:,2)-sin(10*x(:,1))/4-0.5;
x=CODES.sampling.cvt(200,2);
y=f(x);

[X,Y]=meshgrid(linspace(0,1,100));
Z=reshape(f([X(:) Y(:)]),100,100);

tic;svm=CODES.fit.svm(x,y);time_fast=toc;
fast_err=svm.me([X(:) Y(:)],Z(:));
tic;svm1=CODES.fit.svm(x,y,'param_select','loo','UseParallel',true);time_loo=toc;
loo_err=svm1.me([X(:) Y(:)],Z(:));
tic;svm2=CODES.fit.svm(x,y,'param_select','chapelle');time_chapelle=toc;
chapelle_err=svm2.me([X(:) Y(:)],Z(:));
tic;svm3=CODES.fit.svm(x,y,'param_select','auc');time_auc=toc;
auc_err=svm3.me([X(:) Y(:)],Z(:));

disp(['Using fast,     theta=' num2str(svm.theta,'%5.3e') ', C=' num2str(svm.C,'%5.3e') ', err_gen=' num2str(fast_err,'%5.2f') '%, time=' CODES.common.time(time_fast)])
disp(['Using loo,      theta=' num2str(svm1.theta,'%5.3e') ', C=' num2str(svm1.C,'%5.3e') ', err_gen=' num2str(loo_err,'%5.2f') '%, time=' CODES.common.time(time_loo)])
disp(['Using chapelle, theta=' num2str(svm2.theta,'%5.3e') ', C=' num2str(svm2.C,'%5.3e') ', err_gen=' num2str(chapelle_err,'%5.2f') '%, time=' CODES.common.time(time_chapelle)])
disp(['Using auc,      theta=' num2str(svm3.theta,'%5.3e') ', C=' num2str(svm3.C,'%5.3e') ', err_gen=' num2str(auc_err,'%5.2f') '%, time=' CODES.common.time(time_auc)])

[X,Y]=meshgrid(linspace(0,1,100));
Z=reshape(f([X(:) Y(:)]),100,100);
figure('Position',[200 200 500 500])
contour(X,Y,Z,[0 0],'k')
hold on
svm.isoplot('legend',false,'samples',false,'sv',false,'bcol','r')
svm1.isoplot('legend',false,'samples',false,'sv',false,'bcol','b')
svm2.isoplot('legend',false,'samples',false,'sv',false,'bcol','m')
svm3.isoplot('legend',false,'samples',false,'sv',false,'bcol','g')
legend('True','fast','loo','chapelle','auc')

Using fast,     theta=6.501e-01, C=1.000e+04, err_gen=2.90%, time=5.9445e-02 s
Using loo,      theta=2.298e-01, C=1.000e+02, err_gen=2.56%, time=43s
Using chapelle, theta=3.545e-01, C=1.000e+02, err_gen=2.98%, time=3s
Using auc,      theta=3.250e-01, C=1.000e+02, err_gen=3.05%, time=3s

References

Luntz and Brailovsky (1969): Luntz A., Brailovsky V., (1969) On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica 3(6):6-12
Metz (1978): Metz C. E., (1978) Basic principles of ROC analysis. Seminars in Nuclear Medicine. Technicheskaya Kibernetica 8(4):283-298 - DOI
Jaakkola et al. (1999): Jaakkola T., Diekhans M., Haussler D., (1999) Using the Fisher kernel method to detect remote protein homologies. In: International Conference on Intelligent Systems for Molecular Biology. 149-158 - PMID
Vapnik (2000): Vapnik V., (2000) The nature of statistical learning theory. Springer
Chapelle et al. (2002): Chapelle O., Vapnik V., Bousquet O., Mukherjee S., (2002) Choosing multiple parameters for support vector machines. Machine Learning 46(1-3):131-159 - DOI
Basudhar and Missoum (2010): Basudhar A., Missoum S., (2010) An improved adaptive sampling scheme for the construction of explicit boundaries. Structural and Multidisciplinary Optimization 42(4):517-529 - DOI
Jiang and Missoum (2014): Jiang P., Missoum S., (2014) Optimal SVM parameter selection for non-separable and unbalanced datasets. Structural and Multidisciplinary Optimization 50(4):523-535 - DOI

Computational Optimal Design of
Engineering Systems