matlab - Remove highly correlated components -


i have got problem remove highly correlated components. can ask how this?

for example, have got 40 instances 20 features (random created). feature 2 , 18 highly correlated feature 4. , feature 6 highly correlated feature 10. how remove highly correlated (redundant) features such 2, 18 , 10? essentially, need index of remaining features 1, 3, 4, 5, 6, ..., 9, 11, ..., 17, 19, 20.

matlab codes:

x = randn(40,20); x(:,2) = 2.*x(:,4); x(:,18) = 3.*x(:,4); x(:,6) = 100.*x(:,10); x_corr = corr(x); size(x_corr)  figure, imagesc(x_corr),colorbar 

correlation matrix x_corr looks like

correlation matrix <code>x_corr</code>

edit:

i worked out way:

x_corr = x_corr - diag(diag(x_corr)); [x_corrx, x_corry] = find(x_corr>0.8);  = 1:size(x_corrx,1)     xx = find(x_corry == x_corrx(i));     x_corrx(xx,:) = 0;     x_corry(xx,:) = 0; end x_corrx = unique(x_corrx); x_corrx = x_corrx(2:end); im = setxor(x_corrx, (1:20)'); 

am right? or have better idea please post. thanks.

edit2: method same using pca?

it seems quite clear idea of yours, remove highly correlated variables analysis not same pca. pca way rank reduction of seems complicated problem, 1 turns out have few independent things happening. pca uses eigenvalue (or svd) decomposition achieve goal.

anyway, might have problem. example, suppose highly correlated b, , b highly correlated c. however, need not true , c highly correlated. since correlation can viewed measure of angle between vectors in corresponding high dimensional vector space, can made happen.

as trivial example, i'll create 2 variables, , b, correlated @ "moderate" level.

n = 50; = rand(n,1); b = + randn(n,1)/2; corr([a,b]) ans =             1      0.55443       0.55443            1 

so here 0.55 correlation. i'll create c virtually average of , b. highly correlated definition.

c = [a + b]/2 + randn(n,1)/100; corr([a,b,c]) ans =             1      0.55443      0.80119       0.55443            1      0.94168       0.80119      0.94168            1 

clearly c bad guy here. if 1 @ pair [a,c] , remove analysis, same pair [b,c] , remove b, have made wrong choices. , trivially constructed example.

in fact, true eigenvalues of correlation matrix might of interest.

[v,d] = eig(corr([a,b,c])) v =      -0.53056     -0.78854       -0.311      -0.57245      0.60391     -0.55462      -0.62515      0.11622       0.7718 d =        2.5422            0            0             0      0.45729            0             0            0   0.00046204 

the fact d has 2 significant diagonal elements, , tiny 1 tells really, 2 variable problem. pca not tell vector remove though, , problem less clear more variables, many interactions between of them.


Comments

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -

javascript - addthis share facebook and google+ url -