numpy - Python - How to reduce the number of entries per row or symmetric matrix by keeping K largest values -


i have symmetric similarity matrix , want keep k largest value in each row.

here's code want, i'm wondering if there's better way. particularly flatten/reshape bit clumsy. in advance.

note nrows (below) have scale tens of thousands.

from scipy.spatial.distance import pdist, squareform random.seed(1) nrows = 4 = (random.rand(nrows,nrows))  # generate symmetric similarity matrix s = 1-squareform( pdist( a, 'cosine' ) ) print "start with:\n", s  # generate sorted indices ss = argsort(s.view(np.ndarray), axis=1)[:,::-1] s2 = ss + (arange(ss.shape[0])*ss.shape[1])[:,none]  # zero-out after k-largest-value entries in each row k = 3 # number of top-values keep, per row s = s.flatten() s[s2[:,k:].flatten()] = 0 print "desired output:\n", s.reshape(nrows,nrows) 

gives:

start with: [[ 1.          0.61103296  0.82177072  0.92487807]  [ 0.61103296  1.          0.94246304  0.7212526 ]  [ 0.82177072  0.94246304  1.          0.87247418]  [ 0.92487807  0.7212526   0.87247418  1.        ]] desired output: [[ 1.          0.          0.82177072  0.92487807]  [ 0.          1.          0.94246304  0.7212526 ]  [ 0.          0.94246304  1.          0.87247418]  [ 0.92487807  0.          0.87247418  1.        ]] 

not considerable improvement, avoid flatten , reshape can use np.put:

# generate sorted indices ss = np.argsort(s.view(np.ndarray), axis=1)[:,::-1] ss += (np.arange(ss.shape[0])*ss.shape[1])[:,none] #add in place, trivial improvement  k=3 np.put(s,ss[:,k:],0) #or s.flat[ss[:,k:]]=0 print s  [[ 1.          0.          0.82177072  0.92487807]  [ 0.          1.          0.94246304  0.7212526 ]  [ 0.          0.94246304  1.          0.87247418]  [ 0.92487807  0.          0.87247418  1.        ]] 

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -