numpy - Python - How to reduce the number of entries per row or symmetric matrix by keeping K largest values -
i have symmetric similarity matrix , want keep k largest value in each row.
here's code want, i'm wondering if there's better way. particularly flatten/reshape bit clumsy. in advance.
note nrows (below) have scale tens of thousands.
from scipy.spatial.distance import pdist, squareform random.seed(1) nrows = 4 = (random.rand(nrows,nrows)) # generate symmetric similarity matrix s = 1-squareform( pdist( a, 'cosine' ) ) print "start with:\n", s # generate sorted indices ss = argsort(s.view(np.ndarray), axis=1)[:,::-1] s2 = ss + (arange(ss.shape[0])*ss.shape[1])[:,none] # zero-out after k-largest-value entries in each row k = 3 # number of top-values keep, per row s = s.flatten() s[s2[:,k:].flatten()] = 0 print "desired output:\n", s.reshape(nrows,nrows)
gives:
start with: [[ 1. 0.61103296 0.82177072 0.92487807] [ 0.61103296 1. 0.94246304 0.7212526 ] [ 0.82177072 0.94246304 1. 0.87247418] [ 0.92487807 0.7212526 0.87247418 1. ]] desired output: [[ 1. 0. 0.82177072 0.92487807] [ 0. 1. 0.94246304 0.7212526 ] [ 0. 0.94246304 1. 0.87247418] [ 0.92487807 0. 0.87247418 1. ]]
not considerable improvement, avoid flatten , reshape can use np.put
:
# generate sorted indices ss = np.argsort(s.view(np.ndarray), axis=1)[:,::-1] ss += (np.arange(ss.shape[0])*ss.shape[1])[:,none] #add in place, trivial improvement k=3 np.put(s,ss[:,k:],0) #or s.flat[ss[:,k:]]=0 print s [[ 1. 0. 0.82177072 0.92487807] [ 0. 1. 0.94246304 0.7212526 ] [ 0. 0.94246304 1. 0.87247418] [ 0.92487807 0. 0.87247418 1. ]]
Comments
Post a Comment