Japanese Word Similarity and Association Norm (JWSAN) The file "jwsan-2145.csv" includes the following data. ---------- pairID,word1,word2,POS,similarity,association,n_sim,n_asso,JWSAN_1400 p0001,うら悲しい,物憂い,3,2.85,3.49,150,150,0 p0002,おっかない,酷い,3,1.56,2.78,170,140,0 ... p2145,麒麟,花瓶,1,0.57,1.24,150,150,0 ---------- The descriptions of the columns are as follows: pairID: The identification number of a word pair (IDs are shared by JWSAN-2145 and JWSAN-1400) word1, word2: Two Japanese words of a pair POS: Part of speech (1=noun, 2=verb, 3=adjective) similarity: The mean similarity rating (0-6) association: The mean association rating (0-6) n_sim: The number of similarity ratings (= the number of rators who rated the degree of similarity) n_asso: The number of association ratings (= the number of rators who rated the degree of association) JWSAN_1400: 1=pair included in JWSAN-1400,0=pair not included in JWSAN-1400 The file "jwsan-1400.csv" is generated by selecting pairs with 1 in the column of JWSAN_1400 and removing the column of JWSAN_1400. (We used the following Unix command to generate jwsan-1400.csv.) >> egrep '1$|00$' jwsan-2145.csv | cut -f 1-8 -d ',' > jwsan-1400.csv