Japanese Word Similarity and Association Norm (JWSAN)

The file "jwsan-2145.csv" includes the following data.

----------
pairID,word1,word2,POS,similarity,association,n_sim,n_asso,JWSAN_1400
p0001,うら悲しい,物憂い,3,2.85,3.49,150,150,0
p0002,おっかない,酷い,3,1.56,2.78,170,140,0
...
p2145,麒麟,花瓶,1,0.57,1.24,150,150,0
----------

The descriptions of the columns are as follows:

pairID: The identification number of a word pair (IDs are shared by JWSAN-2145 and JWSAN-1400)
word1, word2: Two Japanese words of a pair
POS: Part of speech (1=noun, 2=verb, 3=adjective)
similarity: The mean similarity rating (0-6)
association: The mean association rating (0-6)
n_sim: The number of similarity ratings (= the number of rators who rated the degree of similarity)
n_asso: The number of association ratings (= the number of rators who rated the degree of association)
JWSAN_1400: 1=pair included in JWSAN-1400，0=pair not included in JWSAN-1400

The file "jwsan-1400.csv" is generated by selecting pairs with 1 in the column of JWSAN_1400 and removing the column of JWSAN_1400.
(We used the following Unix command to generate jwsan-1400.csv.)

>> egrep '1$|00$' jwsan-2145.csv | cut -f 1-8 -d ',' > jwsan-1400.csv