WO2018029791A1

WO2018029791A1 - Keyword extraction system, keyword extraction method and program

Info

Publication number: WO2018029791A1
Application number: PCT/JP2016/073492
Authority: WO
Inventors: 容朱鄭
Original assignee: 楽天株式会社
Priority date: 2016-08-09
Filing date: 2016-08-09
Publication date: 2018-02-15
Also published as: JPWO2018029791A1; JP6457153B2

Abstract

The purpose of the present invention is to extract a keyword expressing a characteristic of an object from a sentence relating to the object. Provided is a keyword extraction system that: acquires a sentence relating to an object; identifies a plurality of adjective words and a plurality of noun words from the acquired sentence; calculates a score, for each of the plurality of identified noun words, indicating the frequency with which each of the noun words is used with an adjective word; and selects, on the basis of the calculated score, one or more words from among the plurality of noun words as a word expressing a characteristic of the object.

Description

Keyword extraction system, keyword extraction method and program

The present invention relates to a keyword extraction system, a keyword extraction method, and a program.

There is a technique for extracting a word having a correlation with a theme from a sentence described with respect to a theme based on the appearance frequency of the word in the sentence.

Non-Patent Document 1 discloses that a word having a correlation with a political party is extracted by statistically processing the appearance frequency of words in documents related to the two political parties.

For example, it is conceivable to extract a keyword (for example, “taste” or “texture”) representing the characteristics of the object from the review text of the object (for example, “candy”). If keywords are simply extracted using the appearance frequency, keywords that do not represent the characteristics of the object (for example, “shipping”) are also extracted, which is not effective.

The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique capable of extracting a keyword representing a characteristic of an object from a sentence related to the object.

In order to solve the above problems, a keyword extraction system according to the present invention includes an acquisition unit that acquires a sentence related to an object, and an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence. And for each of the identified plurality of noun words, a score calculation means for calculating a frequency indicating a frequency with which each of the noun words is used together with an adjective word, and based on the calculated score Characteristic selection means for selecting one or more of the plurality of noun words as words representing the characteristics of the object.

Further, the keyword extraction method according to the present invention includes a step of acquiring a sentence related to an object, a step of identifying a plurality of adjective words and a plurality of noun words from the acquired sentence, and the identified plurality Calculating a score indicating the frequency with which each noun word is used together with an adjective word for each of the noun words, and one of the plurality of noun words based on the calculated score Or selecting a plurality as words representing the characteristics of the object.

Further, the program according to the present invention includes an acquisition unit that acquires a sentence relating to an object, an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence, and the plurality of identified nouns. Score calculating means for calculating a frequency indicating the frequency with which each of the noun words is used together with the adjective word, one of the plurality of noun words based on the calculated score, The computer is caused to function as characteristic selection means for selecting a plurality of words as words representing the characteristics of the object.

According to the present invention, a keyword representing a characteristic of an object can be extracted from a sentence related to the object.

In one aspect of the present invention, the score calculation means may determine whether each noun word is an adjective based on a relative position between each of the identified plural noun words and each of the identified plural adjective words. You may calculate the score which shows the frequency used with this word.

In one aspect of the present invention, the score calculation means may determine whether each noun word is an adjective word based on an adjective word whose distance from each of the identified plural noun words is smaller than a predetermined value. You may calculate the score which shows the frequency used together.

In one aspect of the present invention, the score calculation means may determine the score based on a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value. May be calculated.

In one aspect of the present invention, the score calculation means includes a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value, and the distance is predetermined. The score may be calculated based on whether or not an adjective word smaller than the value is behind the noun word.

In one aspect of the present invention, the score calculating means may calculate the score further based on a group to which an adjective word used together with the respective noun word belongs.

In one aspect of the present invention, the keyword extraction system may further include an analysis unit that acquires another sentence describing the object and detects the selected word that is not included in the acquired other sentence. Good.

It is a figure which shows an example of the analysis system concerning embodiment of this invention. It is a figure which shows an example of the hardware constitutions of an analysis server. It is a block diagram which shows the function which an analysis server implement | achieves. It is a flowchart which shows an example of a process of an object description part and a user text collection part. It is a figure which shows an example of the information stored in an object information storage part. It is a figure which shows an example of a user text input screen. It is a figure which shows an example of the data stored in a user text storage part. It is a flowchart which shows an example of a process of a user text reading part, a user text analysis part, and an explanatory sentence analysis part. It is a figure which shows the example of the word divided | segmented from the text and the part of speech was identified. It is a flowchart which shows an example of a process of a score calculation part. It is a flowchart which shows another example of the process of a score calculation part. It is a figure which shows an example of the word of the noun contained in the text about a certain object, and was selected by the characteristic selection part.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The overlapping description is abbreviate | omitted about the structure which attached | subjected the same code | symbol. In the present embodiment, as a keyword extraction system, an analysis system that receives a review text about an object such as a product from a user, analyzes the received text, and extracts a keyword indicating a characteristic (attribute) of the object is described. To do. Note that an object such as an organization or a service provider may be processed instead of a product.

FIG. 1 is a diagram showing an example of an analysis system according to an embodiment of the present invention. This analysis system includes an analysis server 1 and a user terminal 2. These are connected via the network 3. The network 3 is, for example, a local area network or the Internet.

The analysis server 1 is a server computer. The analysis server 1 executes a web server program (such as httpd), receives information from the user terminal 2 executing the browser program via the Internet, and displays an image (screen) including buttons and character strings on the user terminal 2 Information to be output. Moreover, the analysis server 1 receives the text regarding the object input from the user, and stores it in the database. The analysis server 1 analyzes the received text and extracts words indicating the characteristics of the object.

The user terminal 2 is, for example, a personal computer or a smartphone. The user terminal 2 transmits information input by the user to the analysis server 1 or the like, receives information from the analysis server 1 or the like, and the display output device displays an image corresponding to the information. Control as follows.

FIG. 2 is a diagram illustrating an example of a hardware configuration of the analysis server 1. Each of the analysis server 1 and the user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, and an input / output unit 14.

The processor 11 operates according to a program stored in the storage unit 12. The processor 11 controls the communication unit 13 and the input / output unit 14. The program may be provided via the Internet or the like, or may be provided by being stored in a computer-readable storage medium such as a flash memory or a DVD-ROM. .

The storage unit 12 includes a memory element such as a RAM and a flash memory and an external storage device such as a hard disk drive. The storage unit 12 stores the program. The storage unit 12 stores information input from each unit and calculation results.

The communication unit 13 realizes a function of communicating with other devices, and is configured by, for example, a wired LAN integrated circuit. Based on the control of the processor 11, the communication unit 13 inputs information received from another device to the processor 11 or the storage unit 12 and transmits the information to the other device.

The input / output unit 14 includes a video controller that controls the display output device, a controller that acquires data from the input device, and the like. Examples of input devices include a keyboard, a mouse, and a touch panel. Based on the control of the processor 11, the input / output unit 14 outputs display data to the display output device, and acquires data input by the user operating the input device. The display output device is, for example, a display device connected to the outside.

The user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, an input / output unit 14, and the like, similar to the analysis server 1. The user terminal 2 realizes a function of presenting a screen based on data received from the analysis server 1 or the like and a function of transmitting information such as text input by the user on the screen to the analysis server 1. These functions are realized, for example, when the processor 11 or the like included in the user terminal 2 executes a program such as a browser and performs processing according to data received from the analysis server 1 or the like. These functions may be realized by a dedicated application program installed in the user terminal 2 instead of the browser.

Next, functions and processes realized by the analysis server 1 according to the embodiment of the present invention will be described. FIG. 3 is a block diagram illustrating functions realized by the analysis server 1. The analysis server 1 functionally includes an object explanation unit 51, a user text collection unit 52, a user text reading unit 53, a user text analysis unit 54, an explanation sentence analysis unit 55, an object information storage unit 71, and a user text storage unit 72. . The user text analysis unit 54 functionally includes a part of speech identification unit 57, a score calculation unit 58, and a characteristic selection unit 59. These functions are realized by the processor 11 included in the analysis server 1 executing a program stored in the storage unit 12 and controlling the communication unit 13 and the like.

The object information storage unit 71 is mainly realized by the storage unit 12. The object information storage unit 71 stores, for each object such as a product, object ID, name, category in which the object is included, and information on an introductory sentence input by an administrator such as a store.

The user text storage unit 72 is mainly realized by the storage unit 12. The user text storage unit 72 stores text about the object input by the user. Here, the user text storage unit 72 stores a review of a product that is an object as text input by the user. Here, the object information storage unit 71 and the user text storage unit 72 may be arranged on a server different from the analysis server 1. For example, the object information storage unit 71 and the user text storage unit 72 may be arranged in a database management system installed on another server.

The object explanation unit 51 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13. The object description unit 51 acquires text describing the object from the object information storage unit 71, and transmits data including the text to the user terminal 2 using the communication unit 13.

The user text collection unit 52 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13. The user text collection unit 52 acquires a sentence input by the user for a certain object and stores it in the user text storage unit 72.

The user text reading unit 53 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The user text reading unit 53 acquires the text stored in the user text storage unit 72 and associated with the object.

The user text analysis unit 54 analyzes the acquired text and extracts a word indicating the characteristic (attribute) of the object.

The part-of-speech identifying unit 57 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The part-of-speech identifying unit 57 identifies a plurality of adjective words and a plurality of noun words from the text acquired by the user text reading unit 53.

The score calculation unit 58 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The score calculation unit 58 calculates, for each of the plurality of identified noun words, a score value indicating the frequency with which each noun word is used together with the adjective word.

The characteristic selection unit 59 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The characteristic selection unit 59 selects one or more of the plural noun words as words representing the characteristic (attribute) of the object based on the calculated score.

The explanatory note analysis unit 55 acquires a text that is read by the user text reading unit 53 and is different from the analyzed text. This text is an explanatory text explaining the object. And the explanatory note analysis part 55 detects the word which is not contained in the text among the words showing the characteristic of the object.

Next, processing of the object explanation unit 51 and the user text collection unit 52 will be described. FIG. 4 is a flowchart showing an example of processing of the object explanation unit 51 and the user text collection unit 52.

First, the object explanation unit 51 acquires object information from the object information storage unit 71 (step S101). The object whose information is to be acquired may be an object that is selected in advance by the user and received by the communication unit 13 from the user terminal 2.

FIG. 5 is a diagram illustrating an example of information stored in the object information storage unit 71. The object information storage unit 71 stores object information for each of a plurality of objects. The object information for one object includes an object ID for identifying the object, a name, a category to which the object belongs, an introduction sentence, and an administrator ID indicating an input person who has input the introduction sentence. The introduction text is mainly input by an administrator who manages the object (for example, an administrator of a store that sells the object). There may be a plurality of managers, which are identified by the manager ID. The object information may be a mixture of Japanese and English.

Next, the object explanation unit 51 transmits information for explaining the object and information for allowing the user to input text (sentence) related to the object to the user terminal 2 (step S102). The information describing the object includes the name and description of the object, and the information for inputting the text includes, for example, HTML indicating an input field.

FIG. 6 is a diagram showing an example of a user text input screen. The user text input screen is a screen generated by the user terminal 2 that has received the information transmitted from the object explanation unit 51. The user text input screen includes an explanatory text area 31 in which an explanatory text transmitted from the object description section 51 is arranged, an input area 32 for a user to input a review text, and an input button 33. When the user inputs text and presses the input button 33, the user terminal 2 transmits the input text to the analysis server 1.

When the text is transmitted from the user terminal 2, the user text collection unit 52 receives the text about the object from the user terminal 2 (step S103). Then, the user text collection unit 52 associates the received text with the object and stores it in the user text storage unit 72 (step S104).

FIG. 7 is a diagram illustrating an example of data stored in the user text storage unit 72. The user text storage unit 72 stores a plurality of records, and each record includes text, an object ID of an object to be text, and a user ID of a user who has input the text. One record corresponds to one review input by the user.

Here, processing of the user text reading unit 53, the user text analysis unit 54, and the explanation sentence analysis unit 55 will be described. FIG. 8 is a flowchart showing an example of processing of the user text reading unit 53, the user text analysis unit 54, and the explanatory sentence analysis unit 55. The process illustrated in FIG. 8 may be performed for a predetermined object or may be repeatedly performed for each object. In the process illustrated in FIG. 8, a word representing characteristics of a certain object is extracted, but a word representing characteristics of a plurality of objects belonging to a certain category may be extracted.

In the process shown in FIG. 8, first, the user text reading unit 53 obtains text data associated with the object to be processed from the user text storage unit 72 (step S201). More specifically, the user text reading unit 53 extracts records having the object ID of the object to be processed from the user text records stored in the storage unit 12 and acquires the text included in those records. To do. If the user text record is stored by the database management system, the record is extracted by searching the record stored in the database management system using the object ID as a search key.

When the text data associated with the object is acquired, the part-of-speech identifying unit 57 breaks down the text into a plurality of words, and identifies each part-of-speech of the broken-down word (step S202). A specific method for decomposing a text into words and identifying the part of speech of the word is generally known as morphological analysis and the like, and thus will not be described. Through this process, noun words and adjective words included in the text are identified. Here, the part-of-speech identifying unit 57 assigns sequence numbers to the divided words. The sequence number of the word located at the beginning of the document is 1.

FIG. 9 is a diagram showing an example of words that are divided from text and part of speech is identified. The upper side shows an example of a word in which a Japanese sentence is decomposed, and the lower side shows an example of a word in which an English sentence is decomposed. For each example, the first line shows the sequence number of the word given from the beginning, and the second line shows the word to be divided. In the example of FIG. 9, a word is shown by dividing a sentence by “/”. The mark on the third line indicates the part of speech of the word immediately above, where “n” corresponds to a noun and “adj” corresponds to an adjective.

When the word part of speech is identified, the score calculation unit 58 calculates, for each noun word, a score value indicating the frequency with which the noun word is used together with the adjective (step S203).

The process in which the score calculation unit 58 calculates the score value will be described in more detail. FIG. 10 is a flowchart showing an example of processing of the score calculation unit 58.

In the process of FIG. 10, the score calculation unit 58 first creates a list of noun words present in the text (step S301). More specifically, a list obtained by removing duplication of noun words identified by the part-of-speech identifying unit 57 is created, and the noun words for which score values are to be calculated are specified. Then, the score calculation unit 58 sets 0 as the score value of each noun word included in the list (step S302).

Since the calculation of the score value is performed for each word name, when there are a plurality of words having the same name, the sum of the score elements calculated for each word becomes the score value.

Next, the score calculation unit 58 acquires the word of the first noun from the plurality of words included in the text, and sets the sequence number (position) of the word of the noun as the variable i (step S303). Then, the score calculation unit 58 detects an adjective word whose distance from the noun word is smaller than a predetermined value. More specifically, the score calculation unit 58 detects an adjective word from the (i−2) th to the (i + 2) th word (step S304). Note that the range of acquired words may be changed based on experimental results or the like.

Then, the score calculation unit 58 calculates a score element for each of the selected adjective words (step S305), and adds the score element to the score value for the acquired noun word (step S306).

Here, when the score element for one noun word is set as Variety, the processing of Step S304 and Step S305 is expressed by a mathematical expression as follows.

Here, “adjective” indicates that calculation is performed for all adjectives, and other words of part of speech are not subject to calculation. dist indicates the relative position of the adjective relative to the position of the noun. In the example of FIG. 10, -2, -1, 1, 1, 2 are taken as the values of dist, and adjectives exceeding the range are not processed. If there is an adjective word after the noun, dist is positive, and if there is an adjective word before the noun, dist is negative. The function f is a monotonically increasing function.

According to this mathematical formula, the value weighted according to the relative position between the adjective word and the noun word becomes the score element, and the score value, which is the sum, reflects the relative position. This weight is a function of the distance between the noun word and the adjective word, and it is more likely that the adjective word follows the noun word than the adjective word precedes the noun word. The weight of the noun word increases. The greater the weight, the greater the score value.

For example, in the sentence shown in the upper example of FIG. 9, the fifth word (noun) is modified by the seventh word (adjective), but the third word (adjective) is the first. It is used to explain the word (noun) and is not used to explain the fifth word (noun). For example, in Japanese, adjectives are often used probabilistically to explain previous nouns, and this formula can suppress the problem of adjectives not being used with nouns being excessively reflected in the score. Note that the score calculation unit 58 may calculate the score element using a linear monotonically increasing function instead of the exponential function.

When the score element is calculated, the score calculation unit 58 determines whether or not the next noun word exists in the text and after the currently acquired noun word (step S307). If the next noun word is present in the text (Y in step S307), the next noun word is acquired, and the sequence number (position) of the noun word in the text is set in the variable i (step i). (S308), the process from step S304 is repeated. When the next noun word does not exist in the text (N in step S307), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S309).

Note that the score calculation unit 58 may calculate the score element with the same weight for the adjective word before the noun and the adjective word after the noun. It also simply counts the number of adjective words around the noun. When the end of a sentence is clearly indicated by a punctuation mark or the like, the score calculation unit 58 may exclude a word included in a sentence adjacent to a sentence in which a noun word is present from a score value calculation target.

Further, the score calculation unit 58 calculates a score element based further on whether the adjective word beside the noun word belongs to a group of words having a positive meaning or a group of words having a negative meaning. May be. For example, the calculation may be performed such that a positive score element is obtained for a word having a positive good meaning and a negative score element is obtained for a word having a negative bad meaning. According to this, the person who analyzes the data can guess the reason for purchasing a product that is a kind of object from a noun with a high score value, or can guess an item that requires improvement of an object from a noun with a low score value. It becomes possible. Moreover, the score calculation part 58 may calculate a score value for every combination of a noun and a group.

In the processing of FIG. 10, the loop processing is performed for the noun word, but the loop processing may be performed for the adjective word. FIG. 11 is a flowchart illustrating another example of the processing of the score calculation unit 58.

In the process of FIG. 11, the score calculation unit 58 first creates a list of noun words present in the text (step S401), and sets 0 for each score value of the noun words included in the list (step S401). S402). These processes are the same as in the example of FIG.

Then, the score calculation unit 58 acquires the first adjective word among the plurality of words included in the text, and sets the sequence number (position) of the adjective word to the variable i (step S403). Then, the score calculation unit 58 detects noun words from the (i−2) th to (i + 2) th words (step S404). Then, the score calculation unit 58 calculates a score element for each word of the selected noun (step S405), and adds the score element to the score value for each noun (step S406). The score calculation unit 58 calculates score elements for the noun word and one adjective before and after the weight according to the relative positions of the noun word and the adjective word. This calculation formula may use an exponential function as in step S304. Further, a fixed value may be simply set as a score element without weighting. In this case, adjectives around nouns are counted.

When the score element is calculated, the score calculation unit 58 determines whether the next noun word is present in the text after the currently acquired adjective word (step S407). If the next adjective word is present in the text (Y in step S407), the next adjective word is acquired, and the sequence number (position) of the adjective word in the text is set to the variable i (step (S408), the process from step S404 is repeated. If the next noun word does not exist in the text (N in step S407), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S409).

Even if the score value is calculated by sequentially obtaining the adjective words in this way, the score value in the noun word can be obtained. Moreover, since there are generally fewer adjective words than noun words, the computational burden can be reduced.

The processing after the score value of the noun word in the text is calculated will be described. When the score value of the noun word in the text is calculated, the characteristic selection unit 59 selects a word indicating the characteristic of the object to be processed based on the calculated score value. The word selected here is a noun. The characteristic selection unit 59 selects, for example, a word having a score value higher than a predetermined threshold or a word having a rank determined by sorting higher than the predetermined threshold as a word indicating the characteristic of the object. In this embodiment, the greater the score value, the greater the frequency with which adjectives and nouns are used in sets.

The word indicating the characteristics of the object is a word indicating the attribute of the object and corresponds to the evaluation item of the object. Through the processing described so far, items that are important in the evaluation of an object can be acquired from a user's review of the object.

FIG. 12 is a diagram illustrating an example of a noun word that is included in text about an object and is selected by the characteristic selection unit 59. FIG. 12 shows an example where the object is a “shirt”. In the example of this figure, as a word indicating the characteristics of the object, a word used for explaining the shirt and indicating an attribute is selected. Since a noun word used together with an adjective is extracted, occurrence of a phenomenon such as “package” or “shipping” that is not related to an object attribute is also suppressed.

When a word indicating the characteristics of an object is selected, the explanation sentence analyzing unit 55 acquires other text about the object. The other text is specifically an explanatory text included in the object information. And the explanatory note analysis part 55 detects the word which is not contained in the explanatory note of the object among the words selected as showing the characteristic of an object (step S205).

The explanatory note analysis unit 55 can detect an unexplained characteristic among the characteristics of the object by the above processing. By correcting the explanatory text for the word (characteristic) detected by the explanatory text analyzing unit 55, it is possible to create an explanatory text that is more easily understood by the user.

In the embodiment described so far, the analysis server 1 is assumed to have functions from the object explanation unit 51 to the explanation sentence analysis unit 55, but some of the functions may be implemented in another computer. . For example, the functions of the object explanation unit 51 and the user text collection unit 52 may be implemented in different servers.

1 analysis server, 2 user terminal, 3 network, 11 processor, 12 storage unit, 13 communication unit, 14 input / output unit, 31 description sentence area, 32 input area, 33 input button, 51 object description part, 52 user text collection part 53, user text reading unit, 54 user text analysis unit, 55 explanation sentence analysis unit, 57 part of speech identification unit, 58 score calculation unit, 59 characteristic selection unit, 71 object information storage unit, 72 user text storage unit.

Claims

An acquisition means for acquiring text about the object;
Identification means for identifying a plurality of adjective words and a plurality of noun words from the acquired sentence;
Score calculating means for calculating a score indicating a frequency with which each of the noun words is used together with an adjective word for each of the plurality of identified noun words;
Characteristic selection means for selecting one or more of the plurality of noun words as a word representing the characteristic of the object based on the calculated score;
A keyword extraction system characterized by including:
The keyword extraction system according to claim 1,
The score calculation means determines the frequency with which each noun word is used together with the adjective word based on the relative position of each of the identified plural noun words and the identified plural adjective words. Calculate the score shown,
A keyword extraction system characterized by that.
The keyword extraction system according to claim 2,
The score calculation means is a score indicating a frequency with which each noun word is used together with an adjective word based on an adjective word whose distance from each of the identified plural noun words is smaller than a predetermined value. To calculate,
A keyword extraction system characterized by that.
In the keyword extraction system according to claim 3,
The score calculating means calculates the score based on a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value;
A keyword extraction system characterized by that.
The keyword extraction system according to claim 4,
The score calculating means includes a distance between an adjective word whose distance to each of the identified plurality of noun words is smaller than a predetermined value and a word of the adjective whose distance is smaller than a predetermined value. Calculating the score based on whether or not the word is behind the noun word,
A keyword extraction system characterized by that.
The keyword extraction system according to claim 4,
The score calculating means calculates the score based further on a group to which an adjective word used together with the respective noun word belongs,
A keyword extraction system characterized by that.
The keyword extraction system according to any one of claims 1 to 6,
It further includes analysis means for acquiring the other sentence that describes the object and detecting the selected word that is not included in the acquired other sentence.
A keyword extraction system characterized by that.
Obtaining a sentence about the object;
Identifying a plurality of adjective words and a plurality of noun words from the acquired sentences;
For each of the identified plurality of noun words, calculating a score indicating the frequency with which each noun word is used with an adjective word;
Selecting one or more of the plurality of noun words as words representing the characteristics of the object based on the calculated score;
A keyword extraction method characterized by including:
An acquisition means for acquiring text about the object;
An identification means for identifying a plurality of adjective words and a plurality of noun words from the acquired sentences;
Score calculating means for calculating a score indicating a frequency with which each of the noun words is used together with an adjective word for each of the plurality of identified noun words;
Characteristic selection means for selecting one or more of the plurality of noun words as words representing the characteristics of the object based on the calculated score;
As a program to make the computer function as.