WO2018029791A1 - Keyword extraction system, keyword extraction method and program - Google Patents

Keyword extraction system, keyword extraction method and program Download PDF

Info

Publication number
WO2018029791A1
WO2018029791A1 PCT/JP2016/073492 JP2016073492W WO2018029791A1 WO 2018029791 A1 WO2018029791 A1 WO 2018029791A1 JP 2016073492 W JP2016073492 W JP 2016073492W WO 2018029791 A1 WO2018029791 A1 WO 2018029791A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
noun
words
score
adjective
Prior art date
Application number
PCT/JP2016/073492
Other languages
French (fr)
Japanese (ja)
Inventor
容朱 鄭
Original Assignee
楽天株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 楽天株式会社 filed Critical 楽天株式会社
Priority to PCT/JP2016/073492 priority Critical patent/WO2018029791A1/en
Priority to JP2018516205A priority patent/JP6457153B2/en
Publication of WO2018029791A1 publication Critical patent/WO2018029791A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • the present invention relates to a keyword extraction system, a keyword extraction method, and a program.
  • Non-Patent Document 1 discloses that a word having a correlation with a political party is extracted by statistically processing the appearance frequency of words in documents related to the two political parties.
  • a keyword for example, “taste” or “texture” representing the characteristics of the object from the review text of the object (for example, “candy”). If keywords are simply extracted using the appearance frequency, keywords that do not represent the characteristics of the object (for example, “shipping”) are also extracted, which is not effective.
  • the present invention has been made in view of the above problems, and an object of the present invention is to provide a technique capable of extracting a keyword representing a characteristic of an object from a sentence related to the object.
  • a keyword extraction system includes an acquisition unit that acquires a sentence related to an object, and an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence. And for each of the identified plurality of noun words, a score calculation means for calculating a frequency indicating a frequency with which each of the noun words is used together with an adjective word, and based on the calculated score Characteristic selection means for selecting one or more of the plurality of noun words as words representing the characteristics of the object.
  • the keyword extraction method includes a step of acquiring a sentence related to an object, a step of identifying a plurality of adjective words and a plurality of noun words from the acquired sentence, and the identified plurality Calculating a score indicating the frequency with which each noun word is used together with an adjective word for each of the noun words, and one of the plurality of noun words based on the calculated score Or selecting a plurality as words representing the characteristics of the object.
  • the program according to the present invention includes an acquisition unit that acquires a sentence relating to an object, an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence, and the plurality of identified nouns.
  • Score calculating means for calculating a frequency indicating the frequency with which each of the noun words is used together with the adjective word, one of the plurality of noun words based on the calculated score,
  • the computer is caused to function as characteristic selection means for selecting a plurality of words as words representing the characteristics of the object.
  • a keyword representing a characteristic of an object can be extracted from a sentence related to the object.
  • the score calculation means may determine whether each noun word is an adjective based on a relative position between each of the identified plural noun words and each of the identified plural adjective words. You may calculate the score which shows the frequency used with this word.
  • the score calculation means may determine whether each noun word is an adjective word based on an adjective word whose distance from each of the identified plural noun words is smaller than a predetermined value. You may calculate the score which shows the frequency used together.
  • the score calculation means may determine the score based on a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value. May be calculated.
  • the score calculation means includes a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value, and the distance is predetermined.
  • the score may be calculated based on whether or not an adjective word smaller than the value is behind the noun word.
  • the score calculating means may calculate the score further based on a group to which an adjective word used together with the respective noun word belongs.
  • the keyword extraction system may further include an analysis unit that acquires another sentence describing the object and detects the selected word that is not included in the acquired other sentence. Good.
  • a keyword extraction system an analysis system that receives a review text about an object such as a product from a user, analyzes the received text, and extracts a keyword indicating a characteristic (attribute) of the object is described. To do. Note that an object such as an organization or a service provider may be processed instead of a product.
  • FIG. 1 is a diagram showing an example of an analysis system according to an embodiment of the present invention.
  • This analysis system includes an analysis server 1 and a user terminal 2. These are connected via the network 3.
  • the network 3 is, for example, a local area network or the Internet.
  • the analysis server 1 is a server computer.
  • the analysis server 1 executes a web server program (such as httpd), receives information from the user terminal 2 executing the browser program via the Internet, and displays an image (screen) including buttons and character strings on the user terminal 2 Information to be output.
  • the analysis server 1 receives the text regarding the object input from the user, and stores it in the database.
  • the analysis server 1 analyzes the received text and extracts words indicating the characteristics of the object.
  • the user terminal 2 is, for example, a personal computer or a smartphone.
  • the user terminal 2 transmits information input by the user to the analysis server 1 or the like, receives information from the analysis server 1 or the like, and the display output device displays an image corresponding to the information. Control as follows.
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of the analysis server 1.
  • Each of the analysis server 1 and the user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, and an input / output unit 14.
  • the processor 11 operates according to a program stored in the storage unit 12.
  • the processor 11 controls the communication unit 13 and the input / output unit 14.
  • the program may be provided via the Internet or the like, or may be provided by being stored in a computer-readable storage medium such as a flash memory or a DVD-ROM. .
  • the storage unit 12 includes a memory element such as a RAM and a flash memory and an external storage device such as a hard disk drive.
  • the storage unit 12 stores the program.
  • the storage unit 12 stores information input from each unit and calculation results.
  • the communication unit 13 realizes a function of communicating with other devices, and is configured by, for example, a wired LAN integrated circuit. Based on the control of the processor 11, the communication unit 13 inputs information received from another device to the processor 11 or the storage unit 12 and transmits the information to the other device.
  • the input / output unit 14 includes a video controller that controls the display output device, a controller that acquires data from the input device, and the like.
  • Examples of input devices include a keyboard, a mouse, and a touch panel.
  • the input / output unit 14 Based on the control of the processor 11, the input / output unit 14 outputs display data to the display output device, and acquires data input by the user operating the input device.
  • the display output device is, for example, a display device connected to the outside.
  • the user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, an input / output unit 14, and the like, similar to the analysis server 1.
  • the user terminal 2 realizes a function of presenting a screen based on data received from the analysis server 1 or the like and a function of transmitting information such as text input by the user on the screen to the analysis server 1.
  • These functions are realized, for example, when the processor 11 or the like included in the user terminal 2 executes a program such as a browser and performs processing according to data received from the analysis server 1 or the like.
  • These functions may be realized by a dedicated application program installed in the user terminal 2 instead of the browser.
  • FIG. 3 is a block diagram illustrating functions realized by the analysis server 1.
  • the analysis server 1 functionally includes an object explanation unit 51, a user text collection unit 52, a user text reading unit 53, a user text analysis unit 54, an explanation sentence analysis unit 55, an object information storage unit 71, and a user text storage unit 72.
  • the user text analysis unit 54 functionally includes a part of speech identification unit 57, a score calculation unit 58, and a characteristic selection unit 59. These functions are realized by the processor 11 included in the analysis server 1 executing a program stored in the storage unit 12 and controlling the communication unit 13 and the like.
  • the object information storage unit 71 is mainly realized by the storage unit 12.
  • the object information storage unit 71 stores, for each object such as a product, object ID, name, category in which the object is included, and information on an introductory sentence input by an administrator such as a store.
  • the user text storage unit 72 is mainly realized by the storage unit 12.
  • the user text storage unit 72 stores text about the object input by the user.
  • the user text storage unit 72 stores a review of a product that is an object as text input by the user.
  • the object information storage unit 71 and the user text storage unit 72 may be arranged on a server different from the analysis server 1.
  • the object information storage unit 71 and the user text storage unit 72 may be arranged in a database management system installed on another server.
  • the object explanation unit 51 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13.
  • the object description unit 51 acquires text describing the object from the object information storage unit 71, and transmits data including the text to the user terminal 2 using the communication unit 13.
  • the user text collection unit 52 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13.
  • the user text collection unit 52 acquires a sentence input by the user for a certain object and stores it in the user text storage unit 72.
  • the user text reading unit 53 is realized mainly by the processor 11 executing a program and controlling the storage unit 12.
  • the user text reading unit 53 acquires the text stored in the user text storage unit 72 and associated with the object.
  • the user text analysis unit 54 analyzes the acquired text and extracts a word indicating the characteristic (attribute) of the object.
  • the part-of-speech identifying unit 57 is realized mainly by the processor 11 executing a program and controlling the storage unit 12.
  • the part-of-speech identifying unit 57 identifies a plurality of adjective words and a plurality of noun words from the text acquired by the user text reading unit 53.
  • the score calculation unit 58 is realized mainly by the processor 11 executing a program and controlling the storage unit 12.
  • the score calculation unit 58 calculates, for each of the plurality of identified noun words, a score value indicating the frequency with which each noun word is used together with the adjective word.
  • the characteristic selection unit 59 is realized mainly by the processor 11 executing a program and controlling the storage unit 12.
  • the characteristic selection unit 59 selects one or more of the plural noun words as words representing the characteristic (attribute) of the object based on the calculated score.
  • the explanatory note analysis unit 55 acquires a text that is read by the user text reading unit 53 and is different from the analyzed text. This text is an explanatory text explaining the object. And the explanatory note analysis part 55 detects the word which is not contained in the text among the words showing the characteristic of the object.
  • FIG. 4 is a flowchart showing an example of processing of the object explanation unit 51 and the user text collection unit 52.
  • the object explanation unit 51 acquires object information from the object information storage unit 71 (step S101).
  • the object whose information is to be acquired may be an object that is selected in advance by the user and received by the communication unit 13 from the user terminal 2.
  • FIG. 5 is a diagram illustrating an example of information stored in the object information storage unit 71.
  • the object information storage unit 71 stores object information for each of a plurality of objects.
  • the object information for one object includes an object ID for identifying the object, a name, a category to which the object belongs, an introduction sentence, and an administrator ID indicating an input person who has input the introduction sentence.
  • the introduction text is mainly input by an administrator who manages the object (for example, an administrator of a store that sells the object). There may be a plurality of managers, which are identified by the manager ID.
  • the object information may be a mixture of Japanese and English.
  • the object explanation unit 51 transmits information for explaining the object and information for allowing the user to input text (sentence) related to the object to the user terminal 2 (step S102).
  • the information describing the object includes the name and description of the object, and the information for inputting the text includes, for example, HTML indicating an input field.
  • FIG. 6 is a diagram showing an example of a user text input screen.
  • the user text input screen is a screen generated by the user terminal 2 that has received the information transmitted from the object explanation unit 51.
  • the user text input screen includes an explanatory text area 31 in which an explanatory text transmitted from the object description section 51 is arranged, an input area 32 for a user to input a review text, and an input button 33.
  • the user terminal 2 transmits the input text to the analysis server 1.
  • the user text collection unit 52 receives the text about the object from the user terminal 2 (step S103). Then, the user text collection unit 52 associates the received text with the object and stores it in the user text storage unit 72 (step S104).
  • FIG. 7 is a diagram illustrating an example of data stored in the user text storage unit 72.
  • the user text storage unit 72 stores a plurality of records, and each record includes text, an object ID of an object to be text, and a user ID of a user who has input the text. One record corresponds to one review input by the user.
  • FIG. 8 is a flowchart showing an example of processing of the user text reading unit 53, the user text analysis unit 54, and the explanatory sentence analysis unit 55.
  • the process illustrated in FIG. 8 may be performed for a predetermined object or may be repeatedly performed for each object.
  • a word representing characteristics of a certain object is extracted, but a word representing characteristics of a plurality of objects belonging to a certain category may be extracted.
  • the user text reading unit 53 obtains text data associated with the object to be processed from the user text storage unit 72 (step S201). More specifically, the user text reading unit 53 extracts records having the object ID of the object to be processed from the user text records stored in the storage unit 12 and acquires the text included in those records. To do. If the user text record is stored by the database management system, the record is extracted by searching the record stored in the database management system using the object ID as a search key.
  • the part-of-speech identifying unit 57 breaks down the text into a plurality of words, and identifies each part-of-speech of the broken-down word (step S202).
  • a specific method for decomposing a text into words and identifying the part of speech of the word is generally known as morphological analysis and the like, and thus will not be described. Through this process, noun words and adjective words included in the text are identified.
  • the part-of-speech identifying unit 57 assigns sequence numbers to the divided words. The sequence number of the word located at the beginning of the document is 1.
  • FIG. 9 is a diagram showing an example of words that are divided from text and part of speech is identified.
  • the upper side shows an example of a word in which a Japanese sentence is decomposed, and the lower side shows an example of a word in which an English sentence is decomposed.
  • the first line shows the sequence number of the word given from the beginning, and the second line shows the word to be divided.
  • a word is shown by dividing a sentence by “/”.
  • the mark on the third line indicates the part of speech of the word immediately above, where “n” corresponds to a noun and “adj” corresponds to an adjective.
  • the score calculation unit 58 calculates, for each noun word, a score value indicating the frequency with which the noun word is used together with the adjective (step S203).
  • FIG. 10 is a flowchart showing an example of processing of the score calculation unit 58.
  • the score calculation unit 58 first creates a list of noun words present in the text (step S301). More specifically, a list obtained by removing duplication of noun words identified by the part-of-speech identifying unit 57 is created, and the noun words for which score values are to be calculated are specified. Then, the score calculation unit 58 sets 0 as the score value of each noun word included in the list (step S302).
  • the score calculation unit 58 acquires the word of the first noun from the plurality of words included in the text, and sets the sequence number (position) of the word of the noun as the variable i (step S303). Then, the score calculation unit 58 detects an adjective word whose distance from the noun word is smaller than a predetermined value. More specifically, the score calculation unit 58 detects an adjective word from the (i ⁇ 2) th to the (i + 2) th word (step S304). Note that the range of acquired words may be changed based on experimental results or the like.
  • the score calculation unit 58 calculates a score element for each of the selected adjective words (step S305), and adds the score element to the score value for the acquired noun word (step S306).
  • Step S304 and Step S305 are expressed by a mathematical expression as follows.
  • dist indicates the relative position of the adjective relative to the position of the noun.
  • -2, -1, 1, 1, 2 are taken as the values of dist, and adjectives exceeding the range are not processed. If there is an adjective word after the noun, dist is positive, and if there is an adjective word before the noun, dist is negative.
  • the function f is a monotonically increasing function.
  • the value weighted according to the relative position between the adjective word and the noun word becomes the score element, and the score value, which is the sum, reflects the relative position.
  • This weight is a function of the distance between the noun word and the adjective word, and it is more likely that the adjective word follows the noun word than the adjective word precedes the noun word.
  • the weight of the noun word increases. The greater the weight, the greater the score value.
  • the fifth word (noun) is modified by the seventh word (adjective), but the third word (adjective) is the first. It is used to explain the word (noun) and is not used to explain the fifth word (noun).
  • adjectives are often used probabilistically to explain previous nouns, and this formula can suppress the problem of adjectives not being used with nouns being excessively reflected in the score.
  • the score calculation unit 58 may calculate the score element using a linear monotonically increasing function instead of the exponential function.
  • the score calculation unit 58 determines whether or not the next noun word exists in the text and after the currently acquired noun word (step S307). If the next noun word is present in the text (Y in step S307), the next noun word is acquired, and the sequence number (position) of the noun word in the text is set in the variable i (step i). (S308), the process from step S304 is repeated. When the next noun word does not exist in the text (N in step S307), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S309).
  • the score calculation unit 58 may calculate the score element with the same weight for the adjective word before the noun and the adjective word after the noun. It also simply counts the number of adjective words around the noun. When the end of a sentence is clearly indicated by a punctuation mark or the like, the score calculation unit 58 may exclude a word included in a sentence adjacent to a sentence in which a noun word is present from a score value calculation target.
  • the score calculation unit 58 calculates a score element based further on whether the adjective word beside the noun word belongs to a group of words having a positive meaning or a group of words having a negative meaning. May be. For example, the calculation may be performed such that a positive score element is obtained for a word having a positive good meaning and a negative score element is obtained for a word having a negative bad meaning. According to this, the person who analyzes the data can guess the reason for purchasing a product that is a kind of object from a noun with a high score value, or can guess an item that requires improvement of an object from a noun with a low score value. It becomes possible. Moreover, the score calculation part 58 may calculate a score value for every combination of a noun and a group.
  • FIG. 11 is a flowchart illustrating another example of the processing of the score calculation unit 58.
  • the score calculation unit 58 first creates a list of noun words present in the text (step S401), and sets 0 for each score value of the noun words included in the list (step S401). S402). These processes are the same as in the example of FIG.
  • the score calculation unit 58 acquires the first adjective word among the plurality of words included in the text, and sets the sequence number (position) of the adjective word to the variable i (step S403). Then, the score calculation unit 58 detects noun words from the (i ⁇ 2) th to (i + 2) th words (step S404). Then, the score calculation unit 58 calculates a score element for each word of the selected noun (step S405), and adds the score element to the score value for each noun (step S406). The score calculation unit 58 calculates score elements for the noun word and one adjective before and after the weight according to the relative positions of the noun word and the adjective word. This calculation formula may use an exponential function as in step S304. Further, a fixed value may be simply set as a score element without weighting. In this case, adjectives around nouns are counted.
  • the score calculation unit 58 determines whether the next noun word is present in the text after the currently acquired adjective word (step S407). If the next adjective word is present in the text (Y in step S407), the next adjective word is acquired, and the sequence number (position) of the adjective word in the text is set to the variable i (step (S408), the process from step S404 is repeated. If the next noun word does not exist in the text (N in step S407), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S409).
  • the characteristic selection unit 59 selects a word indicating the characteristic of the object to be processed based on the calculated score value.
  • the word selected here is a noun.
  • the characteristic selection unit 59 selects, for example, a word having a score value higher than a predetermined threshold or a word having a rank determined by sorting higher than the predetermined threshold as a word indicating the characteristic of the object.
  • the greater the score value the greater the frequency with which adjectives and nouns are used in sets.
  • the word indicating the characteristics of the object is a word indicating the attribute of the object and corresponds to the evaluation item of the object.
  • FIG. 12 is a diagram illustrating an example of a noun word that is included in text about an object and is selected by the characteristic selection unit 59.
  • FIG. 12 shows an example where the object is a “shirt”.
  • a word indicating the characteristics of the object a word used for explaining the shirt and indicating an attribute is selected. Since a noun word used together with an adjective is extracted, occurrence of a phenomenon such as “package” or “shipping” that is not related to an object attribute is also suppressed.
  • the explanation sentence analyzing unit 55 acquires other text about the object.
  • the other text is specifically an explanatory text included in the object information.
  • the explanatory note analysis part 55 detects the word which is not contained in the explanatory note of the object among the words selected as showing the characteristic of an object (step S205).
  • the explanatory note analysis unit 55 can detect an unexplained characteristic among the characteristics of the object by the above processing. By correcting the explanatory text for the word (characteristic) detected by the explanatory text analyzing unit 55, it is possible to create an explanatory text that is more easily understood by the user.
  • the analysis server 1 is assumed to have functions from the object explanation unit 51 to the explanation sentence analysis unit 55, but some of the functions may be implemented in another computer. .
  • the functions of the object explanation unit 51 and the user text collection unit 52 may be implemented in different servers.
  • 1 analysis server 1 user terminal, 3 network, 11 processor, 12 storage unit, 13 communication unit, 14 input / output unit, 31 description sentence area, 32 input area, 33 input button, 51 object description part, 52 user text collection part 53, user text reading unit, 54 user text analysis unit, 55 explanation sentence analysis unit, 57 part of speech identification unit, 58 score calculation unit, 59 characteristic selection unit, 71 object information storage unit, 72 user text storage unit.

Abstract

The purpose of the present invention is to extract a keyword expressing a characteristic of an object from a sentence relating to the object. Provided is a keyword extraction system that: acquires a sentence relating to an object; identifies a plurality of adjective words and a plurality of noun words from the acquired sentence; calculates a score, for each of the plurality of identified noun words, indicating the frequency with which each of the noun words is used with an adjective word; and selects, on the basis of the calculated score, one or more words from among the plurality of noun words as a word expressing a characteristic of the object.

Description

キーワード抽出システム、キーワード抽出方法およびプログラムKeyword extraction system, keyword extraction method and program
 本発明はキーワード抽出システム、キーワード抽出方法およびプログラムに関する。 The present invention relates to a keyword extraction system, a keyword extraction method, and a program.
 あるテーマについて記載された文章から、その文章における単語の出現頻度に基づいて、そのテーマと相関を有する単語を抽出する技術がある。 There is a technique for extracting a word having a correlation with a theme from a sentence described with respect to a theme based on the appearance frequency of the word in the sentence.
 非特許文献1には、2つの政党に関連する文書における単語の出現頻度を統計的に処理することにより、政党と相関を有する単語を抽出することが開示されている。 Non-Patent Document 1 discloses that a word having a correlation with a political party is extracted by statistically processing the appearance frequency of words in documents related to the two political parties.
 例えば、オブジェクト(例えば「お菓子」)のレビューの文章からそのオブジェクトの特性を表すキーワード(例えば「味」「食感」)を抽出することが考えられる。単に出現頻度を用いてキーワードを抽出すると、オブジェクトの特性を表さないキーワード(例えば「発送」)も抽出されてしまい、効果的でなかった。 For example, it is conceivable to extract a keyword (for example, “taste” or “texture”) representing the characteristics of the object from the review text of the object (for example, “candy”). If keywords are simply extracted using the appearance frequency, keywords that do not represent the characteristics of the object (for example, “shipping”) are also extracted, which is not effective.
 本発明は上記課題を鑑みてなされたものであって、その目的は、あるオブジェクトに関連する文章からそのオブジェクトの特性を表すキーワードを抽出することが可能な技術を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique capable of extracting a keyword representing a characteristic of an object from a sentence related to the object.
 上記課題を解決するために、本発明にかかるキーワード抽出システムは、オブジェクトに関する文章を取得する取得手段と、取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別する識別手段と、前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するスコア算出手段と、前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択する特性選択手段と、を含む。 In order to solve the above problems, a keyword extraction system according to the present invention includes an acquisition unit that acquires a sentence related to an object, and an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence. And for each of the identified plurality of noun words, a score calculation means for calculating a frequency indicating a frequency with which each of the noun words is used together with an adjective word, and based on the calculated score Characteristic selection means for selecting one or more of the plurality of noun words as words representing the characteristics of the object.
 また、本発明にかかるキーワード抽出方法は、オブジェクトに関する文章を取得するステップと、取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別するステップと、前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するステップと、前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択するステップと、を含む。 Further, the keyword extraction method according to the present invention includes a step of acquiring a sentence related to an object, a step of identifying a plurality of adjective words and a plurality of noun words from the acquired sentence, and the identified plurality Calculating a score indicating the frequency with which each noun word is used together with an adjective word for each of the noun words, and one of the plurality of noun words based on the calculated score Or selecting a plurality as words representing the characteristics of the object.
 また、本発明にかかるプログラムは、オブジェクトに関する文章を取得する取得手段、取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別する識別手段、前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するスコア算出手段、前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択する特性選択手段、としてコンピュータを機能させる。 Further, the program according to the present invention includes an acquisition unit that acquires a sentence relating to an object, an identification unit that identifies a plurality of adjective words and a plurality of noun words from the acquired sentence, and the plurality of identified nouns. Score calculating means for calculating a frequency indicating the frequency with which each of the noun words is used together with the adjective word, one of the plurality of noun words based on the calculated score, The computer is caused to function as characteristic selection means for selecting a plurality of words as words representing the characteristics of the object.
本発明によれば、あるオブジェクトに関連する文章から、そのオブジェクトの特性を表すキーワードを抽出することができる。 According to the present invention, a keyword representing a characteristic of an object can be extracted from a sentence related to the object.
 本発明の一態様では、前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれと、識別された複数の形容詞の単語との相対位置に基づいて、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出してもよい。 In one aspect of the present invention, the score calculation means may determine whether each noun word is an adjective based on a relative position between each of the identified plural noun words and each of the identified plural adjective words. You may calculate the score which shows the frequency used with this word.
 本発明の一態様では、前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語に基づいて、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出してもよい。 In one aspect of the present invention, the score calculation means may determine whether each noun word is an adjective word based on an adjective word whose distance from each of the identified plural noun words is smaller than a predetermined value. You may calculate the score which shows the frequency used together.
 本発明の一態様では、前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語と前記名詞の単語との距離に基づいて、前記スコアを算出してもよい。 In one aspect of the present invention, the score calculation means may determine the score based on a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value. May be calculated.
 本発明の一態様では、前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語と前記名詞の単語との距離と、前記距離が所定の値より小さい形容詞の単語が前記名詞の単語の後ろにあるか否かとに基づいて、前記スコアを算出してもよい。 In one aspect of the present invention, the score calculation means includes a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value, and the distance is predetermined. The score may be calculated based on whether or not an adjective word smaller than the value is behind the noun word.
 本発明の一態様では、前記スコア算出手段は、前記それぞれの名詞の単語とともに用いられる形容詞の単語が属するグループにさらに基づいて、前記スコアを算出してもよい。 In one aspect of the present invention, the score calculating means may calculate the score further based on a group to which an adjective word used together with the respective noun word belongs.
 本発明の一態様では、キーワード抽出システムは、前記オブジェクトを説明する他の文を取得し、前記取得された他の文に含まれない、前記選択された単語を検出する解析手段をさらに含んでもよい。 In one aspect of the present invention, the keyword extraction system may further include an analysis unit that acquires another sentence describing the object and detects the selected word that is not included in the acquired other sentence. Good.
本発明の実施形態にかかる分析システムの一例を示す図である。It is a figure which shows an example of the analysis system concerning embodiment of this invention. 分析サーバのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of an analysis server. 分析サーバが実現する機能を示すブロック図である。It is a block diagram which shows the function which an analysis server implement | achieves. オブジェクト説明部およびユーザテキスト収集部の処理の一例を示すフロー図である。It is a flowchart which shows an example of a process of an object description part and a user text collection part. オブジェクト情報格納部に格納される情報の一例を示す図である。It is a figure which shows an example of the information stored in an object information storage part. ユーザテキスト入力画面の一例を示す図である。It is a figure which shows an example of a user text input screen. ユーザテキスト格納部に格納されるデータの一例を示す図である。It is a figure which shows an example of the data stored in a user text storage part. ユーザテキスト読出部、ユーザテキスト解析部、説明文分析部の処理の一例を示すフロー図である。It is a flowchart which shows an example of a process of a user text reading part, a user text analysis part, and an explanatory sentence analysis part. テキストから分割され品詞が識別された単語の例を示す図である。It is a figure which shows the example of the word divided | segmented from the text and the part of speech was identified. スコア算出部の処理の一例を示すフロー図である。It is a flowchart which shows an example of a process of a score calculation part. スコア算出部の処理の他の一例を示すフロー図である。It is a flowchart which shows another example of the process of a score calculation part. あるオブジェクトについてのテキストに含まれ特性選択部により選択された名詞の単語の一例を示す図である。It is a figure which shows an example of the word of the noun contained in the text about a certain object, and was selected by the characteristic selection part.
以下では、本発明の実施形態を図面に基づいて説明する。同じ符号を付された構成に対しては、重複する説明を省略する。本実施形態においては、キーワード抽出システムとして、ユーザから商品などのオブジェクトについてのレビューのテキストを受信し、受信されたテキストを分析してオブジェクトの特性(属性)を示すキーワードを抽出する分析システムについて説明する。なお、オブジェクトとして、商品の代わりに、組織やサービスの提供者といったものを処理の対象としてもよい。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The overlapping description is abbreviate | omitted about the structure which attached | subjected the same code | symbol. In the present embodiment, as a keyword extraction system, an analysis system that receives a review text about an object such as a product from a user, analyzes the received text, and extracts a keyword indicating a characteristic (attribute) of the object is described. To do. Note that an object such as an organization or a service provider may be processed instead of a product.
 図1は、本発明の実施形態にかかる分析システムの一例を示す図である。この分析システムは、分析サーバ1と、ユーザ端末2とを含む。これらは、ネットワーク3を介して接続されている。ネットワーク3は、例えばローカルエリアネットワークやインターネットである。 FIG. 1 is a diagram showing an example of an analysis system according to an embodiment of the present invention. This analysis system includes an analysis server 1 and a user terminal 2. These are connected via the network 3. The network 3 is, for example, a local area network or the Internet.
 分析サーバ1は、サーバコンピュータである。分析サーバ1は、ウェブサーバプログラム(httpdなど)を実行し、ブラウザプログラムを実行するユーザ端末2からインターネットを介して情報を受信し、ユーザ端末2にボタンや文字列を含む画像(画面)を表示させる情報等を出力する。また、分析サーバ1は、ユーザから入力されたオブジェクトに関するテキストを受信し、データベースに記憶させる。分析サーバ1は、受信されたテキストを分析してオブジェクトの特性を示す単語を抽出する。 The analysis server 1 is a server computer. The analysis server 1 executes a web server program (such as httpd), receives information from the user terminal 2 executing the browser program via the Internet, and displays an image (screen) including buttons and character strings on the user terminal 2 Information to be output. Moreover, the analysis server 1 receives the text regarding the object input from the user, and stores it in the database. The analysis server 1 analyzes the received text and extracts words indicating the characteristics of the object.
 ユーザ端末2は、例えばパーソナルコンピュータやスマートフォンであり、ユーザが入力した情報を分析サーバ1等に送信し、分析サーバ1等から情報を受信し、その情報に応じた画像を表示出力デバイスが表示するよう制御する。 The user terminal 2 is, for example, a personal computer or a smartphone. The user terminal 2 transmits information input by the user to the analysis server 1 or the like, receives information from the analysis server 1 or the like, and the display output device displays an image corresponding to the information. Control as follows.
 図2は、分析サーバ1のハードウェア構成の一例を示す図である。分析サーバ1、ユーザ端末2のそれぞれは、プロセッサ11、記憶部12、通信部13、入出力部14を含む。 FIG. 2 is a diagram illustrating an example of a hardware configuration of the analysis server 1. Each of the analysis server 1 and the user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, and an input / output unit 14.
 プロセッサ11は、記憶部12に格納されているプログラムに従って動作する。またプロセッサ11は通信部13、入出力部14を制御する。なお、上記プログラムは、インターネット等を介して提供されるものであってもよいし、フラッシュメモリやDVD-ROM等のコンピュータで読み取り可能な記憶媒体に格納されて提供されるものであってもよい。 The processor 11 operates according to a program stored in the storage unit 12. The processor 11 controls the communication unit 13 and the input / output unit 14. The program may be provided via the Internet or the like, or may be provided by being stored in a computer-readable storage medium such as a flash memory or a DVD-ROM. .
 記憶部12は、RAMおよびフラッシュメモリ等のメモリ素子とハードディスクドライブのような外部記憶装置とによって構成されている。記憶部12は、上記プログラムを格納する。また、記憶部12は、各部から入力される情報や演算結果を格納する。 The storage unit 12 includes a memory element such as a RAM and a flash memory and an external storage device such as a hard disk drive. The storage unit 12 stores the program. The storage unit 12 stores information input from each unit and calculation results.
 通信部13は、他の装置と通信する機能を実現するものであり、例えば有線LANの集積回路などにより構成されている。通信部13は、プロセッサ11の制御に基づいて、他の装置から受信した情報をプロセッサ11や記憶部12に入力し、他の装置に情報を送信する。 The communication unit 13 realizes a function of communicating with other devices, and is configured by, for example, a wired LAN integrated circuit. Based on the control of the processor 11, the communication unit 13 inputs information received from another device to the processor 11 or the storage unit 12 and transmits the information to the other device.
 入出力部14は、表示出力デバイスをコントロールするビデオコントローラや、入力デバイスからのデータを取得するコントローラなどにより構成される。入力デバイスとしては、キーボード、マウス、タッチパネルなどがある。入出力部14は、プロセッサ11の制御に基づいて、表示出力デバイスに表示データを出力し、入力デバイスをユーザが操作することにより入力されるデータを取得する。表示出力デバイスは例えば外部に接続されるディスプレイ装置である。 The input / output unit 14 includes a video controller that controls the display output device, a controller that acquires data from the input device, and the like. Examples of input devices include a keyboard, a mouse, and a touch panel. Based on the control of the processor 11, the input / output unit 14 outputs display data to the display output device, and acquires data input by the user operating the input device. The display output device is, for example, a display device connected to the outside.
 ユーザ端末2は、分析サーバ1と同様にプロセッサ11、記憶部12、通信部13、入出力部14等を含む。ユーザ端末2は分析サーバ1等から受信したデータに基づいて画面を提示する機能や、その画面についてユーザが入力したテキスト等の情報を分析サーバ1に送信する機能を実現する。これらの機能は、例えばユーザ端末2に含まれるプロセッサ11等がブラウザなどのプログラムを実行し、分析サーバ1等から受信したデータに応じた処理をすることで実現される。またブラウザではなく、ユーザ端末2にインストールされた専用のアプリケーションプログラムによりこれらの機能が実現されてもよい。 The user terminal 2 includes a processor 11, a storage unit 12, a communication unit 13, an input / output unit 14, and the like, similar to the analysis server 1. The user terminal 2 realizes a function of presenting a screen based on data received from the analysis server 1 or the like and a function of transmitting information such as text input by the user on the screen to the analysis server 1. These functions are realized, for example, when the processor 11 or the like included in the user terminal 2 executes a program such as a browser and performs processing according to data received from the analysis server 1 or the like. These functions may be realized by a dedicated application program installed in the user terminal 2 instead of the browser.
 次に、本発明の実施形態にかかる分析サーバ1が実現する機能や処理について説明する。図3は、分析サーバ1が実現する機能を示すブロック図である。分析サーバ1は機能的に、オブジェクト説明部51、ユーザテキスト収集部52、ユーザテキスト読出部53、ユーザテキスト解析部54、説明文分析部55、オブジェクト情報格納部71、ユーザテキスト格納部72を含む。また、ユーザテキスト解析部54は、機能的に品詞識別部57、スコア算出部58、特性選択部59を含む。これらの機能は、分析サーバ1に含まれるプロセッサ11が、記憶部12に格納されたプログラムを実行し通信部13等を制御することにより実現される。 Next, functions and processes realized by the analysis server 1 according to the embodiment of the present invention will be described. FIG. 3 is a block diagram illustrating functions realized by the analysis server 1. The analysis server 1 functionally includes an object explanation unit 51, a user text collection unit 52, a user text reading unit 53, a user text analysis unit 54, an explanation sentence analysis unit 55, an object information storage unit 71, and a user text storage unit 72. . The user text analysis unit 54 functionally includes a part of speech identification unit 57, a score calculation unit 58, and a characteristic selection unit 59. These functions are realized by the processor 11 included in the analysis server 1 executing a program stored in the storage unit 12 and controlling the communication unit 13 and the like.
 オブジェクト情報格納部71は、主に記憶部12により実現される。オブジェクト情報格納部71は、商品などのオブジェクトのそれぞれについて、オブジェクトID、名前、オブジェクトが含まれるカテゴリ、店舗などの管理者により入力された紹介文の情報を格納する。 The object information storage unit 71 is mainly realized by the storage unit 12. The object information storage unit 71 stores, for each object such as a product, object ID, name, category in which the object is included, and information on an introductory sentence input by an administrator such as a store.
 ユーザテキスト格納部72は、主に記憶部12により実現される。ユーザテキスト格納部72は、ユーザにより入力された、オブジェクトについてのテキストを格納する。ここでは、ユーザテキスト格納部72は、ユーザにより入力されたテキストとして、オブジェクトである商品についてのレビューを格納する。ここで、オブジェクト情報格納部71およびユーザテキスト格納部72は、分析サーバ1と異なるサーバに配置されてもよい。例えば、他のサーバにインストールされたデータベース管理システムにオブジェクト情報格納部71およびユーザテキスト格納部72が配置されていてもよい。 The user text storage unit 72 is mainly realized by the storage unit 12. The user text storage unit 72 stores text about the object input by the user. Here, the user text storage unit 72 stores a review of a product that is an object as text input by the user. Here, the object information storage unit 71 and the user text storage unit 72 may be arranged on a server different from the analysis server 1. For example, the object information storage unit 71 and the user text storage unit 72 may be arranged in a database management system installed on another server.
 オブジェクト説明部51は、主にプロセッサ11がプログラムを実行し、記憶部12および通信部13を制御することにより実現される。オブジェクト説明部51は、オブジェクト情報格納部71からオブジェクトを説明するテキストを取得し、通信部13を用いてテキストを含むデータをユーザ端末2へ送信する。 The object explanation unit 51 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13. The object description unit 51 acquires text describing the object from the object information storage unit 71, and transmits data including the text to the user terminal 2 using the communication unit 13.
 ユーザテキスト収集部52は、主にプロセッサ11がプログラムを実行し、記憶部12および通信部13を制御することにより実現される。ユーザテキスト収集部52は、あるオブジェクトについてユーザが入力した文を取得し、ユーザテキスト格納部72に格納する。 The user text collection unit 52 is realized mainly by the processor 11 executing a program and controlling the storage unit 12 and the communication unit 13. The user text collection unit 52 acquires a sentence input by the user for a certain object and stores it in the user text storage unit 72.
 ユーザテキスト読出部53は、主にプロセッサ11がプログラムを実行し、記憶部12を制御することにより実現される。ユーザテキスト読出部53は、ユーザテキスト格納部72に格納され、かつオブジェクトに関連するテキストを取得する。 The user text reading unit 53 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The user text reading unit 53 acquires the text stored in the user text storage unit 72 and associated with the object.
 ユーザテキスト解析部54は、取得されたテキストを解析し、オブジェクトの特性(属性)を示す単語を抽出する。 The user text analysis unit 54 analyzes the acquired text and extracts a word indicating the characteristic (attribute) of the object.
 品詞識別部57は、主にプロセッサ11がプログラムを実行し、記憶部12を制御することにより実現される。品詞識別部57は、ユーザテキスト読出部53により取得されたテキストから、複数の形容詞の単語と、複数の名詞の単語とを識別する。 The part-of-speech identifying unit 57 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The part-of-speech identifying unit 57 identifies a plurality of adjective words and a plurality of noun words from the text acquired by the user text reading unit 53.
 スコア算出部58は、主にプロセッサ11がプログラムを実行し、記憶部12を制御することにより実現される。スコア算出部58は、識別された複数の名詞の単語のそれぞれに対し、そのそれぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコア値を算出する。 The score calculation unit 58 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The score calculation unit 58 calculates, for each of the plurality of identified noun words, a score value indicating the frequency with which each noun word is used together with the adjective word.
 特性選択部59は、主にプロセッサ11がプログラムを実行し、記憶部12を制御することにより実現される。特性選択部59は、算出されたスコアに基づいて、複数の名詞の単語のうち1または複数を、オブジェクトの特性(属性)を表す単語として選択する。 The characteristic selection unit 59 is realized mainly by the processor 11 executing a program and controlling the storage unit 12. The characteristic selection unit 59 selects one or more of the plural noun words as words representing the characteristic (attribute) of the object based on the calculated score.
 説明文分析部55は、ユーザテキスト読出部53により読み出され、解析されたテキストと異なるテキストを取得する。このテキストはオブジェクトについて説明する説明文である。そして、説明文分析部55は、そのオブジェクトの特性を表す単語のうち、そのテキストに含まれない単語を検出する。 The explanatory note analysis unit 55 acquires a text that is read by the user text reading unit 53 and is different from the analyzed text. This text is an explanatory text explaining the object. And the explanatory note analysis part 55 detects the word which is not contained in the text among the words showing the characteristic of the object.
 次に、オブジェクト説明部51およびユーザテキスト収集部52の処理について説明する。図4は、オブジェクト説明部51およびユーザテキスト収集部52の処理の一例を示すフロー図である。 Next, processing of the object explanation unit 51 and the user text collection unit 52 will be described. FIG. 4 is a flowchart showing an example of processing of the object explanation unit 51 and the user text collection unit 52.
 はじめに、オブジェクト説明部51は、オブジェクトの情報をオブジェクト情報格納部71から取得する(ステップS101)。情報の取得の対象となるオブジェクトは、予めユーザにより選択されたオブジェクトであって、通信部13がユーザ端末2から受信したオブジェクトであってよい。 First, the object explanation unit 51 acquires object information from the object information storage unit 71 (step S101). The object whose information is to be acquired may be an object that is selected in advance by the user and received by the communication unit 13 from the user terminal 2.
 図5は、オブジェクト情報格納部71に格納される情報の一例を示す図である。オブジェクト情報格納部71には複数のオブジェクトのそれぞれについてオブジェクト情報が格納されている。1つのオブジェクトについてのオブジェクト情報は、オブジェクトを識別するためのオブジェクトID、名前、オブジェクトが属するカテゴリ、紹介文、紹介文を入力した入力者を示す管理者IDを含む。紹介文は、主にオブジェクトを管理する管理者(例えばそのオブジェクトを販売する店舗の管理者)により入力される。その管理者は複数であってよく、管理者IDにより識別される。オブジェクト情報は、日本語と英語とが混在していてもよい。 FIG. 5 is a diagram illustrating an example of information stored in the object information storage unit 71. The object information storage unit 71 stores object information for each of a plurality of objects. The object information for one object includes an object ID for identifying the object, a name, a category to which the object belongs, an introduction sentence, and an administrator ID indicating an input person who has input the introduction sentence. The introduction text is mainly input by an administrator who manages the object (for example, an administrator of a store that sells the object). There may be a plurality of managers, which are identified by the manager ID. The object information may be a mixture of Japanese and English.
 次に、オブジェクト説明部51は、オブジェクトを説明する情報と、ユーザにオブジェクトに関連するテキスト(文章)を入力させるための情報とをユーザ端末2に向けて送信する(ステップS102)。オブジェクトを説明する情報はオブジェクトの名前、説明文を含み、テキストを入力させるための情報は例えば入力フィールドを示すHTMLを含む。 Next, the object explanation unit 51 transmits information for explaining the object and information for allowing the user to input text (sentence) related to the object to the user terminal 2 (step S102). The information describing the object includes the name and description of the object, and the information for inputting the text includes, for example, HTML indicating an input field.
 図6は、ユーザテキスト入力画面の一例を示す図である。ユーザテキスト入力画面は、オブジェクト説明部51から送信される情報を受信したユーザ端末2が生成する画面である。ユーザテキスト入力画面には、オブジェクト説明部51から送信される説明文が配置される説明文領域31と、ユーザがレビューのテキストを入力する入力領域32と、入力ボタン33とを含む。ユーザがテキストを入力し入力ボタン33を押下すると、ユーザ端末2は入力されたテキストを分析サーバ1に向けて送信する。 FIG. 6 is a diagram showing an example of a user text input screen. The user text input screen is a screen generated by the user terminal 2 that has received the information transmitted from the object explanation unit 51. The user text input screen includes an explanatory text area 31 in which an explanatory text transmitted from the object description section 51 is arranged, an input area 32 for a user to input a review text, and an input button 33. When the user inputs text and presses the input button 33, the user terminal 2 transmits the input text to the analysis server 1.
 ユーザ端末2からテキストが送信されると、ユーザテキスト収集部52はユーザ端末2からそのオブジェクトについてのテキストを受信する(ステップS103)。そして、ユーザテキスト収集部52は受信されたテキストをオブジェクトに関連付けて、ユーザテキスト格納部72に格納する(ステップS104)。 When the text is transmitted from the user terminal 2, the user text collection unit 52 receives the text about the object from the user terminal 2 (step S103). Then, the user text collection unit 52 associates the received text with the object and stores it in the user text storage unit 72 (step S104).
 図7は、ユーザテキスト格納部72に格納されるデータの一例を示す図である。ユーザテキスト格納部72には、複数のレコードが格納されており、各レコードは、テキストと、テキストの対象となるオブジェクトのオブジェクトIDと、テキストを入力したユーザのユーザIDとを含む。1つのレコードはユーザが入力する1つのレビューに対応する。 FIG. 7 is a diagram illustrating an example of data stored in the user text storage unit 72. The user text storage unit 72 stores a plurality of records, and each record includes text, an object ID of an object to be text, and a user ID of a user who has input the text. One record corresponds to one review input by the user.
 ここで、ユーザテキスト読出部53、ユーザテキスト解析部54、説明文分析部55の処理について説明する。図8は、ユーザテキスト読出部53、ユーザテキスト解析部54、説明文分析部55の処理の一例を示すフロー図である。図8に示す処理は、予め定められたオブジェクトについて行われてもよいし、オブジェクトごとに繰り返し実行されてもよい。図8に示す処理では、あるオブジェクトについて特性を表す単語を抽出しているが、あるカテゴリに属する複数のオブジェクトについて特性を表す単語を抽出してもよい。 Here, processing of the user text reading unit 53, the user text analysis unit 54, and the explanation sentence analysis unit 55 will be described. FIG. 8 is a flowchart showing an example of processing of the user text reading unit 53, the user text analysis unit 54, and the explanatory sentence analysis unit 55. The process illustrated in FIG. 8 may be performed for a predetermined object or may be repeatedly performed for each object. In the process illustrated in FIG. 8, a word representing characteristics of a certain object is extracted, but a word representing characteristics of a plurality of objects belonging to a certain category may be extracted.
 図8に示す処理では、はじめに、ユーザテキスト読出部53は、ユーザテキスト格納部72から処理対象となるオブジェクトに関連付けられたテキストのデータを取得する(ステップS201)。より具体的には、ユーザテキスト読出部53は、記憶部12に格納されたユーザテキストのレコードのうち処理対象となるオブジェクトのオブジェクトIDを有するレコードを抽出し、それらのレコードに含まれるテキストを取得する。ここで、ユーザテキストのレコードがデータベース管理システムにより格納されている場合には、データベース管理システムに格納されているレコードをオブジェクトIDを検索キーとして検索することによりレコードが抽出される。 In the process shown in FIG. 8, first, the user text reading unit 53 obtains text data associated with the object to be processed from the user text storage unit 72 (step S201). More specifically, the user text reading unit 53 extracts records having the object ID of the object to be processed from the user text records stored in the storage unit 12 and acquires the text included in those records. To do. If the user text record is stored by the database management system, the record is extracted by searching the record stored in the database management system using the object ID as a search key.
 オブジェクトに関連付けられたテキストのデータが取得されると、品詞識別部57は、テキストを複数の単語に分解し、分解された単語のそれぞれの品詞を識別する(ステップS202)。テキストを単語に分解しその単語の品詞を識別する具体的な手法については形態素解析などとして一般に知られているので説明を省略する。この処理により、テキストに含まれる名詞の単語および形容詞の単語が識別される。ここで、品詞識別部57は、分割された単語にシーケンス番号を付与する。文書の初めに位置する単語のシーケンス番号は1である。 When the text data associated with the object is acquired, the part-of-speech identifying unit 57 breaks down the text into a plurality of words, and identifies each part-of-speech of the broken-down word (step S202). A specific method for decomposing a text into words and identifying the part of speech of the word is generally known as morphological analysis and the like, and thus will not be described. Through this process, noun words and adjective words included in the text are identified. Here, the part-of-speech identifying unit 57 assigns sequence numbers to the divided words. The sequence number of the word located at the beginning of the document is 1.
 図9は、テキストから分割され品詞が識別された単語の例を示す図である。上側は日本語の文が分解された単語の例を示し、下側は英語の文が分解された単語の例を示す。それぞれの例について、1行目は先頭から付与される単語のシーケンス番号、2行目は、分割される単語を示している。図9の例では、文章を「/」で分割することにより単語を示している。3行目のマークは直上にある単語の品詞を示しており、「n」が名詞、「adj」が形容詞に対応する。 FIG. 9 is a diagram showing an example of words that are divided from text and part of speech is identified. The upper side shows an example of a word in which a Japanese sentence is decomposed, and the lower side shows an example of a word in which an English sentence is decomposed. For each example, the first line shows the sequence number of the word given from the beginning, and the second line shows the word to be divided. In the example of FIG. 9, a word is shown by dividing a sentence by “/”. The mark on the third line indicates the part of speech of the word immediately above, where “n” corresponds to a noun and “adj” corresponds to an adjective.
 単語の品詞が識別されると、スコア算出部58は、名詞の単語ごとに、その名詞の単語が形容詞とともに用いられる頻度を示すスコア値を算出する(ステップS203)。 When the word part of speech is identified, the score calculation unit 58 calculates, for each noun word, a score value indicating the frequency with which the noun word is used together with the adjective (step S203).
 スコア算出部58がスコア値を算出する処理についてさらに詳細に説明する。図10は、スコア算出部58の処理の一例を示すフロー図である。 The process in which the score calculation unit 58 calculates the score value will be described in more detail. FIG. 10 is a flowchart showing an example of processing of the score calculation unit 58.
 図10の処理において、スコア算出部58ははじめに、テキスト中に存在する名詞の単語のリストを作成する(ステップS301)。より具体的には、品詞識別部57により識別された名詞の単語の重複を取り除いたリストを作成し、スコア値の算出の対象となる名詞の単語を特定する。そして、スコア算出部58はリストに含まれる名詞の単語のそれぞれのスコア値に0を設定する(ステップS302)。 In the process of FIG. 10, the score calculation unit 58 first creates a list of noun words present in the text (step S301). More specifically, a list obtained by removing duplication of noun words identified by the part-of-speech identifying unit 57 is created, and the noun words for which score values are to be calculated are specified. Then, the score calculation unit 58 sets 0 as the score value of each noun word included in the list (step S302).
 スコア値の算出は単語の名称ごとに行われるので、同じ名称の複数の単語が存在する場合は、それぞれの単語について算出されたスコア要素の和がスコア値となる。 Since the calculation of the score value is performed for each word name, when there are a plurality of words having the same name, the sum of the score elements calculated for each word becomes the score value.
 次に、スコア算出部58は、テキストに含まれる複数の単語のうち最初の名詞の単語を取得し、その名詞の単語のシーケンス番号(位置)を変数iに設定する(ステップS303)。そして、スコア算出部58はその名詞の単語との距離が所定の値より小さい形容詞の単語を検出する。より具体的には、スコア算出部58は、(i-2)番目から(i+2)番目までの単語から、形容詞の単語を検出する(ステップS304)。なお、取得される単語の範囲は実験結果等に基づいて変化させてもよい。 Next, the score calculation unit 58 acquires the word of the first noun from the plurality of words included in the text, and sets the sequence number (position) of the word of the noun as the variable i (step S303). Then, the score calculation unit 58 detects an adjective word whose distance from the noun word is smaller than a predetermined value. More specifically, the score calculation unit 58 detects an adjective word from the (i−2) th to the (i + 2) th word (step S304). Note that the range of acquired words may be changed based on experimental results or the like.
 そして、スコア算出部58は、選択された形容詞の単語のそれぞれについて、スコア要素を算出し(ステップS305)、取得された名詞の単語についてのスコア値にそのスコア要素を加算する(ステップS306)。 Then, the score calculation unit 58 calculates a score element for each of the selected adjective words (step S305), and adds the score element to the score value for the acquired noun word (step S306).
 ここで、1つの名詞の単語についてのスコア要素をVarietyとして、ステップS304およびステップS305の処理を数式で表すと、以下のようになる。 Here, when the score element for one noun word is set as Variety, the processing of Step S304 and Step S305 is expressed by a mathematical expression as follows.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ここで、adjectiveはすべての形容詞について計算することを示しており、その他の品詞の単語については計算対象としていない。distは名詞の位置を基準とした形容詞の相対位置を示す。図10の例では、distの値として-2,-1,1,2をとり、その範囲を超える形容詞については処理の対象にしていない。名詞の後ろに形容詞の単語がある場合、distは正、名詞の前に形容詞の単語がある場合、distは負になる。関数fは単調増加の関数である。 Here, “adjective” indicates that calculation is performed for all adjectives, and other words of part of speech are not subject to calculation. dist indicates the relative position of the adjective relative to the position of the noun. In the example of FIG. 10, -2, -1, 1, 1, 2 are taken as the values of dist, and adjectives exceeding the range are not processed. If there is an adjective word after the noun, dist is positive, and if there is an adjective word before the noun, dist is negative. The function f is a monotonically increasing function.
 この数式によれば、形容詞の単語と名詞の単語との相対位置に応じた重みづけがされた値がスコア要素になり、その合計であるスコア値も相対位置を反映したものになる。この重みは名詞の単語と形容詞の単語との距離に応じたものであり、さらに、名詞の単語の前に形容詞の単語がある場合より、名詞の単語の後に形容詞の単語がある方が、その名詞の単語についての重みが大きくなる。重みが大きいと、スコア値も大きくなる。 According to this mathematical formula, the value weighted according to the relative position between the adjective word and the noun word becomes the score element, and the score value, which is the sum, reflects the relative position. This weight is a function of the distance between the noun word and the adjective word, and it is more likely that the adjective word follows the noun word than the adjective word precedes the noun word. The weight of the noun word increases. The greater the weight, the greater the score value.
 例えば、図9の上段の例にしめされるような文において、5番目の単語(名詞)は、7番目の単語(形容詞)により修飾されているが、3番目の単語(形容詞)は1番目の単語(名詞)の説明に用いられており、5番目の単語(名詞)の説明には用いられていない。例えば日本語では、確率的に形容詞はその前の名詞の説明に用いられることが多いので、この式により名詞とともに用いられない形容詞がスコアに過剰に反映されてしまう問題を抑えることができる。なお、スコア算出部58は、指数関数の代わりに線形的な単調増加関数を用いてスコア要素を計算してもよい。 For example, in the sentence shown in the upper example of FIG. 9, the fifth word (noun) is modified by the seventh word (adjective), but the third word (adjective) is the first. It is used to explain the word (noun) and is not used to explain the fifth word (noun). For example, in Japanese, adjectives are often used probabilistically to explain previous nouns, and this formula can suppress the problem of adjectives not being used with nouns being excessively reflected in the score. Note that the score calculation unit 58 may calculate the score element using a linear monotonically increasing function instead of the exponential function.
 スコア要素が計算されると、スコア算出部58はテキスト内かつ現在取得されている名詞の単語より後ろに次の名詞の単語が存在するか判定する(ステップS307)。テキスト中に次の名詞の単語が存在する場合には(ステップS307のY)、次の名詞の単語を取得し、テキストにおけるその名詞の単語のシーケンス番号(位置)を変数iに設定し(ステップS308)、ステップS304からの処理を繰り返す。テキスト中に次の名詞の単語が存在しない場合には(ステップS307のN)、名詞の単語の名称ごとに求められたスコア値を計算結果として記憶部12へ出力する(ステップS309)。 When the score element is calculated, the score calculation unit 58 determines whether or not the next noun word exists in the text and after the currently acquired noun word (step S307). If the next noun word is present in the text (Y in step S307), the next noun word is acquired, and the sequence number (position) of the noun word in the text is set in the variable i (step i). (S308), the process from step S304 is repeated. When the next noun word does not exist in the text (N in step S307), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S309).
 なお、スコア算出部58は、名詞の前にある形容詞の単語も後にある形容詞の単語も同じ重みでスコア要素を計算してもよい。また単に名詞の周囲にある形容詞の単語の数をカウントされる。また、句点などで文の終わりが明示されている場合、スコア算出部58は名詞の単語が存在する文の隣の文に含まれる単語をスコア値の計算対象から外してもよい。 Note that the score calculation unit 58 may calculate the score element with the same weight for the adjective word before the noun and the adjective word after the noun. It also simply counts the number of adjective words around the noun. When the end of a sentence is clearly indicated by a punctuation mark or the like, the score calculation unit 58 may exclude a word included in a sentence adjacent to a sentence in which a noun word is present from a score value calculation target.
 さらに、スコア算出部58は、名詞の単語のそばにある形容詞の単語がポジティブな意味をもつ単語のグループに属するかネガティブな意味をもつ単語のグループに属するかにさらに基づいてスコア要素を計算してもよい。例えば、ポジティブな良い意味をもつ単語の場合にプラスのスコア要素となり、ネガティブな悪い意味をもつ単語の場合にマイナスのスコア要素となるように計算してよい。これによれば、データを解析する者がスコア値が高い名詞からオブジェクトの一種である商品の購買理由を推測することや、スコア値の低い名詞からオブジェクトの改良が必要な項目を推測することが可能になる。またスコア算出部58は名詞およびグループの組み合わせごとにスコア値を計算してもよい。 Further, the score calculation unit 58 calculates a score element based further on whether the adjective word beside the noun word belongs to a group of words having a positive meaning or a group of words having a negative meaning. May be. For example, the calculation may be performed such that a positive score element is obtained for a word having a positive good meaning and a negative score element is obtained for a word having a negative bad meaning. According to this, the person who analyzes the data can guess the reason for purchasing a product that is a kind of object from a noun with a high score value, or can guess an item that requires improvement of an object from a noun with a low score value. It becomes possible. Moreover, the score calculation part 58 may calculate a score value for every combination of a noun and a group.
 図10の処理においては名詞の単語についてループ処理を行っているが、形容詞の単語についてループ処理を行ってもよい。図11は、スコア算出部58の処理の他の一例を示すフロー図である。 In the processing of FIG. 10, the loop processing is performed for the noun word, but the loop processing may be performed for the adjective word. FIG. 11 is a flowchart illustrating another example of the processing of the score calculation unit 58.
 図11の処理において、スコア算出部58ははじめに、テキスト中に存在する名詞の単語のリストを作成し(ステップS401)、リストに含まれる名詞の単語のそれぞれのスコア値に0を設定する(ステップS402)。これらの処理は図10の例と同様である。 In the process of FIG. 11, the score calculation unit 58 first creates a list of noun words present in the text (step S401), and sets 0 for each score value of the noun words included in the list (step S401). S402). These processes are the same as in the example of FIG.
 そして、スコア算出部58は、テキストに含まれる複数の単語のうち最初の形容詞の単語を取得し、その形容詞の単語のシーケンス番号(位置)を変数iに設定する(ステップS403)。そして、スコア算出部58は(i-2)番目から(i+2)番目までの単語から、名詞の単語を検出する(ステップS404)。そして、スコア算出部58は、選択された名詞の単語のそれぞれについて、スコア要素を算出し(ステップS405)、その名詞のそれぞれについて、スコア値にそのスコア要素を加算する(ステップS406)。スコア算出部58は、前後にある名詞の単語と1つの形容詞についてのスコア要素を名詞の単語と形容詞の単語との相対位置に応じた重みづけにより計算する。この計算式は、ステップS304のように指数関数を用いたものであってもよい。またスコア要素として重みづけをせず単に固定値を設定してもよい。この場合、名詞の周囲にある形容詞がカウントされる。 Then, the score calculation unit 58 acquires the first adjective word among the plurality of words included in the text, and sets the sequence number (position) of the adjective word to the variable i (step S403). Then, the score calculation unit 58 detects noun words from the (i−2) th to (i + 2) th words (step S404). Then, the score calculation unit 58 calculates a score element for each word of the selected noun (step S405), and adds the score element to the score value for each noun (step S406). The score calculation unit 58 calculates score elements for the noun word and one adjective before and after the weight according to the relative positions of the noun word and the adjective word. This calculation formula may use an exponential function as in step S304. Further, a fixed value may be simply set as a score element without weighting. In this case, adjectives around nouns are counted.
 スコア要素が計算されると、スコア算出部58はテキスト内かつ現在取得されている形容詞の単語より後ろに次の名詞の単語が存在するか判定する(ステップS407)。テキスト中に次の形容詞の単語が存在する場合には(ステップS407のY)、次の形容詞の単語を取得し、テキストにおけるその形容詞の単語のシーケンス番号(位置)を変数iに設定し(ステップS408)、ステップS404からの処理を繰り返す。テキスト中に次の名詞の単語が存在しない場合には(ステップS407のN)、名詞の単語の名称ごとに求められたスコア値を計算結果として記憶部12へ出力する(ステップS409)。 When the score element is calculated, the score calculation unit 58 determines whether the next noun word is present in the text after the currently acquired adjective word (step S407). If the next adjective word is present in the text (Y in step S407), the next adjective word is acquired, and the sequence number (position) of the adjective word in the text is set to the variable i (step (S408), the process from step S404 is repeated. If the next noun word does not exist in the text (N in step S407), the score value obtained for each noun word name is output to the storage unit 12 as a calculation result (step S409).
 このように順に形容詞の単語を取得することでスコア値を計算しても、名詞の単語におけるスコア値を取得することができる。また、一般的に形容詞の単語は名詞の単語より少ないので、計算の負荷を軽減することもできる。 Even if the score value is calculated by sequentially obtaining the adjective words in this way, the score value in the noun word can be obtained. Moreover, since there are generally fewer adjective words than noun words, the computational burden can be reduced.
 テキスト中の名詞の単語のスコア値が算出された後の処理について説明する。テキスト中の名詞の単語のスコア値が算出されると、特性選択部59は、算出されたスコア値に基づいて、処理対象となるオブジェクトの特性を示す単語を選択する。ここで選択される単語は名詞である。特性選択部59は、例えば、スコア値が所定の閾値より高い単語や、ソートにより決まる順位が所定の閾値より高い単語を、オブジェクトの特性を示す単語として選択する。なお、本実施形態では、スコア値が大きいほど形容詞と名詞とがセットで用いられる頻度が大きい。 The processing after the score value of the noun word in the text is calculated will be described. When the score value of the noun word in the text is calculated, the characteristic selection unit 59 selects a word indicating the characteristic of the object to be processed based on the calculated score value. The word selected here is a noun. The characteristic selection unit 59 selects, for example, a word having a score value higher than a predetermined threshold or a word having a rank determined by sorting higher than the predetermined threshold as a word indicating the characteristic of the object. In this embodiment, the greater the score value, the greater the frequency with which adjectives and nouns are used in sets.
 オブジェクトの特性を示す単語は、オブジェクトの属性を示す単語であり、オブジェクトの評価項目に相当する。ここまでに説明した処理により、オブジェクトについてのユーザのレビューなどからそのオブジェクトの評価で重要となる項目を取得することができる。 The word indicating the characteristics of the object is a word indicating the attribute of the object and corresponds to the evaluation item of the object. Through the processing described so far, items that are important in the evaluation of an object can be acquired from a user's review of the object.
 図12は、あるオブジェクトについてのテキストに含まれ特性選択部59により選択された名詞の単語の一例を示す図である。図12は、オブジェクトが「シャツ」である場合の例を示す。本図の例では、オブジェクトの特性を示す単語として、シャツの説明に用いられ、かつ属性を示す単語が選択されている。形容詞とともに用いられる名詞の単語を抽出するため、「荷物」や「発送」といった、オブジェクトの属性と関係がない単語が抽出される現象の発生も抑えられている。 FIG. 12 is a diagram illustrating an example of a noun word that is included in text about an object and is selected by the characteristic selection unit 59. FIG. 12 shows an example where the object is a “shirt”. In the example of this figure, as a word indicating the characteristics of the object, a word used for explaining the shirt and indicating an attribute is selected. Since a noun word used together with an adjective is extracted, occurrence of a phenomenon such as “package” or “shipping” that is not related to an object attribute is also suppressed.
 オブジェクトの特性を示す単語が選択されると、説明文分析部55は、そのオブジェクトについての他のテキストを取得する。この他のテキストは具体的には、オブジェクト情報に含まれる説明文である。そして、説明文分析部55は、オブジェクトの特性を表すものとして選択された単語のうち、そのオブジェクトの説明文に含まれない単語を検出する(ステップS205)。 When a word indicating the characteristics of an object is selected, the explanation sentence analyzing unit 55 acquires other text about the object. The other text is specifically an explanatory text included in the object information. And the explanatory note analysis part 55 detects the word which is not contained in the explanatory note of the object among the words selected as showing the characteristic of an object (step S205).
 説明文分析部55は、上記の処理により、オブジェクトの特性のうち説明がされていない特性を検出することができる。この説明文分析部55により検出された単語(特性)について説明文を修正することにより、ユーザにとってよりわかりやすい説明文を作成することができる。 The explanatory note analysis unit 55 can detect an unexplained characteristic among the characteristics of the object by the above processing. By correcting the explanatory text for the word (characteristic) detected by the explanatory text analyzing unit 55, it is possible to create an explanatory text that is more easily understood by the user.
 なお、これまでに説明した実施形態では分析サーバ1がオブジェクト説明部51から説明文分析部55までの機能を有することとしているが、その一部の機能が他のコンピュータに実装されていてもよい。例えば、オブジェクト説明部51とユーザテキスト収集部52の機能が別のサーバに実装されてもよい。 In the embodiment described so far, the analysis server 1 is assumed to have functions from the object explanation unit 51 to the explanation sentence analysis unit 55, but some of the functions may be implemented in another computer. . For example, the functions of the object explanation unit 51 and the user text collection unit 52 may be implemented in different servers.
1 分析サーバ、2 ユーザ端末、3 ネットワーク、11 プロセッサ、12 記憶部、13 通信部、14 入出力部、31 説明文領域、32 入力領域、33 入力ボタン、51 オブジェクト説明部、52 ユーザテキスト収集部、53 ユーザテキスト読出部、54 ユーザテキスト解析部、55 説明文分析部、57 品詞識別部、58 スコア算出部、59 特性選択部、71 オブジェクト情報格納部、72 ユーザテキスト格納部。 1 analysis server, 2 user terminal, 3 network, 11 processor, 12 storage unit, 13 communication unit, 14 input / output unit, 31 description sentence area, 32 input area, 33 input button, 51 object description part, 52 user text collection part 53, user text reading unit, 54 user text analysis unit, 55 explanation sentence analysis unit, 57 part of speech identification unit, 58 score calculation unit, 59 characteristic selection unit, 71 object information storage unit, 72 user text storage unit.

Claims (9)

  1.  オブジェクトに関する文章を取得する取得手段と、
     取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別する識別手段と、
     前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するスコア算出手段と、
     前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択する特性選択手段と、
     を含むことを特徴とするキーワード抽出システム。
    An acquisition means for acquiring text about the object;
    Identification means for identifying a plurality of adjective words and a plurality of noun words from the acquired sentence;
    Score calculating means for calculating a score indicating a frequency with which each of the noun words is used together with an adjective word for each of the plurality of identified noun words;
    Characteristic selection means for selecting one or more of the plurality of noun words as a word representing the characteristic of the object based on the calculated score;
    A keyword extraction system characterized by including:
  2.  請求項1に記載のキーワード抽出システムにおいて、
     前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれと、識別された複数の形容詞の単語との相対位置に基づいて、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出する、
     ことを特徴とするキーワード抽出システム。
    The keyword extraction system according to claim 1,
    The score calculation means determines the frequency with which each noun word is used together with the adjective word based on the relative position of each of the identified plural noun words and the identified plural adjective words. Calculate the score shown,
    A keyword extraction system characterized by that.
  3.  請求項2に記載のキーワード抽出システムにおいて、
     前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語に基づいて、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出する、
     ことを特徴とするキーワード抽出システム。
    The keyword extraction system according to claim 2,
    The score calculation means is a score indicating a frequency with which each noun word is used together with an adjective word based on an adjective word whose distance from each of the identified plural noun words is smaller than a predetermined value. To calculate,
    A keyword extraction system characterized by that.
  4.  請求項3に記載のキーワード抽出システムにおいて、
     前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語と前記名詞の単語との距離に基づいて、前記スコアを算出する、
     ことを特徴とするキーワード抽出システム。
    In the keyword extraction system according to claim 3,
    The score calculating means calculates the score based on a distance between an adjective word and a noun word whose distance from each of the identified plural noun words is smaller than a predetermined value;
    A keyword extraction system characterized by that.
  5.  請求項4に記載のキーワード抽出システムにおいて、
     前記スコア算出手段は、前記識別された複数の名詞の単語のそれぞれとの距離が所定の値より小さい形容詞の単語と前記名詞の単語との距離と、前記距離が所定の値より小さい形容詞の単語が前記名詞の単語の後ろにあるか否かと、に基づいて、前記スコアを算出する、
     ことを特徴とするキーワード抽出システム。
    The keyword extraction system according to claim 4,
    The score calculating means includes a distance between an adjective word whose distance to each of the identified plurality of noun words is smaller than a predetermined value and a word of the adjective whose distance is smaller than a predetermined value. Calculating the score based on whether or not the word is behind the noun word,
    A keyword extraction system characterized by that.
  6.  請求項4に記載のキーワード抽出システムにおいて、
     前記スコア算出手段は、前記それぞれの名詞の単語とともに用いられる形容詞の単語が属するグループにさらに基づいて、前記スコアを算出する、
     ことを特徴とするキーワード抽出システム。
    The keyword extraction system according to claim 4,
    The score calculating means calculates the score based further on a group to which an adjective word used together with the respective noun word belongs,
    A keyword extraction system characterized by that.
  7.  請求項1から6のいずれかに記載のキーワード抽出システムにおいて、
     前記オブジェクトを説明する他の文を取得し、前記取得された他の文に含まれない、前記選択された単語を検出する解析手段をさらに含む、
     ことを特徴とするキーワード抽出システム。
    The keyword extraction system according to any one of claims 1 to 6,
    It further includes analysis means for acquiring the other sentence that describes the object and detecting the selected word that is not included in the acquired other sentence.
    A keyword extraction system characterized by that.
  8.  オブジェクトに関する文章を取得するステップと、
     取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別するステップと、
     前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するステップと、
     前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択するステップと、
     を含むことを特徴とするキーワード抽出方法。
    Obtaining a sentence about the object;
    Identifying a plurality of adjective words and a plurality of noun words from the acquired sentences;
    For each of the identified plurality of noun words, calculating a score indicating the frequency with which each noun word is used with an adjective word;
    Selecting one or more of the plurality of noun words as words representing the characteristics of the object based on the calculated score;
    A keyword extraction method characterized by including:
  9.  オブジェクトに関する文章を取得する取得手段、
     取得された文章から複数の形容詞の単語と、複数の名詞の単語とを識別する識別手段、
     前記識別された前記複数の名詞の単語のそれぞれに対し、前記それぞれの名詞の単語が形容詞の単語とともに用いられる頻度を示すスコアを算出するスコア算出手段、
     前記算出されたスコアに基づいて、前記複数の名詞の単語のうち1または複数を、前記オブジェクトの特性を表す単語として選択する特性選択手段、
     としてコンピュータを機能させるためのプログラム。
    An acquisition means for acquiring text about the object;
    An identification means for identifying a plurality of adjective words and a plurality of noun words from the acquired sentences;
    Score calculating means for calculating a score indicating a frequency with which each of the noun words is used together with an adjective word for each of the plurality of identified noun words;
    Characteristic selection means for selecting one or more of the plurality of noun words as words representing the characteristics of the object based on the calculated score;
    As a program to make the computer function as.
PCT/JP2016/073492 2016-08-09 2016-08-09 Keyword extraction system, keyword extraction method and program WO2018029791A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2016/073492 WO2018029791A1 (en) 2016-08-09 2016-08-09 Keyword extraction system, keyword extraction method and program
JP2018516205A JP6457153B2 (en) 2016-08-09 2016-08-09 Keyword extraction system, keyword extraction method and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/073492 WO2018029791A1 (en) 2016-08-09 2016-08-09 Keyword extraction system, keyword extraction method and program

Publications (1)

Publication Number Publication Date
WO2018029791A1 true WO2018029791A1 (en) 2018-02-15

Family

ID=61161820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/073492 WO2018029791A1 (en) 2016-08-09 2016-08-09 Keyword extraction system, keyword extraction method and program

Country Status (2)

Country Link
JP (1) JP6457153B2 (en)
WO (1) WO2018029791A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002297659A (en) * 2001-03-30 2002-10-11 Just Syst Corp Device, method, and program for subjective feature element generation
WO2009123260A1 (en) * 2008-04-01 2009-10-08 日本電気株式会社 Cooccurrence dictionary creating system and scoring system
JP2015022708A (en) * 2013-07-23 2015-02-02 株式会社ギックス Marketing support system, marketing support method, program and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002297659A (en) * 2001-03-30 2002-10-11 Just Syst Corp Device, method, and program for subjective feature element generation
WO2009123260A1 (en) * 2008-04-01 2009-10-08 日本電気株式会社 Cooccurrence dictionary creating system and scoring system
JP2015022708A (en) * 2013-07-23 2015-02-02 株式会社ギックス Marketing support system, marketing support method, program and computer storage medium

Also Published As

Publication number Publication date
JPWO2018029791A1 (en) 2018-08-09
JP6457153B2 (en) 2019-01-23

Similar Documents

Publication Publication Date Title
US11194965B2 (en) Keyword extraction method and apparatus, storage medium, and electronic apparatus
JP5212610B2 (en) Representative image or representative image group display system, method and program thereof, and representative image or representative image group selection system, method and program thereof
US9720912B2 (en) Document management system, document management method, and document management program
US20170300748A1 (en) Screenplay content analysis engine and method
US20110078176A1 (en) Image search apparatus and method
US9542474B2 (en) Forensic system, forensic method, and forensic program
KR101735312B1 (en) Apparatus and system for detecting complex issues based on social media analysis and method thereof
JP4997892B2 (en) SEARCH SYSTEM, SEARCH METHOD, AND SEARCH PROGRAM
US20130204835A1 (en) Method of extracting named entity
US9514496B2 (en) System for management of sentiments and methods thereof
US11080348B2 (en) System and method for user-oriented topic selection and browsing
JP5538185B2 (en) Text data summarization device, text data summarization method, and text data summarization program
JP6027781B2 (en) Term extraction device, term extraction method and program
US20150339786A1 (en) Forensic system, forensic method, and forensic program
US20130268833A1 (en) Apparatus and method for visualizing hyperlinks using color attribute values
JP3844193B2 (en) Information automatic filtering method, information automatic filtering system, and information automatic filtering program
JP6457153B2 (en) Keyword extraction system, keyword extraction method and program
JP2011039576A (en) Specific information detecting device, specific information detecting method, and specific information detecting program
JP2004206391A (en) Document information analyzing apparatus
JP4952079B2 (en) Image processing apparatus, method, and program
KR101987301B1 (en) Sensibility level yielding system through web data Analysis associated with a stock and a social data and Controlling Method for the Same
WO2016056095A1 (en) Data analysis system, data analysis system control method, and data analysis system control program
JP6034584B2 (en) Patent search support device, patent search support method, and program
JP5153390B2 (en) Related word dictionary creation method and apparatus, and related word dictionary creation program
JP5409321B2 (en) Information evaluation apparatus, information evaluation method, and information evaluation program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018516205

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16912672

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16912672

Country of ref document: EP

Kind code of ref document: A1