WO2012127968A1 - イベント分析装置、イベント分析方法、およびコンピュータ読み取り可能な記録媒体 - Google Patents
イベント分析装置、イベント分析方法、およびコンピュータ読み取り可能な記録媒体 Download PDFInfo
- Publication number
- WO2012127968A1 WO2012127968A1 PCT/JP2012/054222 JP2012054222W WO2012127968A1 WO 2012127968 A1 WO2012127968 A1 WO 2012127968A1 JP 2012054222 W JP2012054222 W JP 2012054222W WO 2012127968 A1 WO2012127968 A1 WO 2012127968A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- expression
- degree
- analysis
- sharing
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/109—Time management, e.g. calendars, reminders, meetings or time accounting
Definitions
- the present invention relates to an event analysis apparatus, and more particularly to an event analysis apparatus used for analyzing an event that is a hot topic in the world, and further relates to an event analysis method and a computer-readable recording medium.
- event refers to various events that occur in the world, and is not necessarily limited to events or accidents.
- the event includes, for example, an event held somewhere, an event, a natural phenomenon that occurs at a specific location, and a behavior of a specific person.
- Web documents describe various phenomena and many are published.
- the content of Web documents is not limited to the content handled in news reporting by news media. Therefore, the Web document includes a lot of information that is meaningless to many people. For this reason, in order to analyze events that have become a hot topic in the world using Web documents, that is, events that are commonly picked up by many people, it has become a hot topic from miscellaneous information that is not appropriate as a topic. Some means of extracting information about the event is needed.
- Non-Patent Document 1 discloses an example of a conventional technique for analyzing an event that has become a hot topic in the world.
- the appearance frequency of keywords is counted from a plurality of Web documents on the Internet such as a blog and an electronic bulletin board, and a sudden increase in the number of documents in a certain period is evaluated. And based on evaluation, the burst degree which shows the strength of the topic in the period is provided with respect to a keyword.
- Non-Patent Document 1 a keyword having a high burst degree is extracted, and it is determined that the extracted keyword indicates a hot topic. As described above, according to the technique disclosed in Non-Patent Document 1, one or a plurality of keywords that may be related to a topic noticed in a specific period can be obtained. Expect to be able to analyze events.
- Non-Patent Document 1 the background in which each keyword appears in a burst in a certain period is not considered. For this reason, in the technique disclosed in Non-Patent Document 1, when the frequency of occurrence of a keyword in a specific period accidentally increases, even keywords that are not related to the topic of interest are extracted. End up. As a result, even when the technique disclosed in Non-Patent Document 1 is used, there is a problem that the event cannot be analyzed with high accuracy. This will be specifically described below.
- keywords such as “train” or “car” frequently appear in a document group on a website such as a blog, a microblog, an electronic bulletin board, and a diary site on the Internet during one hour of a certain morning.
- Documents that contain descriptions of unspecified trains are not necessarily attributed to a single common event, such as a specific incident or accident, but are more likely to be described because of individual events. It is thought that there are many.
- Non-Patent Document 1 when analyzing the time zone when many people socially commute or go to school, the keyword “train” is always presented.
- the keyword does not refer to the topic that is attracting attention, but refers to various events.
- Non-Patent Document 1 does not consider such a common event at all. That is, in the technique disclosed in Non-Patent Document 1, only the frequency of keywords in a document written in a specific period is counted and used. In fact, even different events are expressed by the same keyword. If so, it will be processed as a keyword with a high degree of burst.
- Non-Patent Document 1 if a plurality of documents describing different events include many of the same keywords by chance, all of these keywords are keywords related to the topical event. Will be extracted in the same way.
- An object of the present invention is to solve the above-mentioned problems, and in an event analysis using a document, an event analysis device capable of performing an analysis in consideration of whether an event is commonly noticed among a plurality of people, An object of the present invention is to provide an event analysis method and a computer-readable recording medium.
- an event analyzer provides: A device for analyzing events described in a document to be analyzed, A component specifying unit for specifying a description related to an event from the document to be analyzed, and for specifying a situation expression representing the situation and an expression corresponding to the situation expression from the specified description; A sharability analysis unit that obtains a degree of sharing that indicates a possibility that an event related to the description is shared by a plurality of people based on the situation expression and the corresponding expression specified from the description. It is characterized by.
- an event analysis method includes: A method for analyzing events described in a document to be analyzed, (A) identifying a description related to an event from the document to be analyzed, and identifying a situation expression representing the situation and an expression corresponding to the situation expression from the identified description; (B) obtaining a degree of sharing that indicates a possibility that an event related to the description is shared by a plurality of people based on the situation expression and the corresponding expression specified from the description. It is characterized by that.
- a recording medium recording a program for analyzing events described in a document to be analyzed by a computer, (A) identifying a description related to an event from the document to be analyzed in the computer, and identifying a situation expression representing the situation and an expression corresponding to the situation expression from the identified description; , (B) Based on the situation expression specified from the description and the corresponding expression, obtaining a degree of sharing that indicates a possibility that an event related to the description is shared by a plurality of people is executed.
- a program including an instruction is recorded.
- FIG. 1 is a block diagram showing a schematic configuration of an event analysis apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a flowchart showing the operation of the event analysis apparatus according to Embodiment 1 of the present invention.
- FIG. 3 shows an example of the situation expression specified from the event description and the corresponding expression corresponding to the situation expression in the first embodiment of the present invention.
- FIG. 4 is a diagram showing an example of rules used when obtaining the degree of sharing in Embodiment 1 of the present invention.
- FIG. 5 is a block diagram showing a schematic configuration of the event analysis apparatus according to Embodiment 2 of the present invention.
- FIG. 6 is a flowchart showing the operation of the event analysis apparatus according to Embodiment 2 of the present invention.
- FIG. 7 is a block diagram illustrating an example of a computer that implements the event analysis apparatus according to the first and second embodiments of the present invention.
- Embodiment 1 (Embodiment 1)
- an event analysis apparatus and an event analysis method according to Embodiment 1 of the present invention will be described with reference to FIGS.
- Embodiment 1 of the present invention will be described, the present invention is not limited to Embodiment 1 described below.
- FIG. 1 is a block diagram showing a schematic configuration of an event analysis apparatus according to Embodiment 1 of the present invention.
- the event analysis apparatus 100 is an apparatus for analyzing an event described in a document to be analyzed.
- the event analysis device 100 includes a component specifying unit 101 and a shareability analysis unit 102.
- the component specifying unit 101 receives a document to be analyzed from the outside, and specifies a description related to the event (hereinafter referred to as “event description”) from the document. Further, the component specifying unit 101 specifies, from the specified event description, a situation expression representing the situation and an expression corresponding to the situation expression (hereinafter referred to as “corresponding expression”) as the constituent elements of the event description. .
- the shareability analysis unit 102 may share the event related to the event description with multiple people, that is, the degree of sharing indicating the event shareability. Ask for.
- the degree of sharing is obtained for the event described in the document. And if the degree of sharing is high, there is a high possibility that the target event is shared by multiple people, and if the degree of sharing is low, the target event may be shared by multiple people. Also lower. Therefore, according to the event analysis apparatus 100, in the analysis of the event using the document, it is possible to perform the analysis in consideration of whether or not the event is commonly noticed among a plurality of people.
- the component specifying unit 101 specifies, for example, a part indicating an action, an action, or a state included in the event description as a situation expression.
- the component specifying unit 101 specifies, for example, an expression related to a situation expression and corresponding to any of time, place, subject, and object as a corresponding expression.
- the shareability analysis unit 102 can obtain the degree of sharing by applying the situation expression and the correspondence expression to the set rules.
- the rule includes a rule (see FIG. 4) that defines the degree of sharing for each combination of an assumed situation expression and a character string assumed as a corresponding expression.
- the rule may further specify a case for a character string assumed as a corresponding expression.
- the shareability analysis unit 102 applies the rule when the correspondence expression matches the case specified by the rule.
- the shareability analysis unit 102 indicates the first degree indicating the possibility that the situation expression target is shared by a plurality of people and the possibility that the correspondence expression is related to the event.
- the second degree shown can be obtained, and the sharing degree can be obtained from the first degree and the second degree.
- the event analysis apparatus 100 includes an analysis result output unit 103.
- the analysis result output unit 103 outputs the obtained sharing degree and information related to the event for which the sharing degree is obtained.
- Information about the event includes a situation expression and a correspondence expression.
- examples of the information related to the event include a sentence including a situation expression and a correspondence expression.
- FIG. 2 is a flowchart showing the operation of the event analysis apparatus according to Embodiment 1 of the present invention.
- FIG. 1 is taken into consideration as appropriate.
- the event analysis method is implemented by operating the event analysis apparatus 100. Therefore, the description of the event analysis in the first embodiment is replaced with the following description of the operation of the event analysis apparatus 100.
- the component specifying unit 101 receives an input of a document to be analyzed (step A1). If there are a plurality of documents accepted in step A1, the subsequent steps are executed for each document.
- the component specifying unit 101 specifies one or more descriptions (event descriptions) related to the events included in each document for each received document (step A2).
- the component specifying unit 101 specifies a component that is a situation expression among the components included in each event description, and further, a component corresponding to the specified component from the event description, That is, the correspondence expression is specified (step A3).
- the sharing analysis unit 102 obtains a sharing degree indicating the sharing of the event based on the situation expression and the correspondence expression specified from the event description (step A4).
- the degree of sharing is obtained for each event included in the input document.
- the analysis result output unit 103 outputs, for each event, the degree of sharing obtained by the shareability analysis unit 102 and information about the event (for example, situation expression and correspondence expression) as an event shareability analysis result. (Step A5).
- Step A1 the component specifying unit 101 receives an input of a document to be analyzed.
- the input document may be a document set.
- a set of Web pages may be input as a document set.
- steps A2 to A4 are executed for each document as described above.
- the component specifying unit 101 specifies an event description included in each document for each input document.
- the event description can be specified by, for example, specifying a description portion including at least a situation expression based on a part-of-speech and part-of-speech string pattern obtained by morphological analysis of text in a document.
- examples of the situation expression include a part indicating an action, an action, or a state, and specifically, a verb, an adjective verb, a sa-variant noun, an action noun that is a noun derived from a verb, and the like.
- step A3 the component specifying unit 101 specifies a situation expression as a component of the event description for each event description specified in step A2, and further, a correspondence expression corresponding to this situation expression from the event description. Is identified.
- the correspondence expression corresponding to the situation expression includes a noun string close to the situation expression.
- the component specifying unit 101 parses the text in the document in step A2 and determines the action, action, or state from the verb, adjective verb, or action noun included in the predicate.
- the part shown may be specified as a situation expression.
- the component specifying unit 101 extracts a case element corresponding to the predicate from the dependency relationship in step A3, and generates an expression including a noun string, a proper noun, and a proper expression included in the case element. , Extracted as a corresponding expression.
- the component specifying unit 110 can also classify the components specified as the corresponding expressions into components such as place, subject, and target.
- FIG. 3 shows an example of the situation expression specified from the event description and the corresponding expression corresponding to the situation expression in the first embodiment of the present invention.
- a corresponding expression such as a place, a subject, and an object is illustrated.
- one event ID is assigned to one event description, and each event ID is associated with a place, a subject, an object, and a situation expression. Furthermore, each event ID may be associated with document metadata, description contents, transmission date and time, and the like.
- the situation expression is shown in a state in which a notation such as a verb, an adjective verb, or a behavioral noun is used as the original form.
- Corresponding expressions related to place, subject, and object can be extracted by using a particle as a clue from an expression including a noun string close to the situation expression.
- Corresponding expressions related to place, subject, and object can also be extracted from the terms that have a corresponding relationship such as a predicate and a dependency relationship by using expressions, parts of speech, specific expressions, and the like included in the relevant item as clues.
- the component identifying unit 110 extracts the location from “Mt. Fuji”, extracts the subject from “Taro Tanaka”, The target is extracted from.
- This example can be realized, for example, by applying an existing technique for analyzing a predicate term structure. Specifically, by using a predicate and a result obtained by analyzing the predicate term structure, the predicate can be used as a situation expression and the term can be used as a correspondence expression. Since one or more terms are obtained as a result of analyzing the predicate term structure, each term can be used as a corresponding expression. If the subject cannot be identified, or if the subject is a pronoun such as “I”, the component identifying unit 110 identifies the sender of the document identified from the document metadata as the subject. You can also
- Step A4 the shareability analysis unit 102 obtains a degree of share indicating event shareability for each event description based on the situation expression and the corresponding expression specified in step A3.
- the shareability analysis unit 102 refers to a rule defining the degree of sharing for a specific combination of a situation expression and a corresponding expression corresponding to the situation expression, and obtains the degree of event sharing.
- FIG. 4 is a diagram showing an example of rules used when obtaining the degree of sharing in Embodiment 1 of the present invention.
- the rule ID, the situation expression, the pattern of the correspondence expression corresponding to the situation expression, and the sharing degree are associated with each other to constitute one rule.
- the situation expression is represented by a combination of original parts of speech as in the example of FIG. 3.
- the correspondence expression corresponding to the situation expression is represented by a combination of an asterisk symbol “*” and a character string.
- An asterisk symbol “*” indicates that an arbitrary word or character string is entered.
- Each rule may further specify a case for a character string assumed as a corresponding expression. That is, each rule may include whether or not it matches case information such as a surface case and a deep case as a requirement. For example, when there is a rule “* (wo)” in the corresponding expression column, this indicates whether or not it matches the Japanese “wo case”. ).
- the degree of sharing is a measure indicating the possibility that an event is shared by a plurality of people, that is, “event sharing” as described above.
- a score indicating the degree of possibility that an event is shared by a plurality of people, that is, the strength of event sharing is used as a numerical value.
- the degree of sharing may be expressed by a binary value of 1 or 0, for example, or may be expressed by a real value from 0 to 1.
- the degree of sharing of each rule used to determine the degree of sharing is determined in advance based on the situation expression and correspondence expression dictionary information required for each rule or how it is used in the actual document corpus. You can ask for it.
- the sharing degree is binary, it indicates whether the event is shared. In the case of a real value, the closer the sharing degree is to 1, the stronger the sharing of the event corresponding to this rule is. On the contrary, the closer the sharing degree is to 0, the weaker the sharing of the event is. Represented.
- step A4 another specific example in step A4 will be described.
- the shareability analysis unit 102 may relate the event to the first degree indicating the possibility that the subject of the situation expression is shared by a plurality of people and the location, subject, and subject. It is also possible to obtain the second degree indicating sex and to obtain the final “sharing degree” based on both.
- the shareability analysis unit 102 obtains the second degree for each of the place, the subject, and the target, and specifies the maximum value from these. Then, the sharing analysis unit 102 can multiply the maximum value of the second degree by the first degree and determine the obtained multiplication value as the degree of sharing.
- the first degree and the second degree will be described using specific examples.
- the first degree can be obtained by, for example, checking a situation expression indicating an action, an action, and a state with a dictionary created in advance.
- the dictionary in this case can be created by setting a value having a first degree for each situation expression.
- expressions such as “eating, eating, making, cooking, buying, sleeping, and getting up” are expressions that make it difficult for a specific entity to share the subject of the act or state with other entities. In addition, it has exclusive properties. Therefore, since there is a low possibility that such an expression target is shared by a plurality of people, a value close to 0 is assigned to such an expression in the dictionary.
- the expression of each action appearing in an actual document corpus is associated with the subject involved using existing language analysis technology, and the number of subjects involved in each action is determined.
- a method for obtaining the degree of sharing of each action by counting is conceivable.
- the usage of each expression may be obtained from lexicographic information, and the degree of sharing may be estimated therefrom.
- the degree of sharing of each expression may be obtained from the frequency of co-occurrence or dependency with those clue expressions.
- expressions such as “meet, see, go to see, join, come, hold, open, performed, gather, and entertain” are the actions or states of a particular entity with other entities. It is thought that it is an expression that is easy to share the target of. In general, it is presumed that the degree of sharing is high for an expression related to viewing of a subject and an action that is not repeated on a daily basis. Therefore, a value close to 1 is assigned to such an expression. Such degree of expression sharing may be obtained from the frequency of co-occurrence or dependency between the expression and an expression indicating the same target event related to different actors in an actual document corpus.
- the second degree can be obtained by matching the corresponding expression with a dictionary created in advance.
- the dictionary in this case can be created by setting a value having the second degree in advance for each corresponding expression.
- the second degree may be obtained from the frequency of co-occurrence or dependency between the expression and the expression indicating the same target event in an actual document corpus.
- the second degree is set to 0.
- the first degree is set to 1.
- the second degree is set to zero.
- the corresponding expression of the place is the word “Mt. Fuji”, Mt. Fuji is a specific mountain, and since multiple subjects can share at a specific time, it is highly likely that it is related to the event.
- the degree of is set to 1.
- the second degree is set to a value close to 0.
- places such as “Yokohama Station” and “Yokohama Port” are limited, the second degree is set to a value close to 1 because it is highly likely that the event is related to a specific event. .
- the second degree can be determined based on the area or volume.
- the second degree is set to a value close to zero.
- the expression can include a plurality of entities such as an organization or a group, the second degree is set to a value close to 1 because there is a high possibility of being related to an event.
- a clue expression that suggests an action by a plurality of subjects such as “together”, “all together”, “in a group”, a value close to 1 is assigned as a corresponding expression.
- step A5 the analysis result output unit 103 outputs the analysis result obtained in step A4, that is, the information about the event and the obtained sharing degree as the analysis result.
- the information related to the event include a situation expression and a correspondence expression.
- the analysis result output unit 103 enumerates situation expressions, correspondence expressions, and sharing degrees for an event description “I went to Osaka Music Festival” in a document, for example, “ Situation expression: Done, Component: Osaka Music Festival, Share degree: 0.92 ”is output.
- examples of information related to events include sentences containing situation expressions and correspondence expressions.
- the analysis result output unit 103 can output the sentence and the degree of sharing as an analysis result, such as “I went to Osaka Music Festival: 0.92.”
- the analysis result output unit 103 can output the presence / absence of sharing as the degree of sharing.
- the analysis result output unit 103 outputs a sentence (event description) that is information about the event and the presence / absence of sharing as an analysis result, such as “I went to the Osaka Music Festival: Sharing” You can also
- the analysis result output unit 103 can output each item name together with the contents of the place, subject, object, and situation expression as information about the event.
- the analysis result output unit 103 includes the contents and item names as the analysis results, such as “Place: Osaka, Subject: Me, Subject: Osaka Music Festival, Situation: Performed, Sharing degree: 0.92.” Can be listed and output as a set.
- the analysis result output unit 103 may be configured to output information about an event as an analysis result only when the sharing degree is 1 or when the sharing degree is greater than or equal to a threshold value. In this case, information regarding the event is not output for an event with a low degree of sharing.
- an event described in a document increases when the possibility that the event is shared by a plurality of people is high, and the possibility that the event is shared by a plurality of people is low. A smaller degree of sharing is required. For this reason, according to the event analysis apparatus 100, it can be considered whether an event attracts attention among several persons based on a sharing degree. As a result, the expressions related to various different events match, and at first glance, it seems that multiple people seem to be picking up in common, and when multiple people are actually talking about a certain event This makes it easy to analyze events.
- Embodiment 2 Next, an event analysis apparatus and an event analysis method according to Embodiment 2 of the present invention will be described with reference to FIGS. In addition, although Embodiment 2 of this invention is demonstrated hereafter, this invention is not limited to Embodiment 2 demonstrated below.
- FIG. 5 is a block diagram showing a schematic configuration of the event analysis apparatus according to Embodiment 2 of the present invention.
- the event analysis apparatus 200 includes a component specifying unit 201, a shareability analysis unit 202, an analysis result output unit 203, a document acquisition unit 204, a document database (hereinafter referred to as “document database”). , “Document DB”) 205.
- the document acquisition unit 204 receives input of analysis conditions, and acquires one or more documents that match the input analysis conditions from a document set prepared in advance.
- the analysis condition may include one or more keywords or a specific period.
- the document set is prepared in the document DB 205.
- the component specifying unit 201 sets the document acquired by the document acquisition unit 204 as an analysis target.
- the component specifying unit 201 operates in the same manner as the component specifying unit 101 shown in FIG. 1 except that one or more documents acquired by the document set acquisition unit 204 are analyzed. Therefore, the component specifying unit 201 also specifies the event description, and further specifies the situation expression and the correspondence expression.
- the shareability analysis unit 202 operates in the same manner as the shareability analysis unit 102 shown in FIG. That is, the shareability analysis unit 202 obtains a degree of share indicating event shareability based on the situation expression and the correspondence expression specified by the component specifying unit 201.
- the analysis result output unit 203 outputs an analysis condition in addition to the degree of sharing and information about the event. Also, the analysis result output unit 203 can perform ranking based on the degree of sharing, as will be described later, depending on the analysis conditions received by the document acquisition unit 204.
- the analysis result output unit 203 can also operate in the same manner as the analysis result output unit 103 illustrated in FIG.
- FIG. 6 is a flowchart showing the operation of the event analysis apparatus according to Embodiment 2 of the present invention.
- the event analysis method is implemented by operating the event analysis apparatus 200. Therefore, the description of the event analysis in the second embodiment is replaced with the following description of the operation of the event analysis apparatus 200.
- the document acquisition unit 204 searches the document DB 205 based on the analysis conditions, and acquires one or more documents that match the analysis conditions (steps). B1). Further, the document acquisition unit 204 inputs the acquired one or more documents to the component specifying unit 201.
- step B1 one or more keywords are listed as analysis conditions.
- the input keyword is a word representing the characteristics of the acquired document (hereinafter also referred to as “characteristic word”). Then, the document acquisition unit 204 acquires a document for each feature word using the feature word.
- the analysis condition includes a specific period.
- the document set acquisition unit 204 accepts a target period as an input instead of a keyword. That is, the document set acquisition unit 204 receives a period specified by the transmission date and time as an analysis condition.
- the document set acquisition unit 204 accepts, as analysis conditions, conditions that specify the start date and time to the end date or conditions that specify the start date and time and the length of the period. Then, the document set acquisition unit 204 acquires a document that meets the condition for the specified period from the document DB 205.
- the document set acquisition unit 204 determines one or more characteristic keywords as “character words” based on the input period, and for each determined characteristic word, A document relating to the feature word can be acquired from the document DB 205 using the feature word.
- the document set acquisition unit 204 calculates an index such as a frequency of each word included in the document set or a tf ⁇ idf value from a document set transmitted in a specific period such as every hour. Then, the document set acquisition unit 204 compares each word with words that appear before and after that in time, and determines whether the index difference or the increase rate exceeds a specific threshold. Thereafter, the document set acquisition unit 204 determines that a word exceeding a specific threshold as a result of the determination is a characteristic keyword that has suddenly increased, and uses this as a characteristic word.
- an index such as a frequency of each word included in the document set or a tf ⁇ idf value from a document set transmitted in a specific period such as every hour. Then, the document set acquisition unit 204 compares each word with words that appear before and after that in time, and determines whether the index difference or the increase rate exceeds a specific threshold. Thereafter, the document set acquisition unit 204 determines that a word exceeding a specific threshold as
- each document is preferably stored in the document DB 205 together with the transmission date and time.
- the document is stored in the document DB 205 as a document with the date and time attached.
- the document set acquisition unit 204 may acquire the transmission date and time in addition to the search result when searching for a document. Further, the document set acquisition unit 204 may search only for a document set transmitted during a specific period, and may perform processing only for the document set during that period. Further, the document set acquisition unit 204 may accept a logical product condition between a keyword and a specific period as an input.
- the component specifying unit 201 receives the analysis condition and the document acquired by the document set acquisition unit 204 from the document set acquisition unit 204, and sets the event description included in each document to 1 for each received document. One or more are specified (step B2). Subsequently, the component specifying unit 101 specifies a situation expression and a corresponding expression from each event description (step B3). Step B2 and Step B3 are the same steps as Step A2 and Step A3 shown in FIG. 2, respectively.
- Step B4 is the same as step A4 shown in FIG.
- the analysis result output unit 203 receives the degree of sharing and information about the event from the shareability analysis unit 202, receives the analysis conditions from the document set acquisition unit 204, and outputs these as the event shareability analysis result to the outside. Output (step B4).
- the keyword “Osaka Music Festival” is input as an analysis condition, and in response to this, the component identifying unit 101 identifies n event descriptions, and the shareability analyzing unit 202 shares each event description.
- the analysis result output unit 203 outputs a keyword (feature word), information about n event descriptions, and each sharing degree. That is, in this case, the analysis result output unit 203 executes step A5 shown in FIG. 2 in the first embodiment for each event description.
- the analysis result output unit 203 when a plurality of keywords that are feature words are input in Step B1, or when a plurality of feature words are determined from the input period.
- the analysis result can also be output for each feature word.
- the analysis result output unit 203 can rank each feature word based on the degree of sharing for each feature word, and can output the ranking result and each feature word.
- the ranking is determined so that the score is calculated based on the degree of sharing and is ranked in descending order of score.
- the analysis result output unit 203 can also calculate the score by adding the degree of sharing for each feature word, and output the obtained score and each feature word.
- the analysis result output unit 203 may specify the maximum value of the degree of sharing instead of adding up, and use the specified maximum value as the score.
- FIG. 7 is a block diagram illustrating an example of a computer that implements the event analysis apparatus according to the first and second embodiments of the present invention.
- a computer apparatus 300 includes a CPU (central processing unit) 301, a RAM (random access memory) 302, a storage device 303, an input interface circuit (input I / F) 304, a display controller 305, a data reader / writer. 306 and a communication interface circuit (communication I / F) 307.
- the storage device 303 is a large-capacity storage device such as a magnetic disk storage device or an SSD (solid state drive).
- an input device 400 such as a keyboard and a mouse is connected to the input interface circuit 304.
- another computer is connected to the communication interface circuit 307 via a communication network.
- a display device 500 is connected to the display controller 305.
- the data reader / writer 306 inputs and outputs data with the external recording medium 600.
- the event analysis apparatus 100 When the steps A1 to A5 shown in FIG. 2 are installed and executed in the computer 300, the event analysis apparatus 100 according to the first embodiment is realized by the computer 300.
- the CPU 301 functions as the component specifying unit 101, the sharing analysis unit 102, and the analysis result output unit 103 to perform processing.
- the computer 300 realizes the event analysis apparatus 200 in the second embodiment.
- the CPU 301 functions as the component specifying unit 201, the sharing analysis unit 202, the analysis result output unit 203, and the document set acquisition unit 204 to perform processing.
- the storage device 303 functions as the document DB 205.
- the document DB 205 may be realized by mounting a recording medium capable of a large number of electronic documents on the reading device 600. Further, the document DB 205 may be realized by another computer device connected to the computer device 300 via a network.
- a program that causes the computer apparatus 300 to execute steps A1 to A5 shown in FIG. 2 and a program that causes the computer apparatus 300 to execute steps B1 to B5 shown in FIG. 6 are stored in, for example, a computer-readable recording medium 600.
- the program stored in the recording medium 600 is installed in the computer device 300 via the reader / writer 306 that is a reading device such as an optical drive device.
- These programs may be distributed on the Internet connected via the communication interface circuit 307.
- the input interface circuit 304 and the communication interface circuit 307 function as input means for the component specifying unit 101 or 201. Further, the display controller 305 and the communication interface circuit 307 function as output means when the analysis result output unit 103 or 203 outputs data to the outside.
- a part of the storage areas of the RAM 302 and the storage device 303 is used as a temporary storage area for intermediate results of each processing step executed by the event analysis apparatus 100 or 200. Further, a part of the storage area of the RAM 302 and the storage device 303 may be used as a data storage area of the document DB 205.
- the computer-readable recording medium 600 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), and magnetic storage media such as a flexible disk (Flexible Disk). Or an optical storage medium such as a CD-ROM (Compact Disk Read Only Memory).
- general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital)
- magnetic storage media such as a flexible disk (Flexible Disk).
- an optical storage medium such as a CD-ROM (Compact Disk Read Only Memory).
- a device for analyzing events described in a document to be analyzed A component specifying unit for specifying a description related to an event from the document to be analyzed, and for specifying a situation expression representing the situation and an expression corresponding to the situation expression from the specified description; Based on the situation expression specified from the description and the corresponding expression, a shareability analysis unit for obtaining a degree of sharing indicating a possibility that an event related to the description is shared by a plurality of people;
- An event analysis device comprising:
- Appendix 2 The event analysis apparatus according to appendix 1, further comprising an analysis result output unit that outputs the degree of sharing and information about the event for which the degree of sharing is obtained.
- the component specifying unit specifies, as the situation expression, a part indicating an action, an action or a state included in the specified description, and further relates to the situation expression, and is related to time, place, subject, and object.
- the shareability analysis unit applies the situation expression specified from the description and the corresponding expression to a set rule to obtain the degree of shareability,
- the rule defines the degree of sharing for each combination of an assumed situation expression and a character string assumed as an expression corresponding to the situation expression.
- the event analyzer according to any one of appendices 1 to 3.
- the rule further defines a case for a character string assumed as an expression corresponding to the situation expression,
- the event analysis device according to appendix 4, wherein the shareability analysis unit applies the rule when the corresponding expression matches the case defined by the rule.
- Appendix 6 A first degree indicating the possibility that the situation expression target is shared by a plurality of people, and a second degree indicating the possibility that the corresponding expression is related to the event. And obtaining the degree of sharing from the first degree and the second degree.
- the event analyzer according to any one of appendices 1 to 3.
- Appendix 8 A document acquisition unit that receives input of analysis conditions and acquires one or more documents that match the analysis conditions received from a set of documents prepared in advance; The component specifying unit, the document acquired by the document acquisition unit, the analysis target, The event analysis apparatus according to appendix 2, wherein the analysis result output unit outputs the analysis condition in addition to the degree of sharing and information related to the event for which the degree of sharing is obtained.
- the document acquisition unit determines a feature word based on the analysis condition that has received an input, acquires the document for each determined feature word,
- the shareability analysis unit obtains the degree of share for each feature word,
- a value obtained by adding up the degree of sharing for each feature word and each of the feature words are output, or each of the feature words is ranked based on the degree of sharing for each feature word, and a ranking result 9.
- the event analysis device according to appendix 8, wherein each of the feature words is output.
- Appendix 12 (C) The event analysis method according to appendix 11, further comprising a step of outputting the degree of sharing and information related to the event for which the degree of sharing is obtained.
- a part indicating an action, an action or a state included in the specified description is specified as the situation expression, and further, related to the situation expression, and time, place, subject, and The event analysis method according to appendix 11 or 12, wherein an expression corresponding to any of the objects is specified as the corresponding expression.
- step (b) the situation expression specified from the description and the corresponding expression are applied to a set rule to obtain the degree of sharing.
- the rule defines the degree of sharing for each combination of an assumed situation expression and a character string assumed as an expression corresponding to the situation expression.
- the event analysis method according to any one of appendices 11 to 13.
- the rule further defines a case for a character string assumed as an expression corresponding to the situation expression, 15.
- step (b) a first degree indicating the possibility that the subject of the situation expression is shared by a plurality of people, and a second degree indicating the possibility that the corresponding expression is related to the event. Obtaining the degree, and obtaining the degree of sharing from the first degree and the second degree.
- the event analysis method according to any one of appendices 11 to 13.
- step (d) a feature word is determined based on the analysis condition that has received an input, and the document is acquired for each determined feature word,
- step (b) the degree of sharing is obtained for each feature word
- step (c) when the feature word is 2 or more, A value obtained by adding up the degree of sharing for each feature word and each of the feature words are output, or each of the feature words is ranked based on the degree of sharing for each feature word, and a ranking result
- the event analysis method according to claim 18, wherein each of the feature words is output.
- a computer-readable recording medium recording a program for analyzing an event described in a document to be analyzed by a computer, (A) identifying a description related to an event from the document to be analyzed in the computer, and identifying a situation expression representing the situation and an expression corresponding to the situation expression from the identified description; , (B) obtaining a degree of sharing based on the situation expression identified from the description and the corresponding expression, indicating a possibility that an event related to the description is shared by a plurality of people;
- the computer-readable recording medium which has recorded the program containing the instruction
- Appendix 22 (C) The computer-readable recording medium according to appendix 21, further causing the computer to execute a step of outputting the degree of sharing and information relating to the event for which the degree of sharing is obtained.
- a part indicating an action, an action or a state included in the specified description is specified as the situation expression, and further, related to the situation expression, and time, place, subject, and 23.
- step (b) the situation expression specified from the description and the corresponding expression are applied to a set rule to obtain the degree of sharing.
- the rule defines the degree of sharing for each combination of an assumed situation expression and a character string assumed as an expression corresponding to the situation expression.
- the computer-readable recording medium according to any one of appendices 21 to 23.
- the rule further defines a case for a character string assumed as an expression corresponding to the situation expression, 25.
- step (b) a first degree indicating the possibility that the subject of the situation expression is shared by a plurality of people, and a second degree indicating the possibility that the corresponding expression is related to the event. Obtaining the degree, and obtaining the degree of sharing from the first degree and the second degree.
- the computer-readable recording medium according to any one of appendices 21 to 23.
- step (d) a feature word is determined based on the analysis condition that has received an input, and the document is acquired for each determined feature word,
- step (b) the degree of sharing is obtained for each feature word
- step (c) when the feature word is 2 or more, A value obtained by adding up the degree of sharing for each feature word and each of the feature words are output, or each of the feature words is ranked based on the degree of sharing for each feature word, and a ranking result 29.
- the computer-readable recording medium according to appendix 28 which outputs each of the feature words.
- the present invention in analyzing an event using a document, it is possible to perform an analysis in consideration of whether an event is attracting attention among a plurality of people.
- the present invention relates to an event information extraction device that extracts information about an event from information on the Internet, an event analysis device that analyzes information about an extracted event, and an information search device that can search for a topical event. Applicable to usage.
- the present invention can also be applied to applications such as a clustering device that clusters topics for each common event, and a clustering device that clusters documents that contain related event descriptions.
- a clustering apparatus for example, a keyword in an event description determined according to the present invention or a feature word output in the second embodiment is used as a clustering feature.
- the present invention can also be applied to a process of assigning weights to clustering features in such a clustering apparatus.
- Event Analyzer (Embodiment 1) 101 Component identifying unit (Embodiment 1) 102 Shareability Analysis Unit (Embodiment 1) 103 Analysis Result Output Unit (Embodiment 1) 200 Event Analyzer (Embodiment 2) 201 component specifying unit (second embodiment) 202 Shareability Analysis Unit (Embodiment 2) 203 Analysis result output unit (Embodiment 2) 204 Document Acquisition Unit 205 Document Database 300 Computer Device 301 CPU 302 RAM 303 Storage Device 304 Input Interface Circuit (Input I / F) 305 Display controller 306 Data reader / writer 307 Communication interface circuit (communication I / F) 400 input device 500 display device 600 recording medium
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
本発明の目的は、上記問題を解消し、文書を用いたイベントの分析において、イベントが複数人の間で共通に注目されているものかどうかを考慮して分析を行い得る、イベント分析装置、イベント分析方法、およびコンピュータ読み取り可能な記録媒体を提供することにある。
分析対象となる文書に記述されているイベントの分析を行うための装置であって、
前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、構成要素特定部と、
前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、共有性分析部と、を備えていることを特徴とする。
分析対象となる文書に記述されているイベントの分析を行うための方法であって、
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、を有することを特徴とする。
コンピュータによって分析対象となる文書に記述されているイベントの分析を行うための、プログラムを記録した記録媒体であって、
前記コンピュータに
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、を実行させる、命令を含むプログラムを記録していることを特徴とする。
以下、本発明の実施の形態1における、イベント分析装置、およびイベント分析方法について、図1~図4を参照しながら説明する。なお、以下、本発明の実施形態1について説明するが、本発明は、以下に説明する実施の形態1に限定されるものではない。
最初に、本発明の実施の形態1におけるイベント分析装置の構成について図1を用いて説明する。図1は、本発明の実施の形態1におけるイベント分析装置の概略構成を示すブロック図である。
次に、本発明の実施の形態1におけるイベント分析装置100の動作について図2を用いて説明する。図2は、本発明の実施の形態1におけるイベント分析装置の動作を示すフロー図である。以下の説明においては、適宜図1を参酌する。また、本実施の形態1では、イベント分析装置100を動作させることによって、イベント分析方法が実施される。よって、本実施の形態1におけるイベント分析の説明は、以下のイベント分析装置100の動作説明に代える。
続いて、上述したステップA1~A5を、具体例と共に詳細に説明する。また、以下の説明は、図1および図2に加え、図3および図4を参照しながら、ステップ毎に行う。
ステップA1では、構成要素特定部101は、分析対象とする文書の入力を受け付ける。この入力される文書は、文書集合であっても良い。また、例えば、Webページの集合が文書集合として入力されても良い。更に、複数の文書が入力された場合は、上述したように、以降のステップA2~A4は文書毎に実行される。
ステップA2では、構成要素特定部101は、入力された文書ごとに、各文書中に含まれるイベント記述を特定する。イベント記述の特定は、例えば、文書中のテキストを形態素解析して得られる品詞および品詞列のパターンに基づいて、事態表現を少なくとも含む記述部分を特定することによって行うことができる。また、事態表現としては、動作、行為、または状態を示す部分が挙げられ、具体的には、動詞、形容動詞、サ変名詞、動詞由来の名詞である動作性名詞などが挙げられる。
ステップA3では、構成要素特定部101は、ステップA2で特定されたイベント記述毎に、イベント記述の構成要素として、事態表現を特定し、さらに、イベント記述中から、この事態表現に対応する対応表現を特定する。例えば、事態表現に対応する対応表現としては、事態表現に近接している名詞列が挙げられる。
ステップA4では、共有性分析部102は、イベント記述毎に、ステップA3で特定された事態表現と対応表現とに基づいて、イベントの共有性を示す共有度合を求める。例えば、共有性分析部102は、事態表現と、事態表現に対応する対応表現との特定の組み合わせに対して、共有度合を規定したルールを参照して、イベントの共有度合を求める。
ステップA5では、分析結果出力部103は、ステップA4で得られた分析結果、即ち、イベント関する情報と、求めた共有度合とを分析結果として出力する。イベントに関する情報としては、例えば、事態表現と対応表現とが挙げられる。具体的には、分析結果出力部103は、ある文書中の「私は大阪音楽祭へ行った」というイベント記述について、事態表現と、対応表現と、共有度合とを列挙して、例えば、「事態表現:行った、構成要素:大阪音楽祭へ、共有度合:0.92」を出力する。
以上のように、本実施の形態1では、文書に記述されているイベントについて、それが複数人に共有されている可能性が高いと大きくなり、複数人に共有されている可能性が低いと小さくなる共有度合が求められる。このため、イベント分析装置100によれば、共有度合に基づいて、イベントが複数人の間で注目されているものかどうかを考慮できる。結果、雑多な個々に異なるイベントに関する表現が一致した結果、一見、複数人が共通して取りあげているように見える場合と、複数人が実際にある特定のイベントに関して話題にしている場合とが区別しやすくなるため、イベントの分析を精度良く行うことができる。
次に、本発明の実施の形態2における、イベント分析装置、およびイベント分析方法について、図5および図6を参照しながら説明する。なお、以下、本発明の実施形態2について説明するが、本発明は、以下に説明する実施の形態2に限定されるものではない。
最初に、本発明の実施の形態2におけるイベント分析装置の構成について図5を用いて説明する。図5は、本発明の実施の形態2におけるイベント分析装置の概略構成を示すブロック図である。
次に、本発明の実施の形態2におけるイベント分析装置200の動作について図6を用いて説明する。図6は、本発明の実施の形態2におけるイベント分析装置の動作を示すフロー図である。以下の説明においては、適宜図5を参酌する。また、本実施の形態2では、イベント分析装置200を動作させることによって、イベント分析方法が実施される。よって、本実施の形態2おけるイベント分析の説明は、以下のイベント分析装置200の動作説明に代える。
以上のように、本実施の形態2では、特定のキーワード、特定の期間が、分析条件として入力され、当該分析条件に関して得られたイベント記述について分析結果が出力される。このため、分析条件との関連において共有性が高いイベントが分析されることになる。また、本実施の形態2によれば、複数の特徴語の間での共有度合を比較することが可能となる。さらに、ランキングを行うことで、共有性が低いイベントおよび特徴語をフィルタリングすることも可能となる。なお、本実施の形態2を用いた場合も、実施の形態1と同様の効果を得ることができる。
続いて、実施の形態1および2におけるプログラムについて説明する。また、図7を用いて、実施の形態1および2におけるプログラムを実行可能なコンピュータについて説明する。図7は、本発明の実施の形態1および2におけるイベント分析装置を実現するコンピュータの一例を示すブロック図である。
分析対象となる文書に記述されているイベントの分析を行うための装置であって、
前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、構成要素特定部と、
前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、共有性分析部と、
を備えていることを特徴とするイベント分析装置。
前記共有度合と、前記共有度合が求められたイベントに関する情報とを、出力する分析結果出力部を、更に備えている、付記1に記載のイベント分析装置。
前記構成要素特定部が、特定した前記記述に含まれる動作、行為または状態を示す部分を、前記事態表現として特定し、更に、前記事態表現に関係し、且つ、時間、場所、主体、および対象のいずれかに該当する表現を、前記対応する表現として特定する、付記1または2に記載のイベント分析装置。
前記共有性分析部が、前記記述から特定された前記事態表現および前記対応する表現を、設定されたルールに当てはめて、前記共有度合を求め、
前記ルールは、想定される事態表現と、当該事態表現に対応する表現として想定される文字列との組み合わせ毎に、共有度合を規定している、
付記1から3のいずれかに記載のイベント分析装置。
前記ルールが、更に、当該事態表現に対応する表現として想定される文字列に対して、格を規定しており、
前記共有性分析部が、前記対応する表現が、前記ルールが規定する前記格に合致する場合に、前記ルールの当てはめを行う、付記4に記載のイベント分析装置。
前記共有性分析部が、前記事態表現の対象が複数人に共有されている可能性を示す第1の度合と、前記対応する表現が前記イベントに関係している可能性を示す第2の度合とを求め、前記第1の度合と前記第2の度合とから、前記共有度合を求める、
付記1から3のいずれかに記載のイベント分析装置。
前記分析結果出力部が、前記共有度合が求められたイベントに関する情報として、前記事態表現および前記対応する表現、または前記事態表現および前記対応する表現を含む文を出力する、付記2に記載のイベント分析装置。
分析条件の入力を受け付け、予め用意されている文書集合から、入力を受け付けた前記分析条件に合致する1または2以上の文書を取得する、文書取得部を、更に備え、
前記構成要素特定部が、前記文書取得部によって取得された前記文書を、前記分析対象とし、
前記分析結果出力部が、前記共有度合と、前記共有度合が求められたイベントに関する情報とに加えて、前記分析条件を出力する、付記2記載のイベント分析装置。
前記分析条件として、1つ以上のキーワード、または特定の期間が入力される、付記8に記載のイベント分析装置。
前記文書取得部が、入力を受け付けた前記分析条件に基づいて特徴語を決定し、決定した特徴語毎に、前記文書を取得し、
前記共有性分析部が、前記特徴語毎に、前記共有度合を求め、
前記分析結果出力部が、前記特徴語が2以上である場合に、
前記特徴語毎の前記共有度合を合算して得られた値と前記特徴語それぞれとを出力する、または、前記特徴語毎の前記共有度合に基づいて、前記特徴語それぞれをランキングし、ランキング結果と前記特徴語それぞれとを出力する、付記8に記載のイベント分析装置。
分析対象となる文書に記述されているイベントの分析を行うための方法であって、
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、
を有することを特徴とするイベント分析方法。
(c)前記共有度合と、前記共有度合が求められたイベントに関する情報とを、出力するステップを、更に有する、付記11に記載のイベント分析方法。
前記(a)のステップで、特定した前記記述に含まれる動作、行為または状態を示す部分を、前記事態表現として特定し、更に、前記事態表現に関係し、且つ、時間、場所、主体、および対象のいずれかに該当する表現を、前記対応する表現として特定する、付記11または12に記載のイベント分析方法。
前記(b)のステップで、前記記述から特定された前記事態表現および前記対応する表現を、設定されたルールに当てはめて、前記共有度合を求め、
前記ルールは、想定される事態表現と、当該事態表現に対応する表現として想定される文字列との組み合わせ毎に、共有度合を規定している、
付記11から13のいずれかに記載のイベント分析方法。
前記ルールが、更に、当該事態表現に対応する表現として想定される文字列に対して、格を規定しており、
前記(b)のステップで、前記対応する表現が、前記ルールが規定する前記格に合致する場合に、前記ルールの当てはめを行う、付記14に記載のイベント分析方法。
前記(b)のステップで、前記事態表現の対象が複数人に共有されている可能性を示す第1の度合と、前記対応する表現が前記イベントに関係している可能性を示す第2の度合とを求め、前記第1の度合と前記第2の度合とから、前記共有度合を求める、
付記11から13のいずれかに記載のイベント分析方法。
前記(c)のステップで、前記共有度合が求められたイベントに関する情報として、前記事態表現および前記対応する表現、または前記事態表現および前記対応する表現を含む文を出力する、付記12に記載のイベント分析方法。
(d)分析条件の入力を受け付け、予め用意されている文書集合から、入力を受け付けた前記分析条件に合致する1または2以上の文書を取得する、ステップを、更に有し、
前記(a)のステップで、前記(d)のステップで取得された前記文書を、前記分析対象とし、
前記(c)のステップで、前記共有度合と、前記共有度合が求められたイベントに関する情報とに加えて、前記分析条件を出力する、付記12記載のイベント分析方法。
前記(d)のステップにおいて、前記分析条件として、1つ以上のキーワード、または特定の期間の入力を受け付ける、付記18に記載のイベント分析方法。
前記(d)のステップで、入力を受け付けた前記分析条件に基づいて特徴語を決定し、決定した特徴語毎に、前記文書を取得し、
前記(b)のステップで、前記特徴語毎に、前記共有度合を求め、
前記(c)のステップで、前記特徴語が2以上である場合に、
前記特徴語毎の前記共有度合を合算して得られた値と前記特徴語それぞれとを出力する、または、前記特徴語毎の前記共有度合に基づいて、前記特徴語それぞれをランキングし、ランキング結果と前記特徴語それぞれとを出力する、付記18に記載のイベント分析方法。
コンピュータによって分析対象となる文書に記述されているイベントの分析を行うための、プログラムを記録したコンピュータ読み取り可能な記録媒体であって、
前記コンピュータに
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、
を実行させる、命令を含むプログラムを記録しているコンピュータ読み取り可能な記録媒体。
(c)前記共有度合と、前記共有度合が求められたイベントに関する情報とを、出力するステップを、更に前記コンピュータに実行させる、付記21に記載のコンピュータ読み取り可能な記録媒体。
前記(a)のステップで、特定した前記記述に含まれる動作、行為または状態を示す部分を、前記事態表現として特定し、更に、前記事態表現に関係し、且つ、時間、場所、主体、および対象のいずれかに該当する表現を、前記対応する表現として特定する、付記21または22に記載のコンピュータ読み取り可能な記録媒体。
前記(b)のステップで、前記記述から特定された前記事態表現および前記対応する表現を、設定されたルールに当てはめて、前記共有度合を求め、
前記ルールは、想定される事態表現と、当該事態表現に対応する表現として想定される文字列との組み合わせ毎に、共有度合を規定している、
付記21から23のいずれかに記載のコンピュータ読み取り可能な記録媒体。
前記ルールが、更に、当該事態表現に対応する表現として想定される文字列に対して、格を規定しており、
前記(b)のステップで、前記対応する表現が、前記ルールが規定する前記格に合致する場合に、前記ルールの当てはめを行う、付記24に記載のコンピュータ読み取り可能な記録媒体。
前記(b)のステップで、前記事態表現の対象が複数人に共有されている可能性を示す第1の度合と、前記対応する表現が前記イベントに関係している可能性を示す第2の度合とを求め、前記第1の度合と前記第2の度合とから、前記共有度合を求める、
付記21から23のいずれかに記載のコンピュータ読み取り可能な記録媒体。
前記(c)のステップで、前記共有度合が求められたイベントに関する情報として、前記事態表現および前記対応する表現、または前記事態表現および前記対応する表現を含む文を出力する、付記22に記載のコンピュータ読み取り可能な記録媒体。
(d)分析条件の入力を受け付け、予め用意されている文書集合から、入力を受け付けた前記分析条件に合致する1または2以上の文書を取得する、ステップを、更に前記コンピュータに実行させ、
前記(a)のステップで、前記(d)のステップで取得された前記文書を、前記分析対象とし、
前記(c)のステップで、前記共有度合と、前記共有度合が求められたイベントに関する情報とに加えて、前記分析条件を出力する、付記22記載のコンピュータ読み取り可能な記録媒体。
前記(d)のステップにおいて、前記分析条件として、1つ以上のキーワード、または特定の期間の入力を受け付ける、付記28に記載のコンピュータ読み取り可能な記録媒体。
前記(d)のステップで、入力を受け付けた前記分析条件に基づいて特徴語を決定し、決定した特徴語毎に、前記文書を取得し、
前記(b)のステップで、前記特徴語毎に、前記共有度合を求め、
前記(c)のステップで、前記特徴語が2以上である場合に、
前記特徴語毎の前記共有度合を合算して得られた値と前記特徴語それぞれとを出力する、または、前記特徴語毎の前記共有度合に基づいて、前記特徴語それぞれをランキングし、ランキング結果と前記特徴語それぞれとを出力する、付記28に記載のコンピュータ読み取り可能な記録媒体。
101 構成要素特定部(実施の形態1)
102 共有性分析部(実施の形態1)
103 分析結果出力部(実施の形態1)
200 イベント分析装置(実施の形態2)
201 構成要素特定部(実施の形態2)
202 共有性分析部(実施の形態2)
203 分析結果出力部(実施の形態2)
204 文書取得部
205 文書データベース
300 コンピュータ装置
301 CPU
302 RAM
303 記憶装置
304 入力インターフェイス回路(入力I/F)
305 表示コントローラ
306 データリーダライタ
307 通信インターフェイス回路(通信I/F)
400 入力装置
500 ディスプレイ装置
600 記録媒体
Claims (10)
- 分析対象となる文書に記述されているイベントの分析を行うための装置であって、
前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、構成要素特定部と、
前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、共有性分析部と、
を備えていることを特徴とするイベント分析装置。 - 前記共有度合と、前記共有度合が求められたイベントに関する情報とを、出力する分析結果出力部を、更に備えている、請求項1に記載のイベント分析装置。
- 前記構成要素特定部が、特定した前記記述に含まれる動作、行為または状態を示す部分を、前記事態表現として特定し、更に、前記事態表現に関係し、且つ、時間、場所、主体、および対象のいずれかに該当する表現を、前記対応する表現として特定する、請求項1または2に記載のイベント分析装置。
- 前記共有性分析部が、前記記述から特定された前記事態表現および前記対応する表現を、設定されたルールに当てはめて、前記共有度合を求め、
前記ルールは、想定される事態表現と、当該事態表現に対応する表現として想定される文字列との組み合わせ毎に、共有度合を規定している、
請求項1から3のいずれかに記載のイベント分析装置。 - 前記ルールが、更に、当該事態表現に対応する表現として想定される文字列に対して、格を規定しており、
前記共有性分析部が、前記対応する表現が、前記ルールが規定する前記格に合致する場合に、前記ルールの当てはめを行う、請求項4に記載のイベント分析装置。 - 前記共有性分析部が、前記事態表現の対象が複数人に共有されている可能性を示す第1の度合と、前記対応する表現が前記イベントに関係している可能性を示す第2の度合とを求め、前記第1の度合と前記第2の度合とから、前記共有度合を求める、
請求項1から3のいずれかに記載のイベント分析装置。 - 分析条件の入力を受け付け、予め用意されている文書集合から、入力を受け付けた前記分析条件に合致する1または2以上の文書を取得する、文書取得部を、更に備え、
前記構成要素特定部が、前記文書取得部によって取得された前記文書を、前記分析対象とし、
前記分析結果出力部が、前記共有度合と、前記共有度合が求められたイベントに関する情報とに加えて、前記分析条件を出力する、請求項2記載のイベント分析装置。 - 前記文書取得部が、入力を受け付けた前記分析条件に基づいて特徴語を決定し、決定した特徴語毎に、前記文書を取得し、
前記共有性分析部が、前記特徴語毎に、前記共有度合を求め、
前記分析結果出力部が、前記特徴語が2以上である場合に、
前記特徴語毎の前記共有度合を合算して得られた値と前記特徴語それぞれとを出力する、または、前記特徴語毎の前記共有度合に基づいて、前記特徴語それぞれをランキングし、ランキング結果と前記特徴語それぞれとを出力する、請求項7に記載のイベント分析装置。 - 分析対象となる文書に記述されているイベントの分析を行うための方法であって、
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、
を有することを特徴とするイベント分析方法。 - コンピュータによって分析対象となる文書に記述されているイベントの分析を行うための、プログラムを記録したコンピュータ読み取り可能な記録媒体であって、
前記コンピュータに、
(a)前記分析対象となる文書からイベントに関係している記述を特定し、特定した記述から、事態を表わす事態表現と、前記事態表現に対応する表現とを特定する、ステップと、
(b)前記記述から特定された前記事態表現および前記対応する表現に基づいて、前記記述が関係するイベントが複数人に共有されている可能性を示す、共有度合を求める、ステップと、
を実行させる、命令を含むプログラムを記録しているコンピュータ読み取り可能な記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013505854A JP5435249B2 (ja) | 2011-03-23 | 2012-02-22 | イベント分析装置、イベント分析方法、およびプログラム |
US14/006,810 US20140012803A1 (en) | 2011-03-23 | 2012-02-22 | Event analysis apparatus, event analysis method, and computer-readable recording medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011-063766 | 2011-03-23 | ||
JP2011063766 | 2011-03-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012127968A1 true WO2012127968A1 (ja) | 2012-09-27 |
Family
ID=46879130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/054222 WO2012127968A1 (ja) | 2011-03-23 | 2012-02-22 | イベント分析装置、イベント分析方法、およびコンピュータ読み取り可能な記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140012803A1 (ja) |
JP (1) | JP5435249B2 (ja) |
WO (1) | WO2012127968A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5622969B1 (ja) * | 2014-02-04 | 2014-11-12 | 株式会社Ubic | 文書分析システム、文書分析方法、および、文書分析プログラム |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016115175A1 (en) * | 2015-01-12 | 2016-07-21 | KYMA Medical Technologies, Inc. | Systems, apparatuses and methods for radio frequency-based attachment sensing |
US10433184B2 (en) * | 2015-12-31 | 2019-10-01 | Motorola Mobility Llc | Method and apparatus for directing an antenna beam based on a location of a communication device |
US10425837B2 (en) * | 2017-10-02 | 2019-09-24 | The Invention Science Fund I, Llc | Time reversal beamforming techniques with metamaterial antennas |
CN113868381B (zh) * | 2021-11-22 | 2022-03-22 | 中国矿业大学(北京) | 一种煤矿瓦斯爆炸事故信息抽取方法及系统 |
CN114445646A (zh) * | 2021-12-31 | 2022-05-06 | 深圳云天励飞技术股份有限公司 | 人员关联度的分析方法、装置、电子设备及存储介质 |
CN114625804B (zh) * | 2022-03-30 | 2022-11-08 | 深圳唯爱智云科技有限公司 | 基于大数据的用户行为数据处理方法、系统及云平台 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006139718A (ja) * | 2004-11-15 | 2006-06-01 | Nippon Telegr & Teleph Corp <Ntt> | 話題語結合方法及び話題語結合・代表語抽出方法及び装置及びプログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6978274B1 (en) * | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
-
2012
- 2012-02-22 US US14/006,810 patent/US20140012803A1/en not_active Abandoned
- 2012-02-22 WO PCT/JP2012/054222 patent/WO2012127968A1/ja active Application Filing
- 2012-02-22 JP JP2013505854A patent/JP5435249B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006139718A (ja) * | 2004-11-15 | 2006-06-01 | Nippon Telegr & Teleph Corp <Ntt> | 話題語結合方法及び話題語結合・代表語抽出方法及び装置及びプログラム |
Non-Patent Citations (1)
Title |
---|
TAKAO KAWAI: "Web Bunsho no Jikeiretsu Bunseki ni Motozuku Iken Henka Event no Chushutsu", PROCEEDINGS OF THE 17TH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING TUTORIAL HONKAIGI WORKSHOP, 7 March 2011 (2011-03-07), pages 264 - 267 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5622969B1 (ja) * | 2014-02-04 | 2014-11-12 | 株式会社Ubic | 文書分析システム、文書分析方法、および、文書分析プログラム |
Also Published As
Publication number | Publication date |
---|---|
JP5435249B2 (ja) | 2014-03-05 |
JPWO2012127968A1 (ja) | 2014-07-24 |
US20140012803A1 (en) | 2014-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bharti et al. | Sarcastic sentiment detection in tweets streamed in real time: a big data approach | |
Dimitrov et al. | Tweetscov19-a knowledge base of semantically annotated tweets about the covid-19 pandemic | |
Gonçalves et al. | Comparing and combining sentiment analysis methods | |
Thelwall et al. | Sentiment strength detection for the social web | |
JP5435249B2 (ja) | イベント分析装置、イベント分析方法、およびプログラム | |
Ratkiewicz et al. | Detecting and tracking the spread of astroturf memes in microblog streams | |
US9558267B2 (en) | Real-time data mining | |
US8898163B2 (en) | Real-time information mining | |
US9207831B2 (en) | Management of data on related websites | |
Yıldırım et al. | Identifying topics in microblogs using Wikipedia | |
Hossny et al. | Feature selection methods for event detection in Twitter: a text mining approach | |
Kaushik et al. | Sociopedia: an interactive system for event detection and trend analysis for twitter data | |
US8037403B2 (en) | Apparatus, method, and computer program product for extracting structured document | |
WO2016067396A1 (ja) | 文の並び替え方法および計算機 | |
Ng et al. | Linguistic characteristics of censorable language on sinaweibo | |
US10795926B1 (en) | Suppressing personally objectionable content in search results | |
Thakkar | Twitter sentiment analysis using hybrid naive Bayes | |
Mokhberi et al. | Development of a COVID-19–related anti-Asian tweet data set: Quantitative study | |
US20120047128A1 (en) | Open class noun classification | |
Chimmalgi | Controversy trend detection in social media | |
Guimaraes et al. | Analysis and detection of unreliable users in twitter: Two case studies | |
Toraman | Early prediction of public reactions to news events using microblogs | |
Mars et al. | A new big data framework for customer opinions polarity extraction | |
Yada et al. | Identification of tweets that mention books | |
KR102625347B1 (ko) | 동사와 형용사와 같은 품사를 이용한 음식 메뉴 명사 추출 방법과 이를 이용하여 음식 사전을 업데이트하는 방법 및 이를 위한 시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12760420 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013505854 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14006810 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12760420 Country of ref document: EP Kind code of ref document: A1 |