WO2014203659A1 - テキストマッチング装置および方法、並びにテキスト分類装置および方法 - Google Patents
テキストマッチング装置および方法、並びにテキスト分類装置および方法 Download PDFInfo
- Publication number
- WO2014203659A1 WO2014203659A1 PCT/JP2014/062912 JP2014062912W WO2014203659A1 WO 2014203659 A1 WO2014203659 A1 WO 2014203659A1 JP 2014062912 W JP2014062912 W JP 2014062912W WO 2014203659 A1 WO2014203659 A1 WO 2014203659A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- information
- matching
- classification
- noun
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Definitions
- the present invention relates to a technique for organizing and presenting information on a network such as the Internet to a user, and particularly requires matching between information posted on various systems on the network and requires some information.
- the present invention relates to a system for efficiently providing necessary information to a user.
- the Internet has spread and various systems are operating on it.
- various systems are operating on it.
- SNS social network services
- Such a system is used for exchanging information between users, and enables the distribution of information that could not be considered before the spread of the Internet.
- a huge amount of information is constantly flowing on such a system, and when an individual tries to find an answer to a certain problem, the possibility that the answer exists is very high. However, it is impossible to find the information that you are looking for from a large amount of information.
- a typical information search service always collects a large amount of information from the Internet, creates a database, and adds an index.
- any information search request for example, an information search request specifying a keyword
- the computer (s) of the site providing these services searches related information by index search using the keyword, and based on the information search technology Data is returned to the requester of the search in descending order of the score with a defined score (a numerical value indicating the accuracy of the answer to the search request).
- This information is provided to the user in an order deemed appropriate as an answer to the search request by keyword.
- the user can obtain the information he / she needs to some extent by performing a search by connecting to such an information search service.
- the most important issue at the time of a disaster is how to support disaster victims at an early stage, and the support needed by disaster victims and the support provided by support groups, etc., in a sustainable and efficient manner. It is. If communication between the victim and the support organization goes smoothly, there is a possibility that the victim can be supported quickly. However, as described above, communication between victims and support organizations becomes extremely difficult during a disaster. As time goes on, the number of information increases and it becomes difficult to find necessary information. For this reason, there may be cases where appropriate support supplies cannot be delivered in a timely manner to the victims who need some kind of support.
- Prior art identifies and collects problem reports for specific disasters (eg wildfire) or infectious diseases (eg influenza).
- disasters eg wildfire
- infectious diseases eg influenza
- the problem was that carbon monoxide poisoning occurred because the room was closed and ventilation was neglected.
- it is not sufficient to collect problem reports only in specific categories, and it is necessary to identify and collect problem reports without such restrictions. If that is possible, it can identify and collect not only specific categories of problems, but also secondary problems derived from those problems.
- a message classifier is created by supervised learning. May All Your Wishes Come True: A Study of Wishes and How to Recognize Them. In Human Language Technologies: The 2009 Annual Conference of the North American Chapter of 271, The ACL, 271 Is disclosed. Similarly, Hiroshi Kanayama and Tetsuya Nasukawa. 2008. Textual demand analysis: Detection of users'wants and needs from opinions. In Proceedings of the 22nd International It is disclosed in Conference on Computational Linguistics (Coling 2008), pages 409-416, Manchester, UK, August. Coling 2008 Organizing Committee. These methods use a so-called machine learning method.
- problem reports and support information that solves the problems reported in the problem reports are collected on the network so that they can be automatically matched and the requests related to the requests and the requests related to the requests are satisfied.
- a technology that automatically collects and matches appropriate support information on a network cannot be found at present.
- technology that identifies what support information is required, or what problem communication from the expression in the support information It is necessary to have a technology to identify whether it corresponds to This has been considered a difficult task until now.
- the object of the present invention is to automatically report information belonging to a certain category, such as general problem reports, request reports, and support information for solving problems reported by them or satisfying requests. It is another object of the present invention to provide a text matching device and a text classification device that can collect information with high reliability and match the information appropriately and in a timely manner.
- the text matching device in a set of text classified into one of the first and second categories, the text of the second category with respect to the text of the first category.
- the text included in the set includes one or more morphemes constituting the text, dependency information of the one or more morphemes, and a combination of a noun included in the text and a predicate related to the noun.
- the text matching apparatus includes a storage unit that stores a first category text and a second text separately from each other, and a text that includes the first category text and the second category text from the storage unit.
- Text pair generating means for generating a pair of characters and matching feature generation for generating matching features including features when the text in the pair generated by the text pair generating means is classified by the text classifier
- matching means for determining whether or not two texts constituting the pair match each other using the matching feature generated by the matching feature generation means.
- the matching means includes a machine learning model that has been learned to use matching learning data in advance and determine whether or not a text pair is matched based on a matching feature.
- the matching features further include an n-gram on a dependency subtree that includes the nouns in the nucleus, determined for each of the text in the pair.
- Any of the n-grams includes time information, regional information, or morphemes representing the modalities of each text, or any combination thereof.
- one of the first and second categories is a category composed of text representing a problem report, and the other is a category composed of text representing support information for solving the problem.
- One of the first and second categories may be text that requests solution of the problem, and the other may be text that represents support information for solving the problem.
- the text classification device is preferably used together with the text matching device according to the first aspect, and is a device for classifying text into a specific category related to problem reporting or solution. is there.
- This device morphologically analyzes text and outputs a morpheme sequence to which part-of-speech information is attached.
- the morpheme sequence output by the morpheme analysis unit is analyzed for the morpheme sequence, and the text is modified.
- classifying means for classifying the text into a specific category and other categories using a combination of noun classification and predicate classification in the nucleus.
- the classifying means specifies, based on the morpheme string and the dependency relation of the text, a core specifying means for specifying a core of a sentence including a combination of a noun included in the text and a predicate related to the noun, Identify nouns in the nucleus identified by the nuclear identification means into problem nouns related to the occurrence of the problem and non-problem nouns related to the occurrence of the problem, and identify by the nuclear identification means
- Predicate classifying means for classifying the predicates in the generated core into predicates representing activation of a thing function represented by a noun related to the predicates or predicates representing inactivation; The combination of the result of classifying the noun in the nucleus by the noun classifying means and the result of classifying the predicate associated with the noun in the nucleus by the predicate classifying means for the nucleus identified by the nucleus identifying means Specific text from And means for classifying the categories and the other category.
- the means for classifying includes at least the result of classification by the noun classification means for the noun in the nucleus for the nucleus identified by the nucleus identifying means, and the predicate classification for the predicate related to the noun in the nucleus. It includes a determination unit based on machine learning that determines whether a given text belongs to a specific category using information representing a combination with the result of classification by the unit as a feature.
- the features further include n-grams on dependency subtrees that contain nouns in the nucleus determined for each of the texts, any of the n-grams being time information, regional information, or each text It may include morphemes representing the modalities, or any combination thereof.
- the computer program according to the third aspect of the present invention when executed by a computer, causes the computer to function as all the means of any of the text classification devices or text matching devices described above.
- the text matching method associates the text of the second category with the text of the first category in the set of texts classified into one of the first and second categories.
- the text included in the set includes one or more morphemes constituting the text, dependency information of the one or more morphemes, and a combination of a noun included in the text and a predicate related to the noun.
- the first category text and the second text are stored in a storage device separately from each other, and from the storage device, the first category text and the second category text are stored.
- the matching step uses the learning data for matching in advance, and uses the machine learning model that has been learned so as to determine whether or not the text pair matches based on the matching feature, so that the two texts that make up the pair match each other Determining whether or not to do so.
- the text classification method is a text classification method for classifying text into a specific category related to problem reporting or solution.
- This method performs morpheme analysis of text and outputs a morpheme sequence with part-of-speech information.
- the morpheme sequence output in the morpheme analysis step is analyzed for the dependency between the morphemes and the text dependency.
- a core of a sentence consisting of a combination of a noun included in the text and a predicate related to the noun based on a dependency analysis step for outputting dependency information representing the relationship, a morpheme string, and a dependency relationship of the text And classifying the text into the specific category and other categories using a combination of noun classification and predicate classification in the core.
- FIG. 8 is a block diagram showing a hardware configuration of the computer system shown in FIG. 7.
- a new method called a nuclear matrix method is used for problem report, request communication, and identification / collection of support information.
- this technique subdivides the dependency relationship between nouns and predicates expressed in text by a combination of noun classification and predicate polarity.
- the noun is classified into trouble expression and non-trouble expression, and the polarity of the predicate is classified into active and inactive. According to this polarity, each predicate is classified into an active template and an inactive template.
- Rouble expression refers to a noun that represents a problem or burden. For example, “failure”, “influenza”, “mistake”, “sloppy”, “influenza”, “atopy”, and the like are applicable.
- a non-trouble expression is an expression that generally does not represent a problem or burden, such as “bath”, “medical product”, “food”, and the like.
- An activity template is an expression pattern that includes a variable X, and is a combination of a particle and a predicate that indicates that the function or the like of the thing represented by the variable X is turned on (the function or the like is activated).
- a variable X is a combination of a particle and a predicate that indicates that the function or the like of the thing represented by the variable X is turned on (the function or the like is activated).
- Each of these is an expression that demonstrates the function of what is represented by X.
- An inactive template is an expression pattern that includes a variable X and indicates that the function of the thing represented by the variable X is “off” (the function is inactivated). For example, “Prevent X”, “Discard X”, “Reduce X”, “Destroy X”, and “X becomes impossible”.
- the dependency relationship corresponding to the problem core (1) indicates that the problem corresponding to the trouble expression and the burden function are turned on, and there is a tendency to indicate that the problem exists and has an influence.
- the dependency relationship corresponding to the problem core (2) indicates that the function of the event corresponding to the non-trouble expression is turned off, and there is a tendency to indicate that the countermeasure or support action is not functioning.
- the dependency relationship corresponding to the support core (1) tends to indicate that the problem corresponding to the trouble expression or the function of the burden is turned off and that the problem is solved or weakened.
- the dependency relationship corresponding to the support core (2) indicates that the function of the event corresponding to the non-trouble expression is turned on, and there is a tendency to indicate that a countermeasure or a support action is being executed or prepared.
- a request nucleus is provided with a request marker attached to a support nucleus.
- the request marker is the same as that used for collecting the desired items in the prior art. There are the following two types of request cores.
- the dependency relationship corresponding to this type requires that the problem and burden function corresponding to the trouble expression be turned off, and tends to indicate a request for the solution or weakening of the problem.
- the dependency relationship corresponding to this type requires that the function of the event corresponding to the non-trouble expression is turned on, and tends to indicate a request for execution or preparation of countermeasures or support actions.
- Problem reports tend to include one of two problem cores.
- the problem report that “There is not enough milk powder for allergies in 00 city” includes an expression corresponding to the problem core (1). Therefore, as a feature of the classifier, whether or not there is a problem nucleus in the text is used as a feature of the classifier. According to experiments described later, it has become clear that when this feature is used, the performance of problem report identification / collection is improved compared to the case where this feature is not used.
- the support information has a tendency to include one of the two support kernels.
- ⁇ Action and effect of requested core> A tendency to include one of two request cores is recognized in the request communication.
- the request message “Please deliver powdered milk for allergic infants to ⁇ City!”
- nucleus (2) contains expressions corresponding to nucleus (2). Therefore, in this embodiment, whether or not there is a request nucleus in the text is used as the feature of the classifier.
- ⁇ Co-occurrence pair matrix method> a new method called “co-occurrence pair matrix method” is used for matching between problem reports and support information and matching between request communication and support information.
- This method can be applied to a pair consisting of a problem report and support information, for example, when the problem core included in the problem report and the support nucleus included in the support information share the same noun. The same applies to a pair of request communication and support information.
- This method subdivides the types of these pairs according to the classification of nuclei contained in them and the polarity of predicates.
- the co-occurrence pair matrix method the following table 2 shows a matrix classification of the types of nuclei and the polarities of predicates contained in the nuclei.
- the pairs subdivided by the co-occurrence pair matrix method are the following two types.
- Problem nucleus-support nucleus pair This refers to a pair in which the nucleus type is the problem nucleus and the support nucleus pair, and the polarity of the predicate is opposite between the problem nucleus and the support nucleus. That is, a pair of a problem nucleus (1) and a support nucleus (1) or a pair of a problem nucleus (2) and a support nucleus (2) is applicable. Pairs whose nuclei are in other relations do not apply. Examples of problem-supporting nucleus pairs are shown in Table 3 below. It is assumed that each nucleus shares the same noun.
- Request nucleus-support nucleus pair This means that the nucleus type is the request nucleus and the support nucleus pair, and the polarity of the predicate is the same pair in the request nucleus and the support nucleus. That is, a pair of the request nucleus (1) and the support nucleus (1) or a pair of the request nucleus (2) and the support nucleus (2) is applicable. Pairs whose nuclei are in other relations do not apply. Examples of requested core-support pairs are as shown in Table 4 below. Again, it is premised that each nucleus shares the same noun.
- the combination of these information tends to include the request core-support core pair.
- the support information “Distribute powdered milk for allergic infants at the city hall” It is considered to be appropriate matching in the sense that it is solved by support information or the latter information contributes to the solution of the former problem.
- the combination of these information includes an expression corresponding to the request core-support core pair of “deliver powdered milk and distribute powdered milk”.
- the information matching system 30 collects various information (hereinafter referred to as “message”) including problem reports, request notifications and support information from the Internet 40.
- a format of a morpheme sequence to which information such as parts of speech is given by performing morpheme analysis on a message collected and shaped by the information collection unit 50 A morpheme analysis unit 52 that outputs the message, and a dependency analysis unit 54 that performs dependency analysis on the message after the morpheme analysis output by the morpheme analysis unit 52 and outputs the dependency relationship between the morphemes to the message;
- the place name / location that identifies the place name or place name from which the dependency relationship is given by the dependency analysis unit 54, and is given to each message
- a tough 58, and a place name and place a dictionary storage device 56 places, place specifying unit 58 stores the place name and place Dictionary utilized to identify the source of the message.
- a message such as Twitter transmitted from a mobile phone or the like may be given latitude / longitude information of the transmission location instead of the place name.
- the place name / location specifying unit 58 also has a function of specifying a place name / place name from such latitude / longitude information.
- the information matching system 30 further includes an information storage unit 60 that stores messages collected by the information collection unit 50 and processed by the morphological analysis unit 52, the dependency analysis unit 54, and the place name / location specifying unit 58, and information to be processed And a storage device 62 for storing information (classification feature generation data) necessary for calculating a plurality of types of features used for the determination by the machine learning model for classifying the features.
- an information storage unit 60 that stores messages collected by the information collection unit 50 and processed by the morphological analysis unit 52, the dependency analysis unit 54, and the place name / location specifying unit 58, and information to be processed
- a storage device 62 for storing information (classification feature generation data) necessary for calculating a plurality of types of features used for the determination by the machine learning model for classifying the features.
- the information matching system 30 further uses the data stored in the storage device 62 to calculate a feature for each message stored in the information storage unit 60, and uses a machine learning model that has been learned using the same feature,
- a problem report collection device 64 that collects problem report text by classifying messages into text belonging to the problem report category and other text, and a problem report accumulation that accumulates problem reports collected by the problem report collection device 64
- the unit 70 and the data stored in the storage device 62 are used to calculate a predetermined feature for each of the messages stored in the information storage unit 60, and the machine learning model previously learned from the learning data is used to generate the message.
- the problem report collection device 64, the support information collection device 66, and the request communication collection device 68 are devices that are preferably used to classify texts in advance as preparation for matching information in the information matching system 30. These use the same machine learning model features for classification. The same features are used during learning and during actual classification processing. However, teacher data is added manually during learning.
- the problem report collection device 64, the support information collection device 66, and the request communication collection device 68 are realized by machine learning using data stored in the storage device 62.
- the features of machine learning are according to the core composition matrix, the noun classification (trouble / non-trouble) of the core (problem core, support core, request core) in the message and the polarity (activity / Dependent relationship between the type of nucleus determined from (inactive), the evaluation expression in the message, evaluation type and evaluation polarity, the semantic class of the noun in the nucleus, and the message obtained using an evaluation expression dictionary (not shown) N-grams including nouns contained in the nucleus are used. As will be described later, with this n-gram, the time information included in the message, the region information related to the message, and the modality of the message are included in the features for matching.
- Message modality refers to subjective meanings related to how messages are written. For example, to distinguish (A) from (B) and (C) when there are the following three contexts (A) to (C) as a pair of “no water” and “water arrives”: belongs to.
- modalities There are various ways of classifying modalities, but there are two types of modalities: a modality that represents the writer's judgment on the message content and a modality that represents the attitude toward the reader. The former is further divided into a modality for authenticity judgment and a modality for value judgment. These can be determined by using expressions (modality elements) attached to predicates in the message as clues. For example, the modality of true / false judgment includes assertion, guess, judgment, hearing, explanation, and the like.
- the words “descendant” and “expect” indicate that this sentence is a kind of guessing or hearing, and is inappropriate for matching in applications such as this embodiment. It shows that.
- the predicate of a sentence ends with a final form such as a verb as in (A)
- it is appropriate as a target of matching because it describes the fact.
- a word for judging a modality is often arranged at a position related to a nucleus of a message. Therefore, by using n-grams including nouns included in the nucleus in the dependency relationship of messages as features, it is possible to match messages in a form that considers message modalities.
- the information stored in the storage device 62 includes a trouble expression dictionary (not shown) for storing the noun classification (trouble / non-trouble) and predicate polarity (activity / A non-illustrated polarity dictionary storing (inactive), an evaluation expression dictionary used to specify an evaluation expression included in the message, and an unillustrated expression used to determine whether the message includes any request expression It includes a requirement expression dictionary and a noun semantic class dictionary for identifying the semantic class to which each noun belongs.
- the evaluation expression dictionary is a dictionary for determining whether or not an evaluation for a certain thing exists in a message. Judgment of evaluation expression is input to a text file, and machine learning is used for opinion, reputation, and evaluation (hereinafter collectively referred to as “evaluation information”) for each event in each sentence in the text. This is a process for determining whether or not. If it is recognized that there is evaluation information in the sentence, extraction of an expression representing the evaluation information (evaluation expression extraction), semantic classification of the evaluation information (evaluation type classification), and positive nuance ( Whether it represents positive) or negative nuance (negative) is determined (evaluation polarity determination) or the like.
- the trouble expression dictionary is a dictionary for determining whether or not a nuclear noun represents trouble, and contains nouns related to illness, disasters, breakdowns, etc.
- the polarity dictionary is for determining whether the combination of a particle and a predicate included in the nucleus is active, inactive, or otherwise.
- Various predicate expressions and information that manually determines whether the polarity of each predicate is active or inactive are recorded.
- the request expression dictionary is used to determine whether or not a predicate included in a message or in the core includes a request marker, and records request markers collected manually.
- a semantic class dictionary is a dictionary that records words classified into classes (semantic classes) consisting of words that are semantically similar. For example, “influenza” and “atopic dermatitis” are registered as the same semantic class.
- the information matching system 30 further creates a pair by extracting information from the problem report stored in the problem report storage unit 70 and the support information stored in the support information storage unit 72 one by one.
- the problem report / support information matching device 76 matches the problem report and support information using a predetermined feature set including features obtained from each based on the co-occurrence pair matrix method.
- the information matching system 30 further creates a pair by extracting information from the support information stored in the support information storage unit 72 and the request contact stored in the request contact storage unit 74 one by one.
- a request contact / support information matching device 78 that matches the information in the pair using the machine learning model and outputs match information that associates the matched support information with the request contact.
- the request contact / support information matching device 78 also matches the request contact and support information using features obtained from each based on the co-occurrence pair matrix method.
- the information matching system 30 further includes a related information DB 80 for storing the match information output by the problem report / support information matching device 76 and the request contact / support information matching device 78, the problem report / support information matching device 76, and the request contact / And a storage device 82 that stores data for generating matching features (matching feature generation data) for matching performed by the support information matching device 78.
- the related information DB 80 is a database, for example, an arbitrary match information is called by an identifier of each information, a match information including a message including a specific keyword is called, or an arbitrary message is set using a place related to a specific place name as a key. You can call match information including
- the problem report / support information matching device 76 and the request contact / support information matching device 78 have the same configuration, and this embodiment is the same in that SVM (Support Vector Machine), which is an example of a machine learning method, is used in this embodiment. . However, the data used for SVM learning is different.
- SVM Serial Vector Machine
- the information matching system 30 further includes a web server 86 connected to the Internet, and an output generation unit 84 made up of a program for information search using the related information DB 80.
- the output generation unit 84 uses each part of the information matching system 30 described above to classify the messages included in the inquiry into problem reports, support information, and request communication. Accumulate.
- the output generation unit 84 also matches the message included in the inquiry with the existing problem report, support information, and request message, and stores the matching result in the related information DB 80.
- the output generation unit 84 further reads out match information that includes the message included in the query and satisfies the search condition included in the query from the related information DB 80, and outputs the formatted output data via the web server 86. Send to the other party.
- the related information DB 80 is generated, then, from the information in the related information DB 80, the matched problem report and support information or request communication and support information are extracted and displayed on the terminal, or as data on other devices Or provide.
- This process is performed by a program executed by the output generation unit 84.
- a process for returning a message that matches the message is generated as an output generator. 84.
- the geographical information about the position where the message is transmitted can be used as a matching element.
- the information matching system 30 is realized by computer hardware having a communication function, a computer program executed by the hardware, and data necessary for output generation when the computer program is executed. .
- the information collection unit 50, the morphological analysis unit 52, the dependency analysis unit 54, and the place name / location specifying unit 58 shown in FIG. 1 can be easily realized by conventional techniques. Therefore, here, processing for collecting problem reports, support information, and request communication from messages collected from the Internet 40, processing for generating and storing match information by matching the information, and useful information using the match information Will be described.
- FIG. 2 shows the configuration of the problem report collection apparatus 64 shown in FIG. 1 in a block diagram format.
- the configuration of the support information collection device 66 and the request communication collection device 68 shown in FIG. 1 is the same as that of the problem report collection device 64. Therefore, the configuration of the problem report collection device 64 will be mainly described below.
- problem report collection device 64 reads a new message from information storage unit 60 and stores it in storage device 62 based on information on the dependency relationship of the input message, morpheme sequence, and the like.
- a feature calculation unit 100 that calculates a predetermined feature using data, and whether or not the message is a problem report based on the input feature vector that has been learned in advance by learning data based on the feature calculated by the feature calculation unit 100
- the SVM 102 that outputs the determination result together with the score, and a selection unit 104 that selects a message determined to be a problem report by the SVM 102 and adds the score of the SVM 102 and stores it in the problem report storage unit 70. .
- the SVM 102 has learned a large number of messages by using learning data composed of the above-described feature sets obtained from them and a flag (correct data) indicating whether or not the message is a problem report.
- the present embodiment is characterized in that the features (noun classification and predicate polarity) obtained by the concept of the nuclear component matrix described above are used.
- the support information collection device 66 and the request communication collection device 68 have the same configuration as the problem report collection device 64. However, in the support information collection device 66, learning is performed using learning data with a flag indicating whether the message is support information or not in the learning of the SVM 102, and in the request contact collection device 68, whether the message is a request contact or not.
- the problem report collection device 64 is different from the problem report collection device 64 in that learning is performed using the learning data with the flag indicating.
- FIG. 3 is a simplified block diagram of the problem report / support information matching device 76 shown in FIG.
- the request contact / support information matching device 78 basically has the same configuration as the problem report / support information matching device 76. Therefore, only the configuration of the problem report / support information matching device 76 will be described below.
- problem report / support information matching device 76 reads out messages one by one from both problem report storage unit 70 and support information storage unit 72, and stores them in storage device 82 from the set of these two messages.
- the feature calculation unit 130 that calculates a predetermined feature using the stored data and outputs it as a feature vector matches the problem report to be processed and the support information based on the feature vector output from the feature calculation unit 130
- the SVM 132 that has been learned in advance and the combination of the problem report processed by the feature calculation unit 130 and the support information are selected based on the output of the SVM 132, and the related information is selected.
- a selection unit 134 stored in the DB 80.
- the features calculated by the feature calculating unit 130 include features based on the above-described co-occurrence pair matrix method, including the presence / absence of a common word in nouns constituting “noun + predicate”, the presence / absence of a common semantic class,
- the score at the time of determination by SVM102 (refer FIG. 2) of the problem report collection device 64 is included.
- the SVM 132 learning is performed using the same feature as the feature calculated by the feature calculation unit 130 based on learning data including a problem report and support information determined to be matched in advance.
- learning data including a problem report and support information determined to be matched in advance.
- the configuration of the request communication / support information matching device 78 is the same as that of the problem report / support information matching device 76. However, it differs from the case of the problem report / support information matching device 76 in that the learning data when learning the SVM 132 is related to a combination of request communication and support information.
- the output generation unit 84 has a function of outputting information that matches a message input by the user.
- the message to be input there are typically problems such as “Insufficient infant milk for allergies is insufficient”, “Allergic infant milk powder is distributed at XX city hall”, etc.
- the output generation unit 84 searches the related information DB 80 for information that matches the input message, and displays a list of the information according to which of the input messages corresponds to the input message. Described below are an example of a program for realizing the output generation unit 84 and its input / output screen.
- input screen 220 displayed by accessing output generation unit 84 from a remote terminal via a web browser includes message input field 230 and date information input panel 232 related to search. And an input panel 234 for geographic conditions, and a search button 236 that triggers transmission of a search request to the information matching system 30.
- the date information input panel 232 and the geographical condition input panel 234 are used for further narrowing down information based on specific conditions, even among the information that matches the message input field 230.
- the information is narrowed down by date and geographical conditions, but other information (for example, keyword, sender, outgoing time, completion or not), etc. may be used as the narrowing conditions.
- FIG. 5 shows an example of a screen returned from the information matching system 30 after inputting some message on the screen shown in FIG. 4 and transmitting it to the information matching system 30.
- the screen 250 includes an input display area 260 that displays a message input by the user for confirmation, a matching information display panel 262 that displays information that matches the input message, and a matching information display panel 262.
- a map panel 264 that displays the transmission area or related point of the message displayed in the form of a pin 266 on the map, a search condition display panel 268 that displays the search condition input by the user for confirmation, and a condition And a re-search button 270 operated by the user when the search is changed.
- the re-search button 270 is clicked, an input screen 220 shown in FIG. 4 is displayed.
- FIG. 6 shows a control structure of a program for returning information matching a message input from the user to the user terminal using the screens shown in FIGS. 4 and 5 as an example of a program for realizing the output generation unit 84.
- the output generation unit 84 can be realized as various forms of web applications using the related information DB 80.
- this program is started when web server 86 receives a search request from a user terminal and passes it to output generation unit 84.
- a message input by the user, a search condition regarding date, and a geographical search condition are passed to this program.
- the GPS information of the terminal that issued this request may be passed to this program in addition to the geographical information. In this example, it is assumed that such GPS information is passed as an argument to the output generation unit 84.
- morphological analysis is performed on the message (step 290), and a morpheme string is output.
- Dependency analysis is performed on this morpheme string (step 292), and a message is issued using the place name / location dictionary storage device 56 shown in FIG. 1 based on the geographical information or GPS information given to this message.
- the location is identified (step 294), added to the message, and added to the information storage unit 60 of FIG.
- a feature set of the message is calculated from the inputted message according to the information stored in the storage device 62 in step 298 and a predetermined feature calculation method, and a feature vector is formed.
- the SVM 102 (see FIG. 2) of the problem report collection device 64, the support information collection device 66 and the request communication collection device 68 shown in FIG. It is judged whether it corresponds to either of these. If it is determined that the message is a problem report, it is stored in the problem report storage unit 70, if it is support information, it is stored in the support information storage unit 72, and if it is a request message, it is stored in the request message storage unit 74 (step 302).
- the reason why the input message is classified and stored as any of problem report, support information, and request communication is to add this message as a matching target with a message input later.
- step 304 it is determined what the classification result was (step 304). If the input message is a problem report or request message, matching with support information is performed in step 306, and if the message is support information, this message is matched with the problem report and request message in step 308. Subsequently, as a result of the matching in step 306 or 308, it is determined whether there is information matching the message (step 309). If there is information that matches this message, the matched information is associated with this message and added to the related information DB 80 (step 310).
- an HTML document corresponding to the screen 250 shown in FIG. 5 is generated in step 312 and sent back to the terminal that sent the first message. Terminates processing for the message.
- step 314 a screen indicating that there is no information matching the input message is output, and the process ends.
- the text “There was no information that matched the input message. Do you want to be notified when matched information is found in the future?” Is displayed on the screen 250, and whether or not to notify is selected. Buttons, etc., and fields, buttons, etc. for inputting information (email address etc.) necessary for notification are displayed.
- notification it is necessary to search the related information DB 80 for match information including this message, and to transmit the information to the recorded mail address if any match information is found.
- this information transmission process is not directly related to the essential part of the present invention, the details are not described here.
- step 316 a message indicating that the message is input again is displayed on the screen 250 while changing the conditions such as the expression of the previously input message, and the process is terminated.
- the re-search button 270 an input screen 220 shown in FIG. 4 is displayed. The user can execute the re-search by changing the search conditions such as the message expression and the date and time.
- the information matching system 30 described above operates as follows. Referring to FIG. 1, prior to this, problem report collection device 64, support information collection device 66, request contact collection device 68, problem report / support information matching device 76 and request contact / support information matching device shown in FIG. It is assumed that the 78 SVMs have completed learning with appropriate learning data in advance.
- the information collection unit 50 of the information matching system 30 first collects various information existing on the Internet 40 and provides it to the morpheme analysis unit 52.
- information transmitted on a system that transmits a problem report, request notification, and support information in a relatively short sentence such as Twitter is mainly collected.
- the morpheme analysis unit 52 performs morpheme analysis on each information
- the dependency analysis unit 54 further performs dependency analysis, and attaches dependency information of each sentence to each information.
- the place name / place specifying unit 58 gives information related to a related area or a transmitted area.
- these pieces of information are stored in the information storage unit 60. Note that the information collected by the information collecting unit 50 is usually given the date and time when the information was transmitted.
- the feature calculation unit 100 (see FIG. 2) of the problem report collection device 64 reads information from the information storage unit 60 and uses the feature generation data stored in the storage device 62 to determine the feature for determining the problem report. Extract from the information to generate a feature vector.
- the SVM 102 receives this feature vector, determines whether or not the information corresponding to the feature vector is a problem report, and outputs a determination result. If the determination is affirmative (the information is a problem report), the selection unit 104 adds this information to the problem report storage unit 70. If the determination is negative, nothing is done with respect to this information, and the problem report collection device 64 proceeds to processing the next information.
- the support information collection device 66 and the request communication collection device 68 operate in the same manner as the problem report collection device 64. However, since each SVM performs learning using learning data different from the SVM of the problem report collection device 64, each of the SVMs determines whether the input information is support information and whether it is a request message. Other than that, there is no difference in the operation of the problem report collection device 64, the support information collection device 66, and the request communication collection device 68.
- the problem report storage unit 70 stores the problem report, support information, and request communication, respectively.
- the problem report / support information matching device 76 performs matching processing on new information each time new information is stored in the problem report storage unit 70 or the support information storage unit 72. If the new information is a problem report, matching is performed with all of the support information stored in the support information storage unit 72. If the new information is support information, the problem report stored in the problem report storage unit 70 is matched. Matches all of.
- the operation of the problem report / support information matching device 76 when a problem report is newly added to the problem report storage unit 70 will be described.
- the feature calculation unit 130 when the feature calculation unit 130 reads out a new problem report from the problem report storage unit 70, the feature calculation unit 130 reads out support information stored in the support information storage unit 72, and each of them and a new problem report Are combined to generate a combination of a problem report and support information.
- the feature calculation unit 130 further calculates features for all of these combinations using data stored in the storage device 82, and generates feature vectors.
- these features are features based on the co-occurrence pair matrix method described above, and whether or not there is a common word in the nouns constituting “noun + predicate”. It includes the presence or absence of a semantic class, and further includes a score at the time of determination by the SVM 102 (see FIG. 2) of the problem report collection device 64.
- the SVM 132 receives the feature vector generated by the feature calculation unit 130, determines whether the problem report and support information included in the combination corresponding to the feature vector match each other, and outputs the determination result. .
- the selection unit 134 adds the combination in which the determination of the SVM 132 is affirmative to the related information DB 80, and does nothing otherwise.
- the problem report / support information matching device 76 When the information newly read by the problem report / support information matching device 76 is support information, the problem report / support information matching device 76 performs an operation in which the support information and the problem report are exchanged in the above description.
- the problem report / support information matching device 76 accumulates the problem report and the support information that match each other in the related information DB 80.
- the operation of the request communication / support information matching device 78 is the same. Therefore, details of the operation of the request contact / support information matching device 78 will not be repeated.
- the features used by the SVM of the request contact / support information matching device 78 are the same as the features used by the problem report / support information matching device 76 in this embodiment.
- match information consisting of problem reports and support information that match each other and match information consisting of support information and request communication are accumulated. If this match information can be accumulated, this information can be used in various ways later.
- the process executed by the output generation unit 84 in this embodiment is merely an example of a method for using match information. There are many other ways to use this information.
- the user In order to use the information matching system 30, the user displays an input screen 220 shown in FIG. For example, this screen is displayed when a URL for using the information matching system 30 is accessed by a browser.
- the user inputs a message indicating the problem he / she encounters, information about the support he / she wants to provide, any desired items, etc. in the message input field 230, and searches the input panel 232 and the input panel 234 as necessary. Enter.
- the search button 236 a search request is transmitted to the web server 86 of the information matching system 30 using the message text and the input search condition as parameters.
- the web server 86 upon receiving this search request, passes the message text and the input search condition to the output generation unit 84 as parameters.
- the output generation unit 84 starts the program by passing parameters to the program whose control structure is shown in FIG.
- the output generation unit 84 performs morphological analysis (step 290), dependency analysis (step 292), and place identification processing (step 294) on the input message, and then receives the input message.
- the message and the information obtained in steps 290, 292 and 294 are stored in the information storage unit 60 shown in FIG. When search conditions are entered, they are added to the message in the form of “month, day” or “in”.
- the problem report collection device 64, the support information collection device 66, and the request communication collection device 68 collect problem reports, support information, and request notifications each time new information is accumulated in the information storage unit 60, and accumulate problem reports.
- the problem report / support information matching device 76 stores support information or a problem report that matches the information.
- the information is stored in the related information DB 80 by searching from the unit 72 and the problem report storage unit 70 and associating matching information with each other.
- the request communication / support information matching device 78 sends a request communication or support information that matches the information to the request communication storage unit. 74 and the support information storage unit 72 are read out, and information for associating matching information is stored in the related information DB 80.
- the output generation unit 84 searches the related information DB 80, extracts information associated with the input message, and displays the information in a list on the matching information display panel 262 of FIG. If there is a lot of associated information, the matching information display panel 262 can be scrolled.
- the output generation unit 84 further associates each information displayed on the matching information display panel 262 with the map panel 264 to which the information is transmitted or each information based on the geographical information attached thereto.
- the pin 266 or the like is displayed at the position.
- problems already resolved problem reports that have already been obtained and resolved in the event of a disaster, support information that has been distributed in the event of a disaster, etc. are removed from the display. It is desirable. For this purpose, for example, after determining the support distribution destination based on the screen shown in FIG. 5 and contacting the support provider, the problem solved by the treatment, the support information that the support supplies have been exhausted, and the satisfaction For the requested request and the like, a flag indicating completion in FIG. 5 may be input.
- This embodiment solves such a problem.
- Technology that identifies and collects problem reports, request communications, and support information prevents the burial of necessary information, makes it easier for victims to obtain support information, and support organizations can identify problems and requests that sufferers have. Contribute to grasping.
- the problem report-support information or request communication-support information matching technique makes it possible to find and reply to support information directly related to a problem report transmitted by a disaster victim.
- any problem or request can be dealt with by distinguishing between a problem report or request contact matched with support information and a problem report or request contact for which no match is found. You will be able to get an overview of how things are not addressed. As a result, it is possible to contribute to reducing waste of resources and time of support groups.
- the problem report storage unit 70, the support information storage unit 72, and the request communication storage unit 74 have been described as being separate devices, but these may be stored in a single storage device. It is possible to store them all in the same file. In short, it is only necessary that information belonging to these different categories can be distinguished from each other. For example, information indicating the category may be attached to each record in the file.
- the question answering system provided by voice on smartphones is in the spotlight, but in order to solve the problem with the question-answering system, it is necessary to consider what kind of questions can be solved to solve the problem. Don't be. Expert knowledge is often required to consider appropriate questions.
- the above embodiment is important in the sense that it is possible to search for support information directly from a problem. Is.
- the “embodiment” is the noun of the nucleus (problem nucleus, support nucleus, request nucleus) in the message according to the nucleus composition matrix.
- Classification trouble / non-trouble
- predicate polarity active / inactive
- evaluation expression in the message evaluation type, evaluation polarity, etc.
- N-grams including nouns included in the core are used in the dependency relationship between the semantic classes of the nouns and the messages.
- Comparative Example 1 is the same method as used in the embodiment, but uses the classification of the noun (trouble / non-trouble) and the polarity (active / inactive) of the nuclear predicate as the SVM features This is the result of an experiment conducted without using the features related to the nuclear component matrix.
- Comparative Example 2 is the same method as that used in the embodiment, but the determination is made without using the feature obtained using the evaluation expression dictionary.
- Comparative Example 3 is the same method as that used in the embodiment, but the determination is made without using the word semantic class as a feature.
- the information matching system 30 according to the above embodiment can be realized by computer hardware and the above-described computer program executed on the computer hardware.
- FIG. 7 shows the external appearance of the computer system 330
- FIG. 8 shows the internal configuration of the computer system 330.
- a computer system 330 includes a computer 340 having a memory port 352 and a DVD (Digital Versatile Disc) drive 350, a keyboard 346, a mouse 348, and a monitor 342.
- DVD Digital Versatile Disc
- the computer 340 in addition to the memory port 352 and the DVD drive 350, the computer 340 includes a CPU (Central Processing Unit) 356, a bus 366 connected to the CPU 356, the memory port 352, and the DVD drive 350, and a boot program.
- a read-only memory (ROM) 358 for storing etc., a random access memory (RAM) 360 connected to the bus 366 for storing program instructions, system programs, work data, etc., and a hard disk 354 are included.
- the computer system 330 further includes a network interface (I / F) 344 that provides a connection to a network 368 that allows communication with other terminals.
- I / F network interface
- a computer program for causing the computer system 330 to function as each functional unit of the information matching system 30 according to the above-described embodiment is stored in the DVD drive 350 or the DVD 362 or the removable memory 364 attached to the memory port 352, and further the hard disk. 354.
- the program may be transmitted to the computer 340 through the network 368 and stored in the hard disk 354.
- the program is loaded into the RAM 360 when executed.
- the program may be loaded directly from the DVD 362 to the RAM 360 from the removable memory 364 or via the network 368.
- This program includes an instruction sequence including a plurality of instructions for causing the computer 340 to function as each functional unit of the information matching system 30 according to the above embodiment.
- Some of the basic functions necessary for the computer 340 to perform this operation are provided by an operating system or third party program running on the computer 340 or various programming toolkits or program libraries installed on the computer 340. The Therefore, this program itself does not necessarily include all functions necessary for realizing the system and method of this embodiment.
- This program includes only instructions that realize the functions of the system described above by calling appropriate functions or appropriate program tools in a programming tool kit in a controlled manner so as to obtain a desired result. Should be included. Of course, all necessary functions may be provided only by the program.
- the information storage unit 60, the storage device 62, the problem report storage unit 70, the support information storage unit 72, the request communication storage unit 74, the storage device 82, and the like are realized by the RAM 360 or the hard disk 354. These values may be further stored in a removable memory 364 such as a USB memory, or may be transmitted to another computer via a communication medium such as a network.
- the related information DB 80 is also realized by the RAM 360, the hard disk 354, and a database management program executed by the CPU 356.
- a database management program a so-called open source database management program can be used in addition to a commercially available program.
- the present invention provides matching services between information posted in various systems on a network, and provides a service that efficiently provides necessary information to a user who needs some information, and therefore It can be used for industries that provide facilities.
- Information Matching System 40 Internet 50 Information Gathering Unit 52 Morphological Analysis Unit 54 Dependency Analysis Unit 56 Place Name / Place Dictionary Storage Device 58 Place Name / Place Identification Unit 60 Information Storage Unit 62 Data Storage Device for Classification Feature Generation 64 Problem Report Collection device 66 Support information collection device 68 Request communication collection device 70 Problem report storage unit 72 Support information storage unit 74 Request communication storage unit 76 Problem report / support information matching device 78 Request communication / support information matching device 80 Related information DB 82 Matching Feature Generation Data Storage Device 84 Output Generation Unit 86 Web Server 100, 130 Feature Calculation Units 102, 132 SVM 104,134 selection unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
最初に、問題報告、要望連絡及び支援情報という3つのカテゴリに属するテキストを特定・収集するための核構成マトリックス手法(テーブル1)について説明し、続いて、その作用と効果について説明する。次に、問題報告と支援情報のマッチング・要望連絡と支援情報のマッチングのために開発した共起対マトリックス手法(テーブル2)について説明した後、その作用・効果について述べる。なお、以下では、問題報告というカテゴリに属するテキスト、要望連絡というカテゴリに属するテキスト及び支援情報というカテゴリに属するテキストを、それぞれ問題報告、要望連絡及び支援情報と呼ぶことがある。
問題核(1)とは、名詞+述部のうち、名詞=トラブル表現、かつ、述部=活性テンプレートの係り受け表現のことをいう。例えば、「ヘドロが蓄積する」(ヘドロ=トラブル表現、Xが蓄積する=活性テンプレート)等が該当する。問題核(1)に該当する係り受け関係は、トラブル表現に該当する問題や負担の機能がオンになることをあらわし、問題が存在すること、影響を及ぼすこと等を示す傾向がある。
問題核(2)とは、名詞が非トラブル表現で、かつ、述部が不活性テンプレートの係り受け関係のことをいう。例えば、「お風呂に困る」(お風呂=非トラブル表現、Xに困る=不活性テンプレート)等が該当する。災害時、断水又は停電のためにお風呂に入れないことが問題となっていたが、そのような場合に発生する可能性の高い表現である。問題核(2)に該当する係り受け関係は、非トラブル表現に該当する事象の機能がオフになることをあらわし、対処又は支援行為等が機能していないこと等を示す傾向がある。
支援核(1)とは、名詞がトラブル表現で、述部が不活性テンプレートの係り受け関係のことをいう。例えば、「ヘドロを除去する」(ヘドロ=トラブル表現、Xを除去する=不活性テンプレート)等が該当する。支援核(1)に該当する係り受け関係は、トラブル表現に該当する問題又は負担の機能がオフになることをあらわすことで問題の解決又は弱体化等を示す傾向がある。
支援核は、名詞が非トラブル表現で、述部が活性テンプレートの係り受け関係のことをいう。例えば、「お風呂を開放する」(お風呂=非トラブル表現、Xを開放する=活性テンプレート)等が該当する。支援核(2)に該当する係り受け関係は、非トラブル表現に該当する事象の機能がオンになることをあらわし、対処又は支援行為等が実行又は準備されていること等を示す傾向がある。
要望核(1)は、支援核(1)に要求マーカが付与されたものをいう。例えば「ヘドロを片付けてください」(ヘドロ=トラブル表現、Xを片付ける=不活性テンプレート、ください=要求マーカ)等が該当する。このタイプに該当する係り受け関係は、トラブル表現に該当する問題や負担の機能がオフになることを要求するもので、問題の解決や弱体化等に対する要望を示す傾向がある。
要望核(2)は、支援核(2)に要求マーカが付与されたものをいう。例えば「お風呂を提供してほしい」(お風呂=非トラブル表現、Xを提供する=活性テンプレート、ほしい=要求マーカ)等が該当する。このタイプに該当する係り受け関係は、非トラブル表現に該当する事象の機能がオンになることを要求するもので、対処又は支援行為等の実行又は準備に対する要望を示す傾向がある。
問題報告には、2つの問題核のいずれかが含まれる傾向が認められる。例えば、「〇〇市でアレルギー用の粉ミルクが足りないとのことです。」という問題報告には、問題核(1)に該当する表現が含まれている。そこで、分類器の素性として、テキスト中に問題核が有るか無いかを分類器の素性として利用する。後述する実験によれば、この素性を利用した場合、利用しない場合と比較して問題報告の特定・収集の性能が向上することが明らかとなった。
支援情報には、2つの支援核のいずれかが含まれる傾向が認められる。例えば、「〇〇市役所でアレルギー幼児用の粉ミルクを配布します」という支援情報は、「粉ミルクを配布する」(粉ミルク=非トラブル、Xを配布する=活性テンプレート)という支援核に該当する表現を含む。そこで、本実施の形態では、分類器の素性として、テキスト中に支援核が有るか無いかを利用する。後述するように、この素性を利用した場合には、しない場合と比較して、支援情報の特定・収集の性能が向上することが実験の結果明らかになった。
要望連絡には、2つの要望核のいずれかが含まれる傾向が認められる。例えば、「〇〇市にアレルギー幼児用の粉ミルクを届けてください!」という要望連絡は、「粉ミルクを届けてください」(粉ミルク=非トラブル、Xを届ける=活性テンプレート、ください=要求マーカ)という要望核(2)に該当する表現を含む。そこで、本実施の形態では、分類器の素性として、テキスト中に要望核が有るか無いかを利用する。この素性を利用した場合には、しない場合と比較して、要望連絡の特定・収集の性能が向上することが実験の結果明らかになった。
本実施の形態では、問題報告と支援情報とのマッチング及び要望連絡と支援情報とのマッチングのために、「共起対マトリックス手法」と呼ぶ新規な手法を用いる。この手法は、例えば問題報告に含まれる問題核と支援情報に含まれる支援核とが同じ名詞を共有する場合に、その問題報告と支援情報とからなるペアに対して適用できる。要望連絡と支援情報とのペアの場合も同様である。この手法は、これらペアの種類を、それらに含まれる核の分類と、述部の極性とによって細分化する。共起対マトリックス手法において、核の種類と、核に含まれる述部の極性とによる分類をマトリックス化したものを次のテーブル2に示す。
核の種類が問題核と支援核ペアで、かつ、述部の極性が問題核と支援核で反対となるペアのことをいう。すなわち、問題核(1)と支援核(1)のペアか、問題核(2)と支援核(2)のペアが該当する。核が他の関係となるペアは該当しない。問題‐支援核ペアの例は以下のテーブル3に示す通りである。それぞれの核は同じ名詞を共有することが前提である。
核の種類が要望核と支援核ペアで、かつ、述部の極性が要望核と支援核で同じペアのことをいう。すなわち、要望核(1)と支援核(1)とのペアか、要望核(2)と支援核(2)とのペアが該当する。核が他の関係となるペアは該当しない。要望核‐支援ペアの例は以下のテーブル4に示す通りである。この場合も、それぞれの核が同じ名詞を共有することが前提である。
問題報告と支援情報とが適切にマッチできる場合は、これら情報の組合せが問題核‐支援核ペアを含む傾向が認められる。例えば、「〇〇市にアレルギー幼児用の粉ミルクが足りないとのことです」という問題報告と「〇〇市役所でアレルギー幼児用の粉ミルクを配布します」という支援情報とは、前者の問題が後者の支援情報によって解決する、又は、後者の情報が前者の問題解決に貢献する、という意味で適切なマッチングと考えられる。このペアには、「粉ミルクが足りない⇔粉ミルクを配布する」という問題核‐支援核ペアに該当する表現が含まれている。そこで、本実施の形態では、分類器の素性として、問題報告と支援情報のペアに問題核‐支援核ペアが有るか無いかを利用する。後述するように、この素性を利用した場合には、利用しない場合と比較して適切な問題要望‐支援情報ペアをマッチングできる性能が向上することが実験の結果明らかになった。
要望報告と支援情報が適切にマッチできる場合は、これら情報の組合せが要望核‐支援核ペアを含む傾向が認められる。例えば、「〇〇市にアレルギー幼児用の粉ミルクを届けてください!」という要望連絡と「〇〇市役所でアレルギー幼児用の粉ミルクを配布します」という支援情報とは、前者の要望が、後者の支援情報によって解決する、又は、後者の情報が前者の問題の解決に貢献する、という意味で適切なマッチングと考えられる。これらの情報の組合せは、「粉ミルクを届けてください⇔粉ミルクを配布する」という要望核‐支援核ペアに該当する表現を含む。そこで、以下に述べる実施の形態では、分類器の素性として、要望報告と支援情報とのペアに要望核‐支援核ペアが有るか無いかを利用する。この素性を利用した場合には、利用しない場合と比較して適切な要望連絡‐支援情報ペアをマッチングできる性能が向上することが実験の結果明らかになった。
図1を参照して、この実施の形態に係る情報マッチングシステム30は、インターネット40から、問題報告、要望連絡及び支援情報を含む様々な情報(以下これらを「メッセージ」と呼ぶ。)を収集し、後続する情報処理に適した形式に整形する情報収集部50と、情報収集部50により収集され整形されたメッセージに対して形態素解析を行なって、品詞等の情報が付与された形態素列の形式で出力する形態素解析部52と、形態素解析部52が出力する形態素解析後のメッセージに対して係り受け解析を行ない、形態素間の係り受け関係をメッセージに付与して出力する係り受け解析部54と、係り受け解析部54により係り受け関係が付与されたメッセージの出所である地名又は場所名を特定し、各メッセージに付与する地名・場所特定部58と、地名・場所特定部58がメッセージの出所を特定するために利用する地名・場所辞書を記憶する地名・場所辞書記憶装置56とを含む。携帯電話等から発信されるTwitter等のメッセージには、地名ではなくその発信地の緯度・経度情報が付与されることがある。地名・場所特定部58は、そのような緯度・経度情報から地名・場所名を特定する機能も持っている。
(B)「水が無い」「水が届くというのはデマ」
(C)「水が無い」「水が届くことを予想してます」
モダリティの分類の仕方には様々な立場があるが、大きく分けると、メッセージの内容に対する書き手の判断のあり方を表すモダリティと、読み手に対する態度を表すモダリティとの2つがある。前者はさらに、真偽判断のモダリティと、価値判断のモダリティとに分けられる。これらは、メッセージ中の述部に付随した表現(モダリティ要素)を手掛かりとして判定できる。例えば、真偽判断のモダリティとしては、断定、推量、判断、伝聞、説明等がある。上の例では「デマ」及び「予想してます」という語が、この文が推量、又は伝聞の類であることを示し、本実施の形態のような用途ではマッチングに対象として不適切であることを示す。一方、(A)のように文の述部が動詞等の終止形で終わる場合には、事実を述べているため、マッチングの対象として適切である。モダリティを判断するための語は、メッセージの核を中心としてその核に関連した位置に配されることが多い。したがって、メッセージの係り受け関係において核に含まれる名詞を含むn-グラムを素性に使用することにより、メッセージのモダリティを考慮した形でメッセージのマッチングができる。
以上説明した情報マッチングシステム30は以下のように動作する。図1を参照して、これに先立って、図1に示す問題報告収集装置64、支援情報収集装置66、要望連絡収集装置68、問題報告・支援情報マッチング装置76及び要望連絡・支援情報マッチング装置78のSVMは、予め適切な学習データで学習を完了しているものとする。
SVM102は、この素性ベクトルを受け、その素性ベクトルに対応する情報が問題報告か否かを判定し、判定結果を出力する。選択部104は、判定が肯定(情報は問題報告である。)ならこの情報を問題報告蓄積部70に追加する。判定が否定ならこの情報については何もせず、問題報告収集装置64は次の情報の処理に移る。
本実施の形態に係るシステムの効果が最も発揮される利用例の1つは、大規模災害時における被災者と支援団体のコミュニケーションの円滑化に関連する。大規模災害時には、先述したように、被災者からの問題報告及び要望連絡がTwitter等を介して発信されるものの、多量に発信されるツイートにこれらの情報が埋没してしまうという傾向がある。支援団体等が発信する支援情報についても同様である。こうした現象は、被災者が支援情報を必要としているにもかかわらず入手しにくいという問題に繋がる。一方で、支援団体にとっては、最も支援を必要としている相手が発見できないという問題に繋がる。仮に、被災者の要望及び問題を支援団体が認識できても、どの問題等が対処されたのかわからない状況では、複数の支援団体が同じ要望や問題に対応し、結果として、資源や時間が無駄に消費されるという問題にも繋がる。
なお、上記実施の形態の効果を検証するため、いくつかの実験を行なった。以下に、その実験結果を比較例とともに示す。テーブル5,6,7はそれぞれ、問題報告、支援情報及び要望連絡の特定精度に関する実験結果である。
上記実施の形態に係る情報マッチングシステム30は、コンピュータハードウェアと、そのコンピュータハードウェア上で実行される上記したコンピュータプログラムとにより実現できる。図7はこのコンピュータシステム330の外観を示し、図8はコンピュータシステム330の内部構成を示す。
40 インターネット
50 情報収集部
52 形態素解析部
54 係り受け解析部
56 地名・場所辞書記憶装置
58 地名・場所特定部
60 情報蓄積部
62 分類用素性の生成用データの記憶装置
64 問題報告収集装置
66 支援情報収集装置
68 要望連絡収集装置
70 問題報告蓄積部
72 支援情報蓄積部
74 要望連絡蓄積部
76 問題報告・支援情報マッチング装置
78 要望連絡・支援情報マッチング装置
80 関連情報DB
82 マッチング用素性の生成用データの記憶装置
84 出力生成部
86 ウェブサーバ
100,130 素性算出部
102,132 SVM
104,134 選択部
Claims (10)
- 第1及び第2のカテゴリのいずれかに分類されたテキストの集合において、前記第1のカテゴリのテキストに対して、前記第2のカテゴリのテキストを対応付けるテキストマッチング装置であって、
前記集合に含まれるテキストは、当該テキストを構成する1又は複数の形態素、当該1又は複数の形態素の係り受け情報、及び、前記テキストに含まれる名詞と、当該名詞が係る述部との組合せからなる文の核内の名詞の分類と述部の分類との組合せとを素性として用いた機械学習を用いたテキスト分類装置により前記第1及び第2のカテゴリに分類されており、
前記テキストマッチング装置は、
前記第1のカテゴリのテキストと、前記第2のテキストとを互いに区別して記憶する記憶手段と、
前記記憶手段から、前記第1のカテゴリのテキストと前記第2のカテゴリのテキストとからなるテキストのペアを生成するテキストペア生成手段と、
前記テキストペア生成手段により生成された前記ペア内のテキストが前記テキスト分類装置により分類されたときの前記素性を含むマッチング用の素性を、前記ペアから生成するマッチング用素性生成手段と、
前記マッチング用素性生成手段により生成されたマッチング用素性を用いて、前記ペアを構成する2つのテキストが互いにマッチするか否かを判定するマッチング手段とを含み、
前記マッチング手段は、予めマッチング用の学習データを用い、前記マッチング素性により、テキストのペアがマッチするか否かを判定するように学習済の、機械学習モデルを含む、テキストマッチング装置。 - 前記マッチング用素性はさらに、前記ペア内のテキストの各々について求められた、前記核内の名詞を含む係り受け関係の部分木上のn-グラムを含み、
当該n-グラムのいずれかは、時間情報、地域情報、若しくは各テキストのモダリティを表す形態素、又はこれらの任意の組合せを含む、請求項1に記載のテキストマッチング装置。 - 前記第1及び第2のカテゴリの一方は問題の報告を表すテキストからなるカテゴリであり、他方は問題を解決するための支援情報を表すテキストからなるカテゴリである、請求項1又は請求項2に記載のテキストマッチング装置。
- 前記第1及び第2のカテゴリの一方は問題の解決を要望するテキストであり、他方は問題を解決するための支援情報を表すテキストである、請求項1又は請求項2に記載のテキストマッチング装置。
- テキストを、問題の報告又は解決に関連する特定のカテゴリに分類するためのテキスト分類装置であって、
前記テキストを形態素解析し、品詞情報が付された形態素列を出力する形態素解析手段と、
前記形態素解析手段の出力する形態素列に対し、形態素間の係り受けを解析し、前記テキストの係り受け関係を表す係り受け情報を出力する係り受け解析手段と、
前記形態素列と、前記テキストの係り受け関係とに基づいて、前記テキストに含まれる名詞と、当該名詞が係る述部との組合せからなる文の核を特定し、当該核内の名詞の分類と述部の分類との組合せを用いて、前記テキストを前記特定のカテゴリとそれ以外のカテゴリとに分類する分類手段を含む、テキスト分類装置。 - 前記分類手段は、
前記形態素列と、前記テキストの係り受け関係とに基づいて、前記テキストに含まれる名詞と、当該名詞が係る述部との組合せからなる文の核を特定する核特定手段と、
前記核特定手段により特定された核内の名詞を、問題の発生に関連した問題系の名詞と、問題の発生と関連しない非問題系の名詞とに分類する名詞分類手段と、
前記核特定手段により特定された核内の述部を、その述部に係る名詞により表される事物機能が活性化することを表す述部か、不活性化することを表す述部かに分類する述部分類手段と、
前記核特定手段により特定された核について、当該核内の名詞について前記名詞分類手段により分類された結果と、当該核内で当該名詞が係る述部について前記述部分類手段により分類された結果との組合せから、前記テキストを前記特定のカテゴリとそれ以外のカテゴリとに分類する手段とを含む、請求項5に記載のテキスト分類装置。 - 前記分類する手段は、少なくとも、前記核特定手段により特定された核について、当該核内の名詞について前記名詞分類手段が分類した結果と、当該核内で当該名詞が係る述部について前記述部分類手段が分類した結果との組合せを表す情報を素性として、与えられたテキストが前記特定のカテゴリに属するか否かを判定する、機械学習による判定手段を含む、請求項6に記載のテキスト分類装置。
- 前記素性はさらに、前記テキストの各々について求められた、前記核内の名詞を含む係り受け関係の部分木上のn-グラムを含み、
当該n-グラムのいずれかは、時間情報、地域情報、若しくは各テキストのモダリティを表す形態素、又はこれらの任意の組合せを含む、請求項7に記載のテキストマッチング装置。 - 第1及び第2のカテゴリのいずれかに分類されたテキストの集合において、前記第1のカテゴリのテキストに対して、前記第2のカテゴリのテキストを対応付けるテキストマッチング方法であって、
前記集合に含まれるテキストは、当該テキストを構成する1又は複数の形態素、当該1又は複数の形態素の係り受け情報、及び、前記テキストに含まれる名詞と、当該名詞が係る述部との組合せからなる文の核内の名詞の分類と述部の分類との組合せとを素性として用いた機械学習を用いたテキスト分類装置により前記第1及び第2のカテゴリに分類されており、
前記テキストマッチング方法は、
前記第1のカテゴリのテキストと、前記第2のテキストとを互いに区別して記憶装置に記憶するステップと、
前記記憶装置から、前記第1のカテゴリのテキストと前記第2のカテゴリのテキストとからなるテキストのペアを生成するテキストペア生成ステップと、
前記テキストペア生成ステップにおいて生成された前記ペア内のテキストが前記テキスト分類装置により分類されたときの前記素性を含むマッチング用の素性を、前記ペアから生成するマッチング用素性生成ステップと、
前記マッチング用素性生成ステップにおいて生成されたマッチング用素性を用いて、前記ペアを構成する2つのテキストが互いにマッチするか否かを判定するマッチングステップとを含み、
前記マッチングステップは、予めマッチング用の学習データを用い、前記マッチング素性により、テキストのペアがマッチするか否かを判定するように学習済の、機械学習モデル用いて前記ペアを構成する2つのテキストが互いにマッチするか否かを判定するステップを含む、テキストマッチング方法。 - テキストを、問題の報告又は解決に関連する特定のカテゴリに分類するためのテキスト分類方法であって、
前記テキストを形態素解析し、品詞情報が付された形態素列を出力する形態素解析ステップと、
前記形態素解析ステップにおいて出力する形態素列に対し、形態素間の係り受けを解析し、前記テキストの係り受け関係を表す係り受け情報を出力する係り受け解析ステップと、
前記形態素列と、前記テキストの係り受け関係とに基づいて、前記テキストに含まれる名詞と、当該名詞が係る述部との組合せからなる文の核を特定し、当該核内の名詞の分類と述部の分類との組合せを用いて、前記テキストを前記特定のカテゴリとそれ以外のカテゴリとに分類する分類ステップとを含む、テキスト分類方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/898,565 US10803103B2 (en) | 2013-06-19 | 2014-05-15 | Text matching device and method, and text classification device and method |
KR1020157035100A KR102188292B1 (ko) | 2013-06-19 | 2014-05-15 | 텍스트 매칭 장치와 방법 및 텍스트 분류 장치와 방법 |
EP14813194.9A EP3012746A4 (en) | 2013-06-19 | 2014-05-15 | Text matching device and method, and text classification device and method |
CN201480034989.6A CN105339936B (zh) | 2013-06-19 | 2014-05-15 | 文本匹配装置以及方法、和文本分类装置以及方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013128454A JP6206840B2 (ja) | 2013-06-19 | 2013-06-19 | テキストマッチング装置、テキスト分類装置及びそれらのためのコンピュータプログラム |
JP2013-128454 | 2013-06-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014203659A1 true WO2014203659A1 (ja) | 2014-12-24 |
Family
ID=52104401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/062912 WO2014203659A1 (ja) | 2013-06-19 | 2014-05-15 | テキストマッチング装置および方法、並びにテキスト分類装置および方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10803103B2 (ja) |
EP (1) | EP3012746A4 (ja) |
JP (1) | JP6206840B2 (ja) |
KR (1) | KR102188292B1 (ja) |
CN (1) | CN105339936B (ja) |
WO (1) | WO2014203659A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021047803A (ja) * | 2019-09-20 | 2021-03-25 | 博之 宮▲崎▼ | コメント共有方法、コメント共有システム及びコメント共有プログラム |
CN112818668A (zh) * | 2021-02-05 | 2021-05-18 | 上海市气象灾害防御技术中心(上海市防雷中心) | 气象灾情数据语义识别分析方法和系统 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5825676B2 (ja) * | 2012-02-23 | 2015-12-02 | 国立研究開発法人情報通信研究機構 | ノン・ファクトイド型質問応答システム及びコンピュータプログラム |
KR20180111979A (ko) | 2016-02-11 | 2018-10-11 | 이베이 인크. | 의미론적 카테고리 분류법 |
US10706044B2 (en) * | 2016-04-06 | 2020-07-07 | International Business Machines Corporation | Natural language processing based on textual polarity |
US10635727B2 (en) | 2016-08-16 | 2020-04-28 | Ebay Inc. | Semantic forward search indexing of publication corpus |
US10565242B2 (en) * | 2017-01-10 | 2020-02-18 | International Business Machines Corporation | Method of label transform for managing heterogeneous information |
US20200151591A1 (en) * | 2017-01-31 | 2020-05-14 | Mocsy Inc. | Information extraction from documents |
JP6805927B2 (ja) | 2017-03-28 | 2020-12-23 | 富士通株式会社 | インデックス生成プログラム、データ検索プログラム、インデックス生成装置、データ検索装置、インデックス生成方法、及びデータ検索方法 |
US20180285775A1 (en) * | 2017-04-03 | 2018-10-04 | Salesforce.Com, Inc. | Systems and methods for machine learning classifiers for support-based group |
JP6649318B2 (ja) * | 2017-05-30 | 2020-02-19 | 株式会社ソケッツ | 言語情報分析装置および方法 |
CN108305050B (zh) * | 2018-02-08 | 2023-04-07 | 贵州小爱机器人科技有限公司 | 报案信息及服务需求信息的提取方法、装置、设备及介质 |
CN108549723B (zh) * | 2018-04-28 | 2022-04-05 | 北京神州泰岳软件股份有限公司 | 一种文本概念分类方法、装置及服务器 |
CN108763402B (zh) * | 2018-05-22 | 2021-08-27 | 广西师范大学 | 基于依存关系、词性和语义词典的类中心向量文本分类法 |
CN108804591A (zh) * | 2018-05-28 | 2018-11-13 | 杭州依图医疗技术有限公司 | 一种病历文本的文本分类方法及装置 |
US11698921B2 (en) | 2018-09-17 | 2023-07-11 | Ebay Inc. | Search system for providing search results using query understanding and semantic binary signatures |
US11657102B2 (en) * | 2019-04-29 | 2023-05-23 | Ip.Com I, Llc | Automating solution prompts based upon semantic representation |
JP7390708B2 (ja) * | 2019-12-24 | 2023-12-04 | Jcc株式会社 | 情報掲示システムおよび情報掲示方法 |
US11694025B2 (en) * | 2020-05-04 | 2023-07-04 | Kyndryl Inc. | Cognitive issue description and multi-level category recommendation |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005071229A (ja) * | 2003-08-27 | 2005-03-17 | Fujitsu Ltd | 文章分類プログラム、文章分類方法および文章分類装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7606714B2 (en) * | 2003-02-11 | 2009-10-20 | Microsoft Corporation | Natural language classification within an automated response system |
KR100701044B1 (ko) * | 2004-07-20 | 2007-03-29 | 황상석 | 온라인망을 기반으로 하는 위급상황 처리 시스템 |
EP1669896A3 (en) * | 2004-12-03 | 2007-03-28 | Panscient Pty Ltd. | A machine learning system for extracting structured records from web pages and other text sources |
JP2008225582A (ja) * | 2007-03-08 | 2008-09-25 | Mazda Motor Corp | テキスト分類装置及びプログラム |
US20100063797A1 (en) | 2008-09-09 | 2010-03-11 | Microsoft Corporation | Discovering question and answer pairs |
EP2406738A4 (en) * | 2009-03-13 | 2012-08-15 | Invention Machine Corp | SYSTEM AND METHOD FOR RESPONSE TO QUESTIONS THAT INVOLVE THE APPOSITION OF SEMANTIC MARKS ON TEXT DOCUMENTS AND USER QUESTIONS |
KR101173561B1 (ko) | 2010-10-25 | 2012-08-13 | 한국전자통신연구원 | 질문 형태 및 도메인 인식 장치 및 그 방법 |
US8560567B2 (en) | 2011-06-28 | 2013-10-15 | Microsoft Corporation | Automatic question and answer detection |
US10372741B2 (en) * | 2012-03-02 | 2019-08-06 | Clarabridge, Inc. | Apparatus for automatic theme detection from unstructured data |
-
2013
- 2013-06-19 JP JP2013128454A patent/JP6206840B2/ja active Active
-
2014
- 2014-05-15 US US14/898,565 patent/US10803103B2/en active Active
- 2014-05-15 KR KR1020157035100A patent/KR102188292B1/ko active IP Right Grant
- 2014-05-15 CN CN201480034989.6A patent/CN105339936B/zh active Active
- 2014-05-15 WO PCT/JP2014/062912 patent/WO2014203659A1/ja active Application Filing
- 2014-05-15 EP EP14813194.9A patent/EP3012746A4/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005071229A (ja) * | 2003-08-27 | 2005-03-17 | Fujitsu Ltd | 文章分類プログラム、文章分類方法および文章分類装置 |
Non-Patent Citations (9)
Title |
---|
ANDREW B. GOLDBERG; NATHANAEL FILLMORE; DAVID ANDRZEJEWSKI ZHITING XU; BRYAN GIBSON; XIAOJIN ZHU: "Human Language Technologies: The 2009 Annual Conference of the North American", May 2009, ACM, article "All Your Wishes Come True: A Study of Wishes and How to Recognize Them", pages: 263 - 271 |
ARON CULOTTA: "Lightweight methods to estimate influenza rates and alcohol sales volume from twitter messages", LANGUAGE RESOURCES AND EVALUATION, 2012, pages 1 - 22 |
HIROSHI KANAYAMA; TETSUYA NASUKAWA: "Textual demand analysis: Detection of users' wants and needs from opinions.", PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, 2008, pages 409 - 416 |
JUN GOTO: "A Disaster Information Analysis System Based on Question Answering", JOURNAL OF NATURAL LANGUAGE PROCESSING, vol. 20, no. 3, 14 June 2013 (2013-06-14), pages 367 - 404, XP055299433 * |
RIKI HASHIMOTO: "Mohitotsu no Imiteki Kyokusei 'Kassei/Fukassei' to Chishiki Kakutoku eno Oyo", PROCEEDINGS OF THE 18TH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING TUTORIAL HONKAIGI, 31 March 2012 (2012-03-31), pages 93 - 96, XP008182327 * |
ROBERT MUNRO: "Proceedings of the Fifteenth Conference on Computational Natural Language Learning", 2011, ACM, article "Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol", pages: 68 - 77 |
SARAH VIEWEG; AMANDA L. HUGHES; KATE STARBIRD; LEYSIA PALEN: "Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '10", 2010, ACM, article "Microblogging during two natural hazards events: what twitter may contribute to situational awareness", pages: 1079 - 1088 |
See also references of EP3012746A4 |
YAYOI TANAKA: "Shuji Unit Bunseki kara Mita Q&A Site no Gengoteki Tokucho", PROCEEDINGS OF THE 17TH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING TUTORIAL HONKAIGI WORKSHOP, 7 March 2011 (2011-03-07), pages 248 - 251, XP008182329 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021047803A (ja) * | 2019-09-20 | 2021-03-25 | 博之 宮▲崎▼ | コメント共有方法、コメント共有システム及びコメント共有プログラム |
CN112818668A (zh) * | 2021-02-05 | 2021-05-18 | 上海市气象灾害防御技术中心(上海市防雷中心) | 气象灾情数据语义识别分析方法和系统 |
CN112818668B (zh) * | 2021-02-05 | 2024-03-29 | 上海市气象灾害防御技术中心(上海市防雷中心) | 气象灾情数据语义识别分析方法和系统 |
Also Published As
Publication number | Publication date |
---|---|
EP3012746A1 (en) | 2016-04-27 |
JP2015005027A (ja) | 2015-01-08 |
US20160140217A1 (en) | 2016-05-19 |
EP3012746A4 (en) | 2017-02-15 |
CN105339936A (zh) | 2016-02-17 |
KR102188292B1 (ko) | 2020-12-08 |
JP6206840B2 (ja) | 2017-10-04 |
US10803103B2 (en) | 2020-10-13 |
KR20160021110A (ko) | 2016-02-24 |
CN105339936B (zh) | 2019-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6206840B2 (ja) | テキストマッチング装置、テキスト分類装置及びそれらのためのコンピュータプログラム | |
Song et al. | Social big data analysis of information spread and perceived infection risk during the 2015 Middle East respiratory syndrome outbreak in South Korea | |
Wallgrün et al. | GeoCorpora: building a corpus to test and train microblog geoparsers | |
US8793254B2 (en) | Methods and apparatus for classifying content | |
Tovstiga et al. | COVID-19: a knowledge and learning perspective | |
Rosa et al. | Event detection system based on user behavior changes in online social networks: Case of the covid-19 pandemic | |
Vieira | Machine translation in the news: A framing analysis of the written press | |
Miyabe et al. | How do rumors spread during a crisis? Analysis of rumor expansion and disaffirmation on Twitter after 3.11 in Japan | |
Reuter et al. | Big data in a crisis? Creating social media datasets for crisis management research | |
Quiroz Flores et al. | Variation in the timing of Covid-19 communication across universities in the UK | |
Alzahrani et al. | [Retracted] Towards Understanding the Usability Attributes of AI‐Enabled eHealth Mobile Applications | |
Keselman et al. | Adapting semantic natural language processing technology to address information overload in influenza epidemic management | |
Liu et al. | Epic30m: An epidemics corpus of over 30 million relevant tweets | |
Paramita et al. | Do you see what I see? Images of the COVID-19 pandemic through the lens of Google | |
Chan et al. | An online risk index for the cross-sectional prediction of new HIV chlamydia, and gonorrhea diagnoses across US counties and across years | |
Pei et al. | A new method for early detection of mass concern about public health issues | |
Van Reisen et al. | Incomplete COVID-19 data: The curation of medical health data by the Virus Outbreak Data Network-Africa | |
Yoon et al. | DiTeX: Disease-related topic extraction system through internet-based sources | |
Gao et al. | Strategies and effectiveness of the Chinese government debunking COVID-19 rumors on Sina Weibo: evaluating from emotions | |
Şahin et al. | Emergency detection and evacuation planning using social media | |
Sandesh et al. | Detection of cyberbullying on twitter data using machine learning | |
Zowalla et al. | Readability and topics of the German Health Web: Exploratory study and text analysis | |
Zhou et al. | Overlooked voices under strict lockdown: mapping humanitarian needs in 2022 Shanghai COVID-19 outbreak | |
Kurohashi et al. | Information credibility analysis of web contents | |
Obateru et al. | Health Communication During a Pandemic: An Analysis of Framing of COVID-19 in Nigeria’s The Guardian and Daily Trust Newspapers, April–August 2020 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201480034989.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14813194 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20157035100 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14898565 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014813194 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |