WO2019194124A1 - Dispositif de génération de modèle sur lequel une étiquette est fixée, dispositif de fixation d'étiquette, et procédés et programme associés - Google Patents
Dispositif de génération de modèle sur lequel une étiquette est fixée, dispositif de fixation d'étiquette, et procédés et programme associés Download PDFInfo
- Publication number
- WO2019194124A1 WO2019194124A1 PCT/JP2019/014467 JP2019014467W WO2019194124A1 WO 2019194124 A1 WO2019194124 A1 WO 2019194124A1 JP 2019014467 W JP2019014467 W JP 2019014467W WO 2019194124 A1 WO2019194124 A1 WO 2019194124A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- tag
- related information
- text
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Definitions
- the present invention relates to a technique for generating a model for adding a tag to each word of text, or a technique for adding a tag to each word of text using the generated model.
- Non-Patent Document 1 A specific expression extraction technique described in Non-Patent Document 1 is known as a technique for attaching a tag such as a place to each word of a text.
- Non-Patent Document 1 The technology of Non-Patent Document 1 is to extract a specific expression using CRF (Conditional Random Field) (conditional random field).
- CRF Conditional Random Field
- proper nouns are assumed as the specific expressions to be extracted.
- Kanji / Kana / Katakana / English characters are used as feature quantities used for learning a model for tagging, and the change point of character type (from Kanji to Kana). Is a basic tag separator.
- Non-Patent Document 1 by using the technique of Non-Patent Document 1, it is possible to extract a specific expression of a place “Japan” and “Mt. Fuji” from the text “Mt. Fuji in Japan”. In other words, with the technique of Non-Patent Document 1, the tag “place name” can be assigned to “Japan” and “Mt. Fuji”.
- Non-Patent Document 1 for example, a specific expression “Tokyo Tower” can be extracted from the text “I climbed Tokyo Tower”. In other words, with the technique of Non-Patent Document 1, a tag “place name” can be assigned to “Tokyo Tower”.
- Non-Patent Document 1 for example, a specific expression “Tokyo” can be extracted from the text “I climbed a tower in Tokyo”. In other words, with the technique of Non-Patent Document 1, a tag “place name” can be assigned to “Tokyo”.
- Non-Patent Document 1 when the text “Climbing a tower in Tokyo” was entered, “Towers in Tokyo” could not be recognized as a whole place. In other words, the tag “Place” could not be assigned to “Tower in Tokyo”.
- Non-Patent Document 1 a tag could not be assigned in consideration of a phrase based on word dependency.
- the present invention relates to a tagging model generation device that generates a tagging model for tagging in consideration of a phrase based on a word dependency, and a phrase based on a word dependency using the generated tagging model. It is an object of the present invention to provide a tag assigning device for giving a tag in consideration, and a method and a program thereof.
- the text related information that is information related to the text includes word related information that is information related to each word including at least part of speech information included in the text, It is assumed that the phrase is related to the word-related information of each word and the phrase is based on the dependency of the word, and that the learning data is a plurality of text-related information respectively corresponding to a plurality of texts.
- connection probability is information related to the connection probability, which is the probability of each tag appearing considering the appearance frequency of a plurality of consecutive tags attached
- a learning unit which generates a tagging model including a multi-address
- a storage unit for storing the generated tagged model, the.
- the tagging model generated by the tagging model generation device and the word related information that is information related to each word included in the input text are used.
- a tag adding unit that assigns a plausible tag to each word of the input text, and a phrase composed of a plurality of continuous words to which a predetermined tag is added by the tag adding unit, or each tag adding unit
- an output unit that outputs a text with a tag attached to the word.
- a tagging model for tagging in consideration of phrases based on word dependency.
- a tag can be given in consideration of a phrase based on the dependency of a word using the generated tag assignment model.
- FIG. 1 is a diagram illustrating an example of a functional configuration of a tag addition model generation device.
- FIG. 2 is a diagram illustrating an example of a processing procedure of the tagging model generation method.
- FIG. 3 is a diagram illustrating an example of a functional configuration of the tag providing device.
- FIG. 4 is a diagram illustrating an example of a processing procedure of the tag assigning method.
- FIG. 5 is a diagram illustrating examples of word-related information and correct tags.
- FIG. 6 is a diagram illustrating examples of probability related information and connection probability related information.
- FIG. 7 is a diagram illustrating an example of a path representing the assignment of each tag to the word related information of each word.
- FIG. 8 is a diagram illustrating an example of classification of place ambiguity labels.
- tags can be assigned by the tag addition model generation equipment and method, the tag addition device and method, and in the following, a case where a “place” tag is assigned will be described as an example.
- the tag assignment model generation device includes, for example, a learning data generation unit 1, a learning unit 2, and a storage unit 3.
- the learning data generation unit 1 includes, for example, a separation unit 11, a word related information generation unit 12, and a correct tag assignment unit 13.
- the word related information generation unit 12 includes a morpheme analysis unit 121 and a part of speech adding unit 122.
- the tagging model generation method is realized by each unit of the tagging model generation apparatus performing the processing from step S1 to step S3 illustrated below and illustrated in FIG.
- the learning data generation unit 1 is input with a plurality of texts that indicate phrase locations based on word dependency.
- the text may be a text transcribed from an actual utterance that a person chats with, a text obtained by a speech recognition system, or any text data such as chat data, monologue, story, etc. It may be.
- the location of the phrase based on the dependency of the word represents a location to be tagged, and can be given in advance by hand, for example.
- a portion that can be read as far as it is related to the tag is selected as a phrase portion based on word dependency.
- a group of phrases as much as possible including the front (modifier) of each particle is selected as the phrase part based on the dependency of the word.
- the phrase is selected as a phrase based on the reception.
- phrase part based on the dependency of the word For example, for a text “Is there a sport you often do at your travel destination?”, It is assumed that “travel destination” is selected as a phrase location based on word dependency. Note that, as in this example, the phrase itself may be selected as the phrase location based on the dependency of the word, instead of a phrase composed of a plurality of words.
- expressions that limit the area for example, expressions such as “xx rahen” and “where to do xx” are selected as phrase locations based on word dependency. For example, for the text “Is there a place where I can play sports near my home?”, Instead of “Home”, “Near” or “Place”, the word “Near home” or “Place where I can play sports” Suppose that it is selected as a phrase location based on dependency.
- the text “I went to ⁇ Ramen shop in front of the station ⁇ today” is input to the learning data generation unit 1.
- ⁇ Ramen shop in front of the station ⁇ represents the location of the phrase based on the dependency of the word.
- the learning data generation unit 1 generates learning data that is a plurality of pieces of text related information respectively corresponding to a plurality of texts, using a plurality of texts that indicate phrase locations based on the dependency of the input words ( Step S1).
- the text related information is information related to the text, for example, word related information that is information related to each word including part-of-speech information included in the text, and word related information of each word, It is a tag that considers a phrase based on the dependency of a word.
- word-related information may not include part-of-speech information as long as it can identify a phrase.
- the generated learning data is output to the learning unit 2.
- the separation unit 11 of the learning data generation unit 1 receives a plurality of texts indicating the location of the phrase based on the dependency of the word input to the learning data generation unit 1, and inputs each of the plurality of input texts.
- a text sentence, which is information of the text body, and a phrase part are separated (step S11).
- the separated text sentence is output to the word related information generation unit 12.
- the separated phrase part is output to the correct tag assignment unit 13.
- the separation unit 11 changes the text “I went to a ramen shop in front of the station today” to the text “I went to the ramen shop in front of the station today” and “The ramen shop in front of the station”.
- the phrase part is separated.
- the word related information generation unit 12 uses the text sentence to generate word related information that is information related to each word including at least part of speech information included in the text (step S12).
- the generated word related information for each word is output to the correct tag assignment unit 13.
- the word-related information includes at least part-of-speech information that is information about the part-of-speech of the word, for example.
- the word related information may include the word itself.
- the word-related information is a phrase based on word dependency associated with the word-related information, which is information related to each word including at least part-of-speech information included in the text, and the word-related information of each word. It may include a word itself to which a tag that takes into account is attached.
- the word related information generation unit 12 performs morphological analysis using an existing morphological analysis engine such as MeCab, and generates word related information. Please refer to “http://taku910.github.io/mecab/” for MeCab.
- the morphological analysis unit 121 of the word related information generation unit 12 divides the text sentence into words by performing morphological analysis on the text sentence. Then, the part-of-speech giving unit 122 of the word-related information generating unit 12 gives the part-of-speech to each divided word. In this case, for example, the divided word and the information about the part of speech attached to the divided word are the word related information.
- the morphological analysis unit 121 of the word-related information generation unit 12 performs a morphological analysis on a text sentence “I went to a ramen shop in front of the station today”.
- “ ⁇ S>” is a symbol given so that it can be seen that it is the beginning of a sentence
- “ ⁇ / S>” is a symbol given so that it can be seen that it is the end of a sentence.
- the word string obtained by morphological analysis may include symbols other than words such as “ ⁇ S>” and “ ⁇ / S>”.
- the part-of-speech giving unit 122 of the word-related information generating unit 12 gives a part-of-speech (POS) to each word as shown in the middle column of the table of FIG.
- POS part-of-speech
- BOS is an acronym for “Beginning Of Sentence” and a label indicating the beginning of the sentence
- EOS is an acronym for “End Of Sentence” and a label indicating the end of the sentence.
- a label For example, “BOS” “EOS”) may be given.
- the correct tag assignment unit 13 assigns a correct tag to the word related information of each word using the phrase part and the word related information of each word (step S13), and the word related information of each word to which the correct tag is assigned.
- the correct tag assignment unit 13 adds a tag [START] to the word related information of “ ⁇ S>” and [B- LOC], [I-LOC] for word related information for "no”, [I-LOC] for word related information for "ramen”, and [I-LOC] for word related information for "ya” -LOC] tag, [END] tag for word related information of " ⁇ / S>", and tag [NL] for word related information of other words.
- a tag indicating that a word or symbol other than the phrase portion based on the word dependency is a word or symbol other than the phrase portion based on the word dependency may be assigned.
- [START] is a tag that indicates the beginning of the sentence
- [END] is a tag that indicates the end of the sentence
- [B-LOC] is a tag that indicates the first word of the phrase indicating “location”
- [I-LOC] is a tag that represents the word that follows the word corresponding to [B-LOC] in the phrase “location”
- [NL] represents that it is not a word of the phrase “location” It is a tag.
- the learning unit 2 uses the learning data, the probability related information that is information related to the probability that each tag is associated with each word related information, and the word related information of a plurality of consecutive words in each text
- a tagging model is generated that includes connection probability related information that is information related to a connection probability, which is a probability of appearance of each tag in consideration of the appearance frequency of a plurality of consecutive tags respectively associated with (Step S2). ).
- the learning unit 2 generates a tagging model by a sequence labeling technique such as CRF.
- the learning unit 2 may generate the tagging model by other methods such as a sequence labeling method using deep learning.
- the generated tagging model is output to the storage unit 3.
- the probability related information may be a value that takes a larger value as the probability that each tag is associated with each word related information is larger, or the probability that each tag is associated with each word related information is smaller. It may be a value that takes a larger value.
- An example of a value that takes a larger value as the probability that each tag is associated with each word related information is a cost when each tag is associated with each word related information. The larger the cost, the lower the probability.
- p (y, x) is the appearance frequency of the word related information x to which the tag y in the learning data is given
- p (x) is the appearance frequency of the word related information x in the learning data.
- the cost can be obtained as the reciprocal of p (y
- the probability related information may be the probability that each tag is associated with each word related information.
- the probability related information is calculated based on the appearance frequency of each tag associated with each word related information in the learning data.
- connection probability related information may be a value that takes a larger value as the connection probability is higher, or may be a value that takes a larger value as the connection probability is lower.
- An example of a value that takes a larger value as the connection probability is smaller is that each tag appears in consideration of the appearance frequency of a plurality of consecutive tags associated with word related information of a plurality of consecutive words in each text. This is the connection cost.
- tags are BOS (first sentence), B-LOC (first word of location tag), I-LOC (following location tag), NL (not location tag), EOS (sentence tag) Assuming that there is a last), an example of calculating the consolidated cost is described.
- the probability is the probability that BOS, B-LOC, I-LOC is given to the column of x_ ⁇ t-2 ⁇ , x_ ⁇ t-1 ⁇ , x_ ⁇ t ⁇ in the whole, Calculated from the appearance frequency in the learning data.
- the connection cost can be obtained as the reciprocal of the connection probability.
- connection probability related information may be the connection probability itself.
- the connection probability related information is calculated based on the appearance frequency of each tag associated with each word related information in the learning data.
- connection probability related information includes a plurality of consecutive word-related information associated with a plurality of consecutive words in each text and a plurality of consecutive word-related information associated with a plurality of consecutive words in each text.
- Information related to the connection probability which is the probability that each tag appears, considering the appearance frequency of the tag may be used.
- the word related information of a plurality of continuous words is word related information of a plurality of continuous words including a word corresponding to the tag to be processed.
- the word related information of a plurality of consecutive words is n + from the word n words before the word corresponding to the tag to be processed to the word corresponding to the tag to be processed. It is the word related information of the words that make up one word string.
- the word-related information of a plurality of consecutive words is the n-th word corresponding to the tag to be processed from the n-th word before the word corresponding to the tag to be processed. It may be the word related information of the words constituting the sequence of 2n + 1 words up to.
- FIG. 6 shows the learning unit 2 using the learning data of the text “I went to the station in Kyoto”, “I went to the station in Shinagawa”, “I went to the station”, “I went to the ramen shop in front of the station”.
- the cost when each tag is associated with each word related information is used as the probability related information
- the word related information of a plurality of consecutive words in each text is used as the connection probability related information.
- the concatenation cost at which each tag appears in consideration of the appearance frequency of a plurality of consecutive tags associated with each other is used.
- the underlined numbers in FIG. 6 are costs that are probability-related information
- the numbers that are not underlined in FIG. 6 are connection costs that are connection probability-related information.
- the storage unit 3 stores the input tagging model (step S3).
- the above-described tag addition model generation apparatus and method can generate a tag addition model for adding tags in consideration of phrases based on word dependency.
- the tag assigning device includes, for example, a storage unit 3, a word related information generation unit 12, a tag assigning unit 4, and an output unit 5.
- the tag assigning method is realized by each part of the tag assigning apparatus performing the processing from step S4 to step S5 illustrated below and illustrated in FIG.
- the storage unit 3 is the same as the storage unit 3 of the tagging model generation device.
- the storage unit 3 stores a tagging model generated by the tagging model generation device.
- the word-related information generation unit 12 is the same as the word-related information generation unit 12 of the tagging model generation device, except that processing is performed on text instead of text. Hereinafter, redundant description of similar parts is omitted.
- the text is input to the word related information generation unit 12.
- the word related information generation unit 12 generates word related information, which is information related to each word including at least part-of-speech information included in the text, using the text (step S12).
- the generated word related information of each word is output to the tag assigning unit 4.
- the tag assigning unit 4 receives the tag assigning model read from the storage unit 3 and the word related information of each word.
- the tag assignment unit 4 assigns a plausible tag to each word of the input text using the tag addition model and the word related information of each word (step S4).
- the text with a tag attached to each word is output to the output unit 5.
- the tagging unit 4 is likely to each word of the text input as follows, for example. Give a tag.
- the tag assigning unit 4 uses the tag assignment model and the word related information of each word, so that the word related information of each word is minimized so that the score when each tag is assigned to the word related information of each word is minimized.
- a plausible tag is assigned to each word of the input text.
- the score indicates that the larger the value, the less likely it is.
- An example of the score is the sum of the cost when each tag is assigned to the word related information of each word and the connection cost when each tag is assigned to a plurality of consecutive words including each word. The cost and the connection cost are obtained by inputting the word related information of each word and each tag to be added to the tag addition model.
- each tag is assigned to the word related information of each word.
- a certain route is associated with a set of tags corresponding to the certain route, and a different route is associated with a different set of tags.
- a tag set corresponding to the route is given to the word related information of the word on the route.
- the underlined number in FIG. 7 is the cost when a tag corresponding to the number is assigned to the word related information of the word corresponding to the number, and the number not underlined in FIG. This is a connection cost when a tag is assigned to a plurality of consecutive words including a word corresponding to a number based on a set of tags corresponding to the route. These costs and concatenation costs are obtained by referring to the tagging model.
- the tag assigning unit 4 may assign a plausible tag to each word of the input text using such a route.
- the tag granting unit 4 uses the tag granting model and the word related information of each word to calculate the score when each route is selected as the cost and the connection cost determined by the tag set corresponding to each route. Calculate based on.
- the score indicates that the larger the value, the less likely it is.
- An example of the score is the sum of the cost and the connection cost in the selected route.
- the tag assigning unit 4 finally selects a route having the smallest score, and assigns a tag set corresponding to the finally selected route, thereby adding a plausible tag to each word of the input text. Give.
- the route represented by the bold line is finally selected as the route with the lowest score among the plurality of routes starting with “BOS” and ending with “EOS”.
- the tag “B-LOC” is displayed for “Ekimae”
- the tag “I-LOC” for “Ramen shop” Is tagged with [NL]
- “I went” is tagged with [NL].
- the output unit 5 is inputted with a text with a tag attached to each word.
- the output unit 5 outputs a phrase composed of a plurality of continuous words to which a predetermined tag is added by the tag adding unit 4.
- the output unit 5 may output the text with the tag added to each word by the tag adding unit 4 as it is.
- the output unit 5 outputs a phrase composed of a plurality of consecutive words to which a predetermined tag is assigned by the tag granting unit 4 or a text in which a tag is given to each word by the tag granting unit 4 as it is. (Step S5).
- the output unit 5 when it is desired to detect a phrase representing a place, the output unit 5 combines a sequence of words from a word to which [B-LOC] is assigned to a last word to which [I-LOC] is assigned, Output as a phrase representing the place.
- the output unit 5 outputs “a ramen shop in front of the station” as a phrase representing a place.
- a dictionary such as “Tower in Tokyo” can be provided by giving a tag in consideration of a phrase based on a word dependency by the tag addition model generation device and method and the tag addition device and method described above. It is also possible to extract a thing that is not registered as a “location” as a “location”.
- the tag adding apparatus and method described above are used in, for example, an automatic response system and a dialogue system, a more appropriate utterance can be generated. Further, it is possible to increase the feeling that the automatic response system and the dialogue system understand the user's utterance.
- the above-described tagging model generation device and method and tagging device and method described above have resulted in an experimental result that the accuracy of detecting a place phrase during a chat has been improved from 30% to 75%.
- the detection accuracy here is correct detection of the word and the phrase as a whole for the data tagged with “place phrase including modifier” and “place on other modifier” as the correct answer of the location phrase. Represents the correct answer rate. Note that the data used for the test is not included in the data to be learned.
- Tag assignment model generation apparatus and method tag assignment apparatus and method are, for example, tags of “time”, “subject”, “operation”, “means”, “reason” that are so-called 5W1H other than “location”, It can be used to attach a tag to be processed.
- the grounding means that a selected portion (word or phrase) in a sentence is associated with a semantic label indicating what the portion means.
- Tags may be defined hierarchically.
- the large category tag “animal” has two small category tags, “things that can be pets” and “things that cannot be pets”.
- the large category tag “animal” has two small category tags “feline” and “canine”.
- the tag of the large category “impression” has two small category tags of “happy feeling” and “sad feeling”.
- tags are hierarchically defined, a large category tag may be used as a tag, or a small category tag may be used as a tag.
- the tag addition model generation device and method the tag addition device and method described above,
- the tag “Food & Drink: Beverages” can be added to the text “My Favorite Tea” in the text “I'm full but I loved my favorite tea.”
- category hierarchy is not limited to two, but may be three or more.
- tags may be used in combination.
- the tagging model generation device and method and the tagging device and method are processed for each of the plurality of tags.
- the type of word related information for each word when the amount of learning data is less than or equal to a predetermined reference value may be less than the type of word related information for each word otherwise. That is, the type of word related information may be adjusted according to the amount of learning data. This is because if the number of types of learning data is small, the data becomes sparse and there is a concern that there are many data that are not applicable, or there is a concern that an overlearning state may occur.
- the determination of whether or not the amount of learning data is equal to or less than a predetermined reference value is made by determining whether the learning data satisfies a predetermined reference, for example. For example, only one piece of learning data is included in a plurality of patterns of “sequence of word related information of word strings viewed up to n (n + 1 connected) when obtaining connection probabilities” required as learning data. If there is even one type of pattern, it is determined that “the amount of learning data is equal to or less than a predetermined reference value”.
- the amount of learning data is less than or equal to a predetermined reference value
- only representative part-of-speech such as “noun” and “case particle” are used as word related information
- the amount of learning data is larger than the predetermined reference value.
- a finer part of speech such as “noun: continuous use” may be used as the word related information.
- “:” represents concatenation of parts of speech
- “noun: continuous use” is a combination of a part of speech “noun” and a part of speech “continuous use” as one part of speech.
- the types of parts of speech can be increased.
- the types of word-related information may be increased by using, for example, parts of speech whose types are increased in this way.
- the number of word-related information of each word when the amount of learning data is larger than a predetermined reference value is the part-of-speech information of a small category of the part-of-speech information included in the word-related information of each word otherwise. By including it, you may make it increase more than the number of the word relevant information of each word in the case where it is not the above-mentioned.
- the adjustment of the type of word related information according to the amount of learning data may be performed by the word related information generation unit 12 or may be performed by the learning unit 2.
- the word related information generation unit 12 determines that the amount of learning data is equal to or less than a predetermined reference value. Generates word related information for each word using fewer types of word related information than otherwise.
- the learning unit 2 determines that the amount of learning data is less than or equal to a predetermined reference value than the case where it is not. Learning using a few types of word-related information.
- the program describing the processing contents can be recorded on a computer-readable recording medium.
- a computer-readable recording medium for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.
- this program is distributed, for example, by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
- a computer that executes such a program first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, this computer reads the program stored in its own storage device and executes the process according to the read program.
- the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially.
- the program is not transferred from the server computer to the computer, and the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good.
- ASP Application Service Provider
- the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
- the present apparatus is configured by executing a predetermined program on a computer.
- a predetermined program on a computer.
- at least a part of these processing contents may be realized by hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
La présente invention concerne une technologie pour générer un modèle sur lequel une étiquette est fixée pour fixer une étiquette en prenant en considération une locution sur la base de la dépendance de mots. Le dispositif de génération de modèle sur lequel une étiquette est fixée de la présente invention comporte : une unité d'apprentissage 2 qui génère, par utilisation de données d'apprentissage d'entrée, un modèle sur lequel une étiquette est fixée comprenant des informations relatives à une probabilité qui sont associées à une probabilité selon laquelle chaque étiquette est associée à chaque élément d'informations relatives à un mot et des informations relatives à une probabilité de liaison qui sont associées à une probabilité de liaison selon laquelle chaque étiquette apparaît en prenant en considération la fréquence à laquelle apparaissent une pluralité d'étiquettes consécutives, chaque étiquette étant associée à des informations relatives à un mot sur chacun d'une pluralité de mots consécutifs dans chaque texte ; et une unité de stockage 3 qui stocke le modèle sur lequel une étiquette est fixée généré.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/044,334 US11531806B2 (en) | 2018-04-03 | 2019-04-01 | Tag assignment model generation apparatus, tag assignment apparatus, methods and programs therefor using probability of a plurality of consecutive tags in predetermined order |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018071308A JP7172101B2 (ja) | 2018-04-03 | 2018-04-03 | タグ付与モデル生成装置、タグ付与装置、これらの方法及びプログラム |
JP2018-071308 | 2018-04-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019194124A1 true WO2019194124A1 (fr) | 2019-10-10 |
Family
ID=68100545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/014467 WO2019194124A1 (fr) | 2018-04-03 | 2019-04-01 | Dispositif de génération de modèle sur lequel une étiquette est fixée, dispositif de fixation d'étiquette, et procédés et programme associés |
Country Status (3)
Country | Link |
---|---|
US (1) | US11531806B2 (fr) |
JP (1) | JP7172101B2 (fr) |
WO (1) | WO2019194124A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008146583A1 (fr) * | 2007-05-23 | 2008-12-04 | Nec Corporation | Système d'enregistrement dans un dictionnaire, procédé d'enregistrement dans un dictionnaire et programme d'enregistrement dans un dictionnaire |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8060357B2 (en) * | 2006-01-27 | 2011-11-15 | Xerox Corporation | Linguistic user interface |
US8001122B2 (en) * | 2007-12-12 | 2011-08-16 | Sun Microsystems, Inc. | Relating similar terms for information retrieval |
US9311299B1 (en) * | 2013-07-31 | 2016-04-12 | Google Inc. | Weakly supervised part-of-speech tagging with coupled token and type constraints |
US10963497B1 (en) * | 2016-03-29 | 2021-03-30 | Amazon Technologies, Inc. | Multi-stage query processing |
US10664540B2 (en) * | 2017-12-15 | 2020-05-26 | Intuit Inc. | Domain specific natural language understanding of customer intent in self-help |
-
2018
- 2018-04-03 JP JP2018071308A patent/JP7172101B2/ja active Active
-
2019
- 2019-04-01 WO PCT/JP2019/014467 patent/WO2019194124A1/fr active Application Filing
- 2019-04-01 US US17/044,334 patent/US11531806B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008146583A1 (fr) * | 2007-05-23 | 2008-12-04 | Nec Corporation | Système d'enregistrement dans un dictionnaire, procédé d'enregistrement dans un dictionnaire et programme d'enregistrement dans un dictionnaire |
Non-Patent Citations (3)
Title |
---|
FUJIWARA, ISAMU ET AL.: "Statistical hierarchical phrase machine translation using syntactic tags", PROCEEDINGS OF THE EIGHTEENTH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING - TUTORIAL PLENARY SESSION, 31 March 2012 (2012-03-31), pages 255 - 258 * |
KATO, AKIHIKO ET AL.: "Establishment and analysis of MWE-based dependency structure corpus in consideration of the intrinsic expressions and compound function words", PROCEEDINGS OF THE TWENTY- THIRD ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING, 6 March 2017 (2017-03-06), pages 42 - 45 * |
NAKANO, KEIGO ET AL.: "Japanese Named Entity Extraction with Bunsetsu Features", IPSJ JOURNAL, vol. 45, no. 3, 15 March 2004 (2004-03-15), pages 934 - 941 * |
Also Published As
Publication number | Publication date |
---|---|
JP2019185153A (ja) | 2019-10-24 |
US11531806B2 (en) | 2022-12-20 |
US20210081597A1 (en) | 2021-03-18 |
JP7172101B2 (ja) | 2022-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110162749B (zh) | 信息提取方法、装置、计算机设备及计算机可读存储介质 | |
US9535896B2 (en) | Systems and methods for language detection | |
US10699073B2 (en) | Systems and methods for language detection | |
CN108701125A (zh) | 用于建议表情符号的系统和方法 | |
JP6462970B1 (ja) | 分類装置、分類方法、生成方法、分類プログラム及び生成プログラム | |
KR102695381B1 (ko) | 엔티티-속성 관계 식별 | |
JP6370962B1 (ja) | 生成装置、生成方法および生成プログラム | |
JP6553180B2 (ja) | 言語検出を行うためのシステムおよび方法 | |
JP5766152B2 (ja) | 言語モデル生成装置、その方法及びプログラム | |
JP6144458B2 (ja) | 手話翻訳装置及び手話翻訳プログラム | |
WO2019194124A1 (fr) | Dispositif de génération de modèle sur lequel une étiquette est fixée, dispositif de fixation d'étiquette, et procédés et programme associés | |
Rajan et al. | A survey of Konkani NLP resources | |
JP2011129006A (ja) | 意味分類付与装置、意味分類付与方法、意味分類付与プログラム | |
Saha et al. | A transformer based multi-task model for domain classification, intent detection and slot-filling | |
WO2018067440A1 (fr) | Systèmes et procédés de détection de langue | |
Chaonithi et al. | A hybrid approach for Thai word segmentation with crowdsourcing feedback system | |
Jucksriporn et al. | A minimum cluster-based trigram statistical model for Thai syllabification | |
JP2010191851A (ja) | 記事特徴語抽出装置、記事特徴語抽出方法及びプログラム | |
Scherbakov et al. | VectorWeavers at SemEval-2016 Task 10: From incremental meaning to semantic unit (phrase by phrase) | |
JP2019215876A (ja) | 言語検出を行うためのシステムおよび方法 | |
Çataltaş et al. | Comparison of Textual Data Augmentation Methods on SST-2 Dataset | |
JP2020052819A (ja) | 情報処理装置、情報処理方法及びプログラム | |
Sivaganeshan et al. | Fine Tuning Named Entity Extraction Models for the Fantasy Domain | |
Balan | Introduction to Natural Language Processing | |
Scherbakov et al. | From Incremental Meaning to Semantic Unit (phrase by phrase) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19782450 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19782450 Country of ref document: EP Kind code of ref document: A1 |