JP6605105B1

JP6605105B1 - Sentence symbol insertion apparatus and method

Info

Publication number: JP6605105B1
Application number: JP2018194615A
Authority: JP
Inventors: 績央渡邊; 上林航
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2019-11-13
Anticipated expiration: 2038-10-15
Also published as: JP2020064370A

Abstract

【課題】文字列テキストデータには句点、読点等の文章記号が適切に挿入されていない場合がある。【解決手段】句点挿入装置３００は、適切に文章記号が挿入されている学習対象の文字列テキストデータ中、着目形態素組の形態素及び文法情報を入力データと、着目形態素組の形態素間に文章記号が挿入されているどうかを示す出力データとを教師データとして学習して学習モデルを形成する学習部３４０と、推定対象の文字列テキストデータ中、着目形態素組の形態素及び部報情報を入力データとして学習モデルに入力して得られた出力データに基づき着目形態素組の形態素間に文章記号を挿入すべきか否かを判断する推定部３５０と、推定部３５０の推定結果に基づき推定対象の文字列テキストデータに文章記号を挿入する句点挿入部３６０で構成される。【選択図】図１There are cases where sentence symbols such as punctuation marks and punctuation marks are not properly inserted in character string text data. A punctuation insertion apparatus 300 includes morpheme and grammatical information of a target morpheme set in input text and morpheme of a target morpheme set in character string text data to be learned in which sentence symbols are appropriately inserted. Learning unit 340 that learns output data indicating whether or not is inserted as teacher data to form a learning model, and the morpheme and part information of the morpheme set of interest in the text string data to be estimated are input data An estimation unit 350 that determines whether or not a sentence symbol should be inserted between the morphemes of the target morpheme set based on the output data obtained by inputting to the learning model, and the character string text to be estimated based on the estimation result of the estimation unit 350 It comprises a punctuation point insertion unit 360 for inserting sentence symbols into data. [Selection] Figure 1

Description

本発明は、句点、読点等といった文章記号を挿入する文章記号挿入装置に関する。 The present invention relates to a text symbol insertion device that inserts text symbols such as punctuation marks and punctuation marks.

現在、一のユーザの音声データ、又は、複数のユーザの音声データを入力することで音声認識テキストデータを出力する多種多様の音声認識エンジンが提供されている。ただし、音声認識エンジンによっては、句点を挿入する機能がないものもあり、また、句点を挿入する機能があっても、ユーザが発話しない時間が所定時間以上であれば句点を挿入するなど簡易的な機能が多い。 Currently, various voice recognition engines that output voice recognition text data by inputting voice data of one user or voice data of a plurality of users are provided. However, some voice recognition engines do not have a function to insert a punctuation mark, and even if there is a function to insert a punctuation mark, if the user does not speak for a predetermined time or longer, a simple punctuation is inserted. There are many functions.

これに対し、特許文献１では、自由な入力単位を文の言語処理単位に適切に変換するために、形態素解析されたテキストデータに基づき複数の単語及び品詞並びに対して句点に対応する節境界らしさを表わすスコアを予め計算して統計モデルとして記憶し、形態素解析されたテキストデータに基づいて句点挿入箇所の韻律情報の経験的知識を予め抽出して経験的規則として記憶し、これらの統計モデル及び経験的規則を参照して、加えて、無音区間を検出して音声認識した単語列について、１つの入力単位の音声認識結果を複数の言語処理単位に分割し、複数の入力単位の音声認識結果を一つの言語処理単位に接合する接合処理を実行する音声言語処理単位変換装置を開示している。 On the other hand, in Patent Document 1, in order to appropriately convert a free input unit into a sentence language processing unit, the likelihood of a node boundary corresponding to a phrase for a plurality of words and part-of-speech sequences based on text data subjected to morphological analysis. Is calculated in advance and stored as a statistical model, empirical knowledge of the prosodic information of the insertion point of punctuation is extracted in advance based on text data subjected to morphological analysis, and stored as an empirical rule. In addition to referring to empirical rules, in addition to speech recognition results by detecting silent sections, the speech recognition result of one input unit is divided into multiple language processing units, and the speech recognition result of multiple input units Discloses a speech language processing unit conversion device that performs a joining process for joining a single language processing unit.

特許第３００９６４２号Japanese Patent No. 3009642

前記の音声言語処理単位変換装置では、統計モデルの計算式を予め作成しておく必要があり、特許文献１にも複数の計算式が提示されており、複数の計算式をテストデータで試して最もパフォーマンスの良い計算式を採用するか、その計算式をトライアンドエラーにて改良する必要があった。 In the above spoken language processing unit conversion apparatus, it is necessary to create a calculation formula for a statistical model in advance, and a plurality of calculation formulas are also presented in Patent Document 1, and a plurality of calculation formulas are tested with test data. It was necessary to adopt the formula with the best performance or to improve it by trial and error.

本発明はこうした課題に鑑みてなされたものであり、その目的は、より適切にテキストに対して句点等の文章記号の挿入を行う機能を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a function for more appropriately inserting sentence symbols such as punctuation marks into text.

本発明に係る文章記号挿入装置は、学習対象の文字列テキストデータ中、処理対象の形態素と当該処理対象形態素の次の形態素からなる着目形態素組の形態素及び文法情報を入力データとし、当該入力データと、着目形態素組の形態素間に文章記号が挿入されているかどうかを示す出力データとを教師データとして学習して学習モデルを形成する学習部と、文字列テキストデータを形態素解析し、分割された形態素と当該形態素の文法情報とを出力する形態素処理部と、推定対象の文字列テキストデータが前記形態素処理部で形態素解析されて得られた形態素と当該形態素の文法情報のうち、処理対象の形態素と当該処理対象形態素の次の形態素からなる着目形態素組の形態素及び文法情報を入力データとして前記学習モデルに入力し、出力された出力データに基づき着目形態素組の形態素間に文章記号を挿入すべきか否かを判断する推定部と、前記推定部の推定結果に基づき推定対象の文字列テキストデータに文章記号を挿入する文章記号挿入部とからなる。 The sentence symbol insertion device according to the present invention uses, as input data, morpheme and grammatical information of a morpheme of interest consisting of a morpheme to be processed and a morpheme next to the process morpheme in character string text data to be learned. And a learning unit that learns, as teacher data, output data indicating whether or not a sentence symbol is inserted between morphemes of the morpheme set of interest, and forms a learning model, and character string text data is subjected to morphological analysis and divided A morpheme processing unit that outputs a morpheme and grammatical information of the morpheme, and a morpheme to be processed among the morpheme obtained by performing morpheme analysis on the character string text data to be estimated by the morpheme processing unit and the grammatical information of the morpheme And the morpheme and grammatical information of the target morpheme set consisting of the next morpheme of the processing target morpheme as input data to the learning model and output An estimator that determines whether or not a sentence symbol should be inserted between morphemes of the morpheme set of interest based on the output data, and a sentence that inserts a sentence symbol into character string text data to be estimated based on the estimation result of the estimator It consists of a symbol insertion part.

本発明によれば、文章記号を挿入するための計算式を利用者が作成することなく、教師データで挿入された文章記号を踏まえ、推定対象の文字列テキストデータに適切に文章記号を挿入することができる。 According to the present invention, without the user creating a calculation formula for inserting a sentence symbol, the sentence symbol is appropriately inserted into the character string text data to be estimated based on the sentence symbol inserted in the teacher data. be able to.

本発明に係る第１の実施形態に係る対話管理システムの構成図である。It is a lineblock diagram of a dialog management system concerning a 1st embodiment concerning the present invention. 本発明に係る第１の実施形態に係る対話型音声認識データ例である。It is an example of interactive speech recognition data according to the first embodiment of the present invention. 本発明に係る第１の実施形態に係る形態素解析結果例である。It is an example of a morpheme analysis result concerning a 1st embodiment concerning the present invention. 本発明に係る第１の実施形態に係る教師データ例である。It is an example of teacher data concerning a 1st embodiment concerning the present invention. 本発明に係る第１の実施形態に係る推定処理対象の音声認識データ例である。It is an example of the speech recognition data of the estimation process object which concerns on 1st Embodiment which concerns on this invention. 本発明に係る第１の実施形態に係る推定処理対象の音声認識データ例の形態素解析結果例である。It is an example of the morphological analysis result of the speech recognition data example of the estimation process target according to the first embodiment of the present invention. 本発明に係る第１の実施形態に係る推定処理説明図である。It is estimation processing explanatory drawing which concerns on 1st Embodiment which concerns on this invention. 本発明に係る第１の実施形態に係る推定処理対象の音声認識データ例に句点を挿入した例である。It is the example which inserted the punctuation into the example of the speech recognition data of the estimation process object which concerns on 1st Embodiment which concerns on this invention. 本発明に係る第１の実施形態に係る学習処理シーケンス図である。It is a learning process sequence diagram concerning a 1st embodiment concerning the present invention. 本発明に係る第１の実施形態に係る推定処理シーケンス図である。It is an estimation process sequence diagram concerning a 1st embodiment concerning the present invention. 本発明に係る第２の実施形態に係るワイルドカード設定説明図である。It is a wild card setting explanatory view concerning a 2nd embodiment concerning the present invention. 本発明に係る第３の実施形態に係る学習処理シーケンス図である。It is a learning process sequence diagram concerning a 3rd embodiment concerning the present invention. 本発明に係るその他の実施形態に係る発話主体種別別の音声認識データ例である。It is an example of the speech recognition data classified by utterance subject type concerning other embodiments concerning the present invention. 本発明に係るその他の実施形態に係る句点挿入済みの音声認識データ例の対比図である。It is a contrast diagram of the example of the speech recognition data in which the punctuation has been inserted according to another embodiment of the present invention.

（第１の実施形態） (First embodiment)

以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 Hereinafter, the same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated description is appropriately omitted. In addition, in the drawings, some of the members that are not important for explanation are omitted.

図１は本実施形態に係る対話管理システムの構成図である。対話管理システム１はコールセンター等に構築され、クライアントとオペレータの対話を管理しており、対話を録音し、録音データをテキスト化し、テキストデータに句点を挿入して記録する機能を提供している。対話管理システム１は、対話録音装置１００、音声認識装置２００及び句点挿入装置３００からなり、それぞれ有線又は無線にてネットワークに接続しており、通信可能となっている。なお、本実施形態では、対話録音装置１００、音声認識装置２００及び句点挿入装置３００をそれぞれ別のハードウェアから構成したものの、これらの装置を一のコンピュータ上で構成することもできるし、各装置を機能単位で更に複数のコンピュータで構成することもできる。 FIG. 1 is a configuration diagram of a dialogue management system according to the present embodiment. The dialogue management system 1 is constructed in a call center or the like, manages the dialogue between the client and the operator, and provides a function of recording the dialogue, converting the recorded data into text, and inserting and recording a phrase in the text data. The dialogue management system 1 includes a dialogue recording device 100, a voice recognition device 200, and a phrase insertion device 300, which are connected to a network by wire or wirelessly and can communicate with each other. In the present embodiment, the dialog recording device 100, the speech recognition device 200, and the phrase insertion device 300 are configured from different hardware, but these devices can be configured on a single computer. Can be configured by a plurality of computers in functional units.

対話録音装置１００は、一の発話者と他の発話者の対話を音声データである対話録音データとして記録し、音声認識装置２００に送信する。典型的には、クライアントは携帯電話や固定電話を用いて電話公衆網を介してコールセンターのオペレータと通話を行い、この電話網を介した通話を対象として記録する。ここで、対話録音データには対話の連続した音声のデータだけでなく、発話者別の音声データであってもよい。電話公衆網を介してコールセンター側に送信されるクライアントの音声データと、典型的にはコールセンターの社内ネットワークに接続している電話機に発話されるオペレータの音声データをそれぞれ別のデータとして記録することができる。また、一の音声データであっても、話者別に音声を分離する公知慣用の音声分離技術もあり、これを用いることで話者別の音声データを取得することができる。 The dialogue recording apparatus 100 records the dialogue between one speaker and another speaker as dialogue recording data that is voice data, and transmits the dialogue recording data to the voice recognition device 200. Typically, a client uses a mobile phone or a fixed phone to make a call with a call center operator via a telephone public network, and records the call via the telephone network as a target. Here, the dialogue recording data may be not only continuous voice data but also voice data for each speaker. The client's voice data sent to the call center via the telephone public network and the operator's voice data typically spoken to a telephone connected to the call center's internal network can be recorded as separate data. it can. There is also a well-known and commonly used speech separation technique for separating speech for each speaker even if the speech data is one. By using this, speech data for each speaker can be acquired.

音声認識装置２００は、対話録音装置１００から記録している対話録音データから受信し、その対話録音データを公知慣用の音声認識技術を用いて文字列であるテキストデータに変換して音声認識データとして記録し、句点挿入装置３００に送信する。音声認識技術としては、体系的には、統計的手法、隠れマルコフモデル等を用いるものがある。音声認識装置は、対話の音声を文字列に変換するのみで句点や読点を挿入する機能を持たない装置もあるが、話者の切り替わりや発話の間を検出して句点や読点を挿入する機能を有するものもある。 The voice recognition device 200 receives the dialogue recording data recorded from the dialogue recording device 100, converts the dialogue recording data into text data that is a character string using a known and common voice recognition technology, and serves as voice recognition data. It is recorded and transmitted to the phrase insertion device 300. As a speech recognition technique, there are systematically using a statistical method, a hidden Markov model, or the like. Some voice recognition devices do not have a function to insert a punctuation mark or a punctuation mark by simply converting dialogue speech into a character string, but a function to detect a speaker change or between utterances and insert a punctuation mark or a punctuation mark. Some have

句点挿入装置３００は、文字列テキストデータである音声認識データに対して適切に句点を挿入し、句点挿入済音声認識データを記録し、システム利用者が利用する装置からの要求を受けて記録した句点挿入済音声認識データを出力する装置であり、句点という文章記号を挿入する文章記号挿入装置の一種である。句点挿入装置３００は、前処理部３１０、形態素処理部３２０、教師データ生成部３３０、学習部３４０、推定部３５０及び句点挿入部３６０からなる。なお、図１のワイルドカード処理部３７０は第１の実施形態では用いず、第２の実施形態で用いるため、点線にて作画している。 The punctuation insertion device 300 appropriately inserts punctuation into the speech recognition data that is character string text data, records the punctuation inserted speech recognition data, and records it in response to a request from the device used by the system user. It is a device that outputs speech recognition data with punctuation inserted, and is a type of sentence symbol insertion device that inserts a punctuation sentence symbol. The phrase insertion device 300 includes a pre-processing unit 310, a morpheme processing unit 320, a teacher data generation unit 330, a learning unit 340, an estimation unit 350, and a phrase insertion unit 360. Note that the wild card processing unit 370 in FIG. 1 is not used in the first embodiment, but is used in the second embodiment, and is drawn with a dotted line.

前処理部３１０は処理対象となる音声認識データに対し、所定の記号を取り除く処理を行う。本実施形態ではこの所定の記号には句点を含むが、句点を含まなくてもよい。本実施形態では音声認識装置２００が句点を挿入する機能を有していたとしても、この前処理部３１０で処理対象となる音声認識データから句点が取り除かれることになり、本句点挿入装置３００で改めて句点を挿入する。また、前処理部３１０は本実施形態において学習対象の学習データについて句点を取り除かない。 The pre-processing unit 310 performs processing for removing predetermined symbols on the speech recognition data to be processed. In this embodiment, the predetermined symbol includes a punctuation point, but may not include a punctuation point. In this embodiment, even if the speech recognition apparatus 200 has a function of inserting a punctuation point, the punctuation point is removed from the speech recognition data to be processed by the pre-processing unit 310. Insert a new punctuation mark. In addition, the preprocessing unit 310 does not remove the punctuation points from the learning data to be learned in this embodiment.

形態素処理部３２０は、前処理済みの音声認識データに対して既存の形態素解析技術を用いて形態素解析を行う。ここで、形態素とは意味を持つ最小の言語単位であり、形態素解析とは、自然言語のテキストデータから、文法や単語の品詞などの情報に基づき、形態素に分割し、分割した形態素の品詞を判別することである。本実施形態では、テキストデータ中の形態素に対し、原形、品詞及び活用形を判別して付与する。 The morpheme processing unit 320 performs morpheme analysis on the preprocessed speech recognition data using an existing morpheme analysis technique. Here, a morpheme is the smallest linguistic unit that has meaning, and morpheme analysis is based on natural language text data based on information such as grammar and part-of-speech of words, and divides the parts of speech of the divided morphemes. It is to discriminate. In the present embodiment, the original form, the part of speech, and the utilization form are determined and assigned to the morphemes in the text data.

教師データ生成部３３０は、形態素解析データを学習部３４０に入力する教師データに変換し、その教師データを学習部３４０に出力する。図２はお客様とオペレータの対話の音声認識データ例を示したものであり、学習用のため、人手で句点を適切に挿入している。この学習用音声認識データ例を形態素解析したものが、図３の通りである。この学習用形態素解析データ例を変換した教師データが図４であり、形態素順に昇順に付与した番号、対象形態素の原形（基本形）、品詞及び活用形と、対象形態素と対象形態素の次の形態素の間に句点が挿入されている否かを示す句点有無とからなる。つまり、変換処理としては、図３の形態素の中で句点を特定し、句点の前の形態素の句点有無を「有」にし、それ以外の形態素の句点有無を「無」にし、句点の形態素を取り除いた。 The teacher data generation unit 330 converts the morphological analysis data into teacher data to be input to the learning unit 340, and outputs the teacher data to the learning unit 340. FIG. 2 shows an example of voice recognition data of the dialogue between the customer and the operator, and for the purpose of learning, the punctuation is appropriately inserted manually. FIG. 3 shows a morphological analysis of the learning speech recognition data example. The training data converted from this learning morpheme analysis data example is shown in FIG. 4, the numbers given in ascending order in the morpheme order, the original form (basic form) of the target morpheme, the part of speech and the utilization form, and the next morpheme of the target morpheme and the target morpheme It consists of the presence or absence of a punctuation mark indicating whether or not a punctuation mark is inserted between them. That is, in the conversion process, a punctuation point is specified in the morpheme of FIG. Removed.

学習部３４０は、入力された教師データを学習して学習モデルを形成する。教師データの入力データは、対象の形態素、その次の形態素（ここで、対象の形態素及び次の形態素のペアを着目形態素組と呼称する）、着目形態素組に隣接する２つの形態素それぞれについての原形、品詞、活用形であり、教師データの出力データは、対象の形態素の句点有無であり、言い換えれば、着目形態素組の形態素間に句点があるかどうかの情報である。図４には、教師データの一つの入力データ例及び対応する出力データ例を示している。ここで、着目形態素組に隣接する２つの形態素を第１隣接形態素組と呼称し、着目形態素組とは逆側に第１隣接形態素組と隣接する２つの形態素を第２隣接形態素組と呼称し、以降、着目形態素組から見てｎ番目に隣接する２つの形態素を第ｎ隣接形態素組と呼称する。ここで、形態素自身ではなく、形態素の原形を入力データとして用いているのは、形態素によっては活用形が異なり、学習するバリエーションが多くなるため原形を本実施形態では原形を用いている。原形の代わりに形態素自身を用いてもよいし、原形に加えて形態素自身を用いてもよい。なお、人工知能／機械学習／ディープラーニングの学習手法としては、様々なものが提案されており、いずれの手法を用いてもよいとする。 The learning unit 340 learns the input teacher data and forms a learning model. The input data of the teacher data includes the target morpheme, the next morpheme (here, the pair of the target morpheme and the next morpheme is referred to as the target morpheme set), and the original form of each of the two morphemes adjacent to the target morpheme set , Part-of-speech, inflection form, and the output data of the teacher data is the presence / absence of punctuation of the target morpheme. FIG. 4 shows one input data example and corresponding output data example of teacher data. Here, two morphemes adjacent to the target morpheme set are referred to as first adjacent morpheme sets, and two morphemes adjacent to the first adjacent morpheme set on the opposite side of the target morpheme set are referred to as second adjacent morpheme sets. Hereinafter, the two morphemes adjacent to the nth as viewed from the target morpheme set are referred to as the nth adjacent morpheme set. Here, the morpheme itself, not the morpheme itself, is used as input data because the morpheme is used differently depending on the morpheme, and there are many variations to learn, so the original form is used in this embodiment. The morpheme itself may be used instead of the original form, or the morpheme itself may be used in addition to the original form. Various methods for learning artificial intelligence / machine learning / deep learning have been proposed, and any method may be used.

推定部３５０は、推定対象となる入力データを前処理部３１０及び形態素処理部３２０を介して取得し、入力データを学習モデルに入力して出力データを取得し、入力データに紐づけて出力データを記録する。前処理部３１０は図５に一例と示す推定処理対象となる音声認識データに対し、所定の記号を取り除く処理を行う。形態素処理部３２０は、前処理済みの音声認識データに対して既存の形態素解析技術を用いて形態素解析を行い、図６に示す形態素解析データを得る。推定部３５０はこの形態素解析データから着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素それぞれについての原形、品詞、活用形を入力データとして学習モデルに入力し、出力データを取得し、着目形態素組の形態素間に句点を挿入すべきか否かを判断し、その判断結果を対象の形態素の句点有無に記録する（図７参照）。学習モデルの出力データの具体例としては、句点を挿入すべき程度を示す値（例えば、０ないし１の値で１に近い程挿入すべきであることを示す）と、句点を挿入すべきではない程度を示す値（例えば、０ないし１の値で１に近い程挿入すべきではないことを示す）である。ここで、この具体例の場合にあっては、それぞれの値に基づき句点有無を判断することになり、例えば、句点を挿入すべき程度を示す値と句点を挿入すべきではない程度を示す値を比較し、句点を挿入すべき程度を示す値が大きければ句点有と判断し、句点を挿入すべきではない程度を示す値が大きければ句点無と判断する。 The estimation unit 350 acquires input data to be estimated via the preprocessing unit 310 and the morpheme processing unit 320, inputs the input data to the learning model, acquires the output data, and associates the input data with the output data. Record. The pre-processing unit 310 performs a process of removing a predetermined symbol from the speech recognition data to be estimated as shown in FIG. 5 as an example. The morpheme processing unit 320 performs morpheme analysis on the preprocessed speech recognition data using an existing morpheme analysis technique to obtain morpheme analysis data shown in FIG. Based on this morpheme analysis data, the estimation unit 350 inputs the original form, part of speech, and utilization form of the two morphemes of the morpheme set of interest and the two morphemes of the first adjacent morpheme set as input data to the learning model, and acquires output data Then, it is determined whether or not a punctuation point should be inserted between the morphemes of the morpheme set of interest, and the determination result is recorded in the presence or absence of the punctuation point of the target morpheme (see FIG. 7). As specific examples of the output data of the learning model, a value indicating the degree to which a punctuation point should be inserted (for example, a value of 0 to 1 indicating that the closer to 1 should be inserted) and a punctuation point should not be inserted. It is a value indicating the degree to which it is not present (for example, a value of 0 to 1 is closer to 1 indicating that it should not be inserted). Here, in the case of this specific example, the presence / absence of a punctuation point is determined based on the respective values. For example, a value indicating the degree to which a punctuation point should be inserted and a value indicating the degree to which a punctuation point should not be inserted If the value indicating the degree to which a punctuation should be inserted is large, it is determined that there is a punctuation, and if the value indicating the degree to which no punctuation should be inserted is large, it is determined that there is no punctuation.

形態素処理部３２０は形態素解析する前の推定対象の音声認識データ上の形態素の位置も形態素解析時に記録しており、句点挿入部３６０は、句点有無が「有」となっている形態素について前記推定対象の音声認識データ上の対応する位置に句点を挿入する（図８参照）。 The morpheme processing unit 320 also records the position of the morpheme on the speech recognition data to be estimated before the morpheme analysis at the time of the morpheme analysis, and the punctuation insertion unit 360 performs the above estimation for the morpheme whose presence / absence of the punctuation is “present”. Phrases are inserted at corresponding positions on the target speech recognition data (see FIG. 8).

次に本実施形態に係る句点挿入装置の動作について、図９を用いて学習動作を、図１０を用いて推定動作を説明する。 Next, regarding the operation of the phrase insertion device according to the present embodiment, the learning operation will be described with reference to FIG. 9, and the estimation operation will be described with reference to FIG.

図９に示す通り、前処理部３１０が学習対象となる多数の対話データである音声認識データに対して前処理を実行する（ステップ１０５）。形態素処理部３２０は前処理済みの全音声認識データに対して形態素解析を行う（ステップ１１０）。教師データ生成部３３０は全形態素解析データを教師データに変換する（ステップ１１５）。学習部３４０は全教師データから一の教師データを取り出し、対象の教師データから一の入力データ及び出力データを取り出し、学習を実行する（ステップ１２０）。学習部３４０は最後の入力データ及び出力データか否かを判断し（ステップ１２５）、最後の入力データ及び出力データではない場合には対象の教師データの次の入力データ及び出力データを取り出し（ステップ１３０）、ステップ１２０を実行する。最後の入力データ及び出力データの場合には、学習部３４０は対象の対話データが最後の対話データか否かを判断し（ステップ１３５）、最後の対話データでなければ学習部３４０は次の対話データを取り出し（ステップ１４０）、ステップ１２０を実行する。最後の対話データの場合には、学習動作を終了する。 As shown in FIG. 9, the preprocessing unit 310 performs preprocessing on speech recognition data that is a large number of conversation data to be learned (step 105). The morpheme processing unit 320 performs morpheme analysis on all pre-processed speech recognition data (step 110). The teacher data generation unit 330 converts all morpheme analysis data into teacher data (step 115). The learning unit 340 extracts one teacher data from all the teacher data, extracts one input data and output data from the target teacher data, and executes learning (step 120). The learning unit 340 determines whether or not it is the last input data and output data (step 125), and if it is not the last input data and output data, takes out the next input data and output data of the target teacher data (step 125). 130), step 120 is executed. In the case of the last input data and output data, the learning unit 340 determines whether the target dialogue data is the last dialogue data (step 135), and if it is not the last dialogue data, the learning unit 340 determines the next dialogue data. Data is extracted (step 140) and step 120 is executed. In the case of the last dialogue data, the learning operation is terminated.

図１０に示す通り、前処理部３１０が推定対象となる対話データである音声認識データに対して前処理を実行する（ステップ２０５）。形態素処理部３２０は前処理済みの音声認識データに対して形態素解析を行う（ステップ２１０）。推定部３５０は形態素解析データから最初の入力データを学習モデルに入力し、出力データを得る（ステップ２１５）。推定部３５０は得た出力データに基づき対象形態素の次に句点を挿入すべきか否かを判断し、その判断結果を対象形態素と関連付けて記録する（ステップ２２０）。推定部３５０は対象の入力データが最後の入力データか否かを判断し（ステップ２２５）、最後の入力データでなければ次の入力データを取り出し（ステップ２３０）、ステップ２１５に移行する。最後の入力データであれば、句点挿入部３６０は全入力データに対する句点有無判断結果に基づき推定対象の音声認識データに対して句点を挿入する（ステップ２３５）。 As shown in FIG. 10, the pre-processing unit 310 performs pre-processing on the speech recognition data that is the dialogue data to be estimated (step 205). The morpheme processing unit 320 performs morpheme analysis on the preprocessed speech recognition data (step 210). The estimation unit 350 inputs the first input data from the morphological analysis data to the learning model, and obtains output data (step 215). The estimation unit 350 determines whether or not a punctuation point should be inserted next to the target morpheme based on the obtained output data, and records the determination result in association with the target morpheme (step 220). The estimation unit 350 determines whether the target input data is the last input data (step 225). If the target input data is not the last input data, the next input data is extracted (step 230), and the process proceeds to step 215. If it is the last input data, the phrase insertion unit 360 inserts a phrase into the speech recognition data to be estimated based on the phrase presence / absence determination result for all input data (step 235).

このように本実施形態に対話管理システムによれば、学習用の音声認識データを準備して入力することで、特に、統計式を予め検討することなく、学習して学習用データに対応した学習モデルが形成され、推定時にはこの学習モデルを用いて各形態素間に句点を挿入すべきかどうかを判断し、挿入すべき句点が適切に挿入される。 As described above, according to the dialogue management system of the present embodiment, by preparing and inputting speech recognition data for learning, learning without learning the statistical formulas in advance and learning corresponding to the learning data A model is formed, and at the time of estimation, it is determined whether or not a punctuation point should be inserted between each morpheme using this learning model, and the punctuation point to be inserted is appropriately inserted.

なお、本実施形態においては、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素の原形、品詞及び活用形を入力データとして学習し、形成した学習モデルに対し、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素それぞれについての原形、品詞、活用形を入力データとして入力し、出力データを用いて着目形態素組の形態素間に句点を挿入すべきか否かを判断したが、第１隣接形態素組と同様に、これらに加え第２隣接形態素組の２つの形態素の原形、品詞及び活用形を用いて学習して推定してもよいし、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素ないし第ｎ隣接形態素の２つの形態素を用いて学習して推定してもよい。また、学習対象の音声認識データ中、着目形態素組の原形等を用いて学習し、推定対象の音声認識データ中、着目形態素組の原形等を用いて推定してもよい。 In this embodiment, the original morpheme, part-of-speech and utilization form of the two morphemes of the target morpheme set and the two morphemes of the first adjacent morpheme set are learned as input data, and the formed morpheme set is The original form, part of speech, and utilization form for each of the two morphemes and the first adjacent morpheme set are input as input data, and the output data is used to determine whether a punctuation point should be inserted between the morphemes of the target morpheme set. However, in the same way as the first adjacent morpheme set, in addition to these, it may be estimated by learning using the original form, part of speech and utilization form of the two morpheme sets of the second adjacent morpheme set, It may be estimated by learning using two morphemes of the morpheme and the first adjacent morpheme set or two morphemes of the nth adjacent morpheme. Alternatively, learning may be performed using the original form of the target morpheme set in the speech recognition data to be learned, and estimation may be performed using the original form of the target morpheme set in the estimation target speech recognition data.

（第２の実施形態） (Second Embodiment)

本第２の実施形態に係る対話管理システム１は、前記第１の実施形態に係る対話管理システム１と同様であり、句点挿入装置３００がワイルドカード処理部３７０を追加で有する点を異にする構成である。 The dialogue management system 1 according to the second embodiment is the same as the dialogue management system 1 according to the first embodiment except that the phrase insertion device 300 additionally includes a wildcard processing unit 370. It is a configuration.

ワイルドカード処理部３７０はワイルドカード設定情報を参照し、学習用形態素解析データ及び推定用形態素解析データに該当するデータがあれば、「＊」（アスタリスク）等のワイルドカード記号で置換を行う。ワイルドカード設定情報はシステム設定者が予め、形態素、品詞、活用形の組み合わせでワイルドカード設定したい組み合わせを設定する。過去の対話データである音声認識データを多数用意したとしても、対話で利用される可能性がある全ての形態素、品詞及び活用形の組み合わせを網羅することは難しいため、現在の学習用対話データでカバーできない組み合わせのうち、今後、出現頻度の可能性が高い形態素、品詞及び活用形をワイルドカード設定することが望ましい。形態素、品詞及び活用形のいずれでもワイルドカード指定することができ、形態素、品詞及び活用形のうち２つをワイルドカード指定することもできる。上記ワイルドカード処理部３７０の具体的な処理例としては、図１１に示す通り、形態素、品詞及び活用形がそれぞれ「難しい」、「形容詞」、「＊」である一のワイルドカード設定があり、学習用データには図３に示す通り、「難しい」、「形容詞」、「連用形‐促音便」があり、前記一のワイルドカード設定に該当するため、図１１に示す通り、活用形を「連用形‐促音便」から「＊」に変更する。同様に、学習用データにおいても図６に示す通り、形態素、品詞及び活用形がそれぞれ「様」、「接尾辞」、「（ｎｕｌｌ）」があり、一のワイルドカード設定で形態素、品詞及び活用形が「＊」、「接尾辞」、「（ｎｕｌｌ）」があって合致するため、学習用データの形態素「様」を「＊」に変換する。ワイルドカード処理部３７０で処理された後は、前記第１の実施形態と同様に、学習時に学習部３４０でワイルドカード処理された形態素解析データから教師データを生成して学習し、推定時に推定部３５０でワイルドカード処理された形態素解析データを用いて形成済みの学習モデルに基づき推定を行う。 The wild card processing unit 370 refers to the wild card setting information, and performs replacement with a wild card symbol such as “*” (asterisk) if there is data corresponding to the learning morpheme analysis data and the estimation morpheme analysis data. In the wild card setting information, the system setter sets in advance a combination that the wild card setting is desired by combining the morpheme, the part of speech, and the utilization form. Even if a lot of speech recognition data, which is past dialogue data, is prepared, it is difficult to cover all combinations of morphemes, parts of speech and usage forms that may be used in dialogue. Of the combinations that cannot be covered, it is desirable to set wildcards for morphemes, parts of speech and inflected forms that are likely to appear in the future. Any of the morpheme, the part of speech, and the utilization form can be designated as a wild card, and two of the morpheme, the part of speech and the utilization form can be designated as a wild card. As a specific processing example of the wild card processing unit 370, as shown in FIG. 11, there is one wild card setting in which the morpheme, the part of speech, and the utilization form are “difficult”, “adjective”, and “*”, respectively. As shown in FIG. 3, the learning data includes “difficult”, “adjective”, and “continuous form-sounding stool”, and corresponds to the one wildcard setting. Therefore, as shown in FIG. Change from “Sounding flight” to “*”. Similarly, in the learning data, as shown in FIG. 6, the morpheme, the part of speech, and the utilization form have “sama”, “suffix”, and “(null)”, respectively. Since the form has “*”, “suffix”, and “(null)” and matches, the morpheme “like” of the learning data is converted to “*”. After the processing by the wild card processing unit 370, as in the first embodiment, teacher data is generated and learned from the morphological analysis data wild-carded by the learning unit 340 during learning, and the estimation unit during estimation The estimation is performed based on the learning model that has been formed using the morphological analysis data that has been subjected to the wild card processing in 350.

このように本実施形態の対話管理システムによれば、学習用データにない形態素、品詞及び活用形をワイルドカード設定することで、推定用データに学習用データでは出現しなかった形態素、品詞及び活用形の組み合わせが出現したとしても、ワイルドカード設定でカバーされることで、学習モデルでの推定が適切に実行される。 As described above, according to the dialogue management system of the present embodiment, by setting wildcards for morphemes, parts of speech and utilization forms that are not in the learning data, morphemes, parts of speech and utilization that did not appear in the learning data in the estimation data. Even if a combination of shapes appears, the estimation by the learning model is appropriately executed by being covered by the wild card setting.

なお、本実施形態においては予めシステム設定者がワイルドカード設定情報を設定し、そのワイルドカード設定情報を参照して処理を行ったが、ワイルドカード設定情報を用いることなく、学習対象の音声認識データをワイルドカード指定することもでき、この場合、学習対象の音声認識データのワイルドカード指定を参照して推定対象の音声認識データをワイルドカード指定して推定処理を行うこととなる。 In this embodiment, the system setter sets wildcard setting information in advance and performs processing with reference to the wildcard setting information. However, the voice recognition data to be learned is not used without using wildcard setting information. Can be designated as a wild card. In this case, the estimation target speech recognition data is designated as a wild card with reference to the wild card designation of the speech recognition data to be learned, and the estimation process is performed.

また、ワイルドカード設定情報はシステム設定者が予め、形態素、品詞、活用形の組み合わせでワイルドカード設定したい組み合わせを設定するとしたが、システム設定者が任意で気づいたタイミングでワイルドカード設定をする他、例えば、図４の教師データの入力データを出現頻度で集計したリスト（又は、必要に応じて出現頻度数にて昇順又は降順したリスト）をシステム設定者等のユーザに表示してワイルドカード設定を受け付ける構成であってもよく、ユーザによっては出現頻度件数等で足切りして出現頻度の少ない語彙をワイルドカード設定することもできる。 In addition, as for the wild card setting information, the system setter sets the combination of the morpheme, the part of speech, and the utilization form that the user wants to set the wildcard in advance, but the system setter arbitrarily sets the wildcard at the timing noticed, For example, a list in which the input data of the teacher data in FIG. 4 is tabulated by appearance frequency (or a list in ascending or descending order according to the number of appearance frequencies as necessary) is displayed to a user such as a system setter to set a wild card. It may be configured to accept, and depending on the user, a vocabulary with a low appearance frequency can be set as a wild card by cutting off the number of appearance frequency cases.

（第３の実施形態） (Third embodiment)

本第３の実施形態に係る対話管理システム１は、前記第１の実施形態に係る対話管理システム１と同様であり、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素ないし第ｎ隣接形態素の２つの形態素を用いて学習して推定する処理を、ｎが１を初期値とし、ｎを所定閾値までそれぞれ実施し、最もテスト結果の良い推定結果となったｎの値の学習モデルを用いて以降の推定処理を行うことを異にする構成である。ここで、形成されるｎ個の学習モデルのテストは、教師データと同様の構成であるテストデータを用いて行うが、教師データのうち学習時に用いる教師データを７割程度とし、残り３割の教師データをテストデータとして用いテスト実行しても良い。学習モデルにテストデータの入力データを入力して得られた出力データとテストデータの出力データとを比較することでテストの正誤が分かり、テストデータ個数中の正解数が正答率となり、ｎ個の学習モデルのうち正答率が最も高い学習モデルが実運用で使用する学習モデルとなる。 The dialogue management system 1 according to the third embodiment is the same as the dialogue management system 1 according to the first embodiment, and includes two morphemes or first morphemes of the first morpheme set and a first morpheme set of interest. Learning by using two morphemes of n adjacent morphemes and performing the estimation, n is 1 as an initial value, n is performed up to a predetermined threshold value, and learning of the value of n that gives the best estimation result of the test result This is a configuration in which subsequent estimation processing is performed using a model. Here, the n learning models to be formed are tested using test data having the same configuration as the teacher data. Of the teacher data, about 70% of the teacher data is used for learning, and the remaining 30%. A test may be executed using the teacher data as test data. Comparing the output data obtained by inputting the test data input data into the learning model and the output data of the test data, the correctness of the test can be determined, and the number of correct answers in the number of test data becomes the correct answer rate. The learning model with the highest correct answer rate among the learning models is the learning model used in actual operation.

次に、図１２を用いて本実施形態に係る対話管理システム１の動作について説明するが、同じステップ番号が付与されているステップは第１の実施形態の動作のため説明を省力する。また、本実施形態のｎの所定閾値は３とする。ステップ１１５で教師データを作成した後、ｎを初期値の１とし、以降の動作を実行する（ステップ３０５）。つまり、ステップ１２０ないしステップ１４０の各動作である学習処理を、着目形態素組と第１隣接形態素組を対象として実行することでｎが１の学習モデルが形成される。ステップ３１５で、ｎが所定閾値の３であるかどうかを比較し、ｎが１であるため１インクリメントとし、ステップ１２０に戻り、同様に、着目形態素組、第１隣接形態素組及び第２隣接形態素組を対象として学習処理を実行してｎが２の学習モデルが形成される。同様に、ステップ３１５に移行してｎが３として学習処理がなされ、ｎが３の学習モデルが形成される。ｎが３であり、次のステップ３１５の判断ブロックで終了に移行する。以上により、ｎ＝１の学習モデル、ｎ＝２の学習モデル、ｎ＝３の学習モデルが形成されたことになり、テストデータを用いてテスト実行することで、それぞれの学習モデルの正答率が得られ、最も正答率の高い学習モデルを今後の推定処理で用いる学習モデルとする。 Next, the operation of the dialogue management system 1 according to the present embodiment will be described with reference to FIG. 12, but steps with the same step numbers are omitted because of the operation of the first embodiment. Also, the predetermined threshold value of n in this embodiment is 3. After the teacher data is created in step 115, n is set to an initial value of 1, and the subsequent operations are executed (step 305). That is, a learning model with n = 1 is formed by executing the learning process, which is each operation of step 120 to step 140, for the target morpheme set and the first adjacent morpheme set. In step 315, whether or not n is a predetermined threshold of 3 is compared. Since n is 1, the increment is made 1 and the process returns to step 120. Similarly, the target morpheme set, the first adjacent morpheme set, and the second adjacent morpheme A learning process is executed on the pair to form a learning model with n = 2. Similarly, the process proceeds to step 315, where n is 3 and learning processing is performed, and a learning model in which n is 3 is formed. Since n is 3, the process proceeds to the end in the decision block of the next step 315. As described above, a learning model with n = 1, a learning model with n = 2, and a learning model with n = 3 are formed. By executing a test using test data, the correct answer rate of each learning model can be increased. The learning model that is obtained and has the highest correct answer rate is used as a learning model to be used in future estimation processing.

このように本実施形態の対話管理システムによれば、ｎの閾値個数分の学習モデルを形成し、最も正答率の高い学習モデルを採用して今後の推定処理を行うため、推定処理に用いられる入力データに最適な推定処理が可能となる。 As described above, according to the dialogue management system of the present embodiment, learning models corresponding to the threshold number n are formed, and the learning model with the highest correct answer rate is adopted to perform future estimation processing. An estimation process optimum for the input data can be performed.

（その他の実施形態） (Other embodiments)

前記各実施形態においては、発話主体又は発話主体種別が異なっている場合であっても、発話主体又は発話主体種別によらず複数の発話主体又は発話主体種別による発話内容全体を対象に前処理及び形態素解析処理を実行して学習し、形成した学習モデルを用いて推定を行っているが、発話主体又は発話主体種別別に学習し、推定を行ってもよい。発話主体／発話主体種別の識別は、例えば、発話内容を録音している際に可能であり、お客様がコールセンターのオペレータと通話している例であれば、お客様とオペレータで通話チャネルが異なり、オペレータの通話チャネルから出力される音声データにはオペレータの識別情報（オペレータそれぞれに付与されているオペレータ識別情報又はオペレータという役割を示す発話主体種別識別情報）を付与し、お客様の通話チャネルから出力される音声データにはお客様の識別情報（お客様それぞれに付与されている顧客識別情報又はオペレータという役割を示す発話主体種別識別情報）を付与し、音声認識時に、それらの発話主体の識別情報を用いることで、オペレータの音声認識テキストデータと、お客様の音声認識テキストデータを識別して保存可能となる。この他にも発話主体を識別する方法はあり、いずれにしろ、音声認識装置２００から句点挿入装置３００に音声認識データを入力する際に、句点挿入装置３００が発話主体毎に学習及び推定処理が可能であればよく、発話主体毎に音声認識テキストデータを句点挿入装置３００に入力してもよい（図１３に示す通り、発話主体種別により分離可能な音声認識データでもよい）。前記オペレータとお客様の例の場合には、オペレータの音声認識テキストデータを学習してオペレータの学習モデルを形成し、推定時に推定処理対象のオペレータの音声認識テキストデータを入力し、句点挿入箇所の出力データを得て句点挿入を反映する。 In each of the embodiments, even when the utterance subject or the utterance subject type is different, the pre-processing and the entire utterance content by the plurality of utterance subjects or utterance subject types regardless of the utterance subject or the utterance subject type Although learning is performed by executing morpheme analysis processing and estimation is performed using the formed learning model, estimation may be performed by learning for each utterance subject or utterance subject type. The utterance subject / speaking subject type can be identified when, for example, the utterance content is recorded. If the customer is talking to the call center operator, the call channel differs between the customer and the operator. The voice data output from the call channel is assigned operator identification information (operator identification information assigned to each operator or utterance subject type identification information indicating the role of the operator) and output from the customer's call channel. By adding customer identification information (customer identification information given to each customer or utterance subject type identification information indicating the role of an operator) to the voice data, and using the identification information of those utterance subjects during voice recognition The operator's voice recognition text data and the customer's voice recognition text data. And it is possible save. There are other methods for identifying the utterance subject. In any case, when speech recognition data is input from the speech recognition device 200 to the punctuation insertion device 300, the punctuation insertion device 300 performs learning and estimation processing for each utterance subject. As long as it is possible, speech recognition text data may be input to the phrase insertion device 300 for each utterance subject (speech recognition data separable according to the utterance subject type as shown in FIG. 13). In the case of the operator and the customer, the operator's speech recognition text data is learned to form the operator's learning model, and the operator's speech recognition text data to be estimated is input at the time of estimation. Get data to reflect the insertion of punctuation.

また、前記各実施形態においては、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素の原形、品詞及び活用形を入力データとし、着目形態素組の２つの形態素間の句点有無を示す出力データを学習データとして学習し、形成した学習モデルに対し、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素それぞれについての原形、品詞、活用形を入力データとして入力し、出力データを用いて着目形態素組の形態素間に句点を挿入すべきか否かを判断したが、同様に、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素の原形、品詞及び活用形を入力データとし、着目形態素組の２つの形態素間の読点有無を示す出力データを学習データとして学習し、形成した学習モデルに対し、着目形態素組の２つの形態素及び第１隣接形態素組の２つの形態素それぞれについての原形、品詞、活用形を入力データとして入力し、出力データを用いて着目形態素組の形態素間に読点を挿入すべきか否かを判断する構成であってもよい。読点に代えて、改行等の文章記号であっても同様である。 Further, in each of the above embodiments, the original form, part of speech and utilization form of the two morphemes of the target morpheme set and the two morphemes of the first adjacent morpheme set are used as input data, and the presence / absence of a punctuation between the two morphemes of the target morpheme set is determined. The output data shown is learned as learning data, and for the formed learning model, the original form, part of speech, and utilization form for each of the two morphemes of the target morpheme set and the two morphemes of the first adjacent morpheme set are input as input data, It was determined whether or not a punctuation point should be inserted between the morphemes of the target morpheme set using the output data. Using the shape as input data, learning output data indicating the presence or absence of reading points between the two morphemes of the target morpheme set as learning data, The original form, part of speech, and utilization form for each of the two morphemes of the morpheme and the first adjacent morpheme set are input as input data, and the output data is used to determine whether a punctuation mark should be inserted between the morphemes of the target morpheme set It may be a configuration. The same applies to sentence symbols such as line breaks instead of punctuation marks.

また、本実施形態においては、対話の音声認識データを処理対象としたが、対話に限らず、発話者が一人のスピーチ、講演の音声認識データでもよく、対話も二人に限らず三人以上の議論等であってもよい。更には、音声認識データでなくてもよく、ワープロソフトで作成した文章であってもよい。音声認識データでは音声認識エンジンにより適切に文章記号が挿入されないという課題があったが、書き手が入力する文章にも文章記号が適切に挿入されていない課題がある。例えば、読点の打ち方は文法的に間違っていなくとも、人為的に取り決めている推敲基準があり、読点挿入装置に対してその推敲基準を満たした教師データで学習して推定することで、推敲基準を満たす読点を挿入することができる。 In the present embodiment, the speech recognition data for dialogue is the target of processing. However, the speech recognition data is not limited to dialogue, but may be speech of one speaker or speech, and the dialogue is not limited to two, but more than three people. It may be a discussion. Further, it may not be voice recognition data, but may be a sentence created by word processing software. In speech recognition data, there is a problem that a sentence symbol is not properly inserted by a speech recognition engine, but there is a problem that a sentence symbol is not properly inserted in a sentence input by a writer. For example, even if there is no grammatical mistake in how to write a punctuation mark, there is a selection criterion that is artificially determined. You can insert readings that meet the criteria.

また、前記各実施形態においては、学習モデルの出力データの具体例としては、句点を挿入すべき程度を示す値と、句点を挿入すべきではない程度を示す値とを例示したが、いずれか一方であってもよく、例えば、句点を挿入すべき程度を示す値が所定閾値以上であれば句点を挿入すべきと判断する構成であってもよく、この場合の所定閾値は利用者が設定可能で、その所定閾値毎に句点を挿入した推定対象の音声認識データを記録し、いずれかの句点を挿入した音声認識データと、他の句点を挿入した音声認識データを比較し、句点挿入の有無を顕示する構成であってもよい。ここで、例えば、句点を挿入すべき程度を示す値の所定閾値を小さいくすることで、図１４に示す通り、句点が挿入され易くなり、標準の所定閾値で句点を挿入した音声認識データと比較し、追加された句点を下線で顕示し、削除された句点を二重取り消し線顕示してもよい。 In each of the above embodiments, as a specific example of the output data of the learning model, a value indicating the degree to which a punctuation point should be inserted and a value indicating the degree to which a punctuation point should not be inserted have been exemplified. For example, if the value indicating the degree to which a punctuation should be inserted is equal to or greater than a predetermined threshold, it may be determined that a punctuation should be inserted. In this case, the predetermined threshold is set by the user. It is possible to record the speech recognition data to be estimated with the punctuation inserted at each predetermined threshold, compare the speech recognition data with any punctuation inserted with the speech recognition data with the other punctuation inserted, and The structure which reveals the presence or absence may be sufficient. Here, for example, by reducing the predetermined threshold value indicating the degree to which a punctuation point should be inserted, it becomes easier to insert a punctuation point as shown in FIG. In comparison, added punctuation points may be underlined, and deleted punctuation points may be double strikethrough.

本発明は、文字列テキストデータに文章記号を適切に挿入する文章記号挿入装置に好適に利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be suitably used for a text symbol insertion device that appropriately inserts text symbols into character string text data.

対話録音装置１００
音声認識装置２００
句点挿入装置３００
前処理部３１０
形態素処理部３２０
教師データ生成部３３０
学習部３４０
推定部３５０
句点挿入部３６０

Dialogue recording device 100
Speech recognition device 200
Punctuation insertion device 300
Pre-processing unit 310
Morphological processing unit 320
Teacher data generation unit 330
Learning part 340
Estimator 350
Punctuation insertion part 360

Claims

In the character string text data to be learned, the morpheme and grammar information of the target morpheme set consisting of the morpheme to be processed and the next morpheme of the process morpheme are set as input data,
A learning unit that learns output data indicating whether or not a sentence symbol is inserted between morphemes of the morpheme set of interest as teacher data and forms a learning model;
A morpheme processing unit that performs morphological analysis on character string text data and outputs the divided morpheme and grammatical information of the morpheme;
The morpheme of the target morpheme set consisting of the morpheme to be processed and the next morpheme of the morpheme to be processed among the morpheme obtained by performing the morpheme analysis on the character string text data to be estimated and the morpheme of the morpheme And an estimator that inputs grammatical information as input data into the learning model and determines whether or not to insert a sentence symbol between morphemes of the morpheme group of interest based on the output data output;
A sentence symbol insertion unit that inserts a sentence symbol into the character string text data to be estimated based on the estimation result of the estimation unit,
The character string text data is given identification information for each utterance subject,
The learning unit uses the identification information for each utterance subject to learn for each utterance subject to form a learning model,
The estimation unit determines whether or not to insert a sentence symbol using a learning model for each utterance subject according to the identification information for each utterance subject ,
In addition to the target morpheme set in the text text data to be learned, the learning unit uses the morpheme of the first adjacent morpheme that is adjacent to the target morpheme set and the grammatical information of the morpheme. To the nth adjacent morpheme that is the nth adjacent morpheme and the grammatical information of the morpheme as input data,
The estimation unit is the second morpheme adjacent to the target morpheme set from the morpheme of the first adjacent morpheme and the grammatical information of the morpheme in addition to the target morpheme set in the character string text data to be estimated. n morphemes of adjacent morphemes and grammatical information of the morphemes are input as input data to the learning model and estimated
The learning unit has n as an initial value, learns each of n up to a predetermined threshold, and forms each learning model,
The said estimation part performs the test execution using test data with respect to each formed learning model, The sentence symbol insertion apparatus which uses a learning model with the highest correct answer rate as a result of the said test execution by subsequent estimation.

The sentence symbol insertion device according to claim 1, further comprising a preprocessing unit that removes a predetermined sentence symbol from the character string text data to be estimated.

Grammatical information the morphological processing section outputs the original part-of-speech, text symbol insertion apparatus according to claim 1 or 2 comprising at least one conjugated form.

The grammatical information output by the morpheme processing unit includes at least an original form,
The sentence symbol insertion device according to claim 1 or 2 , wherein the morpheme original is used instead of each morpheme for learning of the learning unit and estimation of the estimation unit.

Refer to the wildcard setting information or wildcard specification of the input data of learning data,
Including a wildcard processing unit that replaces the input data to be estimated with a wildcard symbol;
The sentence symbol insertion device according to any one of claims 1 to 4 , wherein the estimation unit uses the estimation target input data replaced with the wild card by the wild card processing unit.

The morpheme and grammatical information of the target morpheme set consisting of the morpheme to be processed and the next morpheme of the process target morpheme are used as input data in the text text data to be learned. A learning step of learning output data indicating whether or not a symbol is inserted as teacher data to form a learning model,
Morphological analysis of character string text data and outputting the divided morpheme and grammatical information of the morpheme,
The morpheme of the target morpheme set consisting of the morpheme to be processed and the next morpheme of the morpheme to be processed among the morpheme obtained by performing the morpheme analysis on the character string text data to be estimated And grammatical information as input data input to the learning model, based on the output data that is output, the estimation step to determine whether to insert a sentence symbol between the morphemes of the morpheme group of interest,
A sentence symbol insertion step for inserting a sentence symbol into the character string text data to be estimated based on the estimation result of the estimation step,
The character string text data is given identification information for each utterance subject,
In the learning step, a learning model is formed by learning for each utterance subject using identification information for each utterance subject,
In the estimation step, it is determined whether or not to insert a sentence symbol by using a learning model for each utterance subject by identification information for each utterance subject ,
The learning step includes, in addition to the target morpheme set in the text text data to be learned, the morpheme of the first adjacent morpheme that is adjacent to the target morpheme set and the grammatical information of the morpheme. To the nth adjacent morpheme that is the nth adjacent morpheme and the grammatical information of the morpheme as input data,
In the estimation target character string text data, in addition to the target morpheme set, in addition to the morpheme of the first adjacent morpheme and the grammatical information of the morpheme, the estimation step includes two morphemes adjacent to the target morpheme set. n morphemes of adjacent morphemes and grammatical information of the morphemes are input as input data to the learning model and estimated,
In the learning step, n is 1 as an initial value, n is learned for each of up to a predetermined threshold value, and each learning model is formed.
The estimation step is a sentence symbol insertion method in which a test execution using test data is performed on each formed learning model, and the learning model having the highest correct answer rate is used in the subsequent estimation as a result of the test execution .