JP2022028436A

JP2022028436A - Information processing apparatus and information processing program

Info

Publication number: JP2022028436A
Application number: JP2020131827A
Authority: JP
Inventors: 敏郎大櫃; Toshiro Obitsu
Original assignee: Fujitsu Client Computing Ltd
Current assignee: Fujitsu Client Computing Ltd
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2022-02-16
Anticipated expiration: 2040-08-03
Also published as: JP6836094B1

Abstract

【課題】音声認識の際に発生しうる不明瞭語を適切に処理する。【解決手段】本開示の一例としての情報処理装置は、音声データの内容を示すデータとして音声認識により得られるテキストデータに含まれる形態素を、当該形態素ごとの音声認識の確からしさを示す認識率とともに取得する取得処理部と、認識率が閾値を下回る形態素から、発音が他の語と類似している不明瞭語が特定された場合、不明瞭語から、不明瞭語と同一の意味を持つ語として予め設定された変換語を取得する変換処理部と、を備える。【選択図】図１An object of the present invention is to appropriately process ambiguous words that may occur during speech recognition. [Solution] An information processing device as an example of the present disclosure uses morphemes included in text data obtained by speech recognition as data indicating the content of speech data, together with a recognition rate indicating the certainty of speech recognition for each morpheme. If an ambiguous word whose pronunciation is similar to other words is identified from the acquisition processing unit and the morphemes whose recognition rate is below the threshold, a word with the same meaning as the ambiguous word is selected from the ambiguous words. and a conversion processing unit that acquires a conversion word set in advance as a conversion word. [Selection diagram] Figure 1

Description

本開示は、情報処理装置および情報処理プログラムに関する。 The present disclosure relates to information processing devices and information processing programs.

従来から、ユーザの発話内容を示す音声データに音声認識による解析を実行し、当該解析の結果に応じて、ユーザの発話内容に対する応答を出力する技術について検討されている。このような従来の技術として、話題の一貫性または単語のつながりを考慮して適切な応答の出力を図る構成が知られている。 Conventionally, a technique has been studied in which an analysis by voice recognition is performed on voice data indicating a user's utterance content, and a response to the user's utterance content is output according to the result of the analysis. As such a conventional technique, a configuration is known in which an appropriate response is output in consideration of the consistency of topics or the connection of words.

国際公開第２０１９／１６２２４２号International Publication No. 2019/162242 特開２０１４－１４５８４２号公報Japanese Unexamined Patent Publication No. 2014-145842

しかしながら、上述した音声認識による解析の結果として得られる形態素には、発音が他の語と類似している不明瞭語が含まれる場合がある。この点に関して、上述した従来の技術では、話題の一貫性または単語のつながりに問題が無ければ、たとえ不明瞭語が誤認識されている場合であっても、誤認識された不明瞭語に基づいて応答が出力される。このため、不明瞭語の誤認識を抑制するように、不明瞭語を適切に処理することが望まれている。 However, the morpheme obtained as a result of the above-mentioned speech recognition analysis may include an unclear word whose pronunciation is similar to that of other words. In this regard, the prior art described above is based on misrecognized obscure words, even if the obscure words are misrecognized, provided that there is no problem with topical consistency or word connections. The response is output. Therefore, it is desired to appropriately process the unclear word so as to suppress the misrecognition of the unclear word.

そこで、本開示の課題の一つは、音声認識の際に発生しうる不明瞭語を適切に処理することが可能な情報処理装置および情報処理プログラムを提供することである。 Therefore, one of the problems of the present disclosure is to provide an information processing apparatus and an information processing program capable of appropriately processing unclear words that may occur during speech recognition.

本開示の一例としての情報処理装置は、音声データの内容を示すデータとして音声認識により得られるテキストデータに含まれる形態素を、当該形態素ごとの音声認識の確からしさを示す認識率とともに取得する取得処理部と、認識率が閾値を下回る形態素から、発音が他の語と類似している不明瞭語が特定された場合、不明瞭語から、不明瞭語と同一の意味を持つ語として予め設定された変換語を取得する変換処理部と、を備える。 The information processing apparatus as an example of the present disclosure acquires morphological elements contained in text data obtained by voice recognition as data indicating the contents of voice data together with a recognition rate indicating the certainty of voice recognition for each morphological element. When an unclear word whose pronunciation is similar to another word is identified from the part and the morphological element whose recognition rate is below the threshold, the unclear word is preset as a word having the same meaning as the unclear word. It is provided with a conversion processing unit for acquiring the conversion word.

本開示の一例としての情報処理装置によれば、音声認識の際に発生しうる不明瞭語を適切に処理することができる。 According to the information processing apparatus as an example of the present disclosure, it is possible to appropriately process unclear words that may occur during speech recognition.

図１は、実施形態にかかる情報処理システムの構成を示した例示的かつ模式的なブロック図である。FIG. 1 is an exemplary and schematic block diagram showing a configuration of an information processing system according to an embodiment. 図２は、実施形態にかかる変換データベースの例を例示的かつ模式的な図である。FIG. 2 is an exemplary and schematic diagram of an example of a conversion database according to an embodiment. 図３は、実施形態にかかるユーザ音声データベースの例を示した例示的かつ模式的な図である。FIG. 3 is an exemplary and schematic diagram showing an example of a user voice database according to an embodiment. 図４は、実施形態にかかるサーバ装置が端末装置からの音声データの受信に応じて実行する一連の処理を示した例示的なフローチャートである。FIG. 4 is an exemplary flowchart showing a series of processes executed by the server device according to the embodiment in response to receiving voice data from the terminal device. 図５は、実施形態において端末装置のユーザとサーバ装置との間で実行される会話の一例を示した例示的かつ模式的な図である。FIG. 5 is an exemplary and schematic diagram showing an example of a conversation performed between a user of a terminal device and a server device in an embodiment. 図６は、実施形態において端末装置のユーザとサーバ装置との間で実行される会話の図５とは異なる他の一例を示した例示的かつ模式的な図である。FIG. 6 is an exemplary and schematic diagram showing another example of a conversation performed between a user of a terminal device and a server device in an embodiment different from that of FIG. 図７は、実施形態にかかるサーバ装置を構成するコンピュータのハードウェア構成の例を示した例示的かつ模式的なブロック図である。FIG. 7 is an exemplary and schematic block diagram showing an example of the hardware configuration of the computer constituting the server device according to the embodiment.

以下、本開示の実施形態を図面に基づいて説明する。以下に記載する実施形態の構成、ならびに当該構成によってもたらされる作用および効果は、あくまで一例であって、以下の記載内容に限られるものではない。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. The configuration of the embodiment described below, and the actions and effects brought about by the configuration are merely examples, and are not limited to the contents described below.

図１は、実施形態にかかる情報処理システムの構成を示した例示的かつ模式的なブロック図である。 FIG. 1 is an exemplary and schematic block diagram showing a configuration of an information processing system according to an embodiment.

ここで、実施形態にかかる情報処理システムは、ユーザの発話内容を示す音声データに音声認識による解析を実行し、当該解析の結果に応じて、ユーザの発話内容に対する応答を出力する技術に適用される。このような技術を実施するための構成として、従来、話題の一貫性または単語のつながりを考慮して、適切な応答の出力を図る構成が知られている。 Here, the information system according to the embodiment is applied to a technique of executing an analysis by voice recognition on voice data indicating a user's utterance content and outputting a response to the user's utterance content according to the result of the analysis. To. As a configuration for implementing such a technique, a configuration for outputting an appropriate response in consideration of topic consistency or word connection has been conventionally known.

しかしながら、上述した音声認識による解析の結果として得られる形態素には、発音が他の語と類似している不明瞭語が含まれる場合がある。この点に関して、上述した従来の技術では、話題の一貫性または単語のつながりに問題が無ければ、たとえ不明瞭語が誤認識されている場合であっても、誤認識された不明瞭語に基づいて応答が出力される。このため、不明瞭語の誤認識を抑制するように、不明瞭語をより適切に処理することが望まれている。 However, the morpheme obtained as a result of the above-mentioned speech recognition analysis may include an unclear word whose pronunciation is similar to that of other words. In this regard, the prior art described above is based on misrecognized obscure words, even if the obscure words are misrecognized, provided that there is no problem with topical consistency or word connections. The response is output. Therefore, it is desired to process the unclear word more appropriately so as to suppress the misrecognition of the unclear word.

そこで、実施形態にかかる情報処理システムは、以下に説明するような構成および処理により、音声認識の際に発生しうる不明瞭語をより適切に処理することを実現する。 Therefore, the information processing system according to the embodiment can more appropriately process unclear words that may occur during voice recognition by the configuration and processing as described below.

図１に示されるように、実施形態にかかる情報処理システムは、端末装置１００と、サーバ装置２００と、を備えている。端末装置１００およびサーバ装置２００は、ネットワーク（不図示）を介して互いに通信可能に接続されている。サーバ装置２００は、本開示の「情報処理装置」の一例である。 As shown in FIG. 1, the information processing system according to the embodiment includes a terminal device 100 and a server device 200. The terminal device 100 and the server device 200 are communicably connected to each other via a network (not shown). The server device 200 is an example of the "information processing device" of the present disclosure.

端末装置１００は、通信処理部１１０と、入力処理部１２０と、出力処理部１３０と、を備えている。 The terminal device 100 includes a communication processing unit 110, an input processing unit 120, and an output processing unit 130.

通信処理部１１０は、端末装置１００と他の装置（図１に示される例ではサーバ装置２００）との間で実行されうる通信を司る。 The communication processing unit 110 controls communication that can be executed between the terminal device 100 and another device (server device 200 in the example shown in FIG. 1).

入力処理部１２０は、端末装置１００のユーザの発話に応じた音声データの入力を受け付け、当該音声データのサーバ装置２００への送信を通信処理部１１０に実行させる。 The input processing unit 120 accepts the input of voice data according to the utterance of the user of the terminal device 100, and causes the communication processing unit 110 to transmit the voice data to the server device 200.

出力処理部１３０は、サーバ装置２００が音声データに応じて作成した応答が通信処理部１１０により受信された場合に、当該応答を端末装置１００のユーザに音声または画像で通知する。 When the communication processing unit 110 receives the response created by the server device 200 in response to the voice data, the output processing unit 130 notifies the user of the terminal device 100 of the response by voice or image.

また、サーバ装置２００は、通信処理部２１０と、解析処理部２２０と、解析データベース（ＤＢ）２２１と、変換処理部２３０と、変換データベース２３１と、ユーザ音声データベース２３２と、応答処理部２４０と、検索処理部２５０と、を備えている。解析処理部２２０は、本開示の「取得処理部」の一例である。 Further, the server device 200 includes a communication processing unit 210, an analysis processing unit 220, an analysis database (DB) 221, a conversion processing unit 230, a conversion database 231 and a user voice database 232, and a response processing unit 240. It includes a search processing unit 250. The analysis processing unit 220 is an example of the “acquisition processing unit” of the present disclosure.

通信処理部２１０は、サーバ装置２００と他の装置（図１に示される例では端末装置１００）との間で実行されうる通信を司る。 The communication processing unit 210 controls communication that can be executed between the server device 200 and another device (terminal device 100 in the example shown in FIG. 1).

解析処理部２２０は、通信処理部２１０が端末装置１００から受信した音声データに音声認識による解析を実行する。より具体的に、解析処理部２２０は、音声データの内容を示すデータとして音声認識により得られるテキストデータに含まれる形態素を、当該形態素ごとの音声認識の確からしさを示す認識率とともに取得する。実施形態は、解析の手法として、従来から知られている形態素解析の手法を利用しうる。このような解析に必要な各種のデータは、解析データベース２２１に予め設定されている。 The analysis processing unit 220 executes analysis by voice recognition on the voice data received from the terminal device 100 by the communication processing unit 210. More specifically, the analysis processing unit 220 acquires morphemes included in the text data obtained by voice recognition as data indicating the contents of voice data, together with a recognition rate indicating the certainty of voice recognition for each morpheme. As the embodiment, a conventionally known morphological analysis method can be used as the analysis method. Various data necessary for such analysis are preset in the analysis database 221.

変換処理部２３０は、上記の認識率が閾値を下回る形態素から、発音が他の語と類似している不明瞭語を特定する。そして、変換処理部２３０は、不明瞭語が特定された場合、特定された不明瞭語から、当該不明瞭語と同一の意味を持つ語として予め設定された変換語を取得する。さらに、変換処理部２３０は、変換語が取得された場合、変換語を補足する語として予め設定された補足語をさらに取得する。 The conversion processing unit 230 identifies an unclear word whose pronunciation is similar to that of other words from the morpheme whose recognition rate is below the threshold value. Then, when the unclear word is specified, the conversion processing unit 230 acquires a converted word preset as a word having the same meaning as the unclear word from the specified unclear word. Further, when the conversion word is acquired, the conversion processing unit 230 further acquires a complement word preset as a word that supplements the conversion word.

不明瞭語の特定と、当該不明瞭語に応じた変換語および補足語の取得とは、次の図２に示されるような変換データベース２３１に基づいて実行される。 The identification of the obscure word and the acquisition of the converted word and the complement word corresponding to the unclear word are executed based on the conversion database 231 as shown in FIG. 2 below.

図２は、実施形態にかかる変換データベース２３１の例を例示的かつ模式的な図である。 FIG. 2 is an exemplary and schematic diagram of an example of a conversion database 231 according to an embodiment.

図２に示されるように、変換データベース２３１には、「Ｎｏ」と「カテゴリ」と「不明瞭語」と「変換語」と「補足語」と「誤変換語」と「音声データ」との対応関係が予め設定されている。なお、図２に示される例において、「Ｎｏ」と「カテゴリ」と「不明瞭語」と「変換語」と「補足語」と「誤変換語」と「音声データ」との各欄に設定された情報は、あくまで一例である。したがって、実施形態では、変換データベース２３１の各欄に図２に示される例とは異なる情報が設定されていてもよい。 As shown in FIG. 2, in the conversion database 231, "No", "category", "unclear word", "conversion word", "supplementary word", "erroneous conversion word", and "voice data" are included. Correspondence is preset. In the example shown in FIG. 2, "No", "category", "unclear word", "converted word", "supplementary word", "misconverted word", and "voice data" are set in each column. The information provided is just an example. Therefore, in the embodiment, information different from the example shown in FIG. 2 may be set in each column of the conversion database 231.

「不明瞭語」の欄には、サーバ装置２００を運用する事業者などにより予め決められた不明瞭語が設定される。図２に示される例では、「不明瞭語」の欄に、４つの不明瞭語が設定されている。 In the "unclear word" column, a predetermined unclear word is set by a business operator or the like that operates the server device 200. In the example shown in FIG. 2, four unclear words are set in the "unclear word" column.

また、「変換語」の欄には、「不明瞭語」の欄に設定された不明瞭語と同一の意味を持つ語としてサーバ装置２００を運用する事業者などにより予め決められた変換語が設定される。図２に示される例では、「不明瞭語」の欄に、上記の４つの不明瞭語に対応した４つの変換語が設定されている。図２に示されるように、実施形態では、不明瞭語と変換語とが同一の語であってもよい。 Further, in the "converted word" column, a converted word predetermined by a business operator operating the server device 200 or the like as a word having the same meaning as the unclear word set in the "unclear word" column is entered. Set. In the example shown in FIG. 2, four conversion words corresponding to the above four unclear words are set in the "unclear word" column. As shown in FIG. 2, in the embodiment, the obscure word and the converted word may be the same word.

なお、図２に示される例では、不明瞭語と変換語とが１対１で設定されているが、実施形態では、１つの不明瞭語に対して複数の変換語が設定されていてもよい。この場合、変換処理部２３０は、複数の変換語を、たとえば予め決められた優先順位に従って１つずつ使用しうる。 In the example shown in FIG. 2, the obscure word and the converted word are set on a one-to-one basis, but in the embodiment, even if a plurality of converted words are set for one unclear word. good. In this case, the conversion processing unit 230 may use a plurality of conversion words one by one according to, for example, a predetermined priority.

また、「補足語」の欄には、「変換語」を補足する語としてサーバ装置２００を運用する事業者などにより予め決められた補足語が設定される。図２に示されるように、実施形態では、特定の語を示すデータのみならず、補足語の有無を表すだけのデータも、補足語として設定されうる。 Further, in the "supplementary word" column, a supplementary word predetermined by a business operator operating the server device 200 or the like is set as a supplementary word to the "converted word". As shown in FIG. 2, in the embodiment, not only the data indicating a specific word but also the data indicating the presence or absence of a complement word can be set as the complement word.

また、「音声データ」の欄には、不明瞭語の典型的な発音を示す所定の音声データが設定される。実施形態において、変換処理部２３０は、サーバ装置２００が端末装置１００から取得する音声データのうち認識率が閾値を下回る形態素に対応した不明瞭区間と、「音声データ」の欄に設定された所定のデータと、の類似度に応じて、不明瞭区間に対応した不明瞭語を特定しうる。 Further, in the "voice data" column, predetermined voice data indicating a typical pronunciation of an unclear word is set. In the embodiment, the conversion processing unit 230 has an unclear section corresponding to a morpheme whose recognition rate is lower than the threshold value among the voice data acquired from the terminal device 100 by the server device 200, and a predetermined setting set in the “voice data” column. Depending on the degree of similarity with the data of, the obscure word corresponding to the obscure section can be identified.

また、「誤変換語」の欄には、「不明瞭語」の欄に設定された不明瞭語と発音が類似する語としての誤変換語が設定される。誤変換語は、たとえば、「変換語」の欄に設定された変換語に基づいて出力される応答が誤っているとユーザに判定された場合に、次の応答の作成時に変換語に代えて使用されうる。 Further, in the "misconverted word" column, an erroneously converted word as a word whose pronunciation is similar to that of the unclear word set in the "unclear word" column is set. The erroneous conversion word is replaced with the conversion word when the next response is created, for example, when the user determines that the response output based on the conversion word set in the "conversion word" field is incorrect. Can be used.

なお、「Ｎｏ」の欄には、便宜上割り当てられた管理番号が設定される。また、「カテゴリ」の欄には、「不明瞭語」の欄に設定された不明瞭語の品詞が設定される。 A control number assigned for convenience is set in the "No" column. Further, in the "category" column, the part of speech of the unclear word set in the "unclear word" column is set.

このように、実施形態において、変換処理部２３０は、解析処理部２２０により解析された音声データが不明瞭区間を含む場合、変換データベース２３１を参照することで、不明瞭区間に対応した不明瞭語の特定と、当該不明瞭語に対応した変換語および補足語を特定する。すなわち、変換処理部２３０は、解析処理部２２０により解析された音声データが不明瞭区間を含む場合、変換データベース２３１を参照し、不明瞭区間と所定の音声データとの類似度に基づいて、不明瞭区間に対応した不明瞭語を特定し、不明瞭語に対応した変換語および補足語を取得する。 As described above, in the embodiment, when the voice data analyzed by the analysis processing unit 220 includes an unclear section, the conversion processing unit 230 refers to the conversion database 231 to provide an unclear word corresponding to the unclear section. And the translations and supplements corresponding to the obscure word. That is, when the voice data analyzed by the analysis processing unit 220 includes an unclear section, the conversion processing unit 230 refers to the conversion database 231 and is unclear based on the degree of similarity between the unclear section and the predetermined voice data. Identify the obscure words that correspond to the clear sections, and acquire the converted words and supplementary words that correspond to the unclear words.

ここで、実施形態において、変換処理部２３０は、変換データベース２３１を用いた上記の処理の負担を軽減するために、変換データベース２３１に先立って、次の図３に示されるようなユーザ音声データベース２３２を参照しうる。 Here, in the embodiment, in order to reduce the burden of the above processing using the conversion database 231, the conversion processing unit 230 precedes the conversion database 231 and uses the user voice database 232 as shown in FIG. 3 below. Can be referred to.

図３は、実施形態にかかるユーザ音声データベース２３２の例を示した例示的かつ模式的な図である。 FIG. 3 is an exemplary and schematic diagram showing an example of the user voice database 232 according to the embodiment.

図３に示されるように、ユーザ音声データベース２３２には、「ユーザＩＤ」と「音声データ」と「不明瞭語」との対応関係が予め設定されている。なお、図３に示される例において、「ユーザＩＤ」と「音声データ」と「不明瞭語」との各欄に設定された情報は、あくまで一例である。したがって、実施形態では、ユーザ音声データベース２３２の各欄に図３に示される例とは異なる情報が設定されていてもよい。 As shown in FIG. 3, the correspondence relationship between the "user ID", the "voice data", and the "unclear word" is preset in the user voice database 232. In the example shown in FIG. 3, the information set in each column of "user ID", "voice data", and "unclear word" is only an example. Therefore, in the embodiment, information different from the example shown in FIG. 3 may be set in each column of the user voice database 232.

「ユーザＩＤ」の欄には、端末装置１００のユーザを識別するための情報としてのユーザＩＤが設定される。ユーザＩＤは、ユーザごとに適宜割り当てられる。 In the "user ID" field, a user ID as information for identifying the user of the terminal device 100 is set. The user ID is appropriately assigned to each user.

「不明瞭語」の欄には、「ユーザＩＤ」の欄に設定されたユーザＩＤで識別されるユーザが過去に発話した不明瞭語が設定される。図３に示される例では、「不明瞭語」の欄に、４つの不明瞭語が設定されている。 In the "unclear word" column, an unclear word spoken in the past by the user identified by the user ID set in the "user ID" column is set. In the example shown in FIG. 3, four unclear words are set in the "unclear word" column.

また、「音声データ」の欄には、不明瞭語のユーザごとの発音を示すユーザ音声データが設定される。ユーザ音声データは、変換データベース２３１に予め設定された所定の音声データと異なり、ユーザの発話履歴に基づいている。 Further, in the "voice data" column, user voice data indicating the pronunciation of the unclear word for each user is set. The user voice data is different from predetermined voice data preset in the conversion database 231 and is based on the user's utterance history.

実施形態において、変換処理部２３０は、解析処理部２２０により解析された音声データが不明瞭区間を含む場合、変換データベース２３１を用いた不明瞭語の特定に先立って、上記のようなユーザ音声データベース２３２を用いた不明瞭語の特定を試みる。すなわち、変換処理部２３０は、解析処理部２２０により解析された音声データが不明瞭区間を含む場合、不明瞭区間とユーザ音声データとの類似度に基づいて不明瞭語を特定するように、変換データベース２３１に先立ってユーザ音声データベース２３２を参照する。そして、変換処理部２３０は、ユーザ音声データベース２３２から不明瞭語が特定された場合、ユーザ音声データベース２３２から特定された不明瞭語に基づいて、変換データベース２３１から変換語および補足語を取得する。 In the embodiment, when the voice data analyzed by the analysis processing unit 220 includes an unclear section, the conversion processing unit 230 performs the user voice database as described above prior to the identification of the obscure word using the conversion database 231. Attempts to identify obscure words using 232. That is, when the voice data analyzed by the analysis processing unit 220 includes an unclear section, the conversion processing unit 230 converts the unclear word so as to specify the unclear word based on the similarity between the unclear section and the user voice data. The user voice database 232 is referred to prior to the database 231. Then, when the unclear word is specified from the user voice database 232, the conversion processing unit 230 acquires the converted word and the supplementary word from the conversion database 231 based on the unclear word specified from the user voice database 232.

図１に戻り、応答処理部２４０は、変換処理部２３０により変換語が取得された場合、当該変換語を不明瞭語の代替として用いて、サーバ装置２００が端末装置１００から受信した音声データに対する応答を出力する。また、応答処理部２４０は、変換処理部２３０により変換語とともに補足語も取得された場合、変換語に加えて補足語をさらに用いて、応答を出力する。 Returning to FIG. 1, when the conversion processing unit 230 acquires the conversion word, the response processing unit 240 uses the conversion word as a substitute for the obscure word for the voice data received from the terminal device 100 by the server device 200. Output the response. Further, when the conversion processing unit 230 acquires the supplementary word together with the conversion word, the response processing unit 240 outputs a response by using the supplementary word in addition to the conversion word.

なお、検索処理部２５０は、応答処理部２４０が応答を出力するために必要な情報を検索する。検索処理部２５０による検索の結果に基づいて応答処理部２４０により出力された応答は、通信処理部２１０によって端末装置１００に送信される。 The search processing unit 250 searches for information necessary for the response processing unit 240 to output a response. The response output by the response processing unit 240 based on the search result by the search processing unit 250 is transmitted to the terminal device 100 by the communication processing unit 210.

以上の構成に基づき、実施形態にかかるサーバ装置２００は、端末装置１００からの音声データの受信に応じて、次の図４に示されるような一連の処理を実行する。 Based on the above configuration, the server device 200 according to the embodiment executes a series of processes as shown in FIG. 4 below in response to receiving voice data from the terminal device 100.

図４は、実施形態にかかるサーバ装置２００が端末装置１００からの音声データの受信に応じて実行する一連の処理を示した例示的なフローチャートである。 FIG. 4 is an exemplary flowchart showing a series of processes executed by the server device 200 according to the embodiment in response to receiving voice data from the terminal device 100.

図４に示されるように、実施形態では、まず、Ｓ４０１において、解析処理部２２０は、通信処理部２１０が端末装置１００から受信した音声データを取得する。 As shown in FIG. 4, in the embodiment, first, in S401, the analysis processing unit 220 acquires the voice data received from the terminal device 100 by the communication processing unit 210.

そして、Ｓ４０２において、解析処理部２２０は、Ｓ４０１で取得された音声データに対して解析データベース２２１を用いた解析を実行し、音声データのうち認識率が閾値を下回る形態素を示す不明瞭区間が存在するか否かを判定する。なお、不明瞭区間は、複数存在しうる。 Then, in S402, the analysis processing unit 220 executes an analysis using the analysis database 221 on the voice data acquired in S401, and there is an unclear section indicating a morpheme whose recognition rate is below the threshold value in the voice data. Determine whether or not to do so. There may be a plurality of unclear sections.

Ｓ４０２において、不明瞭区間が存在しないと判定された場合、そのまま処理が終了する。しかしながら、Ｓ４０２において、不明瞭区間が存在すると判定された場合、Ｓ４０３に処理が進む。 If it is determined in S402 that there is no unclear section, the process ends as it is. However, if it is determined in S402 that an unclear section exists, the process proceeds to S403.

そして、Ｓ４０３において、変換処理部２３０は、不明瞭区間に基づいてユーザ音声データベース２３２を参照する。 Then, in S403, the conversion processing unit 230 refers to the user voice database 232 based on the unclear section.

そして、Ｓ４０４において、変換処理部２３０は、不明瞭区間と類似した、すなわち不明瞭区間との類似度が一定以上のユーザ音声データがユーザ音声データベース２３２内に存在するか否かを判定する。 Then, in S404, the conversion processing unit 230 determines whether or not the user voice data similar to the unclear section, that is, having a degree of similarity to the unclear section or more is present in the user voice database 232.

Ｓ４０４において、不明瞭区間と類似したユーザ音声データがユーザ音声データベース２３２内に存在すると判定された場合、Ｓ４０５に処理が進む。そして、Ｓ４０５において、変換処理部２３０は、不明瞭区間と類似したユーザ音声データに対応した不明瞭語をユーザ音声データベース２３２から特定する。 If it is determined in S404 that the user voice data similar to the unclear section exists in the user voice database 232, the process proceeds to S405. Then, in S405, the conversion processing unit 230 identifies an unclear word corresponding to the user voice data similar to the unclear section from the user voice database 232.

一方、Ｓ４０４において、不明瞭区間と類似したユーザ音声データがユーザ音声データベース２３２内に存在しないと判定された場合、Ｓ４０６に処理が進む。そして、Ｓ４０６において、変換処理部２３０は、ユーザ音声データをユーザ音声データベース２３２に新たに追加するように、ユーザ音声データベース２３２を更新する。 On the other hand, if it is determined in S404 that the user voice data similar to the unclear section does not exist in the user voice database 232, the process proceeds to S406. Then, in S406, the conversion processing unit 230 updates the user voice database 232 so as to newly add the user voice data to the user voice database 232.

Ｓ４０５またはＳ４０６の処理が完了すると、Ｓ４０７に処理が進む。そして、Ｓ４０７において、変換処理部２３０は、Ｓ４０５またはＳ４０６の処理の結果に基づいて、変換データベース２３１を参照する。 When the processing of S405 or S406 is completed, the processing proceeds to S407. Then, in S407, the conversion processing unit 230 refers to the conversion database 231 based on the processing result of S405 or S406.

そして、Ｓ４０８において、変換処理部２３０は、該当する不明瞭語が変換データベース２３１内に存在するか否かを判定する。より具体的に、変換処理部２３０は、Ｓ４０５を経たＳ４０８においては、Ｓ４０５で特定された不明瞭語と一致する不明瞭語が変換データベース２３１内に存在するか否かを判定し、Ｓ４０６を経たＳ４０８においては、不明瞭区間と類似した音声データに対応した不明瞭語が変換データベース２３１内に存在するか否かを判定する。 Then, in S408, the conversion processing unit 230 determines whether or not the corresponding unclear word exists in the conversion database 231. More specifically, the conversion processing unit 230 determines whether or not an unclear word matching the unclear word specified in S405 exists in the conversion database 231 in S408 that has passed through S405, and has passed through S406. In S408, it is determined whether or not an unclear word corresponding to the voice data similar to the unclear section exists in the conversion database 231.

Ｓ４０８において、該当する不明瞭語が存在しないと判定された場合、Ｓ４０１で取得された音声データの意味を適切に解釈できないので、端末装置１００に適切な応答を返すことができない。したがって、この場合、応答処理部２４０は、ユーザの再発話を促す通知を端末装置１００への応答として出力する。応答は、通信処理部２１０を介して端末装置１００に送信され、端末装置１００の出力処理部１３０を介してユーザに出力される。そして、処理が終了する。 If it is determined in S408 that the corresponding unclear word does not exist, the meaning of the voice data acquired in S401 cannot be properly interpreted, and therefore an appropriate response cannot be returned to the terminal device 100. Therefore, in this case, the response processing unit 240 outputs a notification prompting the user to speak again as a response to the terminal device 100. The response is transmitted to the terminal device 100 via the communication processing unit 210, and is output to the user via the output processing unit 130 of the terminal device 100. Then, the process ends.

一方、Ｓ４０８において、該当する不明瞭語が存在すると判定された場合、Ｓ４０１で取得された音声データの意味を適切に解釈できるので、端末装置１００に適切な応答を返すことができると見込まれる。したがって、この場合、そのまま処理が終了することなく、Ｓ４１０に処理が進む。 On the other hand, when it is determined in S408 that the corresponding unclear word exists, the meaning of the voice data acquired in S401 can be appropriately interpreted, and it is expected that an appropriate response can be returned to the terminal device 100. Therefore, in this case, the process proceeds to S410 without ending the process as it is.

そして、Ｓ４１０において、変換処理部２３０は、該当する不明瞭語に対応した変換語および補足語を変換データベース２３１から取得する。 Then, in S410, the conversion processing unit 230 acquires the conversion word and the complement word corresponding to the corresponding unclear word from the conversion database 231.

そして、Ｓ４１１において、変換処理部２３０は、全ての不明瞭区間に対応した全ての不明瞭語が特定済みであるか否かを判定する。 Then, in S411, the conversion processing unit 230 determines whether or not all the unclear words corresponding to all the unclear sections have been specified.

Ｓ４１１において、一部の不明瞭語が特定されていない判定された場合、次の不明瞭語の特定のため、Ｓ４０３に処理が戻る。しかしながら、Ｓ４１１において、全ての不明瞭語が特定済みであると判定された場合、Ｓ４１２に処理が進む。 If it is determined in S411 that some unclear words have not been specified, the process returns to S403 to specify the next unclear word. However, if it is determined in S411 that all the unclear words have been specified, the process proceeds to S412.

そして、Ｓ４１２において、応答処理部２４０は、Ｓ４１０で取得された変換語および補足語を用いて、Ｓ４０１で取得された音声データに対する応答を出力する。より具体的に、応答処理部２４０は、変換語を不明瞭語の代替語として用いるとともに、変換語を補足語の付け足しとして用いることで、応答を出力する。このとき、応答処理部２４０は、必要に応じて検索処理部２５０に検索を実行させ、当該検索の結果を利用して応答を出力しうる。なお、応答処理部２４０により出力された応答は、通信処理部２１０を介して端末装置１００に送信され、端末装置１００の出力処理部１３０を介してユーザに出力される。 Then, in S412, the response processing unit 240 outputs a response to the voice data acquired in S401 by using the converted word and the complement acquired in S410. More specifically, the response processing unit 240 outputs a response by using the converted word as a substitute word for the unclear word and using the converted word as an addition to the complement word. At this time, the response processing unit 240 may cause the search processing unit 250 to execute the search as necessary, and output the response using the result of the search. The response output by the response processing unit 240 is transmitted to the terminal device 100 via the communication processing unit 210, and is output to the user via the output processing unit 130 of the terminal device 100.

そして、Ｓ４１３において、応答処理部２４０は、Ｓ４１２で出力された応答に対する端末装置１００のユーザからの訂正の要求が通信処理部２１０を介して受信されたか否かを判定する。 Then, in S413, the response processing unit 240 determines whether or not the request for correction from the user of the terminal device 100 to the response output in S412 is received via the communication processing unit 210.

Ｓ４１３において、訂正の要求が受信されたと判定された場合、Ｓ４１４に処理が進む。そして、Ｓ４１４において、応答処理部２４０は、たとえば変換データベース２３１の「誤認識語」の欄などを参照し、ユーザの訂正に応じた次の応答を出力する。そして、Ｓ４１３に処理が戻る。 If it is determined in S413 that the request for correction has been received, the process proceeds to S414. Then, in S414, the response processing unit 240 refers to, for example, the "misrecognized word" column of the conversion database 231 and outputs the next response according to the user's correction. Then, the process returns to S413.

一方、Ｓ４１３において、訂正の要求が受信されなかったと判定された場合、Ｓ４１５に処理が進む。そして、Ｓ４１５において、変換処理部２３０は、現在の応答の作成に用いた不明瞭語とユーザ音声データとに基づいて、ユーザ音声データベース４１５を更新する。そして、処理が終了する。 On the other hand, if it is determined in S413 that the request for correction has not been received, the process proceeds to S415. Then, in S415, the conversion processing unit 230 updates the user voice database 415 based on the obscure word used for creating the current response and the user voice data. Then, the process ends.

以上の処理に基づき、実施形態では、端末装置１００のユーザとサーバ装置２００との間で、たとえば次の図５に示されるような会話が実行されうる。 Based on the above processing, in the embodiment, a conversation as shown in FIG. 5, for example, can be executed between the user of the terminal device 100 and the server device 200.

図５は、実施形態において端末装置１００のユーザとサーバ装置２００との間で実行される会話の一例を示した例示的かつ模式的な図である。 FIG. 5 is an exemplary and schematic diagram showing an example of a conversation performed between a user of a terminal device 100 and a server device 200 in an embodiment.

図５に示される例では、まず、端末装置１００のユーザにより、「１月の祝日をおしえて」という発話が実行される。この発話は、「１月」という、不明瞭語となりうる不明瞭区間を含んでいる。 In the example shown in FIG. 5, first, the user of the terminal device 100 executes the utterance "Tell me a holiday in January". This utterance contains an obscure section of "January," which can be an obscure word.

ここで、サーバ装置２００が「１月」という上記の不明瞭区間が「いちがつ」という不明瞭語であると特定した場合を考える。この場合、サーバ装置２００は、変換データベース２３１（図２参照）に基づいて、「いちがつ」という変換語と、「睦月」という補足語とを取得する。したがって、この場合、サーバ装置２００は、「１月睦月の祝日は元日と成人式です」という応答（「１月」は「いちがつ」と発音）を実行する。なお、「元日」および「成人式」という情報は、検索処理部２５０による検索の結果に基づいて取得される。 Here, consider a case where the server device 200 identifies the above-mentioned unclear section of "January" as the unclear word of "Ichigatsu". In this case, the server device 200 acquires the conversion word "Ichigatsu" and the complement word "Muzuki" based on the conversion database 231 (see FIG. 2). Therefore, in this case, the server device 200 executes the response "January Mutsuki holidays are New Year's Day and seijin-shiki" ("January" is pronounced "Ichigatsu"). The information "New Year's Day" and "seijin-shiki" is acquired based on the result of the search by the search processing unit 250.

図５に示される例は、サーバ装置２００の最初の解釈が正しい例に該当する。したがって、この場合、ユーザは、無応答（承認を示す何らかの応答であってもよい）により、会話に違和感が無い旨をサーバ装置２００に示す。そして、会話が終了する。 The example shown in FIG. 5 corresponds to an example in which the first interpretation of the server device 200 is correct. Therefore, in this case, the user indicates to the server device 200 that there is no discomfort in the conversation by no response (may be some response indicating approval). And the conversation ends.

一方、実施形態では、次の図６に示される例のような、サーバ装置２００の最初の解釈が正しくない例も想定される。 On the other hand, in the embodiment, an example in which the initial interpretation of the server device 200 is incorrect, such as the example shown in FIG. 6 below, is also assumed.

図６は、実施形態において端末装置１００のユーザとサーバ装置２００との間で実行される会話の図５とは異なる他の一例を示した例示的かつ模式的な図である。 FIG. 6 is an exemplary and schematic diagram showing another example of a conversation performed between a user of a terminal device 100 and a server device 200 in an embodiment different from that of FIG.

図６に示される例では、まず、端末装置１００のユーザにより、「１月の祝日をおしえて」という発話が実行される。この発話は、「１月」という不明瞭区間を含んでいる。ここまでは図５に示される例と同様である。 In the example shown in FIG. 6, first, the user of the terminal device 100 executes the utterance "Tell me a holiday in January". This utterance contains an unclear section called "January". Up to this point, it is the same as the example shown in FIG.

ここで、図６に示される例では、図５に示される例と異なり、「１月」という不明瞭区間が、「しちがつ」という不明瞭語であると特定される。この場合、サーバ装置２００は、変換データベース２３１（図２参照）に基づいて、「なながつ」という変換語と、「文月」という補足語とを取得する。したがって、この場合、サーバ装置２００は、「７月文月の祝日は海の日とスポーツの日です」という応答（「７月」は「なながつ」と発音）を実行する。なお、「元日」および「成人式」という情報は、検索処理部２５０による検索の結果に基づいて取得される。 Here, in the example shown in FIG. 6, unlike the example shown in FIG. 5, the unclear section "January" is specified as the unclear word "shichigatsu". In this case, the server device 200 acquires the conversion word "Nagatsu" and the complement word "Buntsuki" based on the conversion database 231 (see FIG. 2). Therefore, in this case, the server device 200 executes the response "July holidays are Marine Day and Sports Day" ("July" is pronounced "Nagatsu"). The information "New Year's Day" and "seijin-shiki" is acquired based on the result of the search by the search processing unit 250.

上記の会話において、ユーザが発話したのは「１月」であり、「７月」ではない。したがって、この場合、ユーザは、「７月は違う」というような、サーバ装置２００の解釈が正しくないことを示す発話と行い、サーバ装置２００に応答の訂正を要求する。 In the above conversation, the user spoke "January", not "July". Therefore, in this case, the user makes an utterance indicating that the interpretation of the server device 200 is incorrect, such as "July is different", and requests the server device 200 to correct the response.

すると、サーバ装置２００は、変換データベース２３１（図２参照）に基づいて、「いちがつ」という誤変換語を取得する。そして、サーバ装置２００は、当該誤変換語を不明瞭語として用いて、変換データベース２３１から変換語および補足語を取得する。これにより、サーバ装置２００は、「申し訳ありませんでした。１月睦月の祝日は元日と成人式です。」という応答を（「１月」は「いちがつ」と発音）を実行する。なお、「元日」および「成人式」という情報は、検索処理部２５０による検索の結果に基づいて取得される。 Then, the server device 200 acquires the erroneous conversion word "Ichigatsu" based on the conversion database 231 (see FIG. 2). Then, the server device 200 acquires the converted word and the supplementary word from the conversion database 231 by using the erroneously converted word as an unclear word. As a result, the server device 200 executes the response "I'm sorry. The holidays of January Mutsuki are New Year's Day and the adult ceremony." ("January" is pronounced "Ichigatsu"). The information "New Year's Day" and "seijin-shiki" is acquired based on the result of the search by the search processing unit 250.

上記の会話において、サーバ装置２００が再度行った応答は正しい。したがって、この場合、ユーザは、無応答（承認を示す何らかの応答であってもよい）により、会話に違和感が無い旨をサーバ装置２００に示す。そして、会話が終了する。 In the above conversation, the response made by the server device 200 again is correct. Therefore, in this case, the user indicates to the server device 200 that there is no discomfort in the conversation by no response (may be some response indicating approval). And the conversation ends.

最後に、実施形態にかかるサーバ装置２００のハードウェア構成について説明する。実施形態にかかるサーバ装置２００は、たとえば次の図７に示されるようなハードウェア構成を有するコンピュータ７００として構成される。 Finally, the hardware configuration of the server device 200 according to the embodiment will be described. The server device 200 according to the embodiment is configured as, for example, a computer 700 having a hardware configuration as shown in FIG. 7 below.

図７は、実施形態にかかるサーバ装置２００を構成するコンピュータ７００のハードウェア構成の例を示した例示的かつ模式的なブロック図である。 FIG. 7 is an exemplary and schematic block diagram showing an example of the hardware configuration of the computer 700 constituting the server device 200 according to the embodiment.

図７に示されるように、コンピュータ７００は、プロセッサ７１０と、メモリ７２０と、ストレージ７３０と、入出力インターフェース（Ｉ／Ｆ）７４０と、通信インターフェース（Ｉ／Ｆ）７５０と、を備えている。これらのハードウェアは、バス７６０に接続されている。 As shown in FIG. 7, the computer 700 includes a processor 710, a memory 720, a storage 730, an input / output interface (I / F) 740, and a communication interface (I / F) 750. These hardware are connected to bus 760.

プロセッサ７１０は、たとえばＣＰＵ（Central Processing Unit）として構成され、コンピュータ７００の各部の動作を統括的に制御する。 The processor 710 is configured as, for example, a CPU (Central Processing Unit), and controls the operation of each part of the computer 700 in an integrated manner.

メモリ７２０は、たとえばＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）を含み、プロセッサ７１０により実行されるプログラムなどの各種のデータの揮発的または不揮発的な記憶、およびプロセッサ７１０がプログラムを実行するための作業領域の提供などを実現する。 The memory 720 includes, for example, a ROM (Read Only Memory) and a RAM (Random Access Memory), which is a volatile or non-volatile storage of various data such as a program executed by the processor 710, and the processor 710 executes the program. To provide a work area for the purpose.

ストレージ７３０は、たとえばＨＤＤ（Hard Disk Drive）またはＳＳＤ（Solid State Drive）を含み、各種のデータを不揮発的に記憶する。 The storage 730 includes, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various data in a non-volatile manner.

入出力インターフェース７４０は、たとえばキーボードおよびマウスなどのような入力装置（不図示）からコンピュータ７００へのデータの入力と、たとえばコンピュータ７００からディスプレイおよびスピーカなどのような出力装置（不図示）へのデータの出力と、を制御する。 The input / output interface 740 inputs data from an input device (not shown) such as a keyboard and mouse to the computer 700, and data from the computer 700 to an output device (not shown) such as a display and a speaker. Output and control.

通信インターフェース７５０は、コンピュータ７００が他の装置と通信を実行することを可能にする。 The communication interface 750 allows the computer 700 to perform communication with other devices.

実施形態において、サーバ装置２００が有する図１に示される各構成は、プロセッサ７１０がメモリ７２０またはストレージ７３０などに記憶された情報処理プログラムを実行した結果として、ハードウェアとソフトウェアとの協働による機能モジュールとして実現される。ただし、実施形態では、図１に示される機能モジュール群のうち少なくとも一部が、専用のハードウェアのみによって実現されてもよい。 In the embodiment, each configuration shown in FIG. 1 of the server device 200 is a function of cooperation between hardware and software as a result of the processor 710 executing an information processing program stored in a memory 720, a storage 730, or the like. Realized as a module. However, in the embodiment, at least a part of the functional module group shown in FIG. 1 may be realized only by dedicated hardware.

なお、上述した実施形態にかかる情報処理プログラムは、必ずしもメモリ７２０またはストレージ７３０に予め記憶されている必要はない。たとえば、上述した実施形態にかかる情報処理プログラムは、フレキシブルディスク（ＦＤ）のような各種の磁気ディスク、またはＤＶＤ（Digital Versatile Disk）のような各種の光ディスクなどといった、コンピュータで読み取り可能な記録媒体にインストール可能な形式または実行可能な形式で記録されたコンピュータプログラムプロダクトとして提供されてもよい。 The information processing program according to the above-described embodiment does not necessarily have to be stored in the memory 720 or the storage 730 in advance. For example, the information processing program according to the above-described embodiment may be used on a computer-readable recording medium such as various magnetic disks such as a flexible disk (FD) or various optical disks such as a DVD (Digital Versatile Disk). It may be provided as a computer program product recorded in an installable or executable format.

また、上述した実施形態にかかる情報処理プログラムは、インターネットなどのネットワーク経由で提供または配布されてもよい。すなわち、上述した実施形態にかかる情報処理プログラムは、インターネットなどのネットワークに接続されたコンピュータ上に格納された状態で、ネットワーク経由でのダウンロードを受け付ける、といった形で提供されてもよい。 Further, the information processing program according to the above-described embodiment may be provided or distributed via a network such as the Internet. That is, the information processing program according to the above-described embodiment may be provided in a state of being stored on a computer connected to a network such as the Internet and accepting downloads via the network.

なお、実施形態では、端末装置１００も、サーバ装置２００と同様に、図７に示されるようなハードウェア構成を有してコンピュータ７００として構成されうる。したがって、実施形態の変形例として、サーバ装置２００の機能モジュールの少なくとも一部が端末装置１００に実装された構成も考えられる。したがって、実施形態の変形例として、端末装置１００が本開示の「情報処理装置」に該当する例も考えられるし、端末装置１００とサーバ装置２００との組み合わせとしての情報処理システムが本開示の「情報処理装置」に該当する例も考えられる。 In the embodiment, the terminal device 100 can also be configured as the computer 700 with the hardware configuration as shown in FIG. 7, like the server device 200. Therefore, as a modification of the embodiment, a configuration in which at least a part of the functional modules of the server device 200 is mounted on the terminal device 100 can be considered. Therefore, as a modification of the embodiment, an example in which the terminal device 100 corresponds to the "information processing device" of the present disclosure can be considered, and the information processing system as a combination of the terminal device 100 and the server device 200 is the "information processing device" of the present disclosure. An example corresponding to "information processing device" is also conceivable.

以上説明したように、実施形態にかかるサーバ装置２００は、解析処理部２２０と、変換処理部２３０と、を備えている。解析処理部２２０は、端末装置１００から受信された音声データの内容を示すデータとして音声認識により得られるテキストデータに含まれる形態素を、当該形態素ごとの音声認識の確からしさを示す認識率とともに取得する。変換処理部２３０は、認識率が閾値を下回る形態素から、発音が他の語と類似している不明瞭語が特定された場合、不明瞭語から、不明瞭語と同一の意味を持つ語として予め設定された変換語を取得する。 As described above, the server device 200 according to the embodiment includes an analysis processing unit 220 and a conversion processing unit 230. The analysis processing unit 220 acquires morphemes included in the text data obtained by voice recognition as data indicating the contents of voice data received from the terminal device 100, together with a recognition rate indicating the certainty of voice recognition for each morpheme. .. When an unclear word whose pronunciation is similar to another word is identified from a morpheme whose recognition rate is lower than the threshold, the conversion processing unit 230 changes the unclear word into a word having the same meaning as the unclear word. Get the preset conversion word.

上記のような構成によれば、不明瞭語と同一の意味を持つ変換語を用いて、音声認識の際に発生しうる不明瞭語を適切に処理することができる。 According to the above configuration, it is possible to appropriately process an unclear word that may occur during speech recognition by using a converted word having the same meaning as the unclear word.

ここで、実施形態にかかるサーバ装置２００は、変換語が取得された場合、変換語を不明瞭語の代替として用いて、音声データに対する応答を出力する応答処理部２４０をさらに備えている。このような構成によれば、変換語に基づく適切な応答を出力することができる。 Here, the server device 200 according to the embodiment further includes a response processing unit 240 that outputs a response to voice data by using the converted word as a substitute for the unclear word when the converted word is acquired. With such a configuration, it is possible to output an appropriate response based on the converted word.

また、実施形態において、変換処理部２３０は、変換語が取得された場合、変換語を補足する語として予め設定された補足語をさらに取得する。そして、応答処理部２４０は、変換語に加えて補足語をさらに用いて、応答を出力する。このような構成によれば、変換語および補足語の両方に基づくさらに適切な応答を出力することができる。 Further, in the embodiment, when the conversion word is acquired, the conversion processing unit 230 further acquires a complement word preset as a word that supplements the conversion word. Then, the response processing unit 240 outputs the response by further using the complement word in addition to the converted word. With such a configuration, it is possible to output a more appropriate response based on both the translated word and the complement word.

より具体的に、実施形態にかかるサーバ装置２００は、不明瞭語と、変換語と、補足語と、不明瞭語の発音を示す所定の音声データと、の対応関係が予め設定された変換データベース２３１を備えている。そして、変換処理部２３０は、変換データベース２３１を参照し、端末装置１００から受信された音声データのうち認識率が閾値を下回る形態素を示す不明瞭区間と、変換データベース２３１に設定された所定の音声データと、の類似度に基づいて、不明瞭区間に対応した不明瞭語を特定し、不明瞭語に対応した変換語および補足語を取得する。このような構成によれば、変換データベース２３１に基づいて、不明瞭語の特定および不明瞭語に対応した変換語および補足語の取得を容易に実行することができる。 More specifically, the server device 200 according to the embodiment is a conversion database in which a correspondence relationship between an unclear word, a converted word, a complementary word, and a predetermined voice data indicating the pronunciation of the unclear word is preset. It is equipped with 231. Then, the conversion processing unit 230 refers to the conversion database 231 and has an unclear section indicating a morpheme whose recognition rate is below the threshold among the voice data received from the terminal device 100, and a predetermined voice set in the conversion database 231. Based on the similarity with the data, the obscure word corresponding to the unclear section is identified, and the converted word and the supplementary word corresponding to the unclear word are acquired. According to such a configuration, it is possible to easily identify the unclear word and acquire the converted word and the complement word corresponding to the unclear word based on the conversion database 231.

また、実施形態にかかるサーバ装置２００は、不明瞭語と、不明瞭語のユーザごとの発音を示すユーザ音声データと、の対応関係が予め設定されたユーザ音声データベース２３２をさらに備えている。そして、変換処理部２３０は、不明瞭区間とユーザ音声データとの類似度に基づいて不明瞭語を特定するように変換データベース２３１に先立ってユーザ音声データベース２３２を参照する。そして、変換処理部２３０は、ユーザ音声データベース２３２から不明瞭語が特定された場合、ユーザ音声データベース２３２から特定された不明瞭語に基づいて、変換データベース２３１から変換語および補足語を取得する。このような構成によれば、変換データベース２３１とユーザ音声データベース２３２との２種類のデータベースに基づいて、不明瞭語の特定および不明瞭語に対応した変換語および補足語の取得をさらに容易に実行することができる。 Further, the server device 200 according to the embodiment further includes a user voice database 232 in which a correspondence relationship between the unclear word and the user voice data indicating the pronunciation of the unclear word for each user is preset. Then, the conversion processing unit 230 refers to the user voice database 232 prior to the conversion database 231 so as to specify the unclear word based on the similarity between the unclear section and the user voice data. Then, when the unclear word is specified from the user voice database 232, the conversion processing unit 230 acquires the converted word and the supplementary word from the conversion database 231 based on the unclear word specified from the user voice database 232. With such a configuration, it is easier to identify the unclear word and acquire the converted word and the complement word corresponding to the unclear word based on the two types of databases, the conversion database 231 and the user voice database 232. can do.

以上、本開示の実施形態を説明したが、上述した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。上述した新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上述した実施形態およびその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although the embodiments of the present disclosure have been described above, the above-described embodiments are presented as examples and are not intended to limit the scope of the invention. The novel embodiment described above can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The above-described embodiments and modifications thereof are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

２００サーバ装置（情報処理装置）
２２０解析処理部（取得処理部）
２３０変換処理部
２３１変換データベース
２３２ユーザ音声データベース
２４０応答処理部 200 Server device (information processing device)
220 Analysis processing unit (acquisition processing unit)
230 Conversion processing unit 231 Conversion database 232 User voice database 240 Response processing unit

本開示の一例としての情報処理装置は、音声データの内容を示すデータとして音声認識により得られるテキストデータに含まれる形態素を、当該形態素ごとの音声認識の確からしさを示す認識率とともに取得する取得処理部と、認識率が閾値を下回る形態素から、発音が他の語と類似している不明瞭語が特定された場合、不明瞭語から、不明瞭語と同一の意味を持つ語として予め設定された変換語を取得し、変換語が取得された場合、変換語を補足する語として予め設定された補足語をさらに取得する変換処理部と、変換語が取得された場合、変換語を不明瞭語の代替として用いると共に、変換語に加えて補足語をさらに用いて、音声データに対する応答を出力する応答処理部と、不明瞭語と、変換語と、補足語と、不明瞭語の発音を示す所定の音声データと、の対応関係が予め設定された変換データベースと、を備え、変換処理部は、変換データベースを参照し、音声データのうち認識率が閾値を下回る形態素を示す不明瞭区間と、変換データベースに設定された所定の音声データと、の類似度に基づいて、不明瞭区間に対応した不明瞭語を特定し、不明瞭語に対応した変換語および補足語を取得する。 The information processing apparatus as an example of the present disclosure acquires morphological elements contained in text data obtained by voice recognition as data indicating the contents of voice data together with a recognition rate indicating the certainty of voice recognition for each morphological element. When an unclear word whose pronunciation is similar to another word is identified from the part and the morphological element whose recognition rate is below the threshold, the unclear word is preset as a word having the same meaning as the unclear word. When the converted word is acquired, the conversion processing unit which further acquires the supplementary word preset as the supplementary word for the converted word, and when the converted word is acquired, the converted word is unclear. A response processing unit that outputs a response to voice data by using supplementary words in addition to converted words as a substitute for words, and pronunciation of unclear words, converted words, supplementary words, and unclear words. A conversion database having a preset correspondence relationship with the predetermined voice data to be shown is provided, and the conversion processing unit refers to the conversion database and has an unclear section indicating a morphology whose recognition rate is below the threshold in the voice data. , The unclear word corresponding to the unclear section is specified based on the similarity with the predetermined voice data set in the conversion database, and the converted word and the supplementary word corresponding to the unclear word are acquired.

Claims

An acquisition processing unit that acquires morphemes contained in text data obtained by voice recognition as data indicating the contents of voice data together with a recognition rate indicating the certainty of the voice recognition for each morpheme.
When an unclear word whose pronunciation is similar to another word is identified from the morpheme whose recognition rate is below the threshold, the unclear word is preset as a word having the same meaning as the unclear word. A conversion processing unit that acquires the converted words, and
An information processing device equipped with.

When the converted word is acquired, the converted word is used as a substitute for the unclear word, and a response processing unit for outputting a response to the voice data is further provided.
The information processing apparatus according to claim 1.

When the conversion word is acquired, the conversion processing unit further acquires a complement word preset as a word that supplements the conversion word.
The response processing unit outputs the response by further using the complement in addition to the converted word.
The information processing apparatus according to claim 2.

Further provided with a conversion database in which a correspondence relationship between the unclear word, the converted word, the complement word, and a predetermined voice data indicating the pronunciation of the unclear word is preset.
The conversion processing unit refers to the conversion database, and is similar to the unclear section showing the morpheme whose recognition rate is below the threshold in the voice data and the predetermined voice data set in the conversion database. Based on the degree, the unclear word corresponding to the unclear section is specified, and the converted word and the supplementary word corresponding to the unclear word are acquired.
The information processing apparatus according to claim 3.

Further provided with a user voice database in which a correspondence relationship between the unclear word and the user voice data indicating the pronunciation of the unclear word for each user is preset.
The conversion processing unit refers to the user voice database prior to the conversion database so as to identify the unclear word based on the similarity between the unclear section and the user voice data, and from the user voice database. When the obscure word is identified, the converted word and the supplementary word are acquired from the converted database based on the obscured word identified from the user voice database.
The information processing apparatus according to claim 4.

To acquire morphemes contained in text data obtained by voice recognition as data indicating the contents of voice data together with a recognition rate indicating the certainty of the voice recognition for each morpheme.
When an unclear word whose pronunciation is similar to another word is identified from the morpheme whose recognition rate is below the threshold, the unclear word is preset as a word having the same meaning as the unclear word. To get the translated word,
An information processing program that allows a computer to execute.