JP6568813B2

JP6568813B2 - Information processing apparatus, voice recognition method, and program

Info

Publication number: JP6568813B2
Application number: JP2016032218A
Authority: JP
Inventors: 一比良松井; 誠司河村; 町田　健一; 健一町田
Original assignee: NTT TechnoCross Corp
Current assignee: NTT TechnoCross Corp
Priority date: 2016-02-23
Filing date: 2016-02-23
Publication date: 2019-08-28
Anticipated expiration: 2036-02-23
Also published as: JP2017151210A

Description

本発明は、情報処理装置、音声認識方法及びプログラムに関する。 The present invention relates to an information processing apparatus, a speech recognition method, and a program.

コンタクトセンターやカスタマセンター等のコールセンターにおいて、オペレータと顧客との会話を文字（テキスト）に変換し、変換された文字（テキスト）を用いてオペレータの対応状況を分析・監視することが行われている。また、管理者（スーパーバイザー）がオペレータと顧客との会話をリアルタイムに監視し、オペレータの対応状況に応じてリアルタイムに適切な対処を行うという運用が行われている。 In a call center such as a contact center or customer center, conversation between an operator and a customer is converted into characters (text), and the operator's response status is analyzed and monitored using the converted characters (text). . In addition, an operation is performed in which a manager (supervisor) monitors conversations between an operator and a customer in real time and takes appropriate measures in real time according to the operator's response status.

なお、コールセンターに関する従来技術として例えば特許文献１に開示された技術がある。 For example, there is a technique disclosed in Patent Document 1 as a conventional technique related to a call center.

特開２０１５−２１１４０３号公報Japanese Patent Application Laid-Open No. 2015-211403

コールセンターにおいてオペレータが対応する顧客数は一定ではない。例えば、夜間よりも平日のほうが顧客からの問い合わせ件数が多い傾向にあり、また、平日よりも休日のほうが顧客からの問い合わせ件数が多い傾向にある。そのため、オペレータと顧客との会話を文字に変換する処理を行う情報処理装置の負荷は、日時によって変動することになる。 The number of customers handled by the operator at the call center is not constant. For example, the number of inquiries from customers tends to be higher on weekdays than at night, and the number of inquiries from customers tends to be higher on holidays than on weekdays. Therefore, the load on the information processing apparatus that performs processing for converting the conversation between the operator and the customer into characters varies depending on the date and time.

万が一情報処理装置のリソースが不足する場合、オペレータと顧客との会話をリアルタイムに変換する処理が行われず、コールセンターの管理者が適切な対処を行うことができないという問題が発生する可能性がある。解決方法として、顧客からの問い合わせ件数のピークに合わせて情報処理装置のリソースを十分に確保する方法も考えられるが、ハードウェアの増設等を伴うためコストが増大するという問題がある。 If the resources of the information processing apparatus are insufficient, there is a possibility that the process of converting the conversation between the operator and the customer in real time is not performed, and the call center manager cannot take appropriate measures. As a solution, there may be a method of securing sufficient resources of the information processing apparatus in accordance with the peak of the number of inquiries from customers, but there is a problem that the cost increases due to the addition of hardware.

本発明は上記に鑑みてなされたものであって、音声変換処理を行う情報処理装置の処理負荷の上昇を抑えることができる技術を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a technique capable of suppressing an increase in processing load of an information processing apparatus that performs voice conversion processing.

本発明の実施の形態に係る情報処理装置は、音声を文字に変換する処理を行う複数の音声認識モジュールを有する情報処理装置であって、音声の入力を受け付ける入力手段と、前記入力手段により音声の入力を受け付ける場合に、当該情報処理装置における処理負荷に基づき、前記複数の音声認識モジュールのうち所定の音声認識モジュールを選択する選択手段と、前記所定の音声認識モジュールにより、前記入力手段により受け付けた音声が変換された文字を出力する出力手段と、を有する。 An information processing apparatus according to an embodiment of the present invention is an information processing apparatus having a plurality of speech recognition modules that perform processing for converting speech into characters, and includes an input unit that receives speech input, and a voice that is input by the input unit. Is received by the input means by the selection means for selecting a predetermined voice recognition module from the plurality of voice recognition modules and the predetermined voice recognition module based on the processing load in the information processing apparatus. Output means for outputting a character obtained by converting the converted voice.

本発明の実施の形態によれば、音声変換処理を行う情報処理装置の処理負荷の上昇を抑えることができる技術が提供される。 According to the embodiment of the present invention, there is provided a technology capable of suppressing an increase in processing load of an information processing apparatus that performs voice conversion processing.

本実施の形態における情報処理装置の機能構成例を示す図である。It is a figure which shows the function structural example of the information processing apparatus in this Embodiment. ＤＮＮの構造を示す図である。It is a figure which shows the structure of DNN. 選択情報の一例を示す図である。It is a figure which shows an example of selection information. 実施の形態に係る情報処理装置が行う処理手順を示すフローチャートである。It is a flowchart which shows the process sequence which the information processing apparatus which concerns on embodiment performs. コールセンターの混雑度合いに応じて、選択される音声認識モジュールが変化する様子を示す図である。It is a figure which shows a mode that the voice recognition module selected changes according to the congestion degree of a call center. 選択情報（変形例）の一例を示す図である。It is a figure which shows an example of selection information (modification).

以下、図面を参照して本発明の実施の形態を説明する。なお、以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。以下の実施の形態は、コールセンターにおいて音声を文字に変換する処理を行う情報処理装置を例として説明するが、本発明の実施の形態は、これに限られず、音声を文字に変換する処理を行う情報処理装置全般に適用することが可能である。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below is only an example, and the embodiment to which the present invention is applied is not limited to the following embodiment. In the following embodiments, an information processing apparatus that performs a process of converting speech into characters in a call center will be described as an example. However, embodiments of the present invention are not limited to this, and perform a process of converting speech into characters. The present invention can be applied to all information processing apparatuses.

＜概要、機能構成＞
本実施の形態に係る情報処理装置１０は、音声を文字に変換する処理を行う複数の音声認識モジュールを有している。情報処理装置１０は、オペレータと顧客との間の会話が開始されると、情報処理装置１０自身の処理負荷に基づいて、複数の音声認識モジュールのうち適切な所定の音声認識モジュールを選択する。また、情報処理装置１０は、選択された所定の音声認識モジュールを用いてオペレータと顧客との間の会話（音声）を文字に変換する処理を行う。 <Overview, functional configuration>
The information processing apparatus 10 according to the present embodiment has a plurality of speech recognition modules that perform processing for converting speech into characters. When the conversation between the operator and the customer is started, the information processing apparatus 10 selects an appropriate predetermined voice recognition module from among the plurality of voice recognition modules based on the processing load of the information processing apparatus 10 itself. Further, the information processing apparatus 10 performs processing for converting a conversation (voice) between the operator and the customer into characters using the selected predetermined voice recognition module.

図１は、本実施の形態における情報処理装置の機能構成例を示す図である。図１に示すように、情報処理装置１０は、入力部１０１、負荷監視部１０２、選択部１０３、記憶部１０４、変換処理部１０５、及び出力部１０７を有する。また、変換処理部１０５は、複数の音声認識モジュール１０６を含む。これら各部は、情報処理装置１０にインストールされた１以上のプログラムが、情報処理装置１０のＣＰＵに実行させる処理により実現される。なお、入力部１０１、負荷監視部１０２、選択部１０３、記憶部１０４、変換処理部１０５、及び出力部１０７ごとに、異なるコンピュータを用いて実現されてもよいし、更に細かい単位でコンピュータが分散されていてもよい。すなわち、情報処理装置１０は、１又は複数のコンピュータを用いて実現されてもよい。また、当該１又は複数のコンピュータは、仮想化技術を利用した仮想サーバであってもよいし、クラウド上に実装された仮想サーバであってもよい。 FIG. 1 is a diagram illustrating a functional configuration example of the information processing apparatus according to the present embodiment. As illustrated in FIG. 1, the information processing apparatus 10 includes an input unit 101, a load monitoring unit 102, a selection unit 103, a storage unit 104, a conversion processing unit 105, and an output unit 107. The conversion processing unit 105 includes a plurality of voice recognition modules 106. Each of these units is realized by processing that one or more programs installed in the information processing apparatus 10 cause the CPU of the information processing apparatus 10 to execute. Note that each of the input unit 101, the load monitoring unit 102, the selection unit 103, the storage unit 104, the conversion processing unit 105, and the output unit 107 may be realized by using different computers, or the computers are distributed in finer units. May be. That is, the information processing apparatus 10 may be realized using one or a plurality of computers. In addition, the one or more computers may be virtual servers using virtualization technology, or may be virtual servers mounted on the cloud.

本実施の形態では、音声認識モジュール１０６として、ＤＮＮ（深層学習技術：Deep Neural Network）を利用することで、精度（認識率）の高い変換処理を行う。また、当該複数の音声認識モジュール１０６は、それぞれ音声認識率及び変換速度が異なる。図１には、音声認識モジュール１０６として、ＤＮＮ（１）１０６_１、ＤＮＮ（２）１０６_２、ＤＮＮ（３）１０６_３、ＤＮＮ（４）１０６_４の４個の音声認識モジュール１０６が図示されている。ＤＮＮ（１）１０６_１は音声認識率が最も高く（逆に変換速度は最も遅い）、ＤＮＮ（１〜４）１０６_１〜４の順に、音声認識率が低下する（逆に変換速度は速くなる）。なお、音声認識率が高い（変換速度が速い）ＤＮＮほど、音声変換処理に必要な処理負荷は高くなる。言い換えると、ＤＮＮ（１〜４）１０６_１〜４の順に処理負荷は低くなる。なお、複数の音声認識モジュール１０６の各々は、少なくとも一定の音声変換精度を有している前提とする。図１には４つのＤＮＮ（１〜４）１０６_１〜４が図示されているが、ＤＮＮの数に制限はなく、５個以上のＤＮＮを有していてもよい。以下、ＤＮＮ（１〜４）１０６_１〜４を区別しない場合は「ＤＮＮ１０６」と呼ぶ。 In this embodiment, DNN (Deep Neural Network) is used as the speech recognition module 106 to perform conversion processing with high accuracy (recognition rate). The plurality of voice recognition modules 106 have different voice recognition rates and conversion speeds. In FIG. 1, four speech recognition modules 106 of DNN (1) 106 ₁ , DNN (2) 106 ₂ , DNN (3) 106 ₃ , DNN (4) 106 ₄ are shown as speech recognition modules 106. Yes. DNN (1) 106 ₁ has the highest speech recognition rate (conversely the conversion speed is slowest), and the voice recognition rate decreases in the order of DNN (1-4) 106 _1-4 (conversely, the conversion speed becomes high). ). Note that a DNN having a higher voice recognition rate (faster conversion speed) has a higher processing load necessary for the voice conversion process. In other words, the processing load decreases in the order of DNN (1-4) 106 _1-4 . It is assumed that each of the plurality of speech recognition modules 106 has at least a certain speech conversion accuracy. Although four DNNs (1 to 4) 106 ₁ to ₄ are illustrated in FIG. 1, the number of DNNs is not limited and may include five or more DNNs. Hereinafter, when DNN (1-4) 106 _1-4 is not distinguished, it is called "DNN106".

本実施の形態に係るＤＮＮ（１〜４）１０６_１〜４は、図２に示すように、音声を入力する入力層と、文字を出力する出力層と、入力層と出力層との間に存在する隠れ層の複数の層から構成されている。また、各層は複数のユニットから構成されている。 As shown in FIG. 2, the DNN (1-4) 106 _{1-4 according} to the present embodiment includes an input layer for inputting speech, an output layer for outputting characters, and an input layer and an output layer. It consists of multiple layers of existing hidden layers. Each layer is composed of a plurality of units.

基本的に、各層のユニット数が少なくなるとその計算量が削減される。計算量が少なくなることで、処理に必要な処理負荷もこれに応じて低くなる。一方で、各層のユニット数が少なくなることで音声認識率も低下することが想定されるが、各層のユニット数の削減比率に対して、音声認識率の劣化がほとんどないことが経験値として判明している（例えば、計算量が半分になっても音声認識率の劣化は数％程度である）。本実施の形態では、このような特性を利用することで、情報処理装置１０自身の処理負荷が高い場合に計算量の少ない音声認識モジュール１０６を選択することで、一定の音声変換精度を保ちつつ、情報処理装置１０自身の負荷上昇を抑えることを可能にする。図２に戻り説明を続ける。 Basically, when the number of units in each layer is reduced, the amount of calculation is reduced. By reducing the amount of calculation, the processing load required for processing is reduced accordingly. On the other hand, it is assumed that the speech recognition rate will decrease as the number of units in each layer decreases, but it has been found as an experience value that there is almost no deterioration in the speech recognition rate with respect to the reduction ratio of the number of units in each layer (For example, even if the calculation amount is halved, the deterioration of the speech recognition rate is about several percent). In the present embodiment, by using such characteristics, when the processing load of the information processing apparatus 10 itself is high, the speech recognition module 106 with a small amount of calculation is selected, while maintaining a certain speech conversion accuracy. This makes it possible to suppress an increase in the load on the information processing apparatus 10 itself. Returning to FIG.

入力部１０１は、オペレータと顧客との間で会話が開始される際に当該会話に係る音声の入力を受け付ける機能を有する。また、入力部１０１は、音声の入力を受け付ける際に、選択部１０３に対して、どの音声認識モジュール１０６（ＤＮＮ（１〜４）１０６_１〜４のいずれか１つ）に音声データを渡すべきかを問い合わせると共に、選択部１０３から指示された音声認識モジュール１０６に音声データを送信する。なお、入力部１０１は、一旦音声認識モジュール１０６に音声データを送信し始めた後は、オペレータと顧客との会話が終了するまで（つまり、音声のセッションが終了するまで）、同一の音声認識モジュール１０６に音声データを送信し続ける。つまり、本実施の形態における情報処理装置１０では、一旦選択された音声認識モジュール１０６は、オペレータと顧客との間の会話が終了するまで変更されることはない。 The input unit 101 has a function of accepting voice input related to the conversation when the conversation is started between the operator and the customer. Further, when the input unit 101 accepts voice input, the voice data should be passed to which voice recognition module 106 (any one of DNN (1 to 4) 106 _{1 to} 4) to the selection unit 103. And the voice data is transmitted to the voice recognition module 106 instructed by the selection unit 103. Note that the input unit 101 once transmits voice data to the voice recognition module 106 and then continues until the conversation between the operator and the customer ends (that is, until the voice session ends). Continue to transmit audio data to 106. That is, in the information processing apparatus 10 according to the present embodiment, once selected voice recognition module 106 is not changed until the conversation between the operator and the customer ends.

負荷監視部１０２は、情報処理装置１０自身の処理負荷を監視する機能を有する。また、負荷監視部１０２は、選択部１０３からの問い合わせに応じて、情報処理装置１０自身の処理負荷を通知する。 The load monitoring unit 102 has a function of monitoring the processing load of the information processing apparatus 10 itself. Further, the load monitoring unit 102 notifies the processing load of the information processing apparatus 10 itself in response to an inquiry from the selection unit 103.

選択部１０３は、入力部１０１からの問い合わせを受けて、どの音声認識モジュール１０６（図１の例では、ＤＮＮ（１〜４）１０６_１〜Ｎのいずれか１つ）に音声データを渡すべきかを指示する。より具体的には、選択部１０３は、入力部１０１からの問い合わせを受けた場合、情報処理装置１０自身の処理負荷を負荷監視部１０２に問い合わせる。続いて、選択部１０３は、負荷監視部１０２から通知された処理負荷に対応する音声認識モジュール１０６を、後述する選択情報に従って選択し、選択した音声認識モジュール１０６を入力部１０１に指示する。 In response to the inquiry from the input unit 101, the selection unit 103 should pass the voice data to which voice recognition module 106 (in the example of FIG. 1, one of DNN (1 to 4) 106 _{1 to N} ). Instruct. More specifically, when receiving an inquiry from the input unit 101, the selection unit 103 inquires of the load monitoring unit 102 about the processing load of the information processing apparatus 10 itself. Subsequently, the selection unit 103 selects a voice recognition module 106 corresponding to the processing load notified from the load monitoring unit 102 according to selection information described later, and instructs the input unit 101 of the selected voice recognition module 106.

記憶部１０４は、選択情報を格納する。記憶部１０４は、情報処理装置１０が備える記憶装置（メモリ、ＨＤＤ等）、情報処理装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 The storage unit 104 stores selection information. The storage unit 104 can be realized by using a storage device (memory, HDD, or the like) included in the information processing apparatus 10, a storage device that can be connected to the information processing apparatus 10 via a network, or the like.

図３は、選択情報の一例を示す図である。選択情報には、情報処理装置１０の処理負荷の範囲と、範囲ごとに選択すべき音声認識モジュール１０６とが対応づけられている。図３は、情報処理装置１０の処理負荷としてＣＰＵ使用率が用いられる場合の選択情報の例を示している。より具体的には、図３の例では、ＣＰＵ使用率が７０％以下（又は未満）の場合、ＤＮＮ（１）１０６_１を選択すべきであり、ＣＰＵ使用率が７０％〜８０％の場合、ＤＮＮ（２）１０６_２を選択すべきであり、ＣＰＵ使用率が８０％〜９０％の場合、ＤＮＮ（３）１０６_３を選択すべきであり、ＣＰＵ使用率が９０％以上（又は超える）の場合、ＤＮＮ（４）１０６_４を選択すべきであることが示されている。 FIG. 3 is a diagram illustrating an example of selection information. The selection information is associated with the processing load range of the information processing apparatus 10 and the voice recognition module 106 to be selected for each range. FIG. 3 shows an example of selection information when the CPU usage rate is used as the processing load of the information processing apparatus 10. More specifically, in the example of FIG. 3, when the CPU usage rate is 70% or less (or less), DNN (1) 106 ₁ should be selected, and the CPU usage rate is 70% to 80%. should be selected DNN (2) 106 _2, CPU utilization is 80% to 90%, should be selected DNN (3) 106 _3, CPU utilization is 90% (or greater than) for, it has been shown that should select DNN ₍₄₎ 106 4.

「変換速度」は、ＤＮＮ１０６単位あたりの変換速度であり、数字が大きいほど音声を文字に変換する際の変換速度が速いことを意味している。「音声認識率」は、ＤＮＮ１０６における音声認識率（音声から文字への変換精度）を意味している。「層、ユニット」は、ＤＮＮ１０６の構造（いくつの層から構成され、各層にはいくつのユニットを有しているか）を示している。例えば、ＤＮＮ（１）１０６_１は、１０層及び２０４８ユニットから構成されていることを示している。「層、ユニット」に示されている数は一例であり、本実施の形態では、どのような数も取り得る。なお、「変換速度」、「音声認識率」及び「層、ユニット」は、ＤＮＮ１０６に関する参考情報であるため、選択情報に含まれていなくてもよい。図３に示す選択情報は一例であり、変換処理部１０５に５個以上のＤＮＮ１０６が含まれる場合、ＤＮＮ１０６の数に応じて、選択情報に設定される情報処理装置１０の処理負荷の範囲は、更に細かく分割されていてもよい。 The “conversion speed” is a conversion speed per DNN 106 unit, and means that the higher the number, the faster the conversion speed when converting speech into characters. “Voice recognition rate” means the voice recognition rate (conversion accuracy from voice to characters) in the DNN 106. “Layer, unit” indicates the structure of DNN 106 (how many layers are formed, and how many units are included in each layer). For example, DNN (1) 106 ₁ indicates that it is composed of 10 layers and 2048 units. The numbers shown in “layer, unit” are merely examples, and any number can be taken in the present embodiment. Note that “conversion speed”, “voice recognition rate”, and “layer, unit” are reference information regarding the DNN 106, and thus may not be included in the selection information. The selection information illustrated in FIG. 3 is an example. When the conversion processing unit 105 includes five or more DNNs 106, the range of the processing load of the information processing apparatus 10 set in the selection information according to the number of DNNs 106 is as follows. It may be further finely divided.

変換処理部１０５は、入力部１０１から受信した音声を文字に変換する処理を行う。前述の通り、変換処理部１０５は、複数の音声認識モジュール１０６（ＤＮＮ（１〜４）１０６_１〜４）を含む。音声認識モジュール１０６は、例えば、入力部１０１から受信した音声データを分析することで音響特徴を抽出し、抽出した音響特徴を文章に変換する処理を行うことで、音声を文字に変換するようにしてもよい。 The conversion processing unit 105 performs processing for converting the voice received from the input unit 101 into characters. As described above, the conversion processing unit 105 includes a plurality of speech recognition modules 106 (DNN (1-4) 106 _1-4 ). For example, the voice recognition module 106 analyzes the voice data received from the input unit 101 to extract an acoustic feature, and converts the extracted acoustic feature into a sentence, thereby converting the voice into a character. May be.

出力部１０７は、音声認識モジュール１０６により変換された文字を出力する機能を有する。 The output unit 107 has a function of outputting characters converted by the voice recognition module 106.

＜動作例＞
図４は、実施の形態に係る情報処理装置が行う処理手順を示すフローチャートである。図４を用いて、本実施の形態に係る情報処理装置１０が行う処理手順を説明する。 <Operation example>
FIG. 4 is a flowchart illustrating a processing procedure performed by the information processing apparatus according to the embodiment. A processing procedure performed by the information processing apparatus 10 according to the present embodiment will be described with reference to FIG.

ステップＳ１０１で、入力部１０１は、顧客からの問い合わせが発生し、オペレータと顧客との間で会話が開始される際に、当該会話に係る音声の入力を受け付ける。 In step S <b> 101, the input unit 101 receives an input of voice related to a conversation when an inquiry from the customer is generated and a conversation is started between the operator and the customer.

ステップＳ１０２で、入力部１０１は、選択部１０３に対し、どの音声認識モジュール１０６に音声データを送信すべきかを問い合わせる。続いて、選択部１０３は、負荷監視部１０２に、情報処理装置１０自身の処理負荷を問い合わせる。 In step S <b> 102, the input unit 101 inquires of the voice recognition module 106 to which voice data should be transmitted to the selection unit 103. Subsequently, the selection unit 103 inquires of the load monitoring unit 102 about the processing load of the information processing apparatus 10 itself.

ステップＳ１０３で、選択部１０３は、選択情報を用いて、負荷監視部１０２から通知された情報処理装置１０自身の処理負荷に対応する音声認識モジュール１０６（ＤＮＮ（１〜４）１０６_１〜４のいずれか１つ）を選択する。例えば、選択情報に図３に示す情報が格納されている場合において、負荷監視部１０２から通知された処理負荷が６０％であった場合、選択部１０３は、音声認識モジュール１０６としてＤＮＮ（１）１０６_１を選択することになる。また、例えば、負荷監視部１０２から通知された処理負荷が９５％であった場合、選択部１０３は、音声認識モジュール１０６としてＤＮＮ（４）１０６_４を選択することになる。 In step S <b> 103, the selection unit 103 uses the selection information to correspond to the processing load of the information processing apparatus 10 itself notified from the load monitoring unit 102, which corresponds to the voice recognition module 106 (DNN (1-4) 106 _1-4 . Any one). For example, when the information shown in FIG. 3 is stored in the selection information, if the processing load notified from the load monitoring unit 102 is 60%, the selection unit 103 uses the DNN (1) as the voice recognition module 106. 106 ₁ is selected. For example, when notified process load from the load monitoring unit 102 was 95%, the selection unit 103 will select the DNN (4) 106 ₄ as the speech recognition module 106.

ステップＳ１０４で、選択部１０３は、ステップＳ１０３の処理手順で選択した音声認識モジュール１０６を入力部１０１に通知する。入力部１０１は、通知された音声認識モジュール１０６に音声データを送信する。音声認識モジュール１０６は、入力部１０１から受信した音声データを分析して文字に変換する処理を行う。 In step S104, the selection unit 103 notifies the input unit 101 of the voice recognition module 106 selected in the processing procedure of step S103. The input unit 101 transmits voice data to the notified voice recognition module 106. The voice recognition module 106 performs processing of analyzing voice data received from the input unit 101 and converting it into characters.

ステップＳ１０５で、出力部１０７は、音声認識モジュール１０６で変換された文字を出力する。 In step S <b> 105, the output unit 107 outputs the characters converted by the voice recognition module 106.

以上説明したステップＳ１０１乃至ステップＳ１０５の処理手順は、顧客からの問い合わせが発生してオペレータと顧客との間で会話が開始される度に、すなわち、入力部１０１で新たに音声の入力を受け付ける度に繰り返し行われる。 The processing procedure from step S101 to step S105 described above is performed every time an inquiry from a customer occurs and a conversation is started between the operator and the customer, that is, every time a new voice input is received by the input unit 101. Repeatedly.

図５は、コールセンターの混雑度合いに応じて、選択される音声認識モジュールが変化する様子を示す図である。 FIG. 5 is a diagram showing how the selected voice recognition module changes according to the congestion degree of the call center.

夜間など、顧客からの問い合わせ件数が少ない場合（図５の左側の状態）、情報処理装置１０の処理負荷は低い状態であるため、各会話を文字に変換する処理において、最も音声認識率の高い音声認識モジュール１０６であるＤＮＮ（１）１０６_１が選択されることになる。 When the number of inquiries from customers is small, such as at night (the state on the left side of FIG. 5), the processing load on the information processing apparatus 10 is low, so the highest speech recognition rate is achieved in the process of converting each conversation into characters. The DNN (1) 106 ₁ that is the voice recognition module 106 is selected.

次に、顧客からの問い合わせ件数が増加してきた場合（図５の中央の状態）、情報処理装置１０の処理負荷は徐々に上昇するため、顧客からの新たな問い合わせに係る会話を文字に変換する処理において、ＤＮＮ（１）１０６_１よりも計算量が少ない（ＤＮＮ（１）１０６_１よりも音声認識率が低く、かつ処理速度が高い）音声認識モジュール１０６であるＤＮＮ（２）１０６_２又はＤＮＮ（３）１０６_３が選択されることになる。なお、図５の中央の例には、選択された音声認識モジュール１０６としてＤＮＮ（１）１０６_１が図示されている。これは、前述の通り、一旦選択された音声認識モジュール１０６は、オペレータと顧客との間の会話が終了するまで変更されることはないことから、図５の左側の状態であった際に選択されたＤＮＮ（１）１０６_１が、図５の中央の状態に移行した後もそのまま動作し続けることがあり得るということを示したものである。 Next, when the number of inquiries from customers has increased (the state in the center of FIG. 5), the processing load of the information processing apparatus 10 gradually increases, so the conversation related to new inquiries from customers is converted into characters. in the process, DNN (1) 106 is smaller calculation amount than ₁ (DNN (1) 106 low speech recognition rate than _1, and processing speed is high) DNN (2) is a speech recognition module 106 106 ₂ or DNN (3) 106 ₃ is selected. Incidentally, the example of the center of FIG. 5, DNN (1) 106 ₁ is illustrated as a speech recognition module 106 which is selected. As described above, since the voice recognition module 106 once selected is not changed until the conversation between the operator and the customer is finished, it is selected when the voice recognition module 106 is in the state on the left side of FIG. This shows that the DNN (1) 106 ₁ thus made may continue to operate even after shifting to the center state of FIG.

次に、休日や日中帯など、更に顧客からの問い合わせ件数が増加してきた場合（図５の右側の状態）、情報処理装置１０の処理負荷は更に上昇するため、顧客からの新たな問い合わせに係る会話を文字に変換する処理において、ＤＮＮ（３）１０６_３よりも計算量が少ない（ＤＮＮ（３）１０６_３よりも音声認識率が低く、かつ処理速度が高い）音声認識モジュール１０６であるＤＮＮ（４）１０６_４が選択されることになる。 Next, when the number of inquiries from customers has increased further, such as on holidays or during the daytime (the state on the right side of FIG. 5), the processing load on the information processing apparatus 10 further increases, so new inquiries from customers are made. in the process of converting the conversation of the character, the amount of calculation than DNN (3) 106 ₃ is less (DNN ₍₃₎ 106 3 lower speech recognition rate than, and higher processing speed) DNN a speech recognition module 106 (4) so that the 106 ₄ is selected.

なお、選択部１０３は、選択したＤＮＮ１０６の組み合わせ（例えば、図５の中央のように、ＤＮＮ（１）１０６_１、ＤＮＮ（２）１０６_２及びＤＮＮ（３）１０６_３を、それぞれ２つ、３つ及び１つ選択している等）と、情報処理装置１０自身の処理負荷とを対応づけて随時履歴に保存しておくようにしてもよい。また、選択部１０３は、新たにＤＮＮ１０６の何れかを選択した際に情報処理装置１０自身の処理負荷の変化が少ない場合（例えば所定の閾値未満の場合）は、次にＤＮＮ１０６を選択する際に、より音声認識率が高いＤＮＮ１０６を選択するようにしてもよい。逆に、選択部１０３は、新たにＤＮＮ１０６の何れかを選択した際に情報処理装置１０自身の処理負荷の変化が大きい場合（例えば所定の閾値以上の場合）は、次にＤＮＮ１０６を選択する際に、音声認識率が低いＤＮＮ１０６を選択するようにしてもよい。これにより、選択部１０３は、ＤＮＮ１０６の組み合わせと処理負荷との対応関係をさまざまに学習することができ、より精度の高いＤＮＮ選択方式を実現することが可能になる。 Note that the selection unit 103 includes two combinations of the selected DNN 106 (for example, two DNN (1) 106 _1, DNN (2) 106 _2, and DNNN (3) 106 ₃ as shown in the center of FIG. And the processing load of the information processing apparatus 10 itself may be associated with each other and stored in the history as needed. In addition, when a change in the processing load of the information processing apparatus 10 itself is small when the selection unit 103 newly selects any of the DNNs 106 (for example, less than a predetermined threshold value), the selection unit 103 next selects the DNN 106. Alternatively, the DNN 106 having a higher voice recognition rate may be selected. On the contrary, when the selection unit 103 newly selects any of the DNNs 106 and the change in the processing load of the information processing apparatus 10 itself is large (for example, when it is equal to or greater than a predetermined threshold), the selection unit 103 next selects the DNNs 106. Alternatively, the DNN 106 having a low voice recognition rate may be selected. As a result, the selection unit 103 can learn various correspondence relationships between the combinations of the DNNs 106 and the processing loads, and can realize a DNN selection method with higher accuracy.

以上説明したように、本実施の形態に係る情報処理装置１０は、オペレータと顧客と間で同時に行われている会話数（同時に対応中の顧客数）が増加するに従って、計算量が少ない（音声認識率が低く、かつ処理速度が高い）音声認識モジュール１０６を用いて音声変換処理を行うように動作する。これにより、本実施の形態に係る情報処理装置１０は、一定の音声変換精度を保ちつつ、情報処理装置１０自身の処理負荷の上昇を抑えることができる。また、本実施の形態に係る情報処理装置１０は、自身の処理負荷の上昇を抑えると共に、自身で行っている音声変換処理の処理速度の平均を、一定速度以上に保つことができる。 As described above, the information processing apparatus 10 according to the present embodiment has a smaller amount of calculation as the number of conversations simultaneously performed between the operator and the customer (the number of customers who are simultaneously supported) increases (speech) The speech conversion module 106 operates so as to perform speech conversion processing (with a low recognition rate and a high processing speed). Thereby, the information processing apparatus 10 according to the present embodiment can suppress an increase in the processing load of the information processing apparatus 10 itself while maintaining a certain voice conversion accuracy. In addition, the information processing apparatus 10 according to the present embodiment can suppress an increase in its processing load and can keep the average processing speed of the voice conversion processing performed by itself at a certain speed or higher.

＜変形例＞
以上説明した実施の形態では、情報処理装置１０の処理負荷としてＣＰＵ使用率を用いるようにしたが、ＣＰＵ使用率に代えて、他のパラメータを用いるようにしてもよい。例えば、情報処理装置１０は、ＣＰＵ使用率に代えて、ＤＮＮ１０６のプロセス数に基づいて音声認識モジュール１０６を選択するようにしてもよい。本実施の形態に係る情報処理装置１０は、オペレータと顧客との会話ごとに、ＤＮＮ１０６のプロセスが１つ起動するように動作することを想定しているため、プロセス数＝同時に対応中の顧客数と言うこともできる。 <Modification>
In the embodiment described above, the CPU usage rate is used as the processing load of the information processing apparatus 10, but other parameters may be used instead of the CPU usage rate. For example, the information processing apparatus 10 may select the voice recognition module 106 based on the number of processes of the DNN 106 instead of the CPU usage rate. Since the information processing apparatus 10 according to the present embodiment assumes that one process of the DNN 106 is activated for each conversation between the operator and the customer, the number of processes = the number of customers who are simultaneously supporting It can also be said.

本変形例では、図６（ａ）に示すように、選択情報には、ＣＰＵ使用率に代えてプロセス数が格納される。また、本変形例に係る負荷監視部１０２は、ＣＰＵ使用率に代えて、ＤＮＮ１０６のプロセス数を監視するように動作する。また、本変形例に係る選択部１０３は、選択情報を用いて、負荷監視部１０２から通知されたプロセス数に対応する音声認識モジュール１０６を選択するように動作する。 In this modification, as shown in FIG. 6A, the selection information stores the number of processes instead of the CPU usage rate. Further, the load monitoring unit 102 according to the present modification operates so as to monitor the number of processes of the DNN 106 instead of the CPU usage rate. Further, the selection unit 103 according to the present modification operates to select the voice recognition module 106 corresponding to the number of processes notified from the load monitoring unit 102 using the selection information.

また、情報処理装置１０の処理負荷としてＣＰＵ使用率と他のパラメータとを併用するようにしてもよい。例えば、情報処理装置１０は、ＣＰＵ使用率と、ＤＮＮ１０６のプロセス数との両方に基づいて音声認識モジュール１０６を選択するようにしてもよい。 Further, the CPU usage rate and other parameters may be used together as the processing load of the information processing apparatus 10. For example, the information processing apparatus 10 may select the voice recognition module 106 based on both the CPU usage rate and the number of processes of the DNN 106.

この場合、選択情報には、図６（ｂ）に示すように、ＣＰＵ使用率とプロセス数との両方が格納される。また、本変形例に係る負荷監視部１０２は、ＣＰＵ使用率及びプロセス数の両方を監視するように動作する。また、本変形例に係る選択部１０３は、選択情報を用いて、負荷監視部１０２から通知されたＣＰＵ使用率に対応する音声認識モジュール１０６と、負荷監視部１０２から通知されたプロセス数に対応する音声認識モジュール１０６とを選択すると共に、更に、選択した２つの音声認識モジュール１０６のうち、計算量が少ない（音声認識率が低く、かつ処理速度が高い）音声認識モジュール１０６、又は、計算量が多い（音声認識率が高く、かつ処理速度が遅い）音声認識モジュール１０６を選択するように動作する。 In this case, as shown in FIG. 6B, both the CPU usage rate and the number of processes are stored in the selection information. In addition, the load monitoring unit 102 according to the present modification operates to monitor both the CPU usage rate and the number of processes. In addition, the selection unit 103 according to this modification uses the selection information to correspond to the voice recognition module 106 corresponding to the CPU usage rate notified from the load monitoring unit 102 and the number of processes notified from the load monitoring unit 102. The speech recognition module 106 to be selected, and, of the two selected speech recognition modules 106, the speech recognition module 106 having a small calculation amount (low speech recognition rate and high processing speed), or the calculation amount It operates to select the speech recognition module 106 with a large number (high speech recognition rate and low processing speed).

＜実施形態の補足＞
本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。実施の形態で述べた処理手順は、矛盾の無い限り順序を入れ替えてもよい。 <Supplement of embodiment>
The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims. The processing procedures described in the embodiments may be switched in order as long as there is no contradiction.

以上、実施の形態に係る情報処理装置１０の各機能部は、これらが備えるＣＰＵ及びメモリなどのハードウェア資源を用いて、情報処理装置１０で実施される処理に対応するプログラムを実行することによって実現することが可能である。また、当該プログラムは、記憶媒体に格納することができる。 As described above, each functional unit of the information processing apparatus 10 according to the embodiment executes a program corresponding to the process executed by the information processing apparatus 10 by using hardware resources such as a CPU and a memory included in the information processing apparatus 10. It is possible to realize. Further, the program can be stored in a storage medium.

１０情報処理装置
１０１入力部
１０２負荷監視部
１０３選択部
１０４記憶部
１０５変換処理部
１０６音声認識モジュール
１０７出力部 DESCRIPTION OF SYMBOLS 10 Information processing apparatus 101 Input part 102 Load monitoring part 103 Selection part 104 Storage part 105 Conversion processing part 106 Voice recognition module 107 Output part

Claims

An information processing apparatus having a plurality of speech recognition modules that perform processing for converting speech into characters,
Input means for receiving voice input;
A selection unit that selects a voice recognition module from the plurality of voice recognition modules based on a processing load in the information processing apparatus when receiving an input of voice by the input unit;
An output means for outputting a character obtained by converting the voice received by the input means by the voice recognition module selected by the selection means ;
The selection means includes
When the voice is converted by the selected voice recognition module, if the change in processing load in the information processing apparatus is less than a predetermined threshold, a voice recognition module with a higher voice recognition rate is selected as the next voice recognition module And
When the voice is converted by the selected voice recognition module, if the change in the processing load in the information processing apparatus is equal to or greater than a predetermined threshold, a voice recognition module with a lower voice recognition rate is selected as the next voice recognition module An information processing apparatus.

The plurality of voice recognition modules perform conversion processing at different voice recognition rates and different conversion speeds, respectively.
The selection means selects a speech recognition module that performs conversion processing at a speech recognition rate and a conversion speed according to the range of the processing load height.
The information processing apparatus according to claim 1.

The selection unit selects a speech recognition module that performs conversion processing at a speech recognition rate and a conversion speed according to a range of the processing load height and the number of processes in which the speech recognition module performs conversion processing. Item 3. The information processing device according to Item 2.

The information processing apparatus according to any one of claims 1 to 3, wherein the voice recognition module performs a process of converting voice into characters by a DNN (Deep Neural Network).

A speech recognition method executed by an information processing apparatus having a plurality of speech recognition modules that perform processing for converting speech into characters,
An input step for receiving voice input;
A selection step of selecting a speech recognition module from the plurality of speech recognition modules based on a processing load in the information processing apparatus when receiving an input of speech in the input step;
An output step of outputting a character obtained by converting the voice received in the input step by the voice recognition module selected in the selection step ;
The selection step includes
When the voice is converted by the selected voice recognition module, if the change in processing load in the information processing apparatus is less than a predetermined threshold, a voice recognition module with a higher voice recognition rate is selected as the next voice recognition module And
When the voice is converted by the selected voice recognition module, if the change in the processing load in the information processing apparatus is equal to or greater than a predetermined threshold, a voice recognition module with a lower voice recognition rate is selected as the next voice recognition module to, voice recognition method.

The program for functioning a computer as each means of the information processing apparatus as described in any one of Claims 1 thru | or 4 .