JP2016536652A

JP2016536652A - Real-time speech evaluation system and method for mobile devices

Info

Publication number: JP2016536652A
Application number: JP2016550920A
Authority: JP
Inventors: 翌王; 暉林; 哲人胡
Original assignee: Shanghai Liulishuo Information Technology Co ltd
Current assignee: Shanghai Liulishuo Information Technology Co ltd
Priority date: 2013-10-30
Filing date: 2014-10-28
Publication date: 2016-11-24
Anticipated expiration: 2034-10-28
Also published as: US20160253923A1; CN104599680B; EP3065119A1; EP3065119A4; WO2015062465A1; JP6541673B2; CN104599680A

Abstract

本発明は、モバイル機器におけるリアルタイム音声評価システム及び方法を開示した。システムは、評価待ち音声の音声データを収集するのに用いる収集モジュール（１１０）と、収集モジュール（１１０）が収集して得た音声データをテキストデータとして認識するのに用いる識別モジュール（１３０）と、識別モジュール（１３０）が認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングし、マッチング結果を得るのに用いるマッチングモジュール（１５０）と、予め定義された評価ポリシー及びマッチングモジュール（１５０）がマッチングして得たマッチング結果に基づいて、評価待ち音声中の少なくとも１つの文字又は文字列の発音スコア、及び/又は評価待ち音声の発音スコアを得て、且つ出力するのに用いる評価モジュール（１７０）と、を備える。音声評価システムをモバイル機器において完成させることにより、モバイル機器のネットワークに対する依存性を減少させるだけでなく、そのうえ、ユーザにリアルタイム音声評価のフィードバックができ、ユーザの体験効果を高めることができる。The present invention discloses a real-time speech evaluation system and method in a mobile device. The system includes a collection module (110) used for collecting voice data of voices waiting for evaluation, and an identification module (130) used for recognizing voice data collected by the collection module (110) as text data. A matching module (150) used for matching the text data obtained by recognition by the identification module (130) with the text data of the voice sample in the voice sample repository, and obtaining a matching result, a predefined evaluation policy, and Based on the matching result obtained by matching by the matching module (150), the pronunciation score of at least one character or character string in the evaluation waiting speech and / or the pronunciation score of the evaluation waiting speech is obtained and output. And an evaluation module (170) used for the above. By completing the voice evaluation system in the mobile device, not only can the dependence of the mobile device on the network be reduced, but also the user can be fed back with real-time voice evaluation, and the user experience effect can be enhanced.

Description

本発明は、コンピュータ技術分野に関し、特に、モバイル機器におけるリアルタイム音声評価システム及び方法に関する。 The present invention relates to the field of computer technology, and more particularly, to a real-time voice evaluation system and method in a mobile device.

従来の音声評価システムの多くは、コンピュータをクライアントとし、ユーザがコンピュータに接続されたマイクで録音をし、音声データをネットワークを介してサーバにアップロードし、且つサーバで実行されるアルゴリズムによって評価し、評価アルゴリズムは計算リソース（ＣＰＵリソース・メモリリソース・記憶リソース）に比較的余裕のあるサーバのコンピュータで稼動している。 Many of the conventional voice evaluation systems use a computer as a client, a user records with a microphone connected to the computer, uploads voice data to a server via a network, and evaluates by an algorithm executed on the server. The evaluation algorithm is operated on a computer of a server having a relatively large computing resource (CPU resource, memory resource, storage resource).

モバイル機器の普及に伴い、ユーザがコンピュータクライアントからモバイル機器クライアントへ変わり始めた。評価システムのクライアントをモバイル機器に移行する際、以下のような解決方式が多く採用されている。すなわち、モバイル機器クライアントから音声データを収集し、ネットワークを介して音声データをサーバに送り、サーバ上で実行されている音声評価アルゴリズムで評価し、評価した結果をネットワークを介してモバイル機器クライアントに返信する。 With the proliferation of mobile devices, users have begun to change from computer clients to mobile device clients. When migrating evaluation system clients to mobile devices, the following solutions are often used. In other words, voice data is collected from the mobile device client, the voice data is sent to the server via the network, evaluated by the voice evaluation algorithm executed on the server, and the evaluation result is returned to the mobile device client via the network. To do.

従来の技術はネットワークが接続されていることに依存している。ある面では、ネットワークを介して音声データを送信するのに流量がかかり、ある面では、モバイル機器がどんな時間においても信頼性の高いネットワークに接続されているわけではない。以上の２点いずれも、音声評価システムにマイナスなユーザ体験を与え易く、且つ、音声評価システムサーバの構築及びメンテナンスに余分なコストを増加させる。 The conventional technology relies on the network being connected. In some aspects, sending voice data over a network is expensive, and in some aspects, mobile devices are not connected to a reliable network at any time. Both of the above two points easily give a negative user experience to the voice evaluation system, and increase the extra cost for the construction and maintenance of the voice evaluation system server.

本発明は、以上のような課題に鑑みてなされたものであり、上述する問題または一部の問題を解決するために、モバイル機器におけるリアルタイム音声評価システム及び方法を提供する。 The present invention has been made in view of the above problems, and provides a real-time speech evaluation system and method in a mobile device in order to solve the above-mentioned problems or some problems.

音声評価システムをモバイル機器において完成させることで、音声評価システムがネットワークに対する依存性を低減させ、モバイル機器及びサーバのメッセージ伝送の流量損失を減少させることができるだけでなく、そのうえ、ユーザにリアルタイム音声評価のフィードバックをすることもできる。よって、いつでも何処でも当該音声評価システムを用いて音声練習ができ、ユーザの体験効果を高めることができる。 By completing the voice evaluation system on the mobile device, the voice evaluation system can not only reduce the dependency on the network, reduce the flow loss of mobile device and server message transmission, but also the user real-time voice evaluation You can also give feedback. Therefore, voice practice can be performed anytime and anywhere using the voice evaluation system, and the user experience effect can be enhanced.

本発明の一実施例によれば、モバイル機器におけるリアルタイム音声評価システムを提供している。当該システムは、評価待ち音声の音声データを収集するために用いる収集モジュールと、収集モジュールが収集した音声データをテキストデータと認識する識別モジュールと、識別モジュールが認識したテキストデータ及び音声サンプル庫にある音声サンプルのテキストデータとマッチングし、マッチング結果を得るためのマッチングモジュールと、予め定義された評価ポリシーと前記マッチングモジュールによって得られたマッチング結果を用い、評価待ち音声の少なくとも１つの文字又は文字列の発音スコア、及び／又は評価待ち音声の発音スコアを得て、且つ出力するための評価モジュールと、を備える。ここで、評価待ち音声の中には少なくとも１文字の音声または文字列の音声を含む。 According to one embodiment of the present invention, a real-time speech evaluation system in a mobile device is provided. The system is in a collection module used to collect voice data of evaluation-waiting voice, an identification module that recognizes voice data collected by the collection module as text data, and text data and voice sample storage recognized by the identification module. Matching with text data of voice samples and using a matching module for obtaining matching results, a predefined evaluation policy and a matching result obtained by the matching module, and using at least one character or character string of voice to be evaluated And an evaluation module for obtaining and outputting a pronunciation score and / or a pronunciation score of a voice awaiting evaluation. Here, the evaluation-waiting voice includes at least one letter voice or a character string voice.

好ましくは、そのシステムは、前記音声サンプル庫にある音声サンプルのテキストデータを表示するために用いる表示モジュールをさらに備える。 Preferably, the system further comprises a display module used to display text data of the audio samples in the audio sample repository.

前記収集モジュールは、前記表示モジュールが表示した音声サンプル庫にある音声サンプルのテキストデータに基づいて入力された評価待ち音声としての音声データをさらに収集することに用いる。 The collection module is used to further collect voice data as an evaluation waiting voice input based on the text data of the voice sample in the voice sample store displayed by the display module.

好ましくは、システムは、評価モジュールから出力された前記評価待ち音声の発音スコア、及び／又は前記評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアを、予め定義された発音のスコア閾値と比較するためのスコア比較モジュールと、前記評価待ち音声の発音スコアが予め定義された発音スコア閾値より低い場合、前記表示モジュールが表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低いテキストデータにマークを付け、及び/又は、前記評価待ち音声に文字又は文字列の発音スコアが予め定義された発音スコア閾値より低い場合、前記表示モジュールが表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低い文字又は文字列にマークを付けるのに用いるマーキングモジュールをさらに備える。 Preferably, the system uses the pronunciation score of the speech waiting for evaluation output from the evaluation module and / or the pronunciation score of at least one character or character string in the speech awaiting evaluation as a predefined pronunciation score threshold. If the pronunciation comparison score of the score comparison module for comparison and the speech waiting for evaluation is lower than the predefined pronunciation score threshold, the pronunciation score is determined from the predefined pronunciation score threshold among the text data displayed by the display module. A mark is applied to low text data, and / or if the pronunciation score of a character or character string is lower than a predefined pronunciation score threshold value in the voice to be evaluated, a pronunciation score is selected from the text data displayed by the display module. A marker used to mark characters or strings that are below the predefined pronunciation score threshold. -A king module.

好ましくは、マッチングモジュールはさらに、Levenshtein Distance編集距離アルゴリズムに基づいて、前記識別モジュールに認識されたテキストデータを、音声サンプル庫にある音声サンプルのテキストデータとマッチング演算を行い、マッチング結果を得る。 Preferably, the matching module further performs a matching operation on the text data recognized by the identification module with the text data of the voice sample in the voice sample repository based on the Levenshtein Distance editing distance algorithm to obtain a matching result.

好ましくは、予め定義された評価ポリシーは、認識して得たテキストデータが音声サンプル庫にある音声サンプルのテキストデータとマッチングする場合、音声データに基づいて認識して得たテキストデータ中の文字又は文字列の事後確率を評価待ち音声中の文字又は文字列の発音スコアとし、評価待ち音声中の全ての文字または文字列の発音スコアの平均スコアを評価待ち音声の発音スコアとする。 Preferably, when the predefined evaluation policy matches the text data obtained by recognition with the text data of the voice sample in the voice sample store, the character in the text data obtained by recognition based on the voice data or The a posteriori probability of the character string is defined as the pronunciation score of the character or character string in the evaluation waiting speech, and the average score of the pronunciation scores of all the characters or character strings in the evaluation waiting speech is defined as the pronunciation score of the evaluation waiting speech.

好ましくは、システムは、前記音声サンプル庫を記憶するのに用いるストレージモジュールを更に備える。但し、前記音声サンプル庫には少なくとも１つの音声サンプルが含まれている。 Preferably, the system further comprises a storage module used to store the audio sample repository. However, at least one audio sample is included in the audio sample store.

本発明のもう一つの実施例によれば、端末機器におけるリアルタイム音声評価方法をさらに提供する。それは、前記評価待ち音声の音声データを収集するステップと、収集した音声データをテキストデータとして認識するステップと、認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングし、マッチング結果を得るステップと、予め定義される評価ポリシーと前記マッチング結果に従って、前記評価待ち音声中の少なくとも１つの文字又は文字列の発音スコア、及び／又は前記評価待ち音声の発音スコアを得て、且つ出力するステップと、を含む。但し、前記評価待ち音声の中には少なくとも１文字の音声または文字列の音声を含む。 According to another embodiment of the present invention, a real-time speech evaluation method in a terminal device is further provided. It includes the steps of collecting voice data of the awaiting evaluation voice, recognizing the collected voice data as text data, matching the text data obtained by recognition with the text data of the voice sample in the voice sample warehouse, Obtaining a matching result; according to a predefined evaluation policy and the matching result, obtaining a pronunciation score of at least one character or character string in the evaluation waiting speech and / or a pronunciation score of the evaluation waiting speech; And outputting. However, the evaluation waiting voice includes at least one letter voice or a string voice.

好ましくは、前記評価待ち音声の音声データを収集するステップの前に、前記方法は、音声サンプル中にある音声サンプルのテキストデータを表示するステップをさらに含む。 Preferably, prior to the step of collecting voice data of the voice waiting for evaluation, the method further comprises displaying text data of the voice sample in the voice sample.

これに応じて、前記評価待ち音声の音声データを収集するステップは、ユーザが表示した音声サンプル庫中にある音声サンプルのテキストデータに基づいて入力した前記評価待ち音声としての音声データを収集することである。 In response, the step of collecting the voice data of the evaluation waiting voice collects the voice data as the evaluation waiting voice input based on the text data of the voice sample in the voice sample storage displayed by the user. It is.

好ましくは、この方法は、出力された前記評価待ち音声の発音スコア、及び／又は前記評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアを、予め定義された発音のスコア閾値と比較するステップと、前記評価待ち音声の発音スコアが予め定義された発音スコア閾値より低い場合、表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低いテキストデータにマークを付け、及び/又は、評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアが予め定義された発音スコア閾値より低い場合、表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低い文字又は文字列にマークを付けるステップをさらに含む。 Preferably, the method compares the output pronunciation score of the speech to be evaluated and / or the pronunciation score of at least one character or character string in the speech to be evaluated with a predefined pronunciation score threshold. And, if the pronunciation score of the speech awaiting evaluation is lower than a predefined pronunciation score threshold, mark the text data whose pronunciation score is lower than the predefined pronunciation score threshold from the displayed text data, and / or Alternatively, when the pronunciation score of at least one character or character string in the speech waiting for evaluation is lower than a predefined pronunciation score threshold, the character whose pronunciation score is lower than the predefined pronunciation score threshold from the displayed text data or The method further includes marking the character string.

好ましくは、前記認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングし，マッチング結果を得るステップは、Levenshtein Distance編集距離アルゴリズムに基づいて、前記認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチング演算を行い、マッチング結果を得ることである。 Preferably, the step of matching the text data obtained by the recognition with the text data of the voice sample in the voice sample store and obtaining the matching result is based on the Levenshtein Distance edit distance algorithm. Is matched with the text data of the voice sample in the voice sample store to obtain a matching result.

本発明の実施例において、モバイル機器におけるリアルタイム音声評価システムを介して評価待ち音声の音声データを収集し、収集した音声データをテキストデータとして認識し、認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングして、マッチング結果を得る。さらに予め定義された評価ポリシー及びマッチング結果に基づいて、評価待ち音声の発音スコア、及び／又は評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアを得て、且つ出力する。音声評価システムをモバイル機器のクライアント端末において完成することにより、モバイル機器のネットワークに対する依存性を低減し、モバイル機器及びサーバのメッセージ伝送の流量損失を減少させるだけでなく、そのうえ、ユーザにリアルタイム音声評価のフィードバックをすることができる。よって、いつでも何処でも当該音声評価システムを用いて音声練習ができる効果が得られる。 In an embodiment of the present invention, voice data of an evaluation waiting voice is collected through a real-time voice evaluation system in a mobile device, the collected voice data is recognized as text data, and the text data obtained by the recognition is stored in a voice sample store. Matching with text data of a certain audio sample, a matching result is obtained. Further, based on the predefined evaluation policy and matching result, the pronunciation score of the speech awaiting evaluation and / or the pronunciation score of at least one character or character string in the speech awaiting evaluation is obtained and output. Completing the voice evaluation system at the client terminal of the mobile device not only reduces the dependence of the mobile device on the network, reduces the flow loss of message transmission of the mobile device and server, but also provides real-time voice evaluation to the user You can give feedback. Therefore, there is an effect that voice practice can be performed anytime and anywhere using the voice evaluation system.

上記説明は、本発明の概要であり、本発明の技術手段を明確に理解するために、明細書に記載する内容に従って実施すればよい。さらに、本発明の前記ならびに他の目的や特徴や利点をより分かりやすくするために、以下本発明の実施形態をもって説明する。 The above description is an outline of the present invention, and in order to clearly understand the technical means of the present invention, it may be carried out according to the contents described in the specification. Furthermore, in order to make the aforementioned and other objects, features, and advantages of the present invention easier to understand, embodiments of the present invention will be described below.

下記の好ましい実施形態の詳細な説明を読むことにより、各種のその他の効果は本分野の技術者にとって明らかになったであろう。図面は好ましい実施形態を表示するだけに用いられ、本発明を限定するものではない。各図において、同じ参考符号で同じ部品を示す。
本発明の実施例に基づくモバイル機器におけるリアルタイム音声評価システム１００を模式的に示すブロック図である。本発明の実施例に基づくモバイル機器におけるリアルタイム音声評価方法２００を模式的に示すフローチャートである。 Various other advantages will become apparent to those skilled in the art upon reading the following detailed description of the preferred embodiment. The drawings are only used to illustrate preferred embodiments and are not intended to limit the invention. In each drawing, the same reference numerals denote the same parts.
It is a block diagram which shows typically the real-time audio | voice evaluation system 100 in the mobile device based on the Example of this invention. 6 is a flowchart schematically showing a real-time speech evaluation method 200 in a mobile device according to an embodiment of the present invention.

次は、図を参考に本開示の例示性実施例をより詳しく説明する。図をもって本開示の例示性実施例を表示しているが、本開示に述べた方式に拘らず、色々なやり方で本発明を実現すればよいと理解してほしい。逆に、これらの実施例を提供したのは、本開示をよりよく理解してもらい、且つ本明細書に開示された技術範囲を本分野の技術者により完全に伝えるためである。 Next, exemplary embodiments of the present disclosure will be described in more detail with reference to the drawings. While illustrative examples of the present disclosure are shown with figures, it should be understood that the present invention may be implemented in various ways, regardless of the scheme described in the present disclosure. Conversely, these examples are provided in order to provide a better understanding of the present disclosure and to fully convey the technical scope disclosed herein to those skilled in the art.

理解すべきは、本分野の技術者が本明細書に明確な説明や記述がなくても、本発明に含まれる本発明の精神・原理及び範囲内における各構造の実現を見出せることである。 It should be understood that a person skilled in the art can find the realization of each structure within the spirit, principle, and scope of the present invention included in the present invention without any clear explanation or description in the present specification.

本明細書に記述されたすべての例及び条件付き言語は皆説明や教示を目的としたものであり、発明者の従来技術への貢献的な原理と概念を読者に深く理解させるためのものであって、これらの具体的な例及び条件に制限しない。 All examples and conditional languages described herein are for illustrative and teaching purposes only and are intended to give the reader a deep understanding of the principles and concepts that contribute to the prior art of the inventor. Thus, the present invention is not limited to these specific examples and conditions.

本明細書に記述された本発明の原理、各方面及び各実施例又は具体例のあらゆる解釈や説明はその構造上及び機能上における等価物又は等効果物をすべて含むことを意味する。また、このような等価物又は等効果物はいま既知の並びに将来開発される等価物又は等効果物を含むべきであり、つまり、いかなる構造であっても、同じ機能の開発成果を実行されるのである。 Any interpretation or description of the principles, aspects, and examples or specific examples of the invention described herein is meant to include all equivalents or equivalents in structure and function thereof. Also, such equivalents or equivalents should include known and future developed equivalents or equivalents, i.e., any structure can carry out the development of the same function. It is.

本分野の技術者は、明細書添付図面に表示されるブロック図が本発明を実現するための構造又は回路を示す模式図だと理解すべきである。同様に、明細書添付図面に表示されるいかなるフローチャート図などは実際に各種コンピュータ又はプロセッサにより実行される各処理を表示しており、図面にこれらのコンピュータ又はプロセッサを明確に示したか否かは関係しないと理解すべきである。 Those skilled in the art should understand that the block diagram displayed in the accompanying drawings is a schematic diagram showing a structure or a circuit for realizing the present invention. Similarly, any flowcharts and the like displayed in the drawings attached to the specification show the processes actually executed by various computers or processors, and whether or not these computers or processors are clearly shown in the drawings is related. Should not be understood.

請求項の範囲において、機能の限定されたモジュールを実行するのに用いるということは、該機能を実行するためのあらゆる実施形態が含まれ、例えば（a）該機能の回路コンデンサの組み合わせ、または（b）いかなる形のソフトウェアを実行するのを含むことで、ファームウェア及びマイクロコードなどが含まれ、それが適当な回路と組み合わせて、機能実現のソフトウェアを実行するのに用いる。各モジュールで提供された機能が請求項の主張した実施形態と組み合わせることによって、これらの機能を提供できるいずれのモジュール・部品またはコンデンサが請求項に限定されたモジュールと等価であると理解すべきである。 Within the scope of the claims, use to perform a module with limited function includes any embodiment for performing the function, for example (a) a combination of circuit capacitors of the function, or ( b) Including execution of any form of software, including firmware and microcode, etc., used in combination with appropriate circuitry to execute the software that implements the function. It should be understood that any module / part or capacitor capable of providing these functions is equivalent to a module defined in the claims by combining the functions provided in each module with the claimed embodiments of the claims. is there.

明細書の中の術語「実施例」は該実施例に合わせて説明した特徴や構造などが本発明における少なくとも１つの実施例に含まれていることを意味し、従って、明細書の随所に出てくる術語「実施例において」は必ずしも同じ実施例を指すとは限らない。 The term “example” in the specification means that the features, structures, and the like described in accordance with the example are included in at least one embodiment of the present invention, and therefore appear in various places in the specification. The coming term “in the embodiment” does not necessarily refer to the same embodiment.

図１が示す通り、本発明実施例のモバイル機器におけるリアルタイム音声評価システム１００に基づいて、主には収集モジュール１１０と、識別モジュール１３０と、マッチングモジュール１５０と評価モジュール１７０を備えることができる。理解すべきは、図１に表示された各モジュールの接続関係が例示するものだけであり、本分野の技術者が他の接続関係を採用することができ、その接続関係を採用して本発明の機能が実現できればよい。 As shown in FIG. 1, based on the real-time speech evaluation system 100 in the mobile device according to the embodiment of the present invention, a collection module 110, an identification module 130, a matching module 150, and an evaluation module 170 can be mainly provided. It should be understood that the connection relationships of the modules shown in FIG. 1 are only examples, and engineers in this field can adopt other connection relationships. It is only necessary to realize the function.

本明細書において、各モジュールの機能は専用ハードウェアの使用、又は適切なソフトウェアと組み合わせて処理を行うハードウェアの使用によって実現できる。このようなハードウェア又は専用のハードウェアは、専用集積回路（ＡＳＩＣ）と、各種その他の回路と、各種プロセッサなどを備えることができる。プロセッサによって実現する場合、該機能は個別専用のプロセッサ・個別共有プロセッサ・又は複数独立のプロセッサ（そのうちいくつか共有される可能性がある）により提供できる。また、プロセッサがただソフトウェアを実行できるハードウェアだと理解すべきではなく、デジタルシグナルプロセッサ（ＤＳＰ）に限らないハードウェア・ソフトウェアを記憶するための読み取り専用ストレージロム（ＲＯＭ）・ランダムアクセスメモリ（RAM）及び非揮発性ストレージデバイスが暗に備えられるのである。 In this specification, the function of each module can be realized by using dedicated hardware or using hardware that performs processing in combination with appropriate software. Such hardware or dedicated hardware can include a dedicated integrated circuit (ASIC), various other circuits, various processors, and the like. When implemented by a processor, the functions can be provided by dedicated processors, individually shared processors, or multiple independent processors, some of which may be shared. Also, it should not be understood that the processor is just hardware capable of executing software, and is a read-only storage ROM (ROM) / random access memory (RAM) for storing hardware and software that is not limited to a digital signal processor (DSP). ) And non-volatile storage devices are implicitly provided.

本発明の実施例によれば、収集モジュール１１０は、評価待ち音声の音声データを収集するのに用い、その評価待ち音声の中に少なくとも１つの文字の音声又は文字列の音声を含む。好ましくは、評価待ち音声の中に中国語の単語・英語の単語・アラビア数字のいずれの一種類又は多種類の組み合わせを含み、理解すべきは、本発明の実施例に評価待ち音声の言語種類に限定しないことである。 According to an embodiment of the present invention, the collection module 110 is used to collect voice data of voices waiting for evaluation, and includes voices of at least one character or character strings in the voices waiting for evaluation. Preferably, the evaluation-waiting speech includes one or more combinations of Chinese words, English words, and Arabic numerals, and it should be understood that the language type of the evaluation-waiting speech is an embodiment of the present invention. It is not limited to.

本発明の実施例において、収集モジュール１１０は評価待ち音声を登録し、評価待ち音声の音声データを保存する役割を果たす。好ましくは、該収集モジュール１１０は従来のマイクであってもよく、ユーザがマイクを通してシステム１００に評価待ち音声を入力する。例えば、評価待ち音声の内容は以下の英語「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo. 」であってもよい。好ましくは、システム１００は収集モジュール１１０を介して評価待ち音声の音声データを.wav形式の音声ファイルに転換且つ保存する。そのWAV形式はつまり音声波形ファイル形式である。理解すべきは、本発明の実施例に収集モジュール１１０の具体的な構造に限定しないことである。 In the embodiment of the present invention, the collection module 110 plays a role of registering the evaluation waiting voice and storing the voice data of the evaluation waiting voice. Preferably, the collection module 110 may be a conventional microphone, and the user inputs an awaiting evaluation voice to the system 100 through the microphone. For example, the content of the awaiting evaluation voice may be the following English “Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.”. Preferably, the system 100 converts and stores the voice data of the voice waiting for evaluation into a voice file in the .wav format via the collection module 110. That WAV format is an audio waveform file format. It should be understood that the embodiments of the present invention are not limited to the specific structure of the collection module 110.

本発明の実施例によれば、識別モジュール１３０は、収集モジュール１１０が収集した音声データをテキストデータとして認識するのに用いる。 According to an embodiment of the present invention, the identification module 130 is used to recognize the voice data collected by the collection module 110 as text data.

つまり、識別モジュール１３０を介して上記例で説明した評価待ち音声の音声データを以下のテキストデータWELCOME TO LIU LI ＳHUO! MY NAME IＳ PETER. I'M AN ENGLIＳH TEACHER AT LIU LI ＳHUOとして認識できる。 That is, the voice data of the evaluation waiting voice described in the above example can be recognized as the following text data WELCOME TO LIU LI SHUO! MY NAME IS PETER.

好ましくは、本発明の実施例において、識別モジュール１３０が採用する音声認識モデルは混合ガウス分布を出力確率分布とする隠れマルコフモデル（Hidden Markov Model，HMM）である。 Preferably, in the embodiment of the present invention, the speech recognition model employed by the identification module 130 is a hidden Markov model (HMM) having a mixed Gaussian distribution as an output probability distribution.

識別モジュール１３０は、定点演算を行って収集モジュール１１０が収集した音声データをテキストデータとして認識する。例えば、以下の方式で定点演算を行う。もちろんこれに限らない。 The identification module 130 performs fixed point calculation and recognizes the voice data collected by the collection module 110 as text data. For example, the fixed point calculation is performed by the following method. Of course, it is not limited to this.

方式１、従来の音声認識アルゴリズムにおいて、浮動小数点演算が多くあり、定点DＳP（定点DＳPが完成したのは整数演算又は小数点演算であり、データフォーマットにはデータコードを含まず、通常定点DＳPは１６ビットまたは２４ビットのデータ幅がある）を用いて浮動小数点演算ができ、そして数のスケーリング法を用いて浮動点数を固定点数に転換する。数のスケーリング法はつまり小数点が定点における位置を決めることである。Ｑ表示法は常用のスケーリング法であり、その表示仕組みは、定点数をx、浮動点数をｙとし、Ｑ表示法の定点数と浮動点数の転換関係は、浮動点数ｙを定点数ｘに転換し、x=（int）y×２^Qである。 Method 1, conventional speech recognition algorithm has many floating point operations, fixed point DSP (fixed point DSP is completed by integer operation or decimal point operation, data format does not include data code, and normal fixed point DSP is 16 Can be used to perform floating point operations, and a number scaling method is used to convert a floating point number to a fixed point number. The number scaling method is to determine the position of the decimal point at a fixed point. Q display method is a regular scaling method, the display mechanism is fixed point number x and floating point number is y, and the conversion relationship between fixed point number and floating point number in Q display method is to convert floating point number y to fixed point number x then, it is x = (int) y × 2 Q.

方式２、（１）アルゴリズム構造を定義及び簡略する。（２）量子化必要の関数中のキー変数を確定する。（３）キー変数の統計情報を収集する。（４）キー変数の正確な表示を確定する。（５）その他の変数の定点フォーマットを確定する。 Method 2, (1) Define and simplify the algorithm structure. (2) Determine key variables in functions that require quantization. (3) Collect statistical information of key variables. (4) Determine the correct display of key variables. (5) Determine the fixed point format for other variables.

これにより、本発明の実施例において定点演算を用いて一般的な浮動小数点を代替することができ、且つ整数で一般的な浮動小数点を代替して認識結果の出力確率を代表する。本発明の実施例において定点演算を用いることができ、該定点演算は浮動小数点演算に対して多くのパラメータを定義する必要がないため、識別モジュール１３０は少ないシステムリソース（CPUリソース、メモリリソース、ストレージリソース）を占用の情況下で認識過程を完成させることができる。理解すべきは、本発明の実施例において、識別モジュール１３０が文字認識に採用する認識モデルの具体的なタイプに限定しないことである。 Thus, in the embodiment of the present invention, a general floating point can be replaced by using a fixed point operation, and the general floating point is replaced by an integer to represent the output probability of the recognition result. Since the fixed point operation can be used in the embodiment of the present invention, and the fixed point operation does not need to define many parameters for the floating point operation, the identification module 130 has few system resources (CPU resource, memory resource, storage). The recognition process can be completed under the circumstances of occupying resources. It should be understood that the embodiment of the present invention is not limited to the specific type of recognition model that the identification module 130 employs for character recognition.

本発明の実施例によれば、マッチングモジュール１５０は、識別モジュール１３０が認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングし、マッチング結果を得る。 According to the embodiment of the present invention, the matching module 150 matches the text data obtained by the recognition module 130 with the text data of the voice sample stored in the voice sample repository, and obtains a matching result.

好ましくは、本発明の実施例において音声サンプル庫にある音声サンプルのテキストデータは予め音声サンプル庫に保存したテキストデータであってもよく、例えば、予め以下のテキストデータ「WELCOME TO LIU LI ＳHUO! MY NAME IＳ PETER. I'M AN ENGLIＳH TEACHER AT LIU LI ＳHUO」を音声サンプル庫に保存する。 Preferably, in the embodiment of the present invention, the text data of the audio sample in the audio sample store may be text data previously stored in the audio sample store. For example, the following text data “WELCOME TO LIU LI SHUO! Save NAME IS PETER. I'M AN ENGLISH TEACHER AT LIU LI SHUO in the audio sample store.

好ましくは、本発明の実施例において、マッチングモジュール１５０はさらに、Levenshtein Distance編集距離アルゴリズムに基づいて、識別モジュール１３０が認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチング演算を行い、マッチング結果を得る。そのうち、該マッチング結果は、識別モジュール１３０が認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングすることと、識別モジュール１３０が認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングしないことが含まれてもよい。理解すべきは、本発明の実施例においてマッチングモジュール１５０が採用されたマッチングアルゴリズムに限定しないことである。 Preferably, in the embodiment of the present invention, the matching module 150 further performs a matching operation on the text data obtained by the recognition module 130 based on the Levenshtein Distance edit distance algorithm and the text data of the voice sample in the voice sample repository. To obtain matching results. Among them, the matching result is obtained by matching the text data obtained by the recognition module 130 with the text data of the voice sample in the voice sample store, and the text data obtained by the recognition module 130 by using the voice sample store. May not be matched with the text data of the audio sample in It should be understood that the present invention is not limited to the matching algorithm in which the matching module 150 is employed in the embodiment of the present invention.

本発明の実施例によれば，評価モジュール１７０は、予め定義された評価ポリシー及びマッチングモジュール１５０がマッチングして得たマッチング結果に基づいて、評価待ち音声中の少なくとも１つの文字又は文字列の発音スコア、及び/又は評価待ち音声の発音スコアを得て、且つ出力することに用いる。 According to the embodiment of the present invention, the evaluation module 170 generates the pronunciation of at least one character or character string in the evaluation-waiting speech based on the predefined evaluation policy and the matching result obtained by the matching module 150 matching. It is used to obtain and output a score and / or a pronunciation score of a voice waiting for evaluation.

好ましくは、本発明の実施例において、予め定義された評価ポリシーは、認識して得たテキストデータが音声サンプル庫にある音声サンプルのテキストデータとマッチングする場合、認識して得たテキストデータ中の文字又は文字列の事後確率を評価待ち音声中の文字又は文字列の発音スコアとし、及び評価待ち音声中の全ての文字又は文字列の発音スコアの平均スコアを評価待ち音声の発音スコアとする。 Preferably, in the embodiment of the present invention, when the predefined evaluation policy matches the text data obtained by recognition with the text data of the voice sample in the voice sample store, The a posteriori probability of the character or character string is set as the pronunciation score of the character or character string in the evaluation waiting speech, and the average score of the pronunciation scores of all the characters or character strings in the evaluation waiting speech is set as the pronunciation score of the evaluation waiting speech.

好ましくは、本発明の実施例において、音声データが認識して得た文字又は文字列の事後確率がp（０〜１）であることに基づいて、該文字又は文字列の発音スコアはp×１００である。 Preferably, in the embodiment of the present invention, the pronunciation score of the character or character string is p × based on the fact that the posterior probability of the character or character string obtained by recognition of the speech data is p (0 to 1). 100.

上記挙げた英語文を例として、評価モジュール１７０を介して全体の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」の発音スコア、及び/又は上記英語文の中の各単語の発音スコアを得ることができる。つまり、本発明の実施例においてセンテンスや単語で構成されたユニグラム言語モデル（unigram language model）を使用してもよい。 Taking the English sentence listed above as an example, the overall English sentence “Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.” The pronunciation score of each word in the above English sentence can be obtained. That is, in the embodiment of the present invention, a unigram language model composed of sentences and words may be used.

本発明の実施例によれば、前記モバイル機器におけるリアルタイム音声評価システム１００は１つ又は複数のオプションモジュールをさらに備えることによって、追加または付加機能の実現ができる。但し、これらのオプションモジュールは本発明の目的実現には必ずしも不可欠なものではなく、本発明の実施例によれば、モバイル機器におけるリアルタイム音声評価システム１００はこれらのオプションモジュールが備えられない場合でも、本発明の目的実現ができる。これらのオプションモジュールは図１に表示されなくても、上記各モジュール間との接続関係は本分野の技術者が下記の教示によって容易に見出せる。 According to an embodiment of the present invention, the real-time voice evaluation system 100 in the mobile device may further include one or a plurality of option modules, thereby realizing additional or additional functions. However, these optional modules are not necessarily indispensable for realizing the object of the present invention. According to the embodiment of the present invention, the real-time speech evaluation system 100 in a mobile device does not include these optional modules. The object of the present invention can be realized. Even if these optional modules are not displayed in FIG. 1, the connection relationship between the modules can be easily found by a person skilled in the art by the following teaching.

好ましくは、本発明の実施例において、システム１００は、音声サンプル庫にある音声サンプルのテキストデータを表示するのに用いる表示モジュールをさらに備え、例えば以下の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」を表示する。 Preferably, in an embodiment of the present invention, the system 100 further comprises a display module used to display the text data of the audio sample in the audio sample repository, for example, the following English sentence “Welcome to Liu Li shuo! My name” is Peter. I'm an English teacher at Liu Li shuo. "

これに応じて、収集モジュール１１０は、ユーザが表示モジュールに表示された音声サンプル庫にある音声サンプルのテキストデータに基づいて入力した評価待ち音声としての音声データを収集するのにさらに用いる。 In response to this, the collection module 110 is further used to collect voice data as an evaluation-waiting voice input based on the text data of the voice sample in the voice sample store displayed on the display module by the user.

つまり、収集モジュール１１０はユーザが朗読した以下の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」の音声データを収集する。 That is, the collection module 110 collects voice data of the following English sentence “Welcome to Liu Li shuo! My name is Peter. I ’m an English teacher at Liu Li shuo.” Read by the user.

好ましくは、本発明の実施例において、システム１００はスコア比較モジュール及びマーキングモジュールをさらに備え、そのうち、
上記スコア比較モジュールは評価モジュール１７０が出力した評価待ち音声の発音スコア、及び／又は評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアを、予め定義された発音スコア閾値と比較するのに用い、好ましくは、予め定義された発音スコア閾値を６０スコアに設定することができ、理解すべきは、本発明の実施例においてその具体的な値に限定しないことである。 Preferably, in an embodiment of the present invention, the system 100 further comprises a score comparison module and a marking module, of which
The score comparison module compares the pronunciation score of the evaluation waiting speech output by the evaluation module 170 and / or the pronunciation score of at least one character or character string in the evaluation standby speech with a predefined pronunciation score threshold. In use, preferably, the predefined pronunciation score threshold can be set to 60 scores, and it should be understood that it is not limited to that particular value in the embodiments of the present invention.

マーキングモジュールは、評価待ち音声の発音スコアが予め定義された発音スコア閾値より低い場合、表示モジュールが表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低いテキストデータにマークを付け、及び/又は、評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアが予め定義された発音スコア閾値より低い場合、表示モジュールが表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低い文字又は文字列にマークを付けるのに用いる。 The marking module marks text data whose pronunciation score is lower than the predefined pronunciation score threshold from among the text data displayed by the display module when the pronunciation score of the speech waiting for evaluation is lower than the predefined pronunciation score threshold. And / or when the pronunciation score of at least one character or character string in the waiting voice for evaluation is lower than a predefined pronunciation score threshold, the pronunciation score having a predefined pronunciation score from the text data displayed by the display module Used to mark a character or character string below the score threshold.

上記に挙げた英語文を例として、スコア比較モジュールが比較して「Welcome」の発音スコアが予め定義された発音スコア閾値より低いと分かった場合、全体の英語文の中から「Welcome」にマークを付けることができ、好ましくは、「Welcome」の色を赤に設定することである。 Taking the English sentence listed above as an example, if the score comparison module compares and finds that the pronunciation score of "Welcome" is lower than the predefined pronunciation score threshold, mark "Welcome" in the entire English sentence The “Welcome” color is preferably set to red.

好ましくは、本発明の実施例において、システム１００は、音声サンプル庫を記憶するのに用いるストレージモジュールをさらに備える。但し、音声サンプル庫には少なくとも１つの音声サンプルを含み、例えば下記の音声サンプル「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」である。 Preferably, in an embodiment of the present invention, the system 100 further comprises a storage module used to store the audio sample store. However, the audio sample store includes at least one audio sample, for example, the following audio sample "Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo."

本発明の実施例を通して、音声評価システムをモバイル機器のクライアント端末において完成することにより、モバイル機器のネットワークに対する依存性を低減し、モバイル機器及びサーバのメッセージ伝送の流量損失を減少させるだけでなく、そのうえ、ユーザにリアルタイム音声評価のフィードバックをすることができる。よって、いつでも何処でも当該音声評価システムを用いて音声練習ができる効果が得られる。 Through the embodiments of the present invention, not only reducing the dependency of the mobile device on the network and reducing the flow loss of message transmission of the mobile device and the server by completing the voice evaluation system at the client terminal of the mobile device, In addition, real-time voice evaluation feedback can be provided to the user. Therefore, there is an effect that voice practice can be performed anytime and anywhere using the voice evaluation system.

本発明のもう一つの実施例によれば、前記の本発明の実施例によるモバイル機器におけるリアルタイム音声評価システム１００に対応して、本発明は端末機器におけるリアルタイム音声評価方法２００をさらに提供する。 According to another embodiment of the present invention, corresponding to the above-described real-time voice evaluation system 100 in a mobile device according to the embodiment of the present invention, the present invention further provides a real-time voice evaluation method 200 in a terminal device.

図２は、本発明の実施例に基づくモバイル機器におけるリアルタイム音声評価方法２００を模式的に示すフローチャートである。図２の示す通り、前記方法２００はステップＳ２１０・Ｓ２３０・Ｓ２５０・Ｓ２７０を含み、方法２００はステップＳ２１０から始まり、そのうち、評価待ち音声の音声データを収集する。その評価待ち音声中に少なくとも１つの文字の音声又は文字列の音声を含み、好ましくは、評価待ち音声の中に中国語の単語・英語の単語・アラビア数字のいずれの一種類又は多種類の組み合わせを含み、理解すべきは、本発明の実施例において評価待ち音声の言語種類に限定しないことである。 FIG. 2 is a flowchart schematically showing a real-time speech evaluation method 200 in a mobile device according to an embodiment of the present invention. As shown in FIG. 2, the method 200 includes steps S210, S230, S250, and S270, and the method 200 starts from step S210, in which voice data of voices waiting for evaluation is collected. The voice waiting for evaluation includes voice of at least one character or character string, and preferably one or more combinations of Chinese words, English words, Arabic numerals in the voice waiting for evaluation It should be understood that the present invention is not limited to the language type of the voice waiting for evaluation in the embodiment of the present invention.

好ましくは、ユーザがマイクを通してシステム１００に評価待ち音声を入力することができる。例えば、評価待ち音声の内容は以下の英語「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」であってもよい。好ましくは、システム１００は収集モジュール１１０を介して評価待ち音声の音声データを.wav形式の音声ファイルに転換且つ保存する。そのWAV形式はつまり音声波形ファイル形式である。 Preferably, a user can input an evaluation waiting voice to the system 100 through a microphone. For example, the content of the awaiting evaluation voice may be the following English “Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.” Preferably, the system 100 converts and stores the voice data of the voice waiting for evaluation into a voice file in the .wav format via the collection module 110. That WAV format is an audio waveform file format.

その後、ステップＳ２３０において、収集した音声データをテキストデータに認識する。つまり、ステップＳ２３０を通して、上記例で説明した評価待ち音声の音声データを以下のテキストデータWELCOME TO LIU LI ＳHUO! MY NAME IＳ PETER. I'M AN ENGLIＳH TEACHER AT LIU LI ＳHUOに認識する。 Thereafter, in step S230, the collected voice data is recognized as text data. That is, through step S230, the voice data of the voice waiting for evaluation described in the above example is recognized by the following text data WELCOME TO LIU LI SHUO! MY NAME IS PETER.

好ましくは、本発明の実施例において、採用した音声認識モデルは混合ガウス分布を出力確率分布とする隠れマルコフモデル（Hidden Markov Model，HMM）である。つまり、本発明の実施例において定点演算を用いて一般的な浮動小数点を代替し、且つ整型数で一般的な浮動小数点を代替して認識結果の出力確率を代表する。理解すべきは、本発明の実施例において文字認識が採用する認識モデルの具体的なタイプに限定しないことである。 Preferably, in the embodiment of the present invention, the employed speech recognition model is a hidden Markov model (HMM) having a mixed Gaussian distribution as an output probability distribution. In other words, in the embodiment of the present invention, a general floating point is substituted by using a fixed point operation, and the general floating point is substituted by an integer number to represent the output probability of the recognition result. It should be understood that the embodiment of the present invention is not limited to the specific type of recognition model employed by character recognition.

その後、ステップＳ２５０において、認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングし、マッチング結果を得る。 Thereafter, in step S250, the recognized text data is matched with the text data of the voice sample in the voice sample box, and a matching result is obtained.

好ましくは、本発明の実施例において音声サンプル庫にある音声サンプルのテキストデータは予め音声サンプル庫に保存したテキストデータであってもよく、例えば予め以下のテキストデータWELCOME TO LIU LI ＳHUO! MY NAME IＳ PETER. I'M AN ENGLIＳH AT LIU LI ＳHUOを音声サンプル庫に保存する。 Preferably, in the embodiment of the present invention, the text data of the audio sample in the audio sample store may be text data previously stored in the audio sample store. For example, the following text data WELCOME TO LIU LI SHUO! MY NAME IS Save PETER. I'M AN ENGLISH AT LIU LI SHUO in the audio sample store.

好ましくは、本発明の実施例において、ステップＳ２５０には、Levenshtein Distance編集距離アルゴリズムに基づいて、認識して得たテキストデータを、音声サンプル庫にある音声サンプルのテキストデータとマッチング演算を行い、マッチング結果を得る。例えば、該マッチング結果は、認識して得たテキストデータを音声サンプル庫にある音声サンプルのテキストデータとマッチングしないことを含む。理解すべきは、本発明の実施例においてマッチングアルゴリズムに限定しないことである。 Preferably, in the embodiment of the present invention, in step S250, the text data obtained by recognizing based on the Levenshtein Distance editing distance algorithm is subjected to matching operation with the text data of the voice sample in the voice sample warehouse, and matching is performed. Get results. For example, the matching result includes not matching the text data obtained by recognition with the text data of the voice sample in the voice sample store. It should be understood that the embodiments of the present invention are not limited to matching algorithms.

その後、ステップＳ２７０において、予め定義された評価ポリシー及び前記マッチング結果に基づいて、評価待ち音声中の少なくとも１つの文字又は文字列的発音スコア、及び/又は評価待ち音声の発音スコアを得て、且つ出力する。 Thereafter, in step S270, based on a predefined evaluation policy and the matching result, obtain at least one character or string-like pronunciation score in the evaluation pending speech and / or a pronunciation score of the evaluation pending speech, and Output.

好ましくは、本発明の実施例において、予め定義された評価ポリシーは、認識して得たテキストデータが音声サンプル庫にある音声サンプルのテキストデータとマッチングする場合、認識して得たテキストデータ中の文字又は文字列の事後確率を評価待ち音声中の文字又は文字列の発音スコアとし、及び評価待ち音声中の全ての文字又は文字列の発音スコアの平均スコアを評価待ち音声の発音スコアとする。
好ましくは、本発明の実施例において、音声データが認識して得た文字又は文字列の事後確率がp（０〜１）であることに基づいて、該文字又は文字列の発音スコアはp×１００である。 Preferably, in the embodiment of the present invention, when the predefined evaluation policy matches the text data obtained by recognition with the text data of the voice sample in the voice sample store, The a posteriori probability of the character or character string is set as the pronunciation score of the character or character string in the evaluation waiting speech, and the average score of the pronunciation scores of all the characters or character strings in the evaluation waiting speech is set as the pronunciation score of the evaluation waiting speech.
Preferably, in the embodiment of the present invention, the pronunciation score of the character or character string is p × based on the fact that the posterior probability of the character or character string obtained by recognition of the speech data is p (0 to 1). 100.

上記挙げた英語文を例として、ステップＳ２７０を通して全体の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」の発音スコア、及び/又は上記英語文の中の各単語の発音スコアを得ることができる。つまり、本発明の実施例においてセンテンスや単語で構成されたユニグラム言語モデル（unigram language model）を使用してもよい。 Taking the English sentence listed above as an example, the pronunciation score of the entire English sentence “Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.” And / or the above English sentence through step S270 The pronunciation score of each word in can be obtained. That is, in the embodiment of the present invention, a unigram language model composed of sentences and words may be used.

本発明の実施例によれば、前記モバイル機器におけるリアルタイム音声評価方法２００は、１つ又は複数のオプションステップをさらに含むことによって、追加または付加機能の実現ができる。但し、これらのオプションステップは本発明の目的実現には必ずしも不可欠なものではなく、本発明の実施例によれば、モバイル機器におけるリアルタイム音声評価方法２００はこれらのオプションステップが含まれない場合でも、本発明の目的実現ができる。これらのオプションステップは図２に表示されなくても、上記各ステップ間との実行順序は本分野の技術者が下記の教示によって容易に見出すことができる。指摘しておきたいのは、特別の説明がない限り、これらのオプションステップ及び上記ステップの実行順序は実際の必要によって選択できる。 According to an embodiment of the present invention, the real-time speech evaluation method 200 in the mobile device can realize additional or additional functions by further including one or more optional steps. However, these optional steps are not necessarily indispensable for realizing the object of the present invention. According to the embodiment of the present invention, the real-time speech evaluation method 200 in a mobile device does not include these optional steps. The object of the present invention can be realized. Even if these optional steps are not displayed in FIG. 2, the order of execution between the above steps can be easily found by a person skilled in the art by the following teaching. It should be pointed out that these optional steps and the order of execution of these steps can be selected according to actual needs unless otherwise specified.

好ましくは、方法２００は、音声サンプル庫にある音声サンプルのテキストデータのテキストデータを表示するステップをさらに含み、例えば以下の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo.」を表示する。 Preferably, the method 200 further includes the step of displaying text data of the text data of the voice sample in the voice sample repository, for example, the following English sentence “Welcome to Liu Li shuo! My name is Peter. teacher at Liu Li shuo. "is displayed.

これに応じて、前記評価待ち音声の音声データを収集するステップ（Ｓ２１０）は、ユーザが表示された音声サンプル庫中にある音声サンプルに基づいて入力した前記評価待ち音声としての音声データを収集する。 In response to this, the step (S210) of collecting the voice data of the evaluation waiting voice collects the voice data as the evaluation waiting voice inputted based on the voice sample in the displayed voice sample storage. .

つまり、ステップＳ２１０を通してユーザが朗読する以下の英語文「Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo」の音声データを収集する。 That is, voice data of the following English sentence “Welcome to Liu Li shuo! My name is Peter. I'm an English teacher at Liu Li shuo” read by the user through step S210 is collected.

好ましくは、方法２００は、出力した評価待ち音声の発音スコア、及び／又は評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアを、予め定義された発音スコア閾値と比較するステップをさらに含む。好ましくは、予め定義された発音スコア閾値を６０スコアに設定する。理解すべきは、本発明の実施例においてその具体的な値に限定しないことである。 Preferably, the method 200 further includes the step of comparing the pronunciation score of the output waiting speech and / or the pronunciation score of at least one character or character string in the waiting speech to a predefined pronunciation score threshold. . Preferably, the predefined pronunciation score threshold is set to 60 scores. It should be understood that the embodiments of the present invention are not limited to specific values.

前記評価待ち音声の発音スコアが予め定義された発音スコア閾値より低い場合、表示されたテキストデータの中から発音スコアが予め定義された発音スコア閾値より低いテキストデータにマークを付け、及び/又は、評価待ち音声中の少なくとも１つの文字又は文字列の発音スコアが予め定義された発音スコア閾値より低い場合、表示したテキストデータの中から発音スコアが予め定義された発音スコア閾値より低い文字又は文字列にマークを付ける。 If the pronunciation score of the speech awaiting evaluation is lower than a predefined pronunciation score threshold, mark the text data whose pronunciation score is lower than the predefined pronunciation score threshold from the displayed text data, and / or When the pronunciation score of at least one character or character string in the evaluation waiting speech is lower than a predefined pronunciation score threshold, the character or character string whose pronunciation score is lower than the predefined pronunciation score threshold among the displayed text data Mark the.

上記に挙げた英語文を例として、比較して「Welcome」の発音スコアが予め定義された発音スコア閾値より低いと分かった場合、全体の英語文の中から「Welcome」にマークを付けることができ、好ましくは、「Welcome」の色を赤に設定することである。 Using the English sentences listed above as an example, if it is found that the pronunciation score of "Welcome" is lower than the predefined pronunciation score threshold, "Welcome" may be marked from the entire English sentence Preferably, the color of “Welcome” is set to red.

上記各方法にかかる実施例は上記各機器にかかる実施例に対応しているため、各方法の実施例についてはこれ以上詳しく説明しない。 Since the embodiment according to each method corresponds to the embodiment according to each device, the embodiment of each method will not be described in detail.

本明細書において、具体的な細部を多く説明した。しかし、理解すべきは、本発明の実施例はこれらの詳細な説明がなくても実施できる。いくつの実施例において、公知の方法・構造及び技術を詳細に明示しなかったのは、読者に本明細書の原理に対する理解を混同させないためである。 In this specification, a number of specific details have been described. However, it should be understood that embodiments of the invention may be practiced without these detailed descriptions. In some embodiments, well-known methods, structures and techniques have not been shown in detail in order not to confuse the reader with an understanding of the principles herein.

本分野の技術者が理解すべきは、各実施例における装置の中のモジュールを適切に変えることができ、且つ、それらを該実施例と異なる１つまたは複数の装置の中に設置することができるのである。実施例の中の若干モジュールを１つのモジュール又はニュット或は組合体に組み合わせて、また、それらを複数のサブモジュール又はサブユニット或はサブ組合体に分けることができる。特徴及び/又は反発し合う処理の場合を除けば、いかなる組み合わせを採用し、本明細書に公開されたいかなる方法のすべてのステップ及びいかなる装置のすべてのモジュールを組み合わせることができる。さらに明確な陳述がない限り、本説明書の中に公開された各特徴は、すべて提供された同様・等価・類似する目的の代替特徴を用いて代替することができる。 It should be understood by those skilled in the art that the modules in the devices in each embodiment can be appropriately changed and that they can be installed in one or more devices different from the embodiments. It can be done. Some modules in the embodiments can be combined into one module or nut or combination, and they can be divided into multiple sub-modules or subunits or sub-combinations. Except for the case of features and / or repulsive processes, any combination can be employed and all steps of any method disclosed herein and all modules of any apparatus can be combined. Unless otherwise stated explicitly, each feature disclosed in this document may be replaced with alternative features for similar, equivalent, and similar purposes provided.

本発明の各装置における実施例は、ハードウェアで実現でき、又は、１つ又は複数のプロセッサ上で実行されるソフトモジュールで実現でき、又は、それらの組み合わせで実現できる。本分野の技術者が理解すべきは、実施する際にマイクロプロセサ又はデジタルシグナルプロセッサ（DＳP）を用いて、本発明実施例における装置の一部又は全部のモジュールの一部又は全部機能によって実現できることである。本発明はさらにここで説明した方法を実行するための装置プログラムの実現もできる（例えば、コンピュータプログラム及びコンピュータプログラム製品）。 Embodiments in each apparatus of the present invention can be realized by hardware, can be realized by a software module executed on one or a plurality of processors, or a combination thereof. It should be understood by those skilled in the art that a part or all of functions of some or all of the modules in the embodiment of the present invention can be realized by using a microprocessor or a digital signal processor (DSP). It is. The present invention can also implement apparatus programs for executing the methods described herein (eg, computer programs and computer program products).

注意すべきことは、上記の実施例は本発明を説明するものであり、本発明を制限するものではない。本分野の技術者は添付の権利請求の範囲をはずれない前提で、色々な代替実施例を設けられる。権利請求範囲の中で、特徴の並べる順番は特徴の特定の順位を意味しない。特に、方法での権利請求範囲の中での各ステップの順番はこれらのステップが該順番によって実行すべきことを意味しない。逆に、これらのステップは、いずれの適切な順序で実行してもよい。同様に、装置権利請求範囲の中での各モジュールの実行順番も、権利請求範囲中の各モジュールの順番の制限を受けるべきではなく、いずれの適切な順序で実行してもよい。権利請求範囲において、括弧内の参考になるいかなる文言を権利請求範囲への制限だと理解すべきではない。術語「含む」は、権利請求範囲の中に並べていないモジュールまたはステップの存在を排除しない。モジュールまたはステップという術語前の「１」又は「１つ」は、複数のこのようなモジュール又はステップの存在を排除しない。本発明は若干異なるモジュールを含んだハードウェア又は適切なプログラミングのコンピュータ若しくはプロセッサによって実現することができる。若干のモジュールを列挙した装置権利請求範囲の中で、これらのモジュール中の若干項は同一のハードウェアモジュールを介して実現することができる。術語「第一」・「第二」・「第三」などの使用はいかなる順序を表示せず、これらの術語を名称として解釈できる。術語「接続」・「カプラ」などは本明細書にて使用する際、いかなる期待する形式で操作可能な接続すると定義する。例えば、機械的・電子的・デジタル的・シミュレーション的・直接的・間接的に、ソフトウェア・ハードウェアなどの方式で接続を行う。 It should be noted that the above examples are illustrative of the invention and are not intended to limit the invention. Those skilled in the art will be provided with various alternative embodiments, provided that they do not depart from the scope of the appended claims. In the claims, the order in which features are arranged does not mean a specific order of features. In particular, the order of the steps in a method claim does not mean that the steps should be performed in that order. Conversely, these steps may be performed in any suitable order. Similarly, the execution order of the modules in the device claim should not be limited by the order of the modules in the claim, and may be executed in any appropriate order. In the claims, you should not understand any reference in parentheses as a limitation to the claims. The term “comprising” does not exclude the presence of modules or steps that are not listed in a claim. A “1” or “one” before the term module or step does not exclude the presence of a plurality of such modules or steps. The present invention can be implemented by hardware including slightly different modules or by a suitably programmed computer or processor. Within the device claim that enumerates some modules, some of these modules can be implemented via the same hardware module. The use of the terms “first”, “second”, “third”, etc. does not indicate any order, and these terms can be interpreted as names. The terms “connection”, “coupler”, etc., as used herein, are defined as connections that can be operated in any expected form. For example, the connection is performed by a method such as software, hardware, etc., mechanically, electronically, digitally, simulationally, directly or indirectly.

Claims

A real-time voice evaluation system (100) in a mobile device,
A collection module (110) used to collect voice data of voices waiting for evaluation;
An identification module (130) used for recognizing voice data collected by the collection module (110) as text data;
A matching module (150) used for matching the text data obtained by recognition by the identification module (130) with the text data of the voice sample in the voice sample repository and obtaining a matching result;
Based on a predefined evaluation policy and a matching result obtained by matching by the matching module (150), a pronunciation score of at least one character or character string in the evaluation waiting speech and / or the evaluation waiting speech An evaluation module (170) used to obtain and output a pronunciation score;
The real-time speech evaluation system in a mobile device, wherein the evaluation-waiting speech includes at least one character or speech of a character string.

The system further comprises a display module used to display text data of audio samples in the audio sample repository,
The collection module (110) is further used to collect voice data as a voice to be evaluated, which is input based on a voice sample in a voice sample storage displayed on the display module by a user. The real-time audio | voice evaluation system in the mobile device of Claim 1.

The pronunciation score of the evaluation pending speech output by the evaluation module (170) and / or the pronunciation score of at least one character or character string in the evaluation pending speech is used to compare with a predefined pronunciation score threshold. A score comparison module;
If the pronunciation score of the speech waiting for evaluation is lower than a predefined pronunciation score threshold, mark the text data whose pronunciation score is lower than the predefined pronunciation score threshold from the text data displayed by the display module, and / or Or
When the pronunciation score of the character or character string in the voice waiting for evaluation is lower than a predefined pronunciation score threshold, the character or character whose pronunciation score is lower than the predefined pronunciation score threshold among the text data displayed by the display module The real-time speech evaluation system for a mobile device according to claim 2, further comprising a marking module used to mark the column.

Based on the Levenshtein Distance edit distance algorithm, the matching module (150) performs a matching operation on the text data obtained by the identification module (130) and the text data of the voice sample in the voice sample store, and performs a matching result. The real-time speech evaluation system in a mobile device according to claim 1, further used to obtain

When the predefined evaluation policy matches the text data obtained by recognition with the text data of the voice sample in the voice sample store, the character or the character string in the text data obtained by recognition based on the voice data The a posteriori probability is a pronunciation score of a character or character string in the speech waiting for evaluation, and an average score of the pronunciation scores of all characters or character strings in the speech awaiting evaluation is used as the pronunciation score of the speech waiting for evaluation. The real-time audio | voice evaluation system in the mobile device as described in any one of Claims 1 thru | or 4.

5. The storage device according to claim 1, further comprising a storage module used to store the sound sample store, wherein the sound sample store includes at least one sound sample. A real-time voice evaluation system for the described mobile device.

A real-time voice evaluation method (200) in a terminal device,
Collecting voice data of voice to be evaluated (S210);
Identifying the collected voice data as text data (S230);
Matching the text data obtained by the recognition with the text data of the voice sample in the voice sample warehouse and obtaining a matching result (S250);
Obtaining and outputting at least one character or string-like pronunciation score in the evaluation pending speech and / or the pronunciation score of the evaluation pending speech based on a predefined evaluation policy and the matching result ( S270),
A real-time voice evaluation method in a terminal device, comprising: a voice of at least one character or a voice of a character string in the evaluation waiting voice.

Prior to the step (S210) of collecting voice data of the voice waiting for evaluation, the method further includes displaying text data of the voice sample in the voice sample;
The step (S210) of collecting voice data of the evaluation waiting voice includes
8. The real-time voice evaluation method in a terminal device according to claim 7, wherein the voice data is input as voices waiting for evaluation, which are input based on voice samples in the voice sample box displayed by the user. .

Comparing the output pronunciation score of the speech to be evaluated and / or the pronunciation score of at least one character or character string in the speech to be evaluated to a predefined pronunciation score threshold;
Marking the text data whose pronunciation score is lower than the predefined pronunciation score threshold from among the displayed text data when the pronunciation score of the speech waiting for evaluation is lower than the predefined pronunciation score threshold; and / or When the pronunciation score of at least one character or character string in the waiting voice for evaluation is lower than a predefined pronunciation score threshold, the character or character whose pronunciation score is lower than the predefined pronunciation score threshold among the displayed text data 9. The method for evaluating real-time speech in a terminal device according to claim 8, further comprising the step of marking the column.

The step of matching the text data obtained by the recognition with the text data of the voice sample in the voice sample warehouse and obtaining a matching result,
10. The text data obtained by recognizing based on the Levenshtein Distance editing distance algorithm is matched with the text data of the voice sample in the voice sample box to obtain a matching result. The real-time audio | voice evaluation method in the terminal device as described in any one.