WO2017173721A1

WO2017173721A1 - Speech recognition method and device

Info

Publication number: WO2017173721A1
Application number: PCT/CN2016/083516
Authority: WO
Inventors: 潘春岭
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-04-06
Filing date: 2016-05-26
Publication date: 2017-10-12
Also published as: CN107274886B; CN107274886A

Abstract

Provided are a speech recognition method and device. The method comprises: establishing correspondence relationships between pronunciations of words frequently used by a person having a speech impediment and standard pronunciations of the words (S101); and receiving a speech input from the person having a speech impediment, recognizing, according to the established correspondence relationships, a corresponding standard pronunciation, and executing an operation corresponding to the recognized standard pronunciation (S102). Establishment of correspondence relationships between pronunciations of words used by a person having a speech impediment and standard pronunciations realizes accurate recognition of the speech of said person, facilitates accurate expression of their thoughts or purposes, and causes a controlled device to correctly execute a voice command from said person, facilitating their ability to express themselves through speech and enabling them to gain confidence in daily life.

Description

Speech recognition method and device

Technical field

The invention relates to the field of speech recognition technology, in particular to a speech recognition method and device.

Background technique

At present, with the continuous development of voice recognition technology, more and more devices (such as mobile phones, televisions, air conditioners, etc.) can perform corresponding functions through voice control, for example, controlled devices detect voice control. When the command is executed, the corresponding operation can be performed according to the detected control command, and therefore, the voice interaction brings a lot of convenience to the user's daily life.

In the prior art, for people from different countries or different regions, the controlled device can translate the dialects of different countries or different regions through a plurality of voice translation systems, and perform corresponding operations according to the translated control instructions.

However, with the existing technology, for patients with speech disorders caused by late disease, such as stroke, the patients with speech disorders, they can read simple words, have a strong desire for conversation, but the current controlled devices The inability to accurately identify their voice for voice interaction makes it unfavorable for the patient's condition to recover and loses confidence in life.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a voice recognition method and device, which can accurately identify the voice of a language disabled person, so as to correctly enable the controlled device to perform voice interaction.

In a first aspect, an embodiment of the present invention provides a voice recognition method, including:

Establish a correspondence between common life language voices and standard voices of voice disabled persons;

Receiving the voice of the voice disabled person, identifying the corresponding standard voice according to the established correspondence relationship and performing the corresponding operation of the recognized standard voice.

Optionally, the correspondence between the common life language voice and the standard voice of the voice disabled person is:

The speech of the phrase or the text in the common life language voice of the voice disabled person is extracted, and the corresponding relationship is established with the voice of the phrase or the text in the standard voice.

Optionally, after the establishing the correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes:

The established correspondence is stored and uploaded to the cloud server for backup.

The correspondence between the voice of the voice disabled person and the standard voice is reviewed, and the corresponding relationship of the review errors in the correspondence relationship is corrected.

The frequency of use of the correspondence between the voice of the voice disabled person and the standard voice is periodically counted, and the database is updated according to the frequency of use.

Optionally, before the establishing the correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes: recording the voice of the voice disabled person to read the common life language.

A second aspect of the present invention provides a voice recognition apparatus, including: a voice intelligent processing module and a voice recognition module;

The voice intelligent processing module is configured to establish a correspondence between the common life language voice of the voice disabled person and the standard voice;

The voice recognition module is configured to receive the voice of the voice disabled person, identify the corresponding standard voice according to the established correspondence, and perform the corresponding operation of the recognized standard voice.

Optionally, the voice intelligent processing module is specifically configured to:

Optionally, the voice intelligent processing module is further configured to: store the established correspondence, and upload the file to the cloud server for backup.

Optionally, the voice intelligent processing module is further configured to: review voice of the voice disabled person Corresponding relationship with the standard voice, correcting the correspondence of the review errors in the correspondence.

Optionally, the voice intelligent processing module is further configured to:

The frequency of use of the correspondence between the voice of the voice disabled person and the standard voice is periodically counted, and the correspondence relationship is updated according to the frequency of use.

Optionally, the device further includes: a voice input module, configured to record the voice of the voice disabled person to read the common life language.

The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions for performing any of the above voice recognition methods.

The embodiment of the invention realizes the accurate recognition of the voice of the language disabled by establishing the correspondence between the voice of the voice disabled person and the standard voice, and provides convenience for the true expression of the mind intention of the voice disabled person, and correctly enables the controlled device to perform the voice interaction. It is more conducive to the recovery of patients' language expression and establish their confidence in life.

Other features and advantages of the embodiments of the invention will be set forth in the description in the description which The objectives and other advantages of the invention may be realized and obtained by means of the structure particularly pointed in the appended claims.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

The drawings are intended to provide a further understanding of the embodiments of the present invention, and are intended to be a part of the present invention, and the description of the present invention is not intended to limit the invention. In the drawing:

FIG. 1 is a schematic flowchart of Embodiment 1 of a voice recognition method according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of Embodiment 1 of a voice recognition apparatus according to an embodiment of the present invention.

Preferred embodiment of the invention

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

The steps illustrated in the flowchart of the figures may be executed in a computer system such as a set of computer executable instructions. Also, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.

The method according to the embodiment of the present invention can be applied to a voice disabled person who is a voice disabled person due to a late stage disease, who can read a simple text, have a strong desire for conversation, but cannot accurately recognize their voice for voice. Interactions, such as patients with speech disorders caused by strokes, etc., they can prepare to recognize the true intention expressed by their voice through smart devices equipped with voice recognition devices, such as mobile phones, tablets, intelligent robots, etc. They perform the corresponding operations, but are not limited to this.

The method according to the embodiment of the present invention is to solve the problem that the voice of the voice disabled person cannot be accurately recognized in the prior art, so as to correctly make the controlled device perform the voice interaction, thereby making it impossible to express the true intention of the mind, which is not conducive to the recovery of the patient's condition. Technical problem.

The technical solutions of the present invention will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in some embodiments.

FIG. 1 is a schematic flowchart diagram of Embodiment 1 of a voice recognition method according to an embodiment of the present invention. This embodiment relates to a specific process for realizing a voice method for accurately identifying a voice disabled person. As shown in Figure 1, the method includes:

S101. Establish a correspondence between a common life language voice and a standard voice of a voice disabled person.

Specifically, the phrase pronunciation of the commonly used living language voice of the received voice disabled person or the pronunciation of the single text is separated and extracted, and the one-to-one correspondence between the voice of the voice disabled person and the standard voice is established, and a database can be formed, but Not limited to this.

S102. Receive a voice of a voice disabled person, identify a corresponding standard voice according to the established correspondence, and perform an operation corresponding to the recognized standard voice.

Specifically, the voice of the voice-disturbed person is received, and the corresponding standard voice is recognized by separating and discriminating the received voice, and the voice of the corresponding relationship is compared, so that the voice of the language-disabled person is accurately recognized and performed. The voice action can truly express the thought intention of the voice disabled person, play it, and correctly make the controlled device perform the voice interaction, so that it is convenient for the family to carry out Communication can also identify people with speech disabilities.

A speech recognition method provided by an embodiment of the present invention realizes the accurate recognition of the speech of a language disabled person by establishing a correspondence relationship between the voice disabled person's voice and the standard voice, and provides a convenient representation for the true expression of the mentally impaired person's thought intention, and correctly The controlled interaction of the devices is more conducive to the recovery of the patient's language expression and establish their confidence in life.

Optionally, on the basis of the foregoing embodiment, before establishing the correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes:

The voice-disabled person reads the voice of common life words.

The speech impaired person in the present application is a speech disorder caused by a late stage disease, and these speech impaired persons can read a simple text and have a strong desire for conversation, for example, a patient with a speech disorder caused by a stroke, etc., and a person with a speech impairment Commonly used words of life, the common vocabulary can be prepared in advance of 5000 words of articles or short sentences or phrases, etc., the content of 5,000 words is through the screening of life-related words that are closely related to the lives of people with speech disabilities, but also according to "modern The common words (2500 words) and the second common words (1000 words) in the Chinese Common Word List are used to select articles. Through computer sampling, the coverage of these common words in the language reaches 99.48%. Words to meet the communication of people with speech disabilities, but not limited to this.

By recording the speech of the living language commonly used by the voice-disabled person in advance, it is convenient to establish the subsequent database, and it is more convenient to quickly recognize the voices of the voice-disabled person to express their true intentions.

Optionally, on the basis of the foregoing embodiment, establishing a correspondence between the common life language voice of the voice disabled person and the standard voice in the step S101 is:

The speech of the phrase or the text in the common life language voice of the voice disabled person is extracted, and the one-to-one correspondence is established with the voice of the phrase or the text in the standard voice.

Specifically, by realizing the speech separation, sentence breaking, and word breaking of the voice of the voice disabled person, the one-to-one correspondence between the phrase or the character in the voice of the voice disabled person and the voice of the phrase or the text in the standard voice is extracted, wherein The method of splitting the word breaks can be set by artificial conditions, such as: the interval between words and words is between a few milliseconds, etc., so as to ensure the accuracy of the split, and the established one-to-one correspondence can be formed into a database, but Not limited to this.

By extracting or disassembling phrases or words from the common life language of a voice-disabled person Points, so as to facilitate the one-to-one correspondence with the standard phonetic phrases or words, improve the accuracy of the database.

Optionally, on the basis of the foregoing embodiment, after establishing the correspondence between the common life language voice of the voice disabled person and the standard voice in the step S101, the method further includes:

Store the established correspondence and upload it to the cloud server for backup.

Specifically, the established correspondence may be stored in the device and uploaded to the cloud server for backup. For example, the established correspondence may be stored on the mobile phone and uploaded to the cloud server through the mobile phone, so that it is convenient. Calling the corresponding relationship also avoids the loss of the established correspondence after the device is replaced.

By storing the uploaded database and uploading it to the cloud server for backup, it can be conveniently used by the user, and the database can be called anytime and anywhere.

Specifically, since the voice pronunciation of the voice disabled person is a process of abnormal pronunciation, but there are rules to follow, the pronunciation is not arbitrary, and the pronunciation method is basically fixed. Among them, the database collection is not likely to be successful once. There is a need for a correction and improvement process. Therefore, the voice-disabled person or the family member needs to review the database, and the voice intelligent processing module can be used to separate and extract the voice of the voice-disabled person, and at the same time find the corresponding standard voice. Then, the synthesis is performed, and the playback is performed for repeat listening, to determine whether the correspondence is correct, and the correspondence between the voice of the voice disabled person and the standard voice is incorrect, and the correctness of the database can be ensured by correcting the correspondence, wherein the error always occurs. The correspondence relationship can also be completed by forcibly establishing the correspondence relationship of a certain phrase voice.

Through the review and correction of the corresponding relationship, the correct correspondence between the voice of the voice disabled person and the standard voice is ensured, thereby ensuring more accurate recognition of the true intention of the voice disabled person.

Specifically, according to the recovery process of the voice ability of the voice disabled person, the frequency of use of the correspondence between the voice of the voice disabled person and the standard voice may be periodically counted, and the corresponding relationship is updated according to the frequency of use, so that the voice disabled person is convenient for Reconstructing the habitual voice of oneself is conducive to the speech rehabilitation of the voice-disabled person, and is convenient for realizing the true intention of the language of the voice-disabled person.

By periodically counting the frequency of use of the correspondence between the voice of the voice disabled person and the standard voice, updating the database according to the frequency of use, the voice rehabilitation training performed by the voice disabled person is better assisted, and the true intention of the language of the voice disabled person is facilitated.

2 is a schematic structural diagram of a first embodiment of a speech recognition apparatus according to the present invention. As shown in FIG. 2, a speech recognition apparatus includes a speech intelligent processing module 10 and a speech recognition module 20;

The voice intelligent processing module 10 is configured to establish a correspondence between a common life language voice of the voice disabled person and a standard voice;

The voice recognition module 20 is configured to receive the voice of the voice disabled person, identify the corresponding standard voice according to the established correspondence, and perform the corresponding operation of the standard voice.

The voice recognition device provided by the embodiment of the present invention realizes the accurate recognition of the voice of the language disabled person by establishing the correspondence relationship between the speech disabled person's voice and the standard voice, and provides convenience for the true expression of the mentally impaired person's thought intention, and correctly corrects The voice interaction of the control device is more conducive to the recovery of the patient's language expression and establish their confidence in life.

Optionally, based on the foregoing embodiment, the device further includes: a voice input module 30;

The voice entry module 30 is configured to record the voice of the common life language by the voice disabled person.

The device provided by the embodiment of the present invention may perform the foregoing method embodiments, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

Optionally, on the basis of the foregoing embodiment, the voice intelligent processing module is specifically configured to:

Optionally, on the basis of the foregoing embodiment, the voice intelligent processing module is further configured to:

Store the established relational database and upload it to the cloud server for backup.

While the embodiments of the present invention have been described above, the described embodiments are merely for the purpose of understanding the invention and are not intended to limit the invention. Any modification and variation in the form and details of the embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention. The scope defined by the appended claims shall prevail.

Industrial applicability

The voice recognition method and device provided by the embodiment of the present invention comprise: establishing a correspondence between a common life language voice of a voice disabled person and a standard voice; receiving a voice of a voice disabled person, identifying a corresponding standard voice according to the established correspondence relationship, and performing The corresponding operation of the recognized standard voice realizes the accurate recognition of the voice of the language disabled by establishing the corresponding relationship between the voice of the voice disabled person and the standard voice, and provides convenience for the true expression of the mental intention of the voice disabled person, and correctly controls the controlled device. The voice interaction is more conducive to the recovery of the patient's language expression and establish their confidence in life.

Claims

A speech recognition method comprising:

Establish a correspondence between common life language voices and standard voices of voice disabled persons;

Receiving the voice of the voice disabled person, identifying the corresponding standard voice according to the established correspondence relationship and performing the corresponding operation of the recognized standard voice.
The speech recognition method according to claim 1, wherein the correspondence between the common life language speech and the standard speech of the establishment of the speech impaired person comprises:

The speech of the phrase or the text in the common life language voice of the voice disabled person is extracted, and the corresponding relationship is established with the voice of the phrase or the text in the standard voice.
The speech recognition method according to claim 1 or 2, after the establishing the correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes:

The established correspondence is stored and uploaded to the cloud server for backup.
The speech recognition method according to claim 1 or 2, after the establishing a correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes:

The correspondence between the voice of the voice disabled person and the standard voice is reviewed, and the corresponding relationship of the review errors in the correspondence relationship is corrected.
The speech recognition method according to claim 1 or 2, after the establishing the correspondence between the common life language voice of the voice disabled person and the standard voice, the method further includes:

The frequency of use of the correspondence between the voice of the voice disabled person and the standard voice is periodically counted, and the correspondence relationship is updated according to the frequency of use.
The speech recognition method according to claim 1, wherein before the establishing a correspondence between the common life language voice of the voice disabled person and the standard voice, the method further comprises: recording the voice of the voice disabled person to read the common life language.
A voice recognition device, comprising a voice intelligent processing module and a voice recognition module; wherein

The voice intelligent processing module is configured to establish a correspondence between the common life language voice of the voice disabled person and the standard voice;

a voice recognition module, configured to receive a voice of a voice disabled person, according to the established correspondence relationship Do not output the corresponding standard voice and perform the corresponding operation of the recognized standard voice.
The speech recognition apparatus according to claim 7, wherein the speech intelligent processing module is specifically configured to:

The speech of the phrase or the text in the common life language voice of the voice disabled person is extracted, and the corresponding relationship is established with the voice of the phrase or the text in the standard voice.
The speech recognition apparatus according to claim 7 or 8, wherein the speech intelligent processing module is further configured to:

The established correspondence is stored and uploaded to the cloud server for backup.
The speech recognition apparatus according to claim 7 or 8, wherein the speech intelligent processing module is further configured to:

The correspondence between the voice of the voice disabled person and the standard voice is reviewed, and the corresponding relationship of the review errors in the correspondence relationship is corrected.
The speech recognition apparatus according to claim 7 or 8, wherein the speech intelligent processing module is further configured to:

The frequency of use of the correspondence between the voice of the voice disabled person and the standard voice is periodically counted, and the database is updated according to the frequency of use.
The speech recognition apparatus according to claim 7, further comprising: a voice entry module configured to input the voice of the voice disabled person to read the common life language.
A computer readable storage medium storing computer executable instructions for performing the speech recognition method of any one of claims 1 to 6.