CN111009247B

CN111009247B - Speech recognition correction method, device and storage medium

Info

Publication number: CN111009247B
Application number: CN201911352576.3A
Authority: CN
Inventors: 张毅; 李宝洁; 韦泽垠; 孙铭
Original assignee: Shenzhen TCL Digital Technology Co Ltd
Current assignee: Shenzhen TCL Digital Technology Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2023-11-14
Anticipated expiration: 2039-12-24
Also published as: WO2021128967A1; CN111009247A

Abstract

The invention discloses a voice recognition correction method. The method comprises the following steps: in the voice recognition process, when the fact that the user deletes the voice recognition result is monitored, voice information input by the user again is obtained; comparing the big data of the voice information to obtain a voice recognition result sequence corresponding to the voice information; and displaying the voice recognition result sequence according to a preset rule. The invention also discloses a voice recognition correction device and a computer readable storage medium. The invention can correct the voice recognition result of the user when errors occur in the voice recognition process of the user.

Description

Speech recognition correction method, device and storage medium

Technical Field

The present invention relates to the field of television voice interaction, and in particular, to a method and apparatus for correcting voice recognition, and a computer readable storage medium.

Background

Speech recognition (Automatic Speech Recognition, ASR) refers to converting lexical content in human speech into computer-readable inputs, such as keys, binary codes, or character sequences. Speech recognition technology is currently applied in more and more industries such as home appliances, communications, automotive electronics, medical, home services, consumer electronics, etc.

However, speech recognition is a recognition technology method relying on computer deep learning, so that in the process of speech recognition, speech recognition errors often occur due to accents of users, technical restrictions of computers, and the like, and speech information which the user wants to express cannot be accurately recognized.

Disclosure of Invention

The invention mainly aims to provide a voice recognition correction method, a voice recognition correction device and a computer readable storage medium, aiming at correcting a voice recognition result of a user when errors occur in the voice recognition process of the user.

In order to achieve the above object, the present invention provides a voice recognition correction method, comprising the steps of:

in the voice recognition process, when the fact that the user deletes the voice recognition result is monitored, voice information input by the user again is obtained;

comparing the big data of the voice information to obtain a voice recognition result sequence corresponding to the voice information;

and displaying the voice recognition result sequence according to a preset rule.

Optionally, the step of displaying the voice recognition result sequence according to a preset rule includes:

sequentially displaying the voice recognition results in the voice recognition result sequence according to the sequence;

if the fact that the user deletes the voice recognition result which is being displayed is monitored, acquiring a voice recognition result of the last digit in the voice recognition result sequence;

and displaying the voice recognition result of the last bit after the sequencing.

acquiring voice recognition results of which the number is ordered by a preset number in the voice recognition result sequence;

and displaying the voice recognition results of the preset number of the sequences so as to facilitate the selection of the user.

Optionally, the step of comparing the big data of the voice information to obtain a voice recognition result sequence corresponding to the voice information includes:

comparing the big data of the voice information to obtain a voice recognition result set after voice recognition of the voice information;

and sequencing the voice recognition result set according to the using times to obtain a voice recognition result sequence corresponding to the voice information.

acquiring address information of current equipment;

determining a big data comparison area of the voice information according to the address information;

and carrying out big data comparison on the voice information in the big data comparison area to obtain a voice recognition result sequence corresponding to the voice information.

Optionally, the step of acquiring address information of the current device includes:

acquiring an IP address of the current equipment;

and determining the address information of the current equipment according to the IP address.

Optionally, the step of comparing the big data of the voice information in the big data comparison area to obtain a voice recognition result sequence corresponding to the voice information includes:

comparing the big data of the voice information to obtain a voice recognition result set corresponding to the voice information;

and sequencing the voice recognition result set according to a preset distance algorithm to obtain a voice recognition result sequence corresponding to the voice information.

Optionally, the step of sorting the voice recognition result set according to a preset distance algorithm to obtain a voice recognition result sequence corresponding to the voice information includes:

acquiring a matrix value of each voice recognition result in the voice recognition result set;

performing symbol string scanning on the matrix value to obtain a final matrix value;

judging whether the final value of the matrix is larger than a preset threshold value or not;

and if so, sequencing the voice recognition results according to the final value of the matrix.

In addition, in order to achieve the above object, the present invention also provides a voice recognition correction device, including: the system comprises a memory, a processor and a voice recognition correction program stored on the memory and capable of running on the processor, wherein the voice recognition correction program realizes the steps of the voice recognition correction method when being executed by the processor.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a speech recognition correction program which, when executed by a processor, implements the steps of the above-described speech recognition correction method.

The invention provides a voice recognition correction method, a voice recognition correction device and a computer storage medium. In the method, in the voice recognition process, when the user is monitored to delete the voice recognition result, voice information re-input by the user is obtained; comparing the big data of the voice information to obtain a voice recognition result sequence corresponding to the voice information; and displaying the voice recognition result sequence according to a preset rule. Through the mode, when the condition that the user has the voice recognition result deleted is monitored in the voice recognition process of the user, the recognition characters input by the voice of the user are corrected in a big data comparison mode, so that the voice repetition times of the user are reduced, and the voice interaction experience of the user is improved.

Drawings

FIG. 1 is a schematic diagram of a device architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of a speech recognition modification method according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a speech recognition modification method according to the present invention;

FIG. 4 is a flowchart illustrating a third embodiment of a speech recognition modification method according to the present invention;

FIG. 5 is a flowchart of a fourth embodiment of a speech recognition modification method according to the present invention;

FIG. 6 is a flowchart of a fifth embodiment of a speech recognition modification method according to the present invention;

FIG. 7 is a flowchart of a sixth embodiment of a speech recognition modification method according to the present invention;

FIG. 8 is a flowchart of a seventh embodiment of a speech recognition modification method according to the present invention;

fig. 9 is a flowchart of an eighth embodiment of a voice recognition correction method according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of an apparatus structure of a hardware running environment according to an embodiment of the present invention.

The terminal of the embodiment of the invention can be a PC, or can be terminal equipment with a data processing function such as a smart phone, a tablet personal computer, a portable computer and the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Optionally, the terminal may also include a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, wi-Fi modules, and the like. Among other sensors, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile terminal is stationary, and the mobile terminal can be used for recognizing the gesture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, which are not described herein.

It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a voice recognition correction program may be included in the memory 1005, which is one type of computer storage medium.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a voice recognition modification program stored in the memory 1005 and perform the following operations:

Further, the processor 1001 may call a voice recognition correction program stored in the memory 1005, and further perform the following operations:

the step of displaying the voice recognition result sequence according to a preset rule comprises the following steps:

the step of comparing the big data of the voice information to obtain a voice recognition result sequence corresponding to the voice information comprises the following steps:

acquiring address information of current equipment;

the step of obtaining the address information of the current device includes:

acquiring an IP address of the current equipment;

the step of comparing the big data of the voice information in the big data comparison area to obtain a voice recognition result sequence corresponding to the voice information comprises the following steps:

the step of sequencing the voice recognition result set according to a preset distance algorithm to obtain a voice recognition result sequence corresponding to the voice information comprises the following steps:

The specific embodiments of the voice recognition correction apparatus of the present invention are substantially the same as the embodiments of the voice recognition correction method described below, and will not be described herein.

Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a voice recognition correction method according to the present invention, where the voice recognition correction method includes:

step S100, when the user deleting the voice recognition result is monitored in the voice recognition process, voice information re-input by the user is obtained;

in this embodiment, during the process of voice interaction of the user on the television, voice recognition is used, if it is monitored that the user deletes the voice recognition result, it is indicated that the voice recognition result is wrong at this time, and if the voice recognition operation is repeated at this time, the same recognition result may still appear, so that the real voice text of the user cannot be recognized. Therefore, the present embodiment is to correct this situation, and a speech recognition correction method is provided. In the process of user voice recognition, if the user deleting voice recognition result is monitored, the voice information input by the user again is obtained, and the voice information input by the user last time can be obtained as the voice information input by the user newly.

Step S200, big data comparison is carried out on the voice information, and a voice recognition result sequence corresponding to the voice information is obtained;

after the voice information is acquired, large data comparison and recognition are carried out on the voice information, a corresponding voice recognition result sequence is obtained, the voice information is recognized, the recognition results are generally multiple, and the corresponding voice recognition result sequence can be obtained according to the large data conditions, such as the number of times of use, the regional area use condition and the like.

And step S300, displaying the voice recognition result sequence according to a preset rule.

After the voice recognition result sequence is obtained, the voice recognition result sequence is displayed according to a preset rule. Specifically, the first few preset digits can be selected from the voice recognition result sequence, and the voice recognition results of the first few preset digits are displayed on the television for the user to select; of course, the first digit in the speech recognition result may be displayed directly, and if the user deletes the first digit, the second digit may be displayed, and so on, until the user does not delete the first digit. Of course, the display may also be performed according to other preset rules.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a voice recognition correction method according to the present invention.

Based on the above embodiment, in this embodiment, step S300 includes:

step S310, the voice recognition results in the voice recognition result sequence are sequentially displayed according to the sequence;

in this embodiment, after the voice recognition result sequence is obtained, the voice recognition results in the voice recognition result sequence may be sequentially displayed according to the sequence. Such as displaying the first digit, finding that the user deleted the first digit, displaying the second digit, etc., and so on. If the user does not delete, the correction is successful.

Step S320, if the fact that the user deletes the voice recognition result which is being displayed is monitored, acquiring the voice recognition result of the last digit in the voice recognition result sequence;

after the fact that the user deletes the voice recognition result which is being displayed is monitored, the voice recognition result which is ranked one bit later is obtained, and if the user deletes the voice recognition result which is ranked first in the voice recognition result sequence which is being displayed, the voice recognition result which is ranked second is obtained.

Step S330, displaying the sequenced one-bit speech recognition result.

After the voice recognition result of the next digit is obtained, the voice recognition result of the next digit is displayed, for example, after the voice recognition result of the second digit is obtained, the voice recognition result of the second digit is displayed. Through the mode, the user can select in sequence without redundant keys, and convenience is achieved.

Referring to fig. 4, fig. 4 is a flowchart illustrating a third embodiment of a voice recognition correction method according to the present invention.

Based on the above embodiment, in this embodiment, step S300 includes:

step S340, obtaining the voice recognition results of which the preset number is ordered in the voice recognition result sequence;

in this embodiment, after the voice recognition result sequence is obtained, the voice recognition results with a preset number of voice recognition results in the voice recognition result sequence, for example, the voice recognition results with the first 10 voice recognition results in the voice recognition sequence, may be directly obtained.

Step S350, displaying the voice recognition results with the preset number of sequences, so as to facilitate the selection of the user.

After the voice recognition results of the preset number of sequences are obtained, the voice recognition results of the preset number of sequences are displayed so that the user can select conveniently. Therefore, the preset number of recognition results are displayed and used for users to select, so that the accuracy of selection can be greatly improved, and the voice recognition efficiency is improved. Specifically, the top 10 most commonly used words are pushed to the equipment end, ranked according to the probability, and the user selects the correct word. Or automatically recognizing as ranking the first vocabulary, if the automatic recognition is still erroneous, then when the user enters speech a third time, automatically recognizing as ranking the second vocabulary, and so on.

Referring to fig. 5, fig. 5 is a flowchart illustrating a voice recognition correction method according to a fourth embodiment of the invention.

Based on the above embodiment, in this embodiment, step S200 includes:

step S210, comparing the big data of the voice information to obtain a voice recognition result set after voice recognition of the voice information;

in this embodiment, the big data comparison is performed on the voice information, so as to obtain a voice recognition result sequence corresponding to the voice information, and a voice recognition result set after voice recognition is performed on the voice information can be obtained by performing the big data comparison on the voice information. The large data comparison and recognition are carried out on the voice information, so that a plurality of recognition results can be obtained, and the voice recognition results form a voice recognition result set which contains the possibility of the voice recognition result.

Step S220, the voice recognition result set is ordered according to the use times, and a voice recognition result sequence corresponding to the voice information is obtained.

After the voice recognition result set is obtained, the voice recognition result set is ranked according to the using times, and a voice recognition result sequence corresponding to the voice information can be obtained. Such as by how many times it is used.

Referring to fig. 6, fig. 6 is a flowchart illustrating a voice recognition correction method according to a fifth embodiment of the invention.

Based on the above embodiment, in this embodiment, step S200 includes:

step S230, obtaining address information of the current equipment;

in this embodiment, the big data comparison is performed on the voice information, so as to obtain a voice recognition result sequence corresponding to the voice information, and also may first obtain address information of the current device. Such as which province, which country, which administrative area, etc. the current device is located in.

Step S240, determining a big data comparison area of the voice information according to the address information;

after the address information is acquired, a big data comparison area of the voice information can be determined according to the address information, and if the address information is Changsha, the big data comparison area is Hunan and the like.

Step S250, performing big data comparison on the voice information in the big data comparison area, and obtaining a voice recognition result sequence corresponding to the voice information.

After the big data comparison area of the voice information is determined, the big data comparison can be carried out on the voice information in the big data comparison area, and a voice recognition result sequence corresponding to the voice information is obtained. And if the voice information is in the Hunan area, sequencing voice recognition results to obtain a voice recognition result sequence. Specifically, the server side records the behaviors of the user, compares the words repeatedly input by the user with big data, and searches the top 10 words most commonly used in the voice recognition through a distance algorithm. And meanwhile, the geographic position of the user can be positioned according to the IP address of the television of the user, and a big data search area is determined according to the geographic position, if the geographic position displays a foolproof, the vocabulary with the highest voice recognition rate is searched in the foolproof range, and a voice recognition result sequence is obtained.

Referring to fig. 7, fig. 7 is a flowchart illustrating a sixth embodiment of a voice recognition correction method according to the present invention.

Based on the above embodiment, in this embodiment, step S230 includes:

step S231, obtaining the IP address of the current equipment;

in this embodiment, the address information of the current device may be obtained by obtaining an IP address of the current device, where the IP address is a usage address allocated when the user connects to the network, and the address information of the user may be determined according to the address.

And step S232, determining the address information of the current equipment according to the IP address.

After obtaining the IP address, the address information of the current device may be determined according to the IP address, e.g., looking up an IP address allocation table to obtain the address information of the current device.

Referring to fig. 8, fig. 8 is a flowchart of a seventh embodiment of a voice recognition correction method according to the present invention.

Based on the above embodiment, in this embodiment, step S250 includes:

step S251, big data comparison is carried out on the voice information, and a voice recognition result set corresponding to the voice information is obtained;

Step S252, the voice recognition result set is sequenced according to a preset distance algorithm, and a voice recognition result sequence corresponding to the voice information is obtained.

After the voice recognition result set is obtained, the voice recognition result set is ranked according to a preset distance algorithm, and a voice recognition result sequence corresponding to the voice information is obtained. The preset distance algorithm is a computer algorithm, and is specifically described as follows: str1 or str2 has a length of 0 and returns the length of the other string. if (str1.length= =0) return str2.length; if (str2.length= 0) return str1.length;2. initializing the matrix d of (n+1) × (m+1) and letting the values of the first row and column grow from 0. 3. Scanning two strings (n×m), if: str1[ i ] = str2[ j ], which is recorded with temp as 0. Otherwise temp is noted as 1. Then, the matrix d [ i, j ] is given the minimum values of d [ i-1, j ] +1, d [ i, j-1] +1, d [ i-1, j-1] +temp. 4. After scanning, the last value d [ n ] [ m ] of the return matrix is the distance between them. And when the distance is larger than a certain value, the correlation is not considered to be related, and the correlation of the feature labels is further calculated.

Referring to fig. 9, fig. 9 is a flowchart illustrating a voice recognition correction method according to an eighth embodiment of the invention.

Based on the above embodiment, in this embodiment, step S252 includes:

step S2521, obtaining a matrix value of each voice recognition result in the voice recognition result set;

in this embodiment, the voice recognition result sequences corresponding to the voice information are obtained by sorting according to a preset distance algorithm, and a matrix value of each voice recognition result in the voice recognition result set can be obtained first, where the matrix value can reflect relevant features of each voice recognition result in the voice recognition result set.

Step S2522, performing symbol string scanning on the matrix value to obtain a final matrix value;

after the matrix value is obtained, the matrix value is subjected to symbol string scanning, and the final value of the matrix is obtained. The matrix value is scanned in a string to obtain a calculation result or a scanning result, wherein the calculation result or the scanning result is a final value which can reflect the distance value of each voice recognition result in the voice recognition result set.

Step S2523, judging whether the final value of the matrix is larger than a preset threshold value;

if yes, step S2524 is performed: and sequencing the voice recognition results according to the final value of the matrix.

After the matrix final value is obtained, comparing the matrix final value with a preset threshold value, judging whether the matrix final value is larger than the preset threshold value, and if so, sorting the voice recognition results corresponding to the matrix final value larger than the preset threshold value according to the size of the matrix final value to obtain a voice recognition result sequence. If the final value of the matrix is smaller than the preset threshold value, the voice recognition result is directly excluded.

In addition, the embodiment of the invention also provides a computer readable storage medium.

The computer-readable storage medium of the present invention stores thereon a speech recognition correction program which, when executed by a processor, implements the steps of the speech recognition correction method as described above.

The method implemented when the voice recognition correction program running on the processor is executed may refer to various embodiments of the voice recognition correction method of the present invention, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A voice recognition correction method, characterized in that the voice recognition correction method comprises the steps of:

displaying the voice recognition result sequence according to a preset rule;

acquiring address information of current equipment;

and carrying out big data comparison on the voice information in the big data comparison area to obtain the vocabulary with the highest voice recognition rate searched in the big data comparison area so as to obtain a voice recognition result sequence corresponding to the voice information.

2. The voice recognition correction method of claim 1, wherein the step of displaying the voice recognition result sequence according to a preset rule comprises:

3. The voice recognition correction method of claim 1, wherein the step of displaying the voice recognition result sequence according to a preset rule comprises:

4. The method for correcting speech recognition of claim 1, wherein the step of comparing the big data of the speech information to obtain a sequence of speech recognition results corresponding to the speech information comprises:

5. The voice recognition correction method of claim 1, wherein the step of acquiring address information of the current device comprises:

acquiring an IP address of the current equipment;

6. The method for correcting speech recognition according to claim 1, wherein the step of obtaining a speech recognition result sequence corresponding to the speech information by comparing the large data of the speech information in the large data comparison area comprises:

7. The method of claim 6, wherein the step of sorting the set of speech recognition results according to a predetermined distance algorithm to obtain a sequence of speech recognition results corresponding to the speech information comprises:

8. A speech recognition correction apparatus, characterized in that the speech recognition correction apparatus includes: memory, a processor and a speech recognition modification program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the speech recognition modification method according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a speech recognition modification program which, when executed by a processor, implements the steps of the speech recognition modification method according to any one of claims 1 to 7.