CN112908304B

CN112908304B - Method and device for improving voice recognition accuracy

Info

Publication number: CN112908304B
Application number: CN202110130305.4A
Authority: CN
Inventors: 陶贵宾
Original assignee: Shenzhen Tonglian Financial Network Technology Service Co ltd
Current assignee: Shenzhen Tonglian Financial Network Technology Service Co ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2024-03-26
Anticipated expiration: 2041-01-29
Also published as: CN112908304A

Abstract

The invention discloses a method and a device for improving voice recognition accuracy, wherein the method comprises the following steps: obtaining first voice information of a first user, and further obtaining a first recognition result; acquiring first video information in a first preset distance range of a first user through monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the voice interaction behavior is included and exists, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from a preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction. The technical problem that the voice cannot be accurately identified in the prior art is solved.

Description

Method and device for improving voice recognition accuracy

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a method and apparatus for improving speech recognition accuracy.

Background

With the progress of data processing technology and the rapid popularization of mobile internet, computer technology is widely applied to various fields of society, and mass data is generated accordingly, wherein voice data is more and more valued by people, and the voice data gradually enter various fields of industry, home appliances, communication, consumer electronics and the like, so that the voice data has wider development prospect and application field in actual life.

However, in the process of implementing the technical scheme of the invention in the embodiment of the application, the inventor of the application finds that at least the following technical problems exist in the above technology:

due to the influence of external environment, self limitation and other factors, accurate voice recognition cannot be performed, so that the result obtained by voice recognition has great difference from the actual voice content.

Disclosure of Invention

According to the method and the device for improving the voice recognition accuracy, the technical problem that the voice cannot be accurately recognized in the prior art is solved, and the technical effect that the voice recognition accuracy is improved by establishing a powerful voice recognition database and combining specific contexts is achieved.

The embodiment of the application provides a method for improving voice recognition accuracy, wherein the method further comprises the following steps: acquiring first voice information of a first user; obtaining a first recognition result according to the first voice information; acquiring first video information within a first preset distance range of the first user through the monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction.

On the other hand, the application also provides a device for improving the voice recognition accuracy, wherein the device comprises: a first obtaining unit: the first obtaining unit is used for obtaining first voice information of a first user; a second obtaining unit: the second obtaining unit is used for obtaining a first recognition result according to the first voice information; a third obtaining unit: the third obtaining unit is used for obtaining first video information in a first preset distance range of the first user through the monitoring equipment; a first judgment unit: the first judging unit is used for judging whether the first video information comprises a second user or not, and whether the first user and the second user have voice interaction behaviors or not; fourth obtaining unit: the fourth obtaining unit is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and the first user and the second user have a voice interaction behavior; fifth obtaining unit: the fifth obtaining unit is used for obtaining a preset voice recognition database; a first determination unit: the first determining unit is used for determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; sixth obtaining unit: the sixth obtaining unit is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set; a first adjusting unit: the first adjusting unit is used for adjusting the first identification result according to the first adjusting instruction.

One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

the video information within the distance range is obtained according to the monitoring equipment, the scene information of the first user speaking is obtained according to the video information, and the first voice information is adjusted and intelligently identified by combining a specific voice identification database, so that the technical effect of improving the voice identification accuracy is achieved.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

FIG. 1 is a flow chart of a method for improving speech recognition accuracy according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a device for improving accuracy of voice recognition according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Reference numerals illustrate: the device comprises a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a first judging unit 14, a fourth obtaining unit 15, a fifth obtaining unit 16, a first determining unit 17, a sixth obtaining unit 18, a first adjusting unit 19, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304 and a bus interface 305.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

Summary of the application

With the progress of data processing technology and the rapid popularization of mobile internet, computer technology is widely applied to various fields of society, and mass data is generated accordingly, wherein voice data is more and more valued by people, and the voice data gradually enter various fields of industry, home appliances, communication, consumer electronics and the like, so that the voice data has wider development prospect and application field in actual life. Due to the influence of external environment, self limitation and other factors, accurate voice recognition cannot be performed, so that the result obtained by voice recognition has great difference from the actual voice content.

Aiming at the technical problems, the technical scheme provided by the application has the following overall thought:

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

Example 1

As shown in fig. 1, an embodiment of the present application provides a method for improving speech recognition accuracy, where the method further includes:

step S100: acquiring first voice information of a first user;

specifically, the first user is a speaking user, and the first voice information is voice information of speaking of the first user, including information such as speaking content, speaking volume, speaking tone and the like, so as to recognize the first voice information.

Step S200: obtaining a first recognition result according to the first voice information;

specifically, the first voice information is known, the first voice information can be identified, the first identification result is the result identified according to the first voice information, and the first identification result comprises the identified content, whether the identified content is accurate or not, and the like.

Step S300: acquiring first video information within a first preset distance range of the first user through the monitoring equipment;

specifically, the monitoring device is installed around the first user, the first video information is the video information within a first preset distance range of the first user, which is further understood as a distance range within five meters from the first user, and the first preset distance range is not described in detail herein according to the specific situation.

Step S400: judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not;

specifically, the second user is different from the first user, and whether the first video information includes the second user or not is judged, and whether the first user and the second user have voice interaction behaviors, namely, whether the first user and the second user are simultaneously present in the first video information, whether the first user and the second user are speaking or not, and the like is judged.

Step S500: if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user;

specifically, the first affinity information is affinity information between the first user and the second user, and can be determined according to information such as speaking content, speaking expression, speaking gesture, speaking mood and the like of the first user and the second user, and when the first video information includes the second user and the first user and the second user have voice interaction behaviors, the first affinity information of the first user and the second user can be obtained.

Step S600: obtaining a preset voice recognition database;

specifically, the preset voice recognition database is a voice recognition database for calling in various preset situations and occasions, and further can be understood as that word information such as reporting, summarizing, scheme and the like may be provided when a meeting is taken, and the corresponding atmosphere meeting is more formal.

Step S700: determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;

specifically, the first recognition context is determined for a scene of a conversation, and is determined to be a meeting or a learning in class, and the first context information set is an information set corresponding to the first recognition context, including various types of information related to the first recognition context, and further can be understood that when the scene is a meeting, the first context information set includes various meeting information sets, a meeting of a minor department, a meeting of a major group, and the like.

Step S800: obtaining a first adjustment instruction according to the first identification context and the first context information set;

specifically, the first adjustment instruction is configured to adjust the first recognition result, and may adjust the first recognition result according to the first recognition context and the first context information set, so that the first recognition result is more accurate and is more fit with the context.

Step S900: and adjusting the first identification result according to the first adjustment instruction.

Specifically, the first adjustment instruction is known, and the first recognition result is adjusted according to the first adjustment instruction, which further means that when a winery enterprise plays a long-term role, the wine is understood to be a tasty wine according to a specific context, and the tasty wine is output as the recognition result.

After the first voice information of the first user is obtained, step S100 further includes:

step S110: acquiring first attribute information of a first microphone in the audio acquisition device;

step S120: according to the first voice information, first voice characteristic information of the first user is obtained;

step S130: obtaining a first matching suitability between the first voice characteristic information and the first attribute information;

step S140: judging whether the first matching suitability meets a preset matching degree threshold value or not;

step S150: if the preset matching degree threshold is not met, after matching operation is carried out according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained;

step S160: and according to the second adjustment instruction, adjusting the first attribute information according to the first operation result.

In particular, in order to accurately identify the first voice information, the audio acquisition device used by speaking of the first user can be ensured to work normally, the first attribute information of the first microphone in the audio acquisition device is obtained, the first attribute information comprises information such as the voice speed, the voice volume, the voice tone and the like of the first microphone, the first voice characteristic information of the first user can be obtained according to the first voice information, the first voice characteristic information is information such as the voice volume, the voice tone and the voice tone of the speaking of the first user, the first matching suitability between the first voice characteristic information and the first attribute information is obtained, the first matching suitability is the matching degree information between the first voice characteristic information and the first attribute information, and whether the first matching suitability meets a preset matching degree threshold value is judged, namely, whether the first matching fitness meets the expected matching value or not, if the first matching fitness does not meet the preset matching degree threshold, after matching operation is performed according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained, wherein the first operation result is an operation result obtained after correction of the first matching fitness according to the preset matching degree threshold, the second adjustment instruction is an operation result obtained after correction of the first matching fitness according to the first operation result, and specifically, it can be understood that when the first user speaks for a male, the first attribute phenomenon is adjusted because the voice of the male is relatively bright, the attribute of the first microphone is better, the volume is higher, when the first user speaks, the phenomena such as sound breaking, noise and the like are caused by overlarge volume, the audio acquisition and the voice recognition are unfavorable, the output voice is normally recognizable, and the attribute information of the first microphone is adjusted according to the voice characteristic information of the user, so that the technical effects of enabling the output voice information to be clear, enabling the content to be clear and accurately recognizing the voice are achieved.

The step S130 further includes:

step S131: inputting the first attribute information and the first voice characteristic information into a first training model, wherein the first training model is obtained through training of multiple groups of training data, and each group of training data in the multiple groups of training data comprises: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;

step S132: and obtaining output information of the first training model, wherein the output information comprises first matching suitability information between the first voice characteristic information and the first attribute information.

Specifically, the first attribute information and the first voice characteristic information are input into the first training model to be continuously trained, so that the output training result can be more accurate. The training model is a Neural network model, namely a Neural network model in machine learning, and a Neural Network (NN) is a complex Neural network system formed by a large number of simple processing units (called neurons) widely connected with each other, reflects many basic characteristics of brain functions of a human, and is a highly complex nonlinear power learning system. The neural network model is described based on a mathematical model of neurons. An artificial neural network (Artificial Neural Networks) is a description of the first order nature of the human brain system. In brief, it is a mathematical model. In this embodiment of the present application, the first attribute information and the first voice feature information are input into a first training model, and the identified matching fitness level information is used to train the neural network model.

Further, the process of training the neural network model is essentially a process of supervised learning. The plurality of sets of training data specifically comprises: the first attribute information, the first voice feature information, and identification information for identifying a matching fitness level. The neural network model outputs first matching suitability information between the first voice characteristic information and the first attribute information by inputting the first attribute information and the first voice characteristic information, checks the output information with the matching suitability grade information with the identification function, and if the output information is consistent with the matching suitability grade information with the identification function, the data supervision learning is completed, and then the next group of data supervision learning is carried out; and if the output information is inconsistent with the requirement of the matching suitability level information with the identification function, the neural network learning model adjusts itself until the output result of the neural network learning model is consistent with the requirement of the matching suitability level information with the identification function, and supervised learning of the next group of data is performed. The neural network learning model is continuously corrected and optimized through training data, the accuracy of the neural network learning model for processing the information is improved through a supervised learning process, and further the technical effect that the first matching fitness information between the first voice characteristic information and the first attribute information is more accurate is achieved.

Before the first voice information of the first user is obtained, the embodiment of the application further includes:

step S1010: acquiring surrounding environment information of the first user;

step S1020: judging whether the surrounding environment information meets a first preset condition or not;

step S1030: if the first preset condition is met, network environment information of the voice recognition system is obtained;

step S1040: judging whether the network environment information meets the requirement of receiving the voice signal of the first user or not;

step S1050: if the requirement of receiving the voice signal of the first user is not met, a first overhaul instruction is obtained;

step S1060: and after the network environment is overhauled according to the first overhauling instruction, receiving first voice information input by the first user.

Specifically, before the first voice information of the first user is obtained, the situation that the surrounding environment information of the first user is normal can be ensured, the influence on the first voice information is not generated, whether the surrounding environment information meets a first preset condition or not is judged, namely, when construction exists around, noise can influence the first voice information, the situation that the surrounding environment information cannot interfere with the speaking of the first user needs to be ensured, if the first preset condition is met, namely, the surrounding environment information cannot interfere with the speaking of the first user, the network environment information of the voice recognition system is obtained, the network environment information comprises information such as whether a network is unobstructed or not, the network speed is high or low, further, whether the network environment information meets the requirement of receiving the voice signal of the first user is judged, namely, the situation that the network environment information needs to meet the receiving of the voice signal is ensured, and if the requirement of receiving the voice signal of the first user is not met, the first command is obtained after the network environment is overhauled, the voice information input by the first user is received, the voice information is accurately received through the first user, the voice recognition effect is ensured, and the voice information of the first user is better, the voice recognition effect is achieved, and the voice information is better, and the voice recognition effect is better.

In order to improve the voice recognition accuracy of the first user, the embodiment of the application further includes:

step S1110: judging whether the first user uses a first electronic device or not if the second user is not included in the first video information;

step S1120: if the first user uses the first electronic device, after the first electronic device is associated with the voice recognition system, application software use information of the first electronic device is obtained;

step S1130: determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;

step S1140: obtaining a second adjustment instruction according to the second identification context and the second context information set;

step S1150: and adjusting the first identification result according to the second adjustment instruction.

Specifically, when judging whether the first video information includes the second user, if the first video information does not include the second user, that is, if the first video information includes only the first user, it is judged whether the first user uses a first electronic device, where the first electronic device includes whether to use related voice call software, for example, conference, facetime call, etc., if the first user uses the first electronic device, the first electronic device is associated with the voice recognition system, and further, application software use information of the first electronic device is obtained, and the application software information includes software capable of performing teleconference, and further, according to the application software use information, from the preset voice recognition database, a second recognition context and a second context information set are determined, where the second recognition context and the second context information set are obtained according to call content of the first user and the preset voice recognition data base, and if the first user uses the first electronic device, the first electronic device and the second electronic device are associated, and further, the second context information is obtained according to the second recognition context information, and the first context information is adjusted, and the voice recognition system is adjusted, and the voice recognition result is further, and the voice recognition system is adjusted.

In order to ensure complete coherence of the first voice information, embodiments of the present application further include:

step S1210: when the first recognition result is adjusted according to the first adjustment instruction, judging whether each sentence in the first voice information is a continuous sentence or not;

step S1220: if the sentence is a non-coherent sentence, a first calling instruction is obtained;

step S1230: according to the first calling instruction, a first sentence and a second sentence are called, wherein the first sentence is sentence information before the incoherent sentence, and the second sentence is sentence information after the incoherent sentence;

step S1240: and carrying out voice recognition on the non-coherent sentences according to the first sentences and the second sentences to obtain second recognition results corresponding to the non-coherent sentences.

Specifically, in order to ensure complete coherence of the first voice information, when the first recognition result is adjusted according to the first adjustment instruction, whether each sentence in the first voice information is a coherent sentence or not is judged, that is, each sentence needs to be ensured to be a coherent sentence, if the sentence is a non-coherent sentence, a first retrieval instruction is obtained, the first retrieval instruction is to retrieve the first sentence and the second sentence, the first sentence is sentence information before the non-coherent sentence, the second sentence is sentence information after the non-coherent sentence, and further, voice recognition is performed on the non-coherent sentence according to the first sentence and the second sentence, that is, the non-coherent sentence is supplemented through analyzing the front sentence and the rear sentence, so that a second recognition result corresponding to the non-coherent sentence is obtained, and the non-coherent sentence is supplemented, so that the complete coherence of the first voice information is ensured, and further, the technical effect of improving the voice recognition accuracy is achieved.

After the first recognition result is adjusted according to the first adjustment instruction, step S900 further includes:

step S910: judging whether the adjusted first identification result comprises first privacy information of the first user or not;

step S920: if the first private information is included, generating a first verification code according to the first private information, wherein the first verification code is in one-to-one correspondence with the first private information;

step S930: generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;

step S940: and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1.

Specifically, in order to protect the content privacy of the first voice information, after the first recognition result is adjusted according to the first adjustment instruction, whether the first recognition result includes first privacy information of the first user is judged, wherein the first privacy information is the content privacy in the first voice information, and can be understood to include information such as a bank card number and an account password mentioned by the first user, and if the first recognition result includes the first privacy information of the first user, encryption processing based on a block chain can be performed on the privacy information, so that storage security of the privacy information is ensured and tampering is not performed.

Generating a first verification code according to the first privacy information, wherein the first verification code is in one-to-one correspondence with the first privacy information; generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1; and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1. The private information of the first user is stored in an encrypted manner, wherein each device corresponds to a node, all nodes are combined to form a blockchain, and the blockchain forms a general ledger which is convenient to verify (the Hash value of the last block is equivalent to the whole version of verification as long as the Hash value of the last block is verified), and cannot be changed (the Hash value of all subsequent blocks can be changed due to the change of any transaction information, so that the general ledger cannot pass during verification).

The blockchain system adopts a distributed data form, so that each participating node can obtain a complete database backup, and unless 51% of nodes in the whole system can be controlled simultaneously, the modification of the database by a single node is invalid, and the data content on other nodes cannot be influenced. Thus, the more nodes that participate in the system, the more computationally intensive the data security in the system. The encryption processing of the privacy information of the first user based on the blockchain effectively ensures the storage safety of the privacy information of the first user, and achieves the technical effect of safely recording and storing the privacy information of the first user.

In order to make the storage of the private information of the first user more efficient and quick, the embodiment of the application further comprises:

step S840: taking the Nth privacy information and the N-1 th verification code as an Nth block;

step S850: obtaining the recording time of the nth block, wherein the recording time of the nth block represents the time of the nth block needing to be recorded;

step S860: according to the N-th block recording time, obtaining first equipment with strongest operation speed in the M pieces of equipment;

step S870: and transmitting the recording right of the Nth block to the first equipment.

Specifically, when the private information of the first user is subjected to the encryption operation based on the blockchain, in order to obtain more efficient operation and storage rate, the nth block recording time can be obtained, wherein the nth block recording time represents the time that the nth block needs to be recorded; further, according to the recording time of the Nth block, obtaining a first device with the strongest operation speed in the M devices; the recording right of the Nth block is sent to the first equipment, so that the safe, effective and stable operation of the decentralizing blockchain system is guaranteed, the block can be guaranteed to be recorded in the equipment rapidly and accurately, the information safety is guaranteed, the privacy information of the first user is accurately judged, and the technical effect that the privacy information of the first user is stored and recorded more rapidly and efficiently is achieved.

In summary, the method and the device for improving the accuracy of voice recognition provided by the embodiments of the present application have the following technical effects:

1. the video information within the distance range is obtained according to the monitoring equipment, the scene information of the first user speaking is obtained according to the video information, and the first voice information is adjusted and intelligently identified by combining a specific voice identification database, so that the technical effect of improving the voice identification accuracy is achieved.

2. The microphone is adjusted according to the voice characteristic information of the first user, personalized requirements can be met on the basis of meeting general requirements, real-time network environment information is combined, the first voice signal can be completely received, incoherent sentences are connected in series through front and rear semantics, the first sentence information is smooth, and the technical effect of improving voice recognition accuracy is achieved.

Example two

Based on the same inventive concept as the method for improving the accuracy of voice recognition in the foregoing embodiment, the present invention further provides a device for improving the accuracy of voice recognition, as shown in fig. 2, where the device includes:

The first obtaining unit 11: the first obtaining unit 11 is configured to obtain first voice information of a first user;

the second obtaining unit 12: the second obtaining unit 12 is configured to obtain a first recognition result according to the first voice information;

the third obtaining unit 13: the third obtaining unit 13 is configured to obtain, by using the monitoring device, first video information within a first preset distance range of the first user;

the first judgment unit 14: the first determining unit 14 is configured to determine whether a second user is included in the first video information, and whether a voice interaction exists between the first user and the second user;

fourth obtaining unit 15: the fourth obtaining unit 15 is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and there is a voice interaction between the first user and the second user;

fifth obtaining unit 16: the fifth obtaining unit 16 is configured to obtain a preset voice recognition database;

the first determination unit 17: the first determining unit 17 is configured to determine a first recognition context and a first context information set from the preset speech recognition database according to the first affinity information;

Sixth obtaining unit 18: the sixth obtaining unit 18 is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set;

the first adjusting unit 19: the first adjusting unit 19 is configured to adjust the first identification result according to the first adjustment instruction.

Further, the device further comprises:

seventh obtaining unit: the seventh obtaining unit is configured to obtain first attribute information of a first microphone in the audio collecting device;

eighth obtaining unit: the eighth obtaining unit is configured to obtain first voice feature information of the first user according to the first voice information;

a ninth obtaining unit: the ninth obtaining unit is configured to obtain a first matching suitability between the first voice feature information and the first attribute information;

a second judgment unit: the second judging unit is used for judging whether the first matching suitability meets a preset matching suitability threshold value or not;

tenth obtaining unit: the tenth obtaining unit is configured to obtain a first operation result and a second adjustment instruction after performing a matching operation according to the preset matching degree threshold if the preset matching degree threshold is not satisfied;

A second adjusting unit: the second adjusting unit is used for adjusting the first attribute information according to the first operation result according to the second adjusting instruction.

Further, the device further comprises:

a first input unit: the first input unit is configured to input the first attribute information and the first speech feature information into a first training model, where the first training model is obtained through training of multiple sets of training data, and each set of training data in the multiple sets of training data includes: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;

eleventh obtaining unit: the eleventh obtaining unit is configured to obtain output information of the first training model, where the output information includes first matching suitability information between the first speech feature information and the first attribute information.

Further, the device further comprises:

a twelfth obtaining unit: the twelfth obtaining unit is used for obtaining surrounding environment information of the first user;

a third judgment unit: the third judging unit is used for judging whether the surrounding environment information meets a first preset condition or not;

Thirteenth obtaining unit: the thirteenth obtaining unit is configured to obtain network environment information where the voice recognition system is located if the first preset condition is satisfied;

fourth judgment unit: the fourth judging unit is used for judging whether the network environment information meets the requirement of receiving the voice signal of the first user;

fourteenth obtaining unit: the fourteenth obtaining unit is used for obtaining a first maintenance instruction if the requirement of receiving the voice signal of the first user is not met;

a first overhaul unit: the first overhaul unit is used for receiving first voice information input by the first user after overhaul is performed on the network environment according to the first overhaul instruction.

Further, the device further comprises:

fifth judging unit: the fifth judging unit is configured to judge whether the first user uses a first electronic device if the second user is not included in the first video information;

fifteenth obtaining unit: the fifteenth obtaining unit is configured to obtain application software usage information of the first electronic device after associating the first electronic device with the speech recognition system if the first user uses the first electronic device;

A second determination unit: the second determining unit is used for determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;

sixteenth obtaining unit: the sixteenth obtaining unit is configured to obtain a second adjustment instruction according to the second recognition context and the second context information set;

a third adjusting unit: the third adjusting unit is used for adjusting the first identification result according to the second adjusting instruction.

Further, the device further comprises:

sixth judgment unit: the sixth judging unit is configured to judge whether each sentence in the first voice information is a consistent sentence when the first recognition result is adjusted according to the first adjustment instruction;

seventeenth obtaining unit: the seventeenth obtaining unit is configured to obtain a first fetch instruction if the seventeenth obtaining unit is a non-consecutive sentence;

a first calling unit: the first calling unit is used for calling a first sentence and a second sentence according to the first calling instruction, wherein the first sentence is sentence information before the non-coherent sentence, and the second sentence is sentence information after the non-coherent sentence;

Eighteenth obtaining unit: the eighteenth obtaining unit is configured to obtain a second recognition result corresponding to the non-coherent sentence after performing speech recognition on the non-coherent sentence according to the first sentence and the second sentence.

Further, the device further comprises:

seventh judgment unit: the seventh judging unit is configured to judge whether the adjusted first identification result includes first privacy information of the first user;

a first generation unit: the first generation unit is used for generating a first verification code according to the first privacy information if the first privacy information is included, wherein the first verification code is in one-to-one correspondence with the first privacy information;

a second generation unit: the second generation unit is used for generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;

a first storage unit: the first storage unit is used for respectively copying and storing all privacy information and verification codes on M pieces of equipment, wherein M is a natural number larger than 1.

The foregoing various modifications and embodiments of a method for improving speech recognition accuracy in the first embodiment of fig. 1 are equally applicable to a device for improving speech recognition accuracy in this embodiment, and by the foregoing detailed description of a method for improving speech recognition accuracy, those skilled in the art will clearly know the implementation method of a device for improving speech recognition accuracy in this embodiment, so that the details will not be described again for brevity of description.

Example III

An electronic device of an embodiment of the present application is described below with reference to fig. 3.

Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.

Based on the inventive concept of a method for improving speech recognition accuracy as in the previous embodiments, the present invention further provides a system for improving speech recognition accuracy, on which a computer program is stored, which program, when being executed by a processor, implements the steps of any of the methods for improving speech recognition accuracy as described above.

Where in FIG. 3 a bus architecture (represented by bus 300), bus 300 may comprise any number of interconnected buses and bridges, with bus 300 linking together various circuits, including one or more processors, represented by processor 302, and memory, represented by memory 304. Bus 300 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 305 provides an interface between bus 300 and receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, while the memory 304 may be used to store data used by the processor 302 in performing operations.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for improving speech recognition accuracy, the method being applied to a speech recognition system, the speech recognition system having an audio acquisition device and a monitoring device, wherein the method comprises:

Acquiring first voice information of a first user;

obtaining a first recognition result according to the first voice information;

acquiring first video information within a first preset distance range of the first user through the monitoring equipment;

judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not;

if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user;

obtaining a preset voice recognition database;

determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;

obtaining a first adjustment instruction according to the first identification context and the first context information set;

adjusting the first identification result according to the first adjustment instruction;

wherein after the obtaining the first voice information of the first user, the method further comprises:

acquiring first attribute information of a first microphone in the audio acquisition device;

According to the first voice information, first voice characteristic information of the first user is obtained;

obtaining a first matching suitability between the first voice characteristic information and the first attribute information;

judging whether the first matching suitability meets a preset matching degree threshold value or not;

if the preset matching degree threshold is not met, after matching operation is carried out according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained;

according to the second adjustment instruction, adjusting the first attribute information according to the first operation result;

wherein the obtaining the first matching suitability between the first voice feature information and the first attribute information includes:

inputting the first attribute information and the first voice characteristic information into a first training model, wherein the first training model is obtained through training of multiple groups of training data, and each group of training data in the multiple groups of training data comprises: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;

and obtaining output information of the first training model, wherein the output information comprises first matching suitability information between the first voice characteristic information and the first attribute information.

2. The method of claim 1, wherein prior to obtaining the first voice information of the first user, the method further comprises:

acquiring surrounding environment information of the first user;

judging whether the surrounding environment information meets a first preset condition or not;

if the first preset condition is met, network environment information of the voice recognition system is obtained;

judging whether the network environment information meets the requirement of receiving the voice signal of the first user or not;

if the requirement of receiving the voice signal of the first user is not met, a first overhaul instruction is obtained;

and after the network environment is overhauled according to the first overhauling instruction, receiving first voice information input by the first user.

3. The method of claim 1, wherein the method further comprises:

judging whether the first user uses a first electronic device or not if the second user is not included in the first video information;

if the first user uses the first electronic device, after the first electronic device is associated with the voice recognition system, application software use information of the first electronic device is obtained;

Determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;

obtaining a second adjustment instruction according to the second identification context and the second context information set;

and adjusting the first identification result according to the second adjustment instruction.

4. The method of claim 1, wherein the method further comprises:

when the first recognition result is adjusted according to the first adjustment instruction, judging whether each sentence in the first voice information is a continuous sentence or not;

if the sentence is a non-coherent sentence, a first calling instruction is obtained;

according to the first calling instruction, a first sentence and a second sentence are called, wherein the first sentence is sentence information before the non-coherent sentence, and the second sentence is sentence information after the non-coherent sentence;

and carrying out voice recognition on the non-coherent sentences according to the first sentences and the second sentences to obtain second recognition results corresponding to the non-coherent sentences.

5. The method of claim 1, wherein after the adjusting the first recognition result according to the first adjustment instruction, the method further comprises:

Judging whether the adjusted first identification result comprises first privacy information of the first user or not;

if the first private information is included, generating a first verification code according to the first private information, wherein the first verification code is in one-to-one correspondence with the first private information;

generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;

and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1.

6. An apparatus for improving speech recognition accuracy, wherein the apparatus comprises:

a first obtaining unit: the first obtaining unit is used for obtaining first voice information of a first user;

a second obtaining unit: the second obtaining unit is used for obtaining a first recognition result according to the first voice information;

a third obtaining unit: the third obtaining unit is used for obtaining first video information in a first preset distance range of the first user through monitoring equipment;

a first judgment unit: the first judging unit is used for judging whether the first video information comprises a second user or not, and whether the first user and the second user have voice interaction behaviors or not;

Fourth obtaining unit: the fourth obtaining unit is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and the first user and the second user have a voice interaction behavior;

fifth obtaining unit: the fifth obtaining unit is used for obtaining a preset voice recognition database;

a first determination unit: the first determining unit is used for determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;

sixth obtaining unit: the sixth obtaining unit is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set;

a first adjusting unit: the first adjusting unit is used for adjusting the first identification result according to the first adjusting instruction;

the apparatus further comprises:

a second adjusting unit: the second adjusting unit is used for adjusting the first attribute information according to the first operation result according to the second adjusting instruction;

7. An apparatus for improving speech recognition accuracy comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-5 when the program is executed by the processor.