CN112908304B - Method and device for improving voice recognition accuracy - Google Patents

Method and device for improving voice recognition accuracy Download PDF

Info

Publication number
CN112908304B
CN112908304B CN202110130305.4A CN202110130305A CN112908304B CN 112908304 B CN112908304 B CN 112908304B CN 202110130305 A CN202110130305 A CN 202110130305A CN 112908304 B CN112908304 B CN 112908304B
Authority
CN
China
Prior art keywords
information
user
voice
obtaining
obtaining unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110130305.4A
Other languages
Chinese (zh)
Other versions
CN112908304A (en
Inventor
陶贵宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tonglian Financial Network Technology Service Co ltd
Original Assignee
Shenzhen Tonglian Financial Network Technology Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tonglian Financial Network Technology Service Co ltd filed Critical Shenzhen Tonglian Financial Network Technology Service Co ltd
Priority to CN202110130305.4A priority Critical patent/CN112908304B/en
Publication of CN112908304A publication Critical patent/CN112908304A/en
Application granted granted Critical
Publication of CN112908304B publication Critical patent/CN112908304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a method and a device for improving voice recognition accuracy, wherein the method comprises the following steps: obtaining first voice information of a first user, and further obtaining a first recognition result; acquiring first video information in a first preset distance range of a first user through monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the voice interaction behavior is included and exists, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from a preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction. The technical problem that the voice cannot be accurately identified in the prior art is solved.

Description

Method and device for improving voice recognition accuracy
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a method and apparatus for improving speech recognition accuracy.
Background
With the progress of data processing technology and the rapid popularization of mobile internet, computer technology is widely applied to various fields of society, and mass data is generated accordingly, wherein voice data is more and more valued by people, and the voice data gradually enter various fields of industry, home appliances, communication, consumer electronics and the like, so that the voice data has wider development prospect and application field in actual life.
However, in the process of implementing the technical scheme of the invention in the embodiment of the application, the inventor of the application finds that at least the following technical problems exist in the above technology:
due to the influence of external environment, self limitation and other factors, accurate voice recognition cannot be performed, so that the result obtained by voice recognition has great difference from the actual voice content.
Disclosure of Invention
According to the method and the device for improving the voice recognition accuracy, the technical problem that the voice cannot be accurately recognized in the prior art is solved, and the technical effect that the voice recognition accuracy is improved by establishing a powerful voice recognition database and combining specific contexts is achieved.
The embodiment of the application provides a method for improving voice recognition accuracy, wherein the method further comprises the following steps: acquiring first voice information of a first user; obtaining a first recognition result according to the first voice information; acquiring first video information within a first preset distance range of the first user through the monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction.
On the other hand, the application also provides a device for improving the voice recognition accuracy, wherein the device comprises: a first obtaining unit: the first obtaining unit is used for obtaining first voice information of a first user; a second obtaining unit: the second obtaining unit is used for obtaining a first recognition result according to the first voice information; a third obtaining unit: the third obtaining unit is used for obtaining first video information in a first preset distance range of the first user through the monitoring equipment; a first judgment unit: the first judging unit is used for judging whether the first video information comprises a second user or not, and whether the first user and the second user have voice interaction behaviors or not; fourth obtaining unit: the fourth obtaining unit is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and the first user and the second user have a voice interaction behavior; fifth obtaining unit: the fifth obtaining unit is used for obtaining a preset voice recognition database; a first determination unit: the first determining unit is used for determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; sixth obtaining unit: the sixth obtaining unit is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set; a first adjusting unit: the first adjusting unit is used for adjusting the first identification result according to the first adjusting instruction.
One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
the video information within the distance range is obtained according to the monitoring equipment, the scene information of the first user speaking is obtained according to the video information, and the first voice information is adjusted and intelligently identified by combining a specific voice identification database, so that the technical effect of improving the voice identification accuracy is achieved.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
FIG. 1 is a flow chart of a method for improving speech recognition accuracy according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a device for improving accuracy of voice recognition according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.
Reference numerals illustrate: the device comprises a first obtaining unit 11, a second obtaining unit 12, a third obtaining unit 13, a first judging unit 14, a fourth obtaining unit 15, a fifth obtaining unit 16, a first determining unit 17, a sixth obtaining unit 18, a first adjusting unit 19, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304 and a bus interface 305.
Detailed Description
According to the method and the device for improving the voice recognition accuracy, the technical problem that the voice cannot be accurately recognized in the prior art is solved, and the technical effect that the voice recognition accuracy is improved by establishing a powerful voice recognition database and combining specific contexts is achieved.
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Summary of the application
With the progress of data processing technology and the rapid popularization of mobile internet, computer technology is widely applied to various fields of society, and mass data is generated accordingly, wherein voice data is more and more valued by people, and the voice data gradually enter various fields of industry, home appliances, communication, consumer electronics and the like, so that the voice data has wider development prospect and application field in actual life. Due to the influence of external environment, self limitation and other factors, accurate voice recognition cannot be performed, so that the result obtained by voice recognition has great difference from the actual voice content.
Aiming at the technical problems, the technical scheme provided by the application has the following overall thought:
the embodiment of the application provides a method for improving voice recognition accuracy, wherein the method further comprises the following steps: acquiring first voice information of a first user; obtaining a first recognition result according to the first voice information; acquiring first video information within a first preset distance range of the first user through the monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
Example 1
As shown in fig. 1, an embodiment of the present application provides a method for improving speech recognition accuracy, where the method further includes:
step S100: acquiring first voice information of a first user;
specifically, the first user is a speaking user, and the first voice information is voice information of speaking of the first user, including information such as speaking content, speaking volume, speaking tone and the like, so as to recognize the first voice information.
Step S200: obtaining a first recognition result according to the first voice information;
specifically, the first voice information is known, the first voice information can be identified, the first identification result is the result identified according to the first voice information, and the first identification result comprises the identified content, whether the identified content is accurate or not, and the like.
Step S300: acquiring first video information within a first preset distance range of the first user through the monitoring equipment;
specifically, the monitoring device is installed around the first user, the first video information is the video information within a first preset distance range of the first user, which is further understood as a distance range within five meters from the first user, and the first preset distance range is not described in detail herein according to the specific situation.
Step S400: judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not;
specifically, the second user is different from the first user, and whether the first video information includes the second user or not is judged, and whether the first user and the second user have voice interaction behaviors, namely, whether the first user and the second user are simultaneously present in the first video information, whether the first user and the second user are speaking or not, and the like is judged.
Step S500: if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user;
specifically, the first affinity information is affinity information between the first user and the second user, and can be determined according to information such as speaking content, speaking expression, speaking gesture, speaking mood and the like of the first user and the second user, and when the first video information includes the second user and the first user and the second user have voice interaction behaviors, the first affinity information of the first user and the second user can be obtained.
Step S600: obtaining a preset voice recognition database;
specifically, the preset voice recognition database is a voice recognition database for calling in various preset situations and occasions, and further can be understood as that word information such as reporting, summarizing, scheme and the like may be provided when a meeting is taken, and the corresponding atmosphere meeting is more formal.
Step S700: determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;
specifically, the first recognition context is determined for a scene of a conversation, and is determined to be a meeting or a learning in class, and the first context information set is an information set corresponding to the first recognition context, including various types of information related to the first recognition context, and further can be understood that when the scene is a meeting, the first context information set includes various meeting information sets, a meeting of a minor department, a meeting of a major group, and the like.
Step S800: obtaining a first adjustment instruction according to the first identification context and the first context information set;
specifically, the first adjustment instruction is configured to adjust the first recognition result, and may adjust the first recognition result according to the first recognition context and the first context information set, so that the first recognition result is more accurate and is more fit with the context.
Step S900: and adjusting the first identification result according to the first adjustment instruction.
Specifically, the first adjustment instruction is known, and the first recognition result is adjusted according to the first adjustment instruction, which further means that when a winery enterprise plays a long-term role, the wine is understood to be a tasty wine according to a specific context, and the tasty wine is output as the recognition result.
After the first voice information of the first user is obtained, step S100 further includes:
step S110: acquiring first attribute information of a first microphone in the audio acquisition device;
step S120: according to the first voice information, first voice characteristic information of the first user is obtained;
step S130: obtaining a first matching suitability between the first voice characteristic information and the first attribute information;
step S140: judging whether the first matching suitability meets a preset matching degree threshold value or not;
step S150: if the preset matching degree threshold is not met, after matching operation is carried out according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained;
step S160: and according to the second adjustment instruction, adjusting the first attribute information according to the first operation result.
In particular, in order to accurately identify the first voice information, the audio acquisition device used by speaking of the first user can be ensured to work normally, the first attribute information of the first microphone in the audio acquisition device is obtained, the first attribute information comprises information such as the voice speed, the voice volume, the voice tone and the like of the first microphone, the first voice characteristic information of the first user can be obtained according to the first voice information, the first voice characteristic information is information such as the voice volume, the voice tone and the voice tone of the speaking of the first user, the first matching suitability between the first voice characteristic information and the first attribute information is obtained, the first matching suitability is the matching degree information between the first voice characteristic information and the first attribute information, and whether the first matching suitability meets a preset matching degree threshold value is judged, namely, whether the first matching fitness meets the expected matching value or not, if the first matching fitness does not meet the preset matching degree threshold, after matching operation is performed according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained, wherein the first operation result is an operation result obtained after correction of the first matching fitness according to the preset matching degree threshold, the second adjustment instruction is an operation result obtained after correction of the first matching fitness according to the first operation result, and specifically, it can be understood that when the first user speaks for a male, the first attribute phenomenon is adjusted because the voice of the male is relatively bright, the attribute of the first microphone is better, the volume is higher, when the first user speaks, the phenomena such as sound breaking, noise and the like are caused by overlarge volume, the audio acquisition and the voice recognition are unfavorable, the output voice is normally recognizable, and the attribute information of the first microphone is adjusted according to the voice characteristic information of the user, so that the technical effects of enabling the output voice information to be clear, enabling the content to be clear and accurately recognizing the voice are achieved.
The step S130 further includes:
step S131: inputting the first attribute information and the first voice characteristic information into a first training model, wherein the first training model is obtained through training of multiple groups of training data, and each group of training data in the multiple groups of training data comprises: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;
step S132: and obtaining output information of the first training model, wherein the output information comprises first matching suitability information between the first voice characteristic information and the first attribute information.
Specifically, the first attribute information and the first voice characteristic information are input into the first training model to be continuously trained, so that the output training result can be more accurate. The training model is a Neural network model, namely a Neural network model in machine learning, and a Neural Network (NN) is a complex Neural network system formed by a large number of simple processing units (called neurons) widely connected with each other, reflects many basic characteristics of brain functions of a human, and is a highly complex nonlinear power learning system. The neural network model is described based on a mathematical model of neurons. An artificial neural network (Artificial Neural Networks) is a description of the first order nature of the human brain system. In brief, it is a mathematical model. In this embodiment of the present application, the first attribute information and the first voice feature information are input into a first training model, and the identified matching fitness level information is used to train the neural network model.
Further, the process of training the neural network model is essentially a process of supervised learning. The plurality of sets of training data specifically comprises: the first attribute information, the first voice feature information, and identification information for identifying a matching fitness level. The neural network model outputs first matching suitability information between the first voice characteristic information and the first attribute information by inputting the first attribute information and the first voice characteristic information, checks the output information with the matching suitability grade information with the identification function, and if the output information is consistent with the matching suitability grade information with the identification function, the data supervision learning is completed, and then the next group of data supervision learning is carried out; and if the output information is inconsistent with the requirement of the matching suitability level information with the identification function, the neural network learning model adjusts itself until the output result of the neural network learning model is consistent with the requirement of the matching suitability level information with the identification function, and supervised learning of the next group of data is performed. The neural network learning model is continuously corrected and optimized through training data, the accuracy of the neural network learning model for processing the information is improved through a supervised learning process, and further the technical effect that the first matching fitness information between the first voice characteristic information and the first attribute information is more accurate is achieved.
Before the first voice information of the first user is obtained, the embodiment of the application further includes:
step S1010: acquiring surrounding environment information of the first user;
step S1020: judging whether the surrounding environment information meets a first preset condition or not;
step S1030: if the first preset condition is met, network environment information of the voice recognition system is obtained;
step S1040: judging whether the network environment information meets the requirement of receiving the voice signal of the first user or not;
step S1050: if the requirement of receiving the voice signal of the first user is not met, a first overhaul instruction is obtained;
step S1060: and after the network environment is overhauled according to the first overhauling instruction, receiving first voice information input by the first user.
Specifically, before the first voice information of the first user is obtained, the situation that the surrounding environment information of the first user is normal can be ensured, the influence on the first voice information is not generated, whether the surrounding environment information meets a first preset condition or not is judged, namely, when construction exists around, noise can influence the first voice information, the situation that the surrounding environment information cannot interfere with the speaking of the first user needs to be ensured, if the first preset condition is met, namely, the surrounding environment information cannot interfere with the speaking of the first user, the network environment information of the voice recognition system is obtained, the network environment information comprises information such as whether a network is unobstructed or not, the network speed is high or low, further, whether the network environment information meets the requirement of receiving the voice signal of the first user is judged, namely, the situation that the network environment information needs to meet the receiving of the voice signal is ensured, and if the requirement of receiving the voice signal of the first user is not met, the first command is obtained after the network environment is overhauled, the voice information input by the first user is received, the voice information is accurately received through the first user, the voice recognition effect is ensured, and the voice information of the first user is better, the voice recognition effect is achieved, and the voice information is better, and the voice recognition effect is better.
In order to improve the voice recognition accuracy of the first user, the embodiment of the application further includes:
step S1110: judging whether the first user uses a first electronic device or not if the second user is not included in the first video information;
step S1120: if the first user uses the first electronic device, after the first electronic device is associated with the voice recognition system, application software use information of the first electronic device is obtained;
step S1130: determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;
step S1140: obtaining a second adjustment instruction according to the second identification context and the second context information set;
step S1150: and adjusting the first identification result according to the second adjustment instruction.
Specifically, when judging whether the first video information includes the second user, if the first video information does not include the second user, that is, if the first video information includes only the first user, it is judged whether the first user uses a first electronic device, where the first electronic device includes whether to use related voice call software, for example, conference, facetime call, etc., if the first user uses the first electronic device, the first electronic device is associated with the voice recognition system, and further, application software use information of the first electronic device is obtained, and the application software information includes software capable of performing teleconference, and further, according to the application software use information, from the preset voice recognition database, a second recognition context and a second context information set are determined, where the second recognition context and the second context information set are obtained according to call content of the first user and the preset voice recognition data base, and if the first user uses the first electronic device, the first electronic device and the second electronic device are associated, and further, the second context information is obtained according to the second recognition context information, and the first context information is adjusted, and the voice recognition system is adjusted, and the voice recognition result is further, and the voice recognition system is adjusted.
In order to ensure complete coherence of the first voice information, embodiments of the present application further include:
step S1210: when the first recognition result is adjusted according to the first adjustment instruction, judging whether each sentence in the first voice information is a continuous sentence or not;
step S1220: if the sentence is a non-coherent sentence, a first calling instruction is obtained;
step S1230: according to the first calling instruction, a first sentence and a second sentence are called, wherein the first sentence is sentence information before the incoherent sentence, and the second sentence is sentence information after the incoherent sentence;
step S1240: and carrying out voice recognition on the non-coherent sentences according to the first sentences and the second sentences to obtain second recognition results corresponding to the non-coherent sentences.
Specifically, in order to ensure complete coherence of the first voice information, when the first recognition result is adjusted according to the first adjustment instruction, whether each sentence in the first voice information is a coherent sentence or not is judged, that is, each sentence needs to be ensured to be a coherent sentence, if the sentence is a non-coherent sentence, a first retrieval instruction is obtained, the first retrieval instruction is to retrieve the first sentence and the second sentence, the first sentence is sentence information before the non-coherent sentence, the second sentence is sentence information after the non-coherent sentence, and further, voice recognition is performed on the non-coherent sentence according to the first sentence and the second sentence, that is, the non-coherent sentence is supplemented through analyzing the front sentence and the rear sentence, so that a second recognition result corresponding to the non-coherent sentence is obtained, and the non-coherent sentence is supplemented, so that the complete coherence of the first voice information is ensured, and further, the technical effect of improving the voice recognition accuracy is achieved.
After the first recognition result is adjusted according to the first adjustment instruction, step S900 further includes:
step S910: judging whether the adjusted first identification result comprises first privacy information of the first user or not;
step S920: if the first private information is included, generating a first verification code according to the first private information, wherein the first verification code is in one-to-one correspondence with the first private information;
step S930: generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;
step S940: and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1.
Specifically, in order to protect the content privacy of the first voice information, after the first recognition result is adjusted according to the first adjustment instruction, whether the first recognition result includes first privacy information of the first user is judged, wherein the first privacy information is the content privacy in the first voice information, and can be understood to include information such as a bank card number and an account password mentioned by the first user, and if the first recognition result includes the first privacy information of the first user, encryption processing based on a block chain can be performed on the privacy information, so that storage security of the privacy information is ensured and tampering is not performed.
Generating a first verification code according to the first privacy information, wherein the first verification code is in one-to-one correspondence with the first privacy information; generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1; and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1. The private information of the first user is stored in an encrypted manner, wherein each device corresponds to a node, all nodes are combined to form a blockchain, and the blockchain forms a general ledger which is convenient to verify (the Hash value of the last block is equivalent to the whole version of verification as long as the Hash value of the last block is verified), and cannot be changed (the Hash value of all subsequent blocks can be changed due to the change of any transaction information, so that the general ledger cannot pass during verification).
The blockchain system adopts a distributed data form, so that each participating node can obtain a complete database backup, and unless 51% of nodes in the whole system can be controlled simultaneously, the modification of the database by a single node is invalid, and the data content on other nodes cannot be influenced. Thus, the more nodes that participate in the system, the more computationally intensive the data security in the system. The encryption processing of the privacy information of the first user based on the blockchain effectively ensures the storage safety of the privacy information of the first user, and achieves the technical effect of safely recording and storing the privacy information of the first user.
In order to make the storage of the private information of the first user more efficient and quick, the embodiment of the application further comprises:
step S840: taking the Nth privacy information and the N-1 th verification code as an Nth block;
step S850: obtaining the recording time of the nth block, wherein the recording time of the nth block represents the time of the nth block needing to be recorded;
step S860: according to the N-th block recording time, obtaining first equipment with strongest operation speed in the M pieces of equipment;
step S870: and transmitting the recording right of the Nth block to the first equipment.
Specifically, when the private information of the first user is subjected to the encryption operation based on the blockchain, in order to obtain more efficient operation and storage rate, the nth block recording time can be obtained, wherein the nth block recording time represents the time that the nth block needs to be recorded; further, according to the recording time of the Nth block, obtaining a first device with the strongest operation speed in the M devices; the recording right of the Nth block is sent to the first equipment, so that the safe, effective and stable operation of the decentralizing blockchain system is guaranteed, the block can be guaranteed to be recorded in the equipment rapidly and accurately, the information safety is guaranteed, the privacy information of the first user is accurately judged, and the technical effect that the privacy information of the first user is stored and recorded more rapidly and efficiently is achieved.
In summary, the method and the device for improving the accuracy of voice recognition provided by the embodiments of the present application have the following technical effects:
1. the video information within the distance range is obtained according to the monitoring equipment, the scene information of the first user speaking is obtained according to the video information, and the first voice information is adjusted and intelligently identified by combining a specific voice identification database, so that the technical effect of improving the voice identification accuracy is achieved.
2. The microphone is adjusted according to the voice characteristic information of the first user, personalized requirements can be met on the basis of meeting general requirements, real-time network environment information is combined, the first voice signal can be completely received, incoherent sentences are connected in series through front and rear semantics, the first sentence information is smooth, and the technical effect of improving voice recognition accuracy is achieved.
Example two
Based on the same inventive concept as the method for improving the accuracy of voice recognition in the foregoing embodiment, the present invention further provides a device for improving the accuracy of voice recognition, as shown in fig. 2, where the device includes:
The first obtaining unit 11: the first obtaining unit 11 is configured to obtain first voice information of a first user;
the second obtaining unit 12: the second obtaining unit 12 is configured to obtain a first recognition result according to the first voice information;
the third obtaining unit 13: the third obtaining unit 13 is configured to obtain, by using the monitoring device, first video information within a first preset distance range of the first user;
the first judgment unit 14: the first determining unit 14 is configured to determine whether a second user is included in the first video information, and whether a voice interaction exists between the first user and the second user;
fourth obtaining unit 15: the fourth obtaining unit 15 is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and there is a voice interaction between the first user and the second user;
fifth obtaining unit 16: the fifth obtaining unit 16 is configured to obtain a preset voice recognition database;
the first determination unit 17: the first determining unit 17 is configured to determine a first recognition context and a first context information set from the preset speech recognition database according to the first affinity information;
Sixth obtaining unit 18: the sixth obtaining unit 18 is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set;
the first adjusting unit 19: the first adjusting unit 19 is configured to adjust the first identification result according to the first adjustment instruction.
Further, the device further comprises:
seventh obtaining unit: the seventh obtaining unit is configured to obtain first attribute information of a first microphone in the audio collecting device;
eighth obtaining unit: the eighth obtaining unit is configured to obtain first voice feature information of the first user according to the first voice information;
a ninth obtaining unit: the ninth obtaining unit is configured to obtain a first matching suitability between the first voice feature information and the first attribute information;
a second judgment unit: the second judging unit is used for judging whether the first matching suitability meets a preset matching suitability threshold value or not;
tenth obtaining unit: the tenth obtaining unit is configured to obtain a first operation result and a second adjustment instruction after performing a matching operation according to the preset matching degree threshold if the preset matching degree threshold is not satisfied;
A second adjusting unit: the second adjusting unit is used for adjusting the first attribute information according to the first operation result according to the second adjusting instruction.
Further, the device further comprises:
a first input unit: the first input unit is configured to input the first attribute information and the first speech feature information into a first training model, where the first training model is obtained through training of multiple sets of training data, and each set of training data in the multiple sets of training data includes: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;
eleventh obtaining unit: the eleventh obtaining unit is configured to obtain output information of the first training model, where the output information includes first matching suitability information between the first speech feature information and the first attribute information.
Further, the device further comprises:
a twelfth obtaining unit: the twelfth obtaining unit is used for obtaining surrounding environment information of the first user;
a third judgment unit: the third judging unit is used for judging whether the surrounding environment information meets a first preset condition or not;
Thirteenth obtaining unit: the thirteenth obtaining unit is configured to obtain network environment information where the voice recognition system is located if the first preset condition is satisfied;
fourth judgment unit: the fourth judging unit is used for judging whether the network environment information meets the requirement of receiving the voice signal of the first user;
fourteenth obtaining unit: the fourteenth obtaining unit is used for obtaining a first maintenance instruction if the requirement of receiving the voice signal of the first user is not met;
a first overhaul unit: the first overhaul unit is used for receiving first voice information input by the first user after overhaul is performed on the network environment according to the first overhaul instruction.
Further, the device further comprises:
fifth judging unit: the fifth judging unit is configured to judge whether the first user uses a first electronic device if the second user is not included in the first video information;
fifteenth obtaining unit: the fifteenth obtaining unit is configured to obtain application software usage information of the first electronic device after associating the first electronic device with the speech recognition system if the first user uses the first electronic device;
A second determination unit: the second determining unit is used for determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;
sixteenth obtaining unit: the sixteenth obtaining unit is configured to obtain a second adjustment instruction according to the second recognition context and the second context information set;
a third adjusting unit: the third adjusting unit is used for adjusting the first identification result according to the second adjusting instruction.
Further, the device further comprises:
sixth judgment unit: the sixth judging unit is configured to judge whether each sentence in the first voice information is a consistent sentence when the first recognition result is adjusted according to the first adjustment instruction;
seventeenth obtaining unit: the seventeenth obtaining unit is configured to obtain a first fetch instruction if the seventeenth obtaining unit is a non-consecutive sentence;
a first calling unit: the first calling unit is used for calling a first sentence and a second sentence according to the first calling instruction, wherein the first sentence is sentence information before the non-coherent sentence, and the second sentence is sentence information after the non-coherent sentence;
Eighteenth obtaining unit: the eighteenth obtaining unit is configured to obtain a second recognition result corresponding to the non-coherent sentence after performing speech recognition on the non-coherent sentence according to the first sentence and the second sentence.
Further, the device further comprises:
seventh judgment unit: the seventh judging unit is configured to judge whether the adjusted first identification result includes first privacy information of the first user;
a first generation unit: the first generation unit is used for generating a first verification code according to the first privacy information if the first privacy information is included, wherein the first verification code is in one-to-one correspondence with the first privacy information;
a second generation unit: the second generation unit is used for generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;
a first storage unit: the first storage unit is used for respectively copying and storing all privacy information and verification codes on M pieces of equipment, wherein M is a natural number larger than 1.
The foregoing various modifications and embodiments of a method for improving speech recognition accuracy in the first embodiment of fig. 1 are equally applicable to a device for improving speech recognition accuracy in this embodiment, and by the foregoing detailed description of a method for improving speech recognition accuracy, those skilled in the art will clearly know the implementation method of a device for improving speech recognition accuracy in this embodiment, so that the details will not be described again for brevity of description.
Example III
An electronic device of an embodiment of the present application is described below with reference to fig. 3.
Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.
Based on the inventive concept of a method for improving speech recognition accuracy as in the previous embodiments, the present invention further provides a system for improving speech recognition accuracy, on which a computer program is stored, which program, when being executed by a processor, implements the steps of any of the methods for improving speech recognition accuracy as described above.
Where in FIG. 3 a bus architecture (represented by bus 300), bus 300 may comprise any number of interconnected buses and bridges, with bus 300 linking together various circuits, including one or more processors, represented by processor 302, and memory, represented by memory 304. Bus 300 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 305 provides an interface between bus 300 and receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, while the memory 304 may be used to store data used by the processor 302 in performing operations.
The embodiment of the application provides a method for improving voice recognition accuracy, wherein the method further comprises the following steps: acquiring first voice information of a first user; obtaining a first recognition result according to the first voice information; acquiring first video information within a first preset distance range of the first user through the monitoring equipment; judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not; if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user; obtaining a preset voice recognition database; determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information; obtaining a first adjustment instruction according to the first identification context and the first context information set; and adjusting the first identification result according to the first adjustment instruction.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (7)

1. A method for improving speech recognition accuracy, the method being applied to a speech recognition system, the speech recognition system having an audio acquisition device and a monitoring device, wherein the method comprises:
Acquiring first voice information of a first user;
obtaining a first recognition result according to the first voice information;
acquiring first video information within a first preset distance range of the first user through the monitoring equipment;
judging whether the first video information comprises a second user or not, and judging whether the first user and the second user have voice interaction behaviors or not;
if the first video information comprises the second user and the first user and the second user have voice interaction behaviors, obtaining first intimacy information of the first user and the second user;
obtaining a preset voice recognition database;
determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;
obtaining a first adjustment instruction according to the first identification context and the first context information set;
adjusting the first identification result according to the first adjustment instruction;
wherein after the obtaining the first voice information of the first user, the method further comprises:
acquiring first attribute information of a first microphone in the audio acquisition device;
According to the first voice information, first voice characteristic information of the first user is obtained;
obtaining a first matching suitability between the first voice characteristic information and the first attribute information;
judging whether the first matching suitability meets a preset matching degree threshold value or not;
if the preset matching degree threshold is not met, after matching operation is carried out according to the preset matching degree threshold, a first operation result and a second adjustment instruction are obtained;
according to the second adjustment instruction, adjusting the first attribute information according to the first operation result;
wherein the obtaining the first matching suitability between the first voice feature information and the first attribute information includes:
inputting the first attribute information and the first voice characteristic information into a first training model, wherein the first training model is obtained through training of multiple groups of training data, and each group of training data in the multiple groups of training data comprises: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;
and obtaining output information of the first training model, wherein the output information comprises first matching suitability information between the first voice characteristic information and the first attribute information.
2. The method of claim 1, wherein prior to obtaining the first voice information of the first user, the method further comprises:
acquiring surrounding environment information of the first user;
judging whether the surrounding environment information meets a first preset condition or not;
if the first preset condition is met, network environment information of the voice recognition system is obtained;
judging whether the network environment information meets the requirement of receiving the voice signal of the first user or not;
if the requirement of receiving the voice signal of the first user is not met, a first overhaul instruction is obtained;
and after the network environment is overhauled according to the first overhauling instruction, receiving first voice information input by the first user.
3. The method of claim 1, wherein the method further comprises:
judging whether the first user uses a first electronic device or not if the second user is not included in the first video information;
if the first user uses the first electronic device, after the first electronic device is associated with the voice recognition system, application software use information of the first electronic device is obtained;
Determining a second recognition context and a second context information set from the preset voice recognition database according to the application software use information;
obtaining a second adjustment instruction according to the second identification context and the second context information set;
and adjusting the first identification result according to the second adjustment instruction.
4. The method of claim 1, wherein the method further comprises:
when the first recognition result is adjusted according to the first adjustment instruction, judging whether each sentence in the first voice information is a continuous sentence or not;
if the sentence is a non-coherent sentence, a first calling instruction is obtained;
according to the first calling instruction, a first sentence and a second sentence are called, wherein the first sentence is sentence information before the non-coherent sentence, and the second sentence is sentence information after the non-coherent sentence;
and carrying out voice recognition on the non-coherent sentences according to the first sentences and the second sentences to obtain second recognition results corresponding to the non-coherent sentences.
5. The method of claim 1, wherein after the adjusting the first recognition result according to the first adjustment instruction, the method further comprises:
Judging whether the adjusted first identification result comprises first privacy information of the first user or not;
if the first private information is included, generating a first verification code according to the first private information, wherein the first verification code is in one-to-one correspondence with the first private information;
generating a second verification code according to the second privacy information and the first verification code; and so on, generating an Nth verification code according to the Nth privacy information and the N-1 th verification code, wherein N is a natural number larger than 1;
and respectively copying and storing all the privacy information and the verification codes on M pieces of equipment, wherein M is a natural number larger than 1.
6. An apparatus for improving speech recognition accuracy, wherein the apparatus comprises:
a first obtaining unit: the first obtaining unit is used for obtaining first voice information of a first user;
a second obtaining unit: the second obtaining unit is used for obtaining a first recognition result according to the first voice information;
a third obtaining unit: the third obtaining unit is used for obtaining first video information in a first preset distance range of the first user through monitoring equipment;
a first judgment unit: the first judging unit is used for judging whether the first video information comprises a second user or not, and whether the first user and the second user have voice interaction behaviors or not;
Fourth obtaining unit: the fourth obtaining unit is configured to obtain first affinity information of the first user and the second user if the second user is included in the first video information and the first user and the second user have a voice interaction behavior;
fifth obtaining unit: the fifth obtaining unit is used for obtaining a preset voice recognition database;
a first determination unit: the first determining unit is used for determining a first recognition context and a first context information set from the preset voice recognition database according to the first intimacy information;
sixth obtaining unit: the sixth obtaining unit is configured to obtain a first adjustment instruction according to the first recognition context and the first context information set;
a first adjusting unit: the first adjusting unit is used for adjusting the first identification result according to the first adjusting instruction;
the apparatus further comprises:
seventh obtaining unit: the seventh obtaining unit is configured to obtain first attribute information of a first microphone in the audio collecting device;
eighth obtaining unit: the eighth obtaining unit is configured to obtain first voice feature information of the first user according to the first voice information;
A ninth obtaining unit: the ninth obtaining unit is configured to obtain a first matching suitability between the first voice feature information and the first attribute information;
a second judgment unit: the second judging unit is used for judging whether the first matching suitability meets a preset matching suitability threshold value or not;
tenth obtaining unit: the tenth obtaining unit is configured to obtain a first operation result and a second adjustment instruction after performing a matching operation according to the preset matching degree threshold if the preset matching degree threshold is not satisfied;
a second adjusting unit: the second adjusting unit is used for adjusting the first attribute information according to the first operation result according to the second adjusting instruction;
a first input unit: the first input unit is configured to input the first attribute information and the first speech feature information into a first training model, where the first training model is obtained through training of multiple sets of training data, and each set of training data in the multiple sets of training data includes: the first attribute information, the first voice characteristic information and the identification information for identifying the matching fitness level;
eleventh obtaining unit: the eleventh obtaining unit is configured to obtain output information of the first training model, where the output information includes first matching suitability information between the first speech feature information and the first attribute information.
7. An apparatus for improving speech recognition accuracy comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-5 when the program is executed by the processor.
CN202110130305.4A 2021-01-29 2021-01-29 Method and device for improving voice recognition accuracy Active CN112908304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110130305.4A CN112908304B (en) 2021-01-29 2021-01-29 Method and device for improving voice recognition accuracy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110130305.4A CN112908304B (en) 2021-01-29 2021-01-29 Method and device for improving voice recognition accuracy

Publications (2)

Publication Number Publication Date
CN112908304A CN112908304A (en) 2021-06-04
CN112908304B true CN112908304B (en) 2024-03-26

Family

ID=76121687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110130305.4A Active CN112908304B (en) 2021-01-29 2021-01-29 Method and device for improving voice recognition accuracy

Country Status (1)

Country Link
CN (1) CN112908304B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257603A (en) * 2017-12-05 2018-07-06 湖南海翼电子商务股份有限公司 Multimedia volume adjustment device and multimedia volume adjusting method
CN109961780A (en) * 2017-12-22 2019-07-02 深圳市优必选科技有限公司 A kind of man-machine interaction method, device, server and storage medium
CN110047487A (en) * 2019-06-05 2019-07-23 广州小鹏汽车科技有限公司 Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment
CN110245253A (en) * 2019-05-21 2019-09-17 华中师范大学 A kind of Semantic interaction method and system based on environmental information
KR20190106887A (en) * 2019-08-28 2019-09-18 엘지전자 주식회사 Method and device for providing information
CN110890090A (en) * 2018-09-11 2020-03-17 涂悦 Context-based auxiliary interaction control method and system
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN111611349A (en) * 2020-05-26 2020-09-01 深圳壹账通智能科技有限公司 Voice query method and device, computer equipment and storage medium
CN112035879A (en) * 2020-09-04 2020-12-04 昆明理工大学 Information processing method and system for improving confidentiality of automatic logistics of cell
CN112084509A (en) * 2020-08-19 2020-12-15 喻婷婷 Block chain key generation method and system based on biological identification technology
CN112102833A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Voice recognition method, device, equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257603A (en) * 2017-12-05 2018-07-06 湖南海翼电子商务股份有限公司 Multimedia volume adjustment device and multimedia volume adjusting method
CN109961780A (en) * 2017-12-22 2019-07-02 深圳市优必选科技有限公司 A kind of man-machine interaction method, device, server and storage medium
CN110890090A (en) * 2018-09-11 2020-03-17 涂悦 Context-based auxiliary interaction control method and system
CN111508482A (en) * 2019-01-11 2020-08-07 阿里巴巴集团控股有限公司 Semantic understanding and voice interaction method, device, equipment and storage medium
CN110245253A (en) * 2019-05-21 2019-09-17 华中师范大学 A kind of Semantic interaction method and system based on environmental information
CN110047487A (en) * 2019-06-05 2019-07-23 广州小鹏汽车科技有限公司 Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment
KR20190106887A (en) * 2019-08-28 2019-09-18 엘지전자 주식회사 Method and device for providing information
CN111611349A (en) * 2020-05-26 2020-09-01 深圳壹账通智能科技有限公司 Voice query method and device, computer equipment and storage medium
CN112084509A (en) * 2020-08-19 2020-12-15 喻婷婷 Block chain key generation method and system based on biological identification technology
CN112035879A (en) * 2020-09-04 2020-12-04 昆明理工大学 Information processing method and system for improving confidentiality of automatic logistics of cell
CN112102833A (en) * 2020-09-22 2020-12-18 北京百度网讯科技有限公司 Voice recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112908304A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN108962282B (en) Voice detection analysis method and device, computer equipment and storage medium
US10777207B2 (en) Method and apparatus for verifying information
WO2022095380A1 (en) Ai-based virtual interaction model generation method and apparatus, computer device and storage medium
WO2020253128A1 (en) Voice recognition-based communication service method, apparatus, computer device, and storage medium
CN111243574B (en) Voice model adaptive training method, system, device and storage medium
CN111694938A (en) Emotion recognition-based answering method and device, computer equipment and storage medium
WO2022116487A1 (en) Voice processing method and apparatus based on generative adversarial network, device, and medium
CN115409518A (en) User transaction risk early warning method and device
US20110161084A1 (en) Apparatus, method and system for generating threshold for utterance verification
EP3501024A1 (en) Systems, apparatuses, and methods for speaker verification using artificial neural networks
CN112908304B (en) Method and device for improving voice recognition accuracy
CN114065720A (en) Conference summary generation method and device, storage medium and electronic equipment
CN110570877B (en) Sign language video generation method, electronic device and computer readable storage medium
CN111507218A (en) Matching method and device of voice and face image, storage medium and electronic equipment
CN110675865A (en) Method and apparatus for training hybrid language recognition models
CN113873088B (en) Interactive method and device for voice call, computer equipment and storage medium
CN112786041B (en) Voice processing method and related equipment
CN115083426A (en) High-fidelity voice desensitization method and device based on antagonistic sample generation
US8051026B2 (en) Rules collector system and method with user interaction
CN116010563A (en) Multi-round dialogue data analysis method, electronic equipment and storage medium
CN112908314B (en) Intelligent voice interaction method and device based on tone recognition
CN113782022B (en) Communication method, device, equipment and storage medium based on intention recognition model
CN115022395B (en) Service video pushing method and device, electronic equipment and storage medium
KR102241436B1 (en) Learning method and testing method for figuring out and classifying musical instrument used in certain audio, and learning device and testing device using the same
CN117234455B (en) Intelligent control method and system for audio device based on environment perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant