CN112151022A - Speech recognition optimization method, device, equipment and storage medium - Google Patents

Speech recognition optimization method, device, equipment and storage medium Download PDF

Info

Publication number
CN112151022A
CN112151022A CN202011026486.8A CN202011026486A CN112151022A CN 112151022 A CN112151022 A CN 112151022A CN 202011026486 A CN202011026486 A CN 202011026486A CN 112151022 A CN112151022 A CN 112151022A
Authority
CN
China
Prior art keywords
result
score
recognition result
confidence
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011026486.8A
Other languages
Chinese (zh)
Inventor
臧松
张郁枚
胡津
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011026486.8A priority Critical patent/CN112151022A/en
Publication of CN112151022A publication Critical patent/CN112151022A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Abstract

The application discloses a voice recognition optimization method, a voice recognition optimization device, voice recognition optimization equipment and a storage medium, and relates to the fields of voice technology, intelligent transportation and the like. The specific implementation scheme is as follows: identifying the received voice information to obtain a plurality of candidate identification results; determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined; performing semantic analysis on the target identification result to obtain an analysis result; and determining the confidence degree of the analysis result, and updating the score of the target recognition result according to the confidence degree. The method can solve the problem that in the prior art, once an error voice recognition result occurs, a correct instruction cannot be analyzed. And the voice recognition process and the semantic analysis process are taken as a whole, and the semantic analysis result is utilized to optimize the voice recognition. So that the accuracy of speech recognition can be improved.

Description

Speech recognition optimization method, device, equipment and storage medium
Technical Field
The application relates to the field of data processing, in particular to the fields of voice technology, intelligent transportation and the like.
Background
At present, when dialect recognition is carried out by voice recognition, the situation that harmonic words and homophones are recognized according to pronunciation can occur. In the case of inaccurate recognition results, semantic analysis cannot resolve the true intention.
When the situation occurs in the vehicle-mounted interaction process of automatic driving, once semantic analysis is wrong, potential safety hazards can be caused.
Disclosure of Invention
The application provides a voice recognition optimization method, a voice recognition optimization device, voice recognition optimization equipment and a storage medium.
According to an aspect of the present application, there is provided a method for optimizing speech recognition, which may include the steps of:
identifying the received voice information to obtain a plurality of candidate identification results;
determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
performing semantic analysis on the target identification result to obtain an analysis result;
and determining the confidence degree of the analysis result, and updating the score of the target recognition result according to the confidence degree.
According to another aspect of the present application, there is provided an apparatus for optimizing speech recognition, which may include the following components:
the voice information identification module is used for identifying the received voice information to obtain a plurality of candidate identification results;
the target recognition result determining module is used for determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
the semantic analysis module is used for carrying out semantic analysis on the target identification result to obtain an analysis result;
and the score updating module is used for determining the confidence degree of the analysis result and updating the score of the target recognition result according to the confidence degree.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method provided by any one of the embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present application.
According to the technology of the application, the problem that in the prior art, once an error voice recognition result occurs, a correct instruction cannot be analyzed can be solved. The speech recognition process and the semantic analysis process are taken as a whole, and the speech recognition is optimized by utilizing the semantic analysis result, so that the accuracy of the speech recognition can be improved. In the subsequent actual speech recognition process, the candidate recognition result with the highest score can be preferentially selected as the target recognition result, so that the misrecognition rate of the homophones can be reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow chart of a method of optimizing speech recognition according to the present application;
FIG. 2 is a flow chart for determining confidence in a resolution result according to the present application;
FIG. 3 is a flow chart of a method of optimizing speech recognition according to the present application;
FIG. 4 is a flow chart of a method of optimizing speech recognition according to the present application;
FIG. 5 is a schematic diagram of an apparatus for optimizing speech recognition according to the present application;
fig. 6 is a block diagram of an electronic device for implementing the speech recognition optimization method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, the present application provides a method for optimizing speech recognition, which may include the following steps:
s101: identifying the received voice information to obtain a plurality of candidate identification results;
s102: determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
s103: performing semantic analysis on the target identification result to obtain an analysis result;
s104: and determining the confidence degree of the analysis result, and updating the score of the target recognition result according to the confidence degree.
In the embodiment of the application, the whole process can comprise two parts of speech recognition and semantic parsing. Wherein, the voice recognition part can obtain a voice recognition result according to the voice information of the user. The semantic analysis part can analyze the voice recognition result to obtain the intention of the user.
The execution main body of the whole process can be a vehicle machine of a (semi-) automatic driving vehicle, or can be a cloud server and the like which are communicated with the vehicle machine. The following example is described by taking an execution subject as a car machine. For example, the in-vehicle device may recognize a control command of a driver or a passenger for a vehicle or a remote control command of a device such as an intelligent home appliance through the vehicle, and perform corresponding control according to the recognition result.
For example, a driver says "increase primary ride temperature" to the car machine during driving. The recognition result of the car machine to the speech may be the following word segmentation: the expressions of "improving", "driving", "home", "pig price", "temperature", "stability" and the like. The high frequency ordering of the above-mentioned participles can be obtained according to a pre-trained model. For example, a plurality of candidate recognition results such as "increase the main driving temperature", "increase the home temperature", and the like can be obtained.
In a plurality of candidate recognition results, errors often occur due to recognition of a word slot corresponding to a keyword, so that a correct voice analysis result cannot be obtained finally. The word slot corresponding to the keyword may be an instruction object, or the instruction itself.
Therefore, in the present embodiment, one target recognition result may be determined from a plurality of candidate recognition results according to the score of each candidate recognition result. The score of the candidate recognition result may be determined through multiple historical interactions with the driver or the vehicle occupant. That is, the score is already determined in advance.
The target recognition result may be analyzed using a semantic analysis model or the like, thereby obtaining the intention of the driver.
For example, the determined target recognition result is "increase the primary driving temperature". From the intention recognition, the intention of the driver can be resolved as adjusting the temperature. Further, the driver's desire to adjust the temperature of the "primary driving" position may be determined based on the word slot corresponding to the adjustment target. The controllable object is traversed to confirm whether the temperature control object is the main driving position or not. In the case where the temperature control target is present, it can be confirmed that "raising the main driving temperature" is a feasible execution instruction. Based on the feasible execution instruction, a higher confidence level can be given to the recognition result of 'increasing the main driving temperature'. For example, a higher confidence may be a full score.
For another example, the target recognition result is "increase home temperature". From the intention recognition, the intention of the driver can be resolved as adjusting the temperature. Further, it can be determined that the driver's desire is to adjust the temperature of the "home" according to the word slot corresponding to the adjustment object. Traversal is performed in the controllable object to confirm whether there is a temperature controlled object as a home. When the control object does not exist, it can be confirmed that "raising the home temperature" is an infeasible execution instruction. Based on the infeasible execution instruction, a lower confidence may be given to the "home temperature up" recognition result. For example, the lower confidence may be a score of zero.
According to the confidence degree of the analysis result, the target identification result of the analysis result can be scored. The score value may be directly used as the score value of the target recognition result. For example, in the case where the confidence is full score, the score of the target recognition result of "increase the primary driving temperature" may be full score. In the case where the confidence is a zero score, the score of the target recognition result being "raise home temperature" may be a zero score.
Or, the confidence obtained this time and the historical confidence of the candidate recognition result may be comprehensively calculated, and the calculation result is used as the score of the candidate recognition result. The manner of the comprehensive calculation may include weighted summation, variance calculation, and the like.
By the scheme, the problem that in the prior art, once an error voice recognition result occurs, a correct instruction cannot be analyzed can be solved. And the voice recognition process and the semantic analysis process are taken as a whole, and the semantic analysis result is utilized to optimize the voice recognition. So that the accuracy of speech recognition can be improved. In the subsequent actual speech recognition process, the candidate recognition result with the highest score can be preferentially selected as the target recognition result, so that the misrecognition of the homophones can be reduced.
In one embodiment, the determining the confidence level of the analysis result in step S104 may include determining the confidence level according to the feedback of the user to the analysis result, and may also include the following steps as shown in fig. 2:
s1041: determining a vertical class corresponding to the analysis result;
s1042: obtaining a sample library corresponding to the verticals;
s1043: and determining the confidence degree according to the matching degree of the analysis result and the content in the sample library.
In the current embodiment, the confidence level is determined according to the feedback of the user to the analysis result, and the logical relationship between the step S1041 and the step S1043 may be and/or.
The vertical type corresponding to the analysis result can comprise calling, vehicle control, weather query, intelligent home remote control and the like. The vertical class to which the analysis result belongs can be determined according to the intention of the analysis result and the like.
For example, in the context of voice interaction, multiple rounds of voice interaction may be: the driver's voice is "i want to make a call". The car machine feeds back voice information according to the analysis result as 'who asks for a call'? The driver answers "xiaowang". In this case, the candidate recognition result may include "king", "smiling", "wang sales", and the like. Illustratively, the "king" may be a name of a person, the "laugh king", "marketing king", etc. may be brands of intelligent appliances (e.g., air conditioners), and the above examples are for illustration only and do not refer to real brands.
For example, the score of "queen" is highest among the plurality of candidate recognition results. The "king" is determined as the target recognition result.
And under the condition that the vertical type is determined to be a call according to the analysis result, determining the address book as a sample library, and further inquiring a call object matched with the King in the address book.
Under the condition that the vertical type determined according to the analysis result is the intelligent home remote control, the brand or name of the intelligent home can be determined to be a sample library, and then a control object matched with the King is inquired in the brand or name of the intelligent home.
And determining the confidence of the analysis result according to the matching degree of the analysis result and the content in the sample library corresponding to the vertical class. For example, when the vertical category determined according to the analysis result is a call, the target recognition result "queen" may be successfully matched with the name in the address book, so the confidence of the target recognition result "queen" in the vertical category corresponding to the call may be higher.
For another example, when the vertical type determined according to the analysis result is the smart home remote control, the target identification result "queen" cannot be successfully matched with the smart home product directory, and the confidence of the target identification result "queen" in the vertical type corresponding to the smart home remote control is low.
In addition, the confidence of the analysis result can also be determined according to the manual intervention of the user (driver).
The vehicle machine finds the King according to the address list and dials the telephone of the King. Namely, the confidence of the vehicle machine obtaining the analysis result is high. Meanwhile, when the driver does not intervene in the command corresponding to the analysis result, the driver (user) can be defaulted to recognize the analysis result, and it can be determined that the confidence determined by the feedback of the user (driver) to the analysis result is high.
Therefore, in the case where the vertical category is a call, the "queen" is used as the candidate recognition result with the highest score of the pronunciation "xiaowang".
The second plurality of voice interactions may be: the driver's voice is "turn on the air conditioner". The car machine feeds back voice information as' asking which air conditioner to turn on? "driver answer" xiaowang ". Where "xiaowang" may be the pronunciation of a brand of an air conditioner, illustratively, the brand may be "laugh king" or "sell king" or the like. The brand is similar to the pronunciation of the "king". In the case that the "king" is determined as the target recognition result, a correct instruction cannot be obtained due to the fact that the "king" cannot be successfully matched with the name book of the smart home product ("laugh king" or "shang wang"). Situations may therefore occur in which the driver intervenes manually. For example, the driver corrects the analysis result to correct "king" to "laugh king" or "king of sales".
Based on this, in the subsequent process of performing voice recognition, when the intention of the driver is recognized as making a call, the confidence of "queen" is set to be the highest. When it is recognized that the driver's intention is to control the air conditioner, the confidence of "laugh king" ("pinking") is set to the highest.
By the method, the confidence of the analysis result can be determined by combining multiple dimensions such as a sample library corresponding to the vertical class of the analysis result and/or feedback of a user to the analysis result. Thus, the score determination of different candidate recognition results can be satisfied, so that the target recognition result can be more accurately hit when being selected.
Referring to fig. 3, in an embodiment, in a case that the confidence of the analysis result is determined in multiple ways, the method of the present application may further include the following steps:
s301: respectively distributing weight to each mode;
s302: and obtaining the confidence coefficient of the analysis result according to the weight of each mode and the confidence coefficient determined by each mode.
As shown in the foregoing example, the confidence level of the parsing result may be determined by the performability of the parsing result. In addition, the determination may be made by feedback of the analysis result from the user (driver).
Weights may be assigned to different ways. For example, a first weight may be assigned to the user's feedback on the parsing result, and a second weight may be assigned to the performability of the parsing result. Wherein the first weight may be greater than the second weight.
Further, the confidence of the analysis result is obtained according to the weight of each mode and the confidence determined by each mode. For example, in the case where the target recognition result is executable, it can be confirmed that the performability of the analysis result is high. Based on this, the first confidence of the performability determination with the parsing result is higher.
And under the condition that the user does not intervene in the analysis result, the analysis result is approved by the user. It can therefore be determined that the second confidence, determined using the user feedback on the resolution result, is higher. Conversely, in a case where the user intervenes in the analysis result, it may be determined that the second confidence is lower. The second confidence level may be weighted higher. And combining the first confidence coefficient and the weight thereof, the second confidence coefficient and the degree weight thereof to obtain the final confidence coefficient of the analysis result.
By the method, the confidence coefficient can be determined in a weight distribution mode, and the objective and accurate confidence coefficient is met.
In one embodiment, step S102 may further include:
and under the condition that the score of each candidate recognition result is lower than the threshold value, randomly selecting one candidate recognition result as the target recognition result.
The threshold value may be set in advance. In the case where the scores of the respective candidate recognition results are lower than the threshold value, one of the candidate recognition results may be selected as the target recognition result.
By the scheme, more candidate recognition results can participate in score evaluation in a random mode, so that the target recognition result meeting the intention of the voice information can be determined more quickly and comprehensively.
In one embodiment, the present application may further comprise the steps of:
and in the case that the candidate recognition result of which the score is not obtained exists, taking the candidate recognition result of which the score is not obtained as the target recognition result.
Candidate recognition results for which scores are not obtained are typically emerging words, such as emerging words or user-defined words. Under the condition that the vocabularies appear, the score evaluation can be preferentially carried out, so that the target recognition result meeting the intention of the voice information can be determined more quickly and comprehensively.
In one embodiment, as shown in fig. 4, the present application provides a method for optimizing speech recognition, which may include the following steps:
s401: and receiving voice information.
S402: and determining candidate identification results.
The candidate recognition results are obtained based on the speech information, and there may be a plurality of candidate recognition results.
S403: and judging whether the full score candidate identification result exists or not. If so, executing step S407; otherwise, if not present, step S404 is executed.
The scores of the candidate recognition results may include three types, a full score, a zero score, and an untritten score. Wherein the full score and the zero score may be determined based on historical data in speech recognition. An unrated score may be a newly emerging candidate recognition result.
S404: judging whether an unscored candidate recognition result exists or not, and executing the step S405 under the condition that the unscored candidate recognition result exists; otherwise, if not present, step S406 is executed.
S405: the candidate recognition results that are not scored are selected and the process continues to step 407.
S406: a candidate recognition result is randomly selected.
S407: and determining an analysis result.
And carrying out semantic analysis on the selected candidate recognition result to obtain an analysis result.
S408: and updating the score of the candidate recognition result according to the confidence degree of the analysis result.
The analysis result can be determined by whether the analysis result is executable or not, or can be determined by feedback of a user. According to the analysis result, the score of the candidate recognition result can be updated.
As shown in fig. 5, the present application provides an apparatus for optimizing speech recognition, which includes the following components:
a voice information recognition module 501, configured to recognize the received voice information to obtain multiple candidate recognition results;
a target recognition result determining module 502, configured to determine a target recognition result from the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
a semantic analysis module 503, configured to perform semantic analysis on the target identification result to obtain an analysis result;
and the score updating module 504 is configured to determine a confidence level of the analysis result, and update the score of the target recognition result according to the confidence level.
In one embodiment, the score update module 504 may further include the following components:
a first confidence coefficient determining submodule, configured to determine a vertical class corresponding to the analysis result, obtain a sample library corresponding to the vertical class, determine a confidence coefficient according to a matching degree between the analysis result and content in the sample library, and/or,
and the second confidence coefficient determining submodule is used for determining the confidence coefficient according to the feedback of the user to the analysis result.
In one embodiment, in the case that the confidence of the parsing result is determined in a plurality of ways, the apparatus for optimizing speech recognition may further include:
the weight distribution module is used for distributing weight to each mode;
and the confidence coefficient determining module is used for obtaining the confidence coefficient of the analysis result according to the weight of each mode and the confidence coefficient determined by each mode.
In an embodiment, the target recognition result determining module 502 may be specifically configured to: and under the condition that the score of each candidate recognition result is lower than the threshold value, randomly selecting one candidate recognition result as the target recognition result.
In one embodiment, the target recognition result determining module 502 may be further configured to: and in the case that the candidate recognition result of which the score is not obtained exists, taking the candidate recognition result of which the score is not obtained as the target recognition result.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 610, memory 620, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). One processor 610 is illustrated in fig. 6.
Memory 620 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of optimizing speech recognition provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the optimization method of speech recognition provided herein.
The memory 620, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the optimization method of speech recognition in the embodiment of the present application (for example, the speech information recognition module 501, the target recognition result determination module 502, the semantic parsing module 503, and the score value updating module 504 shown in fig. 5). The processor 610 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 620, that is, implements the optimization method of speech recognition in the above-described method embodiments.
The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the optimization method of voice recognition, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 620 optionally includes memory located remotely from the processor 610, and these remote memories may be connected over a network to the electronic device of the method of optimizing speech recognition. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for optimizing speech recognition may further include: an input device 630 and an output device 640. The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6.
The input device 630 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for the method of optimizing speech recognition, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 640 may include a display device, an auxiliary lighting device (e.g., an LED), a haptic feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method of optimizing speech recognition, comprising:
identifying the received voice information to obtain a plurality of candidate identification results;
determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
performing semantic analysis on the target identification result to obtain an analysis result;
and determining the confidence of the analysis result, and updating the score of the target recognition result according to the confidence.
2. The method of claim 1, wherein the determining the confidence level of the resolved result comprises:
determining a vertical class corresponding to the analysis result, acquiring a sample library corresponding to the vertical class, and determining the confidence according to the matching degree of the analysis result and the content in the sample library; and/or the presence of a gas in the gas,
and determining the confidence level according to the feedback of the user to the analysis result.
3. The method according to claim 1 or 2, wherein in case the confidence of the resolved result is determined by at least two ways, further comprising:
respectively distributing weight to each mode;
and obtaining the confidence coefficient of the analysis result according to the weight of each mode and the confidence coefficient determined by each mode.
4. The method according to claim 1, wherein the determining a target recognition result among the candidate recognition results according to the score of each candidate recognition result comprises:
and under the condition that the score of each candidate recognition result is lower than a threshold value, randomly selecting one candidate recognition result as the target recognition result.
5. The method of claim 1 or 4, further comprising:
and in the case that the candidate recognition result of which the score is not obtained exists, taking the candidate recognition result of which the score is not obtained as the target recognition result.
6. An apparatus for optimizing speech recognition, comprising:
the voice information identification module is used for identifying the received voice information to obtain a plurality of candidate identification results;
the target recognition result determining module is used for determining a target recognition result in the candidate recognition results according to the score of each candidate recognition result; wherein the score is predetermined;
the semantic analysis module is used for carrying out semantic analysis on the target identification result to obtain an analysis result;
and the score updating module is used for determining the confidence degree of the analysis result and updating the score of the target recognition result according to the confidence degree.
7. The apparatus of claim 6, wherein the score update module comprises:
a first confidence level determining submodule, configured to determine a vertical class corresponding to the analysis result, obtain a sample library corresponding to the vertical class, determine the confidence level according to a matching degree between the analysis result and content in the sample library, and/or,
and the second confidence coefficient determining submodule is used for determining the confidence coefficient according to the feedback of the user to the analysis result.
8. The apparatus according to claim 6 or 7, wherein, in case the confidence of the resolved result is determined by at least two ways, further comprising:
the weight distribution module is used for distributing weight to each mode;
and the confidence coefficient determining module is used for obtaining the confidence coefficient of the analysis result according to the weight of each mode and the confidence coefficient determined by each mode.
9. The apparatus of claim 6, wherein the target recognition result determination module is specifically configured to: and under the condition that the score of each candidate recognition result is lower than the threshold value, randomly selecting one candidate recognition result as the target recognition result.
10. The apparatus of claim 6 or 9, the target recognition result determination module further to:
and in the case that the candidate recognition result of which the score is not obtained exists, taking the candidate recognition result of which the score is not obtained as the target recognition result.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.
12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 5.
CN202011026486.8A 2020-09-25 2020-09-25 Speech recognition optimization method, device, equipment and storage medium Pending CN112151022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011026486.8A CN112151022A (en) 2020-09-25 2020-09-25 Speech recognition optimization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011026486.8A CN112151022A (en) 2020-09-25 2020-09-25 Speech recognition optimization method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112151022A true CN112151022A (en) 2020-12-29

Family

ID=73897476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011026486.8A Pending CN112151022A (en) 2020-09-25 2020-09-25 Speech recognition optimization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112151022A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157877A (en) * 2021-03-19 2021-07-23 北京百度网讯科技有限公司 Multi-semantic recognition method, device, equipment and medium
CN113223500A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method, method for training speech recognition model and corresponding device
CN113327591A (en) * 2021-06-16 2021-08-31 北京有竹居网络技术有限公司 Voice processing method and device
CN113793604A (en) * 2021-09-14 2021-12-14 思必驰科技股份有限公司 Speech recognition system optimization method and device
CN115906874A (en) * 2023-03-08 2023-04-04 小米汽车科技有限公司 Semantic parsing method, system, electronic device and storage medium
CN115964498A (en) * 2023-03-08 2023-04-14 小米汽车科技有限公司 Vehicle-mounted semantic analysis model generation method and device, electronic equipment and storage medium
CN116610267A (en) * 2023-07-20 2023-08-18 联想凌拓科技有限公司 Storage management method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014077882A (en) * 2012-10-10 2014-05-01 Nippon Hoso Kyokai <Nhk> Speech recognition device, error correction model learning method and program
US20180330723A1 (en) * 2017-05-12 2018-11-15 Apple Inc. Low-latency intelligent automated assistant
CN109271498A (en) * 2018-09-14 2019-01-25 南京七奇智能科技有限公司 The natural language interaction method and system of Virtual robot
CN110111775A (en) * 2019-05-17 2019-08-09 腾讯科技(深圳)有限公司 A kind of Streaming voice recognition methods, device, equipment and storage medium
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
WO2019163242A1 (en) * 2018-02-20 2019-08-29 ソニー株式会社 Information processing device, information processing system, information processing method, and program
CN110288985A (en) * 2019-06-28 2019-09-27 北京猎户星空科技有限公司 Voice data processing method, device, electronic equipment and storage medium
CN110473531A (en) * 2019-09-05 2019-11-19 腾讯科技(深圳)有限公司 Audio recognition method, device, electronic equipment, system and storage medium
US20190385599A1 (en) * 2017-06-29 2019-12-19 Tencent Technology (Shenzhen) Company Limited Speech recognition method and apparatus, and storage medium
CN110637339A (en) * 2017-05-15 2019-12-31 苹果公司 Optimizing dialog policy decisions for a digital assistant using implicit feedback
CN110770819A (en) * 2017-06-15 2020-02-07 北京嘀嘀无限科技发展有限公司 Speech recognition system and method
TW202015034A (en) * 2018-10-12 2020-04-16 廣達電腦股份有限公司 Speech correction system and speech correction method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014077882A (en) * 2012-10-10 2014-05-01 Nippon Hoso Kyokai <Nhk> Speech recognition device, error correction model learning method and program
US20180330723A1 (en) * 2017-05-12 2018-11-15 Apple Inc. Low-latency intelligent automated assistant
CN110637339A (en) * 2017-05-15 2019-12-31 苹果公司 Optimizing dialog policy decisions for a digital assistant using implicit feedback
CN110770819A (en) * 2017-06-15 2020-02-07 北京嘀嘀无限科技发展有限公司 Speech recognition system and method
US20190385599A1 (en) * 2017-06-29 2019-12-19 Tencent Technology (Shenzhen) Company Limited Speech recognition method and apparatus, and storage medium
WO2019163242A1 (en) * 2018-02-20 2019-08-29 ソニー株式会社 Information processing device, information processing system, information processing method, and program
CN109271498A (en) * 2018-09-14 2019-01-25 南京七奇智能科技有限公司 The natural language interaction method and system of Virtual robot
TW202015034A (en) * 2018-10-12 2020-04-16 廣達電腦股份有限公司 Speech correction system and speech correction method
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110111775A (en) * 2019-05-17 2019-08-09 腾讯科技(深圳)有限公司 A kind of Streaming voice recognition methods, device, equipment and storage medium
CN110288985A (en) * 2019-06-28 2019-09-27 北京猎户星空科技有限公司 Voice data processing method, device, electronic equipment and storage medium
CN110473531A (en) * 2019-09-05 2019-11-19 腾讯科技(深圳)有限公司 Audio recognition method, device, electronic equipment, system and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157877A (en) * 2021-03-19 2021-07-23 北京百度网讯科技有限公司 Multi-semantic recognition method, device, equipment and medium
CN113223500A (en) * 2021-04-12 2021-08-06 北京百度网讯科技有限公司 Speech recognition method, method for training speech recognition model and corresponding device
CN113223500B (en) * 2021-04-12 2022-02-25 北京百度网讯科技有限公司 Speech recognition method, method for training speech recognition model and corresponding device
CN113327591A (en) * 2021-06-16 2021-08-31 北京有竹居网络技术有限公司 Voice processing method and device
CN113327591B (en) * 2021-06-16 2023-01-17 北京有竹居网络技术有限公司 Voice processing method and device
CN113793604A (en) * 2021-09-14 2021-12-14 思必驰科技股份有限公司 Speech recognition system optimization method and device
CN113793604B (en) * 2021-09-14 2024-01-05 思必驰科技股份有限公司 Speech recognition system optimization method and device
CN115906874A (en) * 2023-03-08 2023-04-04 小米汽车科技有限公司 Semantic parsing method, system, electronic device and storage medium
CN115964498A (en) * 2023-03-08 2023-04-14 小米汽车科技有限公司 Vehicle-mounted semantic analysis model generation method and device, electronic equipment and storage medium
CN116610267A (en) * 2023-07-20 2023-08-18 联想凌拓科技有限公司 Storage management method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112151022A (en) Speech recognition optimization method, device, equipment and storage medium
US20210097861A1 (en) Parking lot recommendation method and navigation server
JP7271497B2 (en) Speech recognition method, device, equipment and readable storage medium
AU2017408797B2 (en) Data source-based service customizing device, method and system, and storage medium
CN111639168A (en) Multi-turn conversation processing method and device, electronic equipment and storage medium
EP3896690B1 (en) Voice interaction method and apparatus, device and computer storage medium
JP2021056991A (en) Recommendation method, device, electronic device, storage medium, and program
US10552426B2 (en) Adaptive conversational disambiguation system
CN109063116B (en) Data identification method and device, electronic equipment and computer readable storage medium
US20210104235A1 (en) Arbitration of Natural Language Understanding Applications
CN112509585A (en) Voice processing method, device and equipment of vehicle-mounted equipment and storage medium
CN110765342A (en) Information query method and device, storage medium and intelligent terminal
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN112466289A (en) Voice instruction recognition method and device, voice equipment and storage medium
CN111666495A (en) Case recommendation method, device, equipment and storage medium
CN112509566A (en) Voice recognition method, device, equipment, storage medium and program product
CN112966081A (en) Method, device, equipment and storage medium for processing question and answer information
CN114417194A (en) Recommendation system sorting method, parameter prediction model training method and device
KR20210134251A (en) Method and device for updating user image recognition model
KR20200063886A (en) Natural language processing based call center support system and method
CN112597288A (en) Man-machine interaction method, device, equipment and storage medium
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
US20210389156A1 (en) Map rendering method and apparatus, device, and storage medium
CN113449754B (en) Label matching model training and displaying method, device, equipment and medium
CN112614479B (en) Training data processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211012

Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right