CN115662434A

CN115662434A - Vehicle voice recognition method and device and electronic equipment

Info

Publication number: CN115662434A
Application number: CN202211292757.3A
Authority: CN
Inventors: 葛凇志; 高洪伟; 吕贵林; 陈涛; 姜大力; 韩爽; 杨杰; 王烁皓
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-01-31

Abstract

The invention discloses a vehicle voice recognition method and device and electronic equipment. Wherein, the method comprises the following steps: acquiring a voice to be recognized; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient. The invention solves the technical problems of low recognition accuracy and poor user experience caused by only adopting vehicle question answering service or knowledge graph mode to carry out voice question answering recognition in the related technology.

Description

Vehicle voice recognition method and device and electronic equipment

Technical Field

The invention relates to the technical field of voice recognition, in particular to a vehicle voice recognition method and device and electronic equipment.

Background

At present, the voice question-answering service in the intelligent internet vehicle is mainly a vehicle question-answering service based on common question answers (FAQ) or a vehicle question-answering service based on a knowledge graph. The former is to store the historical Question-Answer records into the database in the form of Question-Answer pairs, when the user requests, the generalized search capability is provided, however, the precision rate of the Question-Answer is not high due to the number limitation of the Question-Answer pairs (Question Answer), and the Question-Answer capability is limited; the latter is to store the entities, attributes and relations related to the vehicle question and answer in a knowledge map form into a database, and provide accurate searching capability when requested by a user, but the retrieval effect on cold start scenes and more personalized questions is not good. The two modes have certain disadvantages, so that the accuracy of voice question-answer recognition is low, and the user experience is poor.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a vehicle voice recognition method, a vehicle voice recognition device and electronic equipment, which are used for at least solving the technical problems of low recognition accuracy and poor user experience caused by the fact that voice question and answer recognition is only carried out in a vehicle question and answer service or knowledge graph mode in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a vehicle voice recognition method including: acquiring a voice to be recognized; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

Optionally, the determining a final recognition result of the speech to be recognized according to the first confidence level and the second confidence level includes: determining a confidence difference between the first text recognition result and the second text recognition result based on the first confidence and the second confidence; and determining the final recognition result according to the first text recognition result, the first confidence coefficient, the second text recognition result, the second confidence coefficient and the confidence coefficient difference.

Optionally, the determining the final recognition result according to the first text recognition result, the first confidence level, the second text recognition result, the second confidence level, and the confidence level difference includes: judging whether the first confidence coefficient or the second confidence coefficient is larger than a preset confidence coefficient extreme value threshold, and judging whether the confidence coefficient difference is smaller than a preset confidence coefficient difference threshold; and if the first confidence degree or the second confidence degree is greater than the confidence degree extreme value threshold and the confidence degree difference is less than the confidence degree difference threshold, taking knowledge map information corresponding to the second text recognition result in the knowledge map database as the final recognition result.

Optionally, the method further includes: if the first confidence level or the second confidence level is greater than the confidence level extreme threshold, the first confidence level is greater than the second confidence level, and the confidence level difference is greater than or equal to the confidence level difference threshold, using the first question-answer information corresponding to the first text recognition result in the vehicle question-answer database as the final recognition result; and if the first confidence level or the second confidence level is greater than the confidence level extremum threshold, the first confidence level is less than or equal to the second confidence level, and the confidence level difference is greater than or equal to the confidence level difference threshold, using the knowledge-graph information corresponding to the second text recognition result as the final recognition result.

Optionally, the determining, based on a pre-constructed vehicle question-and-answer database, a first text recognition result corresponding to the initial text recognition result and a first confidence of the first text recognition result includes: acquiring a plurality of groups of first question-answer information included in the vehicle question-answer database, wherein the plurality of groups of first question-answer information include a plurality of first question sentences and a plurality of corresponding first answer sentences; determining a first semantic similarity between the plurality of first question sentences and the initial text recognition result; determining a first number of candidate text recognition results from the plurality of first question sentences by using a rough model based on the first semantic similarity corresponding to each of the plurality of first question sentences; determining the first text recognition result from the first number of candidate text recognition results by using a fine-line model based on the first semantic similarity corresponding to the first number of candidate text recognition results, and determining the first confidence between the first text recognition result and the initial text recognition result.

Optionally, the determining, based on a pre-constructed knowledge graph database, a second text recognition result corresponding to the initial text recognition result and a second confidence level of the second text recognition result includes: acquiring a plurality of groups of second question-answer information included in the knowledge graph database, wherein the plurality of groups of second question-answer information include a plurality of second question sentences and a plurality of corresponding second answer sentences; respectively determining second semantic similarity between the plurality of second question sentences and the initial text recognition result; determining a third text recognition result and a target text type corresponding to the third text recognition result from the plurality of second question sentences based on the second semantic similarity, wherein the third text recognition result is the second question sentence with the largest second semantic similarity in the plurality of second question sentences; judging whether a target semantic understanding model corresponding to the target text type exists in a plurality of different pre-trained semantic understanding models, wherein the different semantic understanding models correspond to different text types; and if the target semantic understanding model exists in a plurality of different semantic understanding models, recognizing the third text recognition result by using the target semantic understanding model to obtain the second text recognition result, and determining the second confidence coefficient between the second text recognition result and the initial text recognition result.

Optionally, the method further includes: if the target semantic understanding model does not exist in the plurality of different semantic understanding models, respectively identifying the third text recognition result by adopting the plurality of different semantic understanding models to obtain a plurality of fourth text recognition results; respectively determining third semantic similarity between the plurality of fourth text recognition results and the initial text recognition result; and determining the second text recognition result from the plurality of fourth text recognition results based on the third semantic similarity, and determining the second confidence level between the second text recognition result and the initial text recognition result.

According to another aspect of the embodiments of the present invention, there is also provided a vehicle voice recognition apparatus including: the first acquisition module is used for acquiring the voice to be recognized; the first recognition module is used for performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; the first determining module is used for determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; a second determining module, configured to determine, based on a pre-constructed knowledge map database, a second text recognition result corresponding to the initial text recognition result and a second confidence of the second text recognition result; and a third determining module, configured to determine a final recognition result of the to-be-recognized speech according to the first text recognition result and the first confidence level, and the second text recognition result and the second confidence level.

According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform any one of the above-described vehicle voice recognition methods.

According to another aspect of embodiments of the present invention, there is also provided an electronic device, including one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement any one of the vehicle voice recognition methods described above.

In the embodiment of the invention, the voice to be recognized is acquired; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; according to the first text recognition result, the first confidence level, the second text recognition result and the second confidence level, the final recognition result of the voice to be recognized is determined, the purpose of integrating a vehicle question and answer database and a knowledge map database and determining the final voice recognition result according to the confidence level comparison result is achieved, the vehicle voice question and answer recognition accuracy is improved, the technical effect of user experience is improved, and the technical problems that in the related technology, voice question and answer recognition is carried out only in a vehicle question and answer service or knowledge map mode, the recognition accuracy is low and the user experience is poor are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a vehicle speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of an alternative vehicle speech recognition method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a vehicle voice recognition device according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for vehicle speech recognition, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than presented herein.

Fig. 1 is a flowchart of a vehicle voice recognition method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

and step S102, acquiring the voice to be recognized.

Optionally, the speech to be recognized is a speech uttered by the user after the speech question-answer service of the vehicle is started, where the speech to be recognized corresponds to at least one piece of question information to be answered.

And step S104, performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized.

Optionally, but not limited to, performing text recognition on the speech to be recognized by using a pre-trained language recognition model to obtain an initial text recognition result corresponding to the speech to be recognized. I.e. converting the question posed by the user's voice into text form.

Optionally, the initial text recognition result is further distributed to a first vehicle question-and-answer service based on the FAQ and a second vehicle question-and-answer service based on the knowledge graph at the same time, a vehicle question-and-answer database corresponding to the first vehicle question-and-answer service and a knowledge graph database corresponding to the second vehicle question-and-answer service are used for joint recognition search, and a final recognition result with higher accuracy is finally output according to comprehensive comparison of confidence degrees of the two recognition results.

Step S106, determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database.

In an optional embodiment, the determining, based on a pre-constructed vehicle question and answer database, a first text recognition result corresponding to the initial text recognition result and a first confidence of the first text recognition result includes: acquiring a plurality of groups of first question-answer information included in the vehicle question-answer database, wherein the plurality of groups of first question-answer information include a plurality of first question sentences and a plurality of corresponding first answer sentences; determining a first semantic similarity between the plurality of first question sentences and the initial text recognition result; determining a first number of candidate text recognition results from the plurality of first question sentences by using a rough model based on the first semantic similarity corresponding to each of the plurality of first question sentences; determining the first text recognition result from the first number of candidate text recognition results by using a fine-line model based on the first semantic similarity corresponding to the first number of candidate text recognition results, and determining the first confidence between the first text recognition result and the initial text recognition result.

Optionally, the vehicle question-answer database is a question-answer database corresponding to the FAQ-based first vehicle question-answer service, and multiple sets of first question-answer information (i.e., multiple first question sentences and corresponding multiple first answer sentences) are stored in the vehicle question-answer database in advance; respectively calculating first semantic similarity between a plurality of first question sentences and the initial text recognition result; performing rough-typesetting recall by adopting a rough-typesetting model based on first semantic similarity corresponding to the first question sentences respectively to obtain a first number of candidate text recognition results of which the semantic similarity with the initial text recognition result is greater than a preset similarity in the first question sentences; and further performing a detailed recall on the first number of candidate text recognition results by adopting a detailed model, determining a candidate text recognition result with the highest semantic similarity with the initial text recognition result from the first number of candidate text recognition results as a first text recognition result, and further calculating a first confidence coefficient between the first text recognition result and the initial text recognition result.

Step S108, determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database.

In an optional embodiment, the determining, based on a pre-constructed knowledge graph database, a second text recognition result corresponding to the initial text recognition result and a second confidence of the second text recognition result includes: acquiring a plurality of groups of second question-answer information included in the knowledge graph database, wherein the plurality of groups of second question-answer information include a plurality of second question sentences and a plurality of corresponding second answer sentences; respectively determining second semantic similarity between the plurality of second question sentences and the initial text recognition result; determining a third text recognition result and a target text type corresponding to the third text recognition result from the plurality of second question sentences based on the second semantic similarity, wherein the third text recognition result is the second question sentence with the largest second semantic similarity in the plurality of second question sentences; judging whether a pre-trained plurality of different semantic understanding models have a target semantic understanding model corresponding to the target text type, wherein the plurality of different semantic understanding models correspond to different text types; and if the target semantic understanding model exists in a plurality of different semantic understanding models, recognizing the third text recognition result by using the target semantic understanding model to obtain the second text recognition result, and determining the second confidence coefficient between the second text recognition result and the initial text recognition result.

Optionally, the knowledge map database is a question-answer database corresponding to a second vehicle question-answer service based on a knowledge map, and multiple sets of second question-answer information (including multiple second question sentences and multiple corresponding second answer sentences) are pre-stored in the knowledge map database; respectively calculating second semantic similarity between the plurality of second question sentences and the initial text recognition result, determining a third text recognition result from the plurality of second question sentences based on the second semantic similarity corresponding to the plurality of second question sentences respectively, and determining a target text type corresponding to the third text recognition result; and further determining a target semantic understanding model corresponding to the target text type from the plurality of different semantic understanding models, identifying a third text recognition result by using the target semantic understanding model to obtain a second text recognition result, and determining a second confidence coefficient between the second text recognition result and the initial text recognition result. By the method, the second problem sentence (namely the third text recognition result) which is closest to the initial text recognition result in semantic meaning can be accurately screened from the indication map database, and the corresponding semantic understanding model can be further accurately obtained according to the text type of the third text recognition result, so that more standard semantic analysis can be obtained, the memory space occupied by model processing can be reduced, and the model processing efficiency can be improved.

Optionally, the plurality of different semantic understanding models may include, but are not limited to: a vehicle control semantic understanding model, a vehicle industry knowledge semantic understanding model, an entertainment application knowledge semantic understanding model, a chat semantic understanding model, and the like. In an optional embodiment, the method further includes: if the target semantic understanding model does not exist in the plurality of different semantic understanding models, respectively identifying the third text recognition result by adopting the plurality of different semantic understanding models to obtain a plurality of fourth text recognition results; respectively determining a third semantic similarity between the plurality of fourth text recognition results and the initial text recognition result; and determining the second text recognition result from the plurality of fourth text recognition results based on the third semantic similarity, and determining the second confidence level between the second text recognition result and the initial text recognition result. Through the above manner, if the target semantic understanding model corresponding to the text type of the third text recognition result cannot be determined in the plurality of different semantic understanding models, the third text recognition result is simultaneously input into the plurality of different semantic understanding models for semantic recognition, a plurality of fourth text recognition results output by the plurality of different semantic understanding models are obtained, third semantic similarities between the plurality of fourth text recognition results and the initial text recognition result are respectively determined, the fourth text recognition result with the largest third semantic similarity among the plurality of fourth text recognition results is taken as the second text recognition result, and the second confidence coefficient between the second text recognition result and the initial text recognition result is further determined. The method not only ensures the similarity with the initial text recognition result, but also ensures the accuracy of semantic analysis.

Step S110, determining a final recognition result of the speech to be recognized according to the first text recognition result and the first confidence level, and the second text recognition result and the second confidence level.

In an optional embodiment, the determining the final recognition result of the speech to be recognized according to the first confidence level and the second confidence level includes: determining a confidence difference between the first text recognition result and the second text recognition result based on the first confidence and the second confidence; and determining the final recognition result according to the first text recognition result, the first confidence coefficient, the second text recognition result, the second confidence coefficient and the confidence coefficient difference. Through the above manner, after determining the first text recognition result corresponding to the initial text recognition result obtained based on the vehicle question and answer database and the second text recognition result corresponding to the initial text recognition result obtained based on the knowledge map database, further comparison needs to be made on the text recognition results obtained in the above two manners, and the optimal text recognition result is used as the final recognition result, so as to improve the accuracy of the voice question and answer recognition result. Specifically, the final recognition result of the speech to be recognized can be determined by comparing the confidence degrees of the first text recognition result and the second text recognition result with the initial text recognition result, that is, by comparing the first confidence degree, the second confidence degree and the confidence degree difference.

Optionally, the output form of the final recognition result is a voice form or a text form, and the user may perform a user-defined setting on the output form specifically according to an application scenario.

In an optional embodiment, the determining the final recognition result according to the first text recognition result and the first confidence level, the second text recognition result and the second confidence level, and the confidence level difference includes: judging whether the first confidence coefficient or the second confidence coefficient is larger than a preset confidence coefficient extreme value threshold, and judging whether the confidence coefficient difference is smaller than a preset confidence coefficient difference threshold; and if the first confidence degree or the second confidence degree is greater than the confidence degree extreme value threshold and the confidence degree difference is less than the confidence degree difference threshold, taking knowledge map information corresponding to the second text recognition result in the knowledge map database as the final recognition result.

Optionally, the first confidence or the second confidence is greater than the confidence extremum threshold, and it can also be understood that the higher of the first confidence and the second confidence is greater than the confidence extremum threshold. Namely, the higher one of the first confidence degree and the second confidence degree is greater than the confidence extreme value threshold (e.g., 0.5), and the confidence difference between the first confidence degree and the second confidence degree is less than the confidence difference threshold (e.g., 0.2), the knowledge-graph information corresponding to the second text recognition result in the knowledge-graph database is taken as the final recognition result.

Optionally, if both the first confidence level and the second confidence level are smaller than the confidence level extremum threshold, no recognition result is returned.

In an optional embodiment, the method further includes: if the first confidence level or the second confidence level is greater than the confidence level extreme threshold, the first confidence level is greater than the second confidence level, and the confidence level difference is greater than or equal to the confidence level difference threshold, taking the first question-answer information corresponding to the first text recognition result in the vehicle question-answer database as the final recognition result; and if the first confidence degree or the second confidence degree is greater than the confidence degree extreme value threshold, the first confidence degree is less than or equal to the second confidence degree, and the confidence degree difference is greater than or equal to the confidence degree difference threshold, taking the knowledge-graph information corresponding to the second text recognition result as the final recognition result.

Optionally, if the higher one of the first confidence level and the second confidence level is greater than the confidence level extremum threshold (e.g., 0.5), and the confidence level difference between the first confidence level and the second confidence level is greater than or equal to the confidence level difference threshold (e.g., 0.2), the text recognition result corresponding to the greater one of the first confidence level and the second confidence level is used as the final recognition result. The method comprises the following specific steps: if the first confidence coefficient is greater than the confidence coefficient extreme value threshold, the first confidence coefficient is greater than the second confidence coefficient, and the confidence coefficient difference between the first confidence coefficient and the second confidence coefficient is greater than the confidence coefficient difference threshold, taking the first question-answer information corresponding to the first text recognition result in the vehicle question-answer database as a final recognition result; and if the second confidence coefficient is greater than the confidence coefficient extreme value threshold, the first confidence coefficient is less than or equal to the second confidence coefficient, and the confidence coefficient difference between the first confidence coefficient and the second confidence coefficient is greater than the confidence coefficient difference threshold, taking the knowledge-graph information corresponding to the second text recognition result as the final recognition result.

Through the steps S102 to S110, the purpose of integrating the vehicle question-answer database and the knowledge map database and determining the final voice recognition result according to the confidence degree comparison result can be achieved, so that the vehicle voice question-answer recognition accuracy rate is improved, the technical effect of user experience is improved, and the technical problems that in the related technology, only the vehicle question-answer service or the knowledge map mode is adopted for voice question-answer recognition, the recognition accuracy is low, and the user experience is poor are solved.

Based on the foregoing examples and alternative examples, the present invention provides an alternative implementation, and fig. 2 is a flowchart of an alternative vehicle speech recognition method according to an embodiment of the present invention, as shown in fig. 2, the method includes:

step S1, voice to be recognized sent by a user is collected through voice application, wherein the voice to be recognized comprises at least one piece of question information to be answered.

And S2, starting voice recognition service, and recognizing the voice to be recognized by adopting a pre-trained voice recognition model to obtain an initial voice recognition result corresponding to the voice to be recognized.

And step S3, starting decision service, and inputting the initial voice recognition result into FAQ-based vehicle question-answering service (namely first vehicle question-answering service) and knowledge map-based vehicle question-answering service (namely second vehicle question-answering service) at the same time. Wherein:

step S31, for the first vehicle question and answer service, respectively calculating first semantic similarities between the plurality of first question sentences and the initial text recognition result based on a plurality of sets of first question and answer information (i.e., a plurality of first question sentences and a plurality of corresponding first answer sentences) pre-stored in the vehicle question and answer database; performing rough-typesetting recall by adopting a rough-typesetting model based on first semantic similarity respectively corresponding to the first question sentences to obtain a first number (namely, top K in the K-row) of candidate text recognition results, wherein the semantic similarity between the candidate text recognition results and the initial text recognition results in the first question sentences is greater than the preset similarity; and further adopting a fine-ranking model to perform fine-ranking recall on the first number of candidate text recognition results, determining a candidate text recognition result with the highest semantic similarity (namely ranked in the first, TOP 1) with the initial text recognition result from the first number of candidate text recognition results as the first text recognition result, and further calculating a first confidence coefficient between the first text recognition result and the initial text recognition result.

Step S32, for the second vehicle question-and-answer service, based on a plurality of sets of second question-and-answer information (including a plurality of second question sentences and a plurality of corresponding second answer sentences) pre-stored in a knowledge graph database; respectively calculating second semantic similarity between the plurality of second question sentences and the initial text recognition result, determining a third text recognition result from the plurality of second question sentences based on the second semantic similarity corresponding to the plurality of second question sentences respectively, and determining a target text type corresponding to the third text recognition result; and further determining a target semantic understanding model corresponding to the target text type from a plurality of different semantic understanding models (namely a vehicle control semantic understanding model, a vehicle industry knowledge semantic understanding model, a chatting semantic understanding model and the like), identifying a third text recognition result by using the target semantic understanding model to obtain a second text recognition result, and determining a second confidence coefficient between the second text recognition result and the initial text recognition result. If the target semantic understanding model corresponding to the text type of the third text recognition result cannot be determined in the different semantic understanding models, inputting the third text recognition result into the different semantic understanding models simultaneously for semantic recognition to obtain a plurality of fourth text recognition results output by the different semantic understanding models; and semantic similarity matching is carried out between the fourth text recognition results and the initial text recognition result, third semantic similarity between the fourth text recognition results and the initial text recognition result is respectively determined, the maximum third semantic similarity in the fourth text recognition results is taken as a second text recognition result, and the second confidence between the second text recognition result and the initial text recognition result is further determined.

Step S4, starting decision and recognition service, and determining a confidence coefficient difference between the first text recognition result and the second text recognition result based on the first confidence coefficient and the second confidence coefficient; and determining the final recognition result according to the first text recognition result, the first confidence coefficient, the second text recognition result, the second confidence coefficient and the confidence coefficient difference. The method comprises the following specific steps: taking knowledge-graph information corresponding to the second text recognition result in the knowledge-graph database as a final recognition result in the case that the higher one of the first confidence degree and the second confidence degree is greater than a confidence-degree extreme value threshold (e.g. 0.5) and the confidence difference value between the first confidence degree and the second confidence degree is less than a confidence-difference value threshold (e.g. 0.2); in the case where the higher of the first confidence degree and the second confidence degree is greater than a confidence limit value threshold (e.g., 0.5), and a confidence difference between the first confidence degree and the second confidence degree is greater than or equal to a confidence difference threshold (e.g., 0.2), the text recognition result corresponding to the greater of the first confidence degree and the second confidence degree is taken as the final recognition result. And under the condition that the first confidence coefficient and the second confidence coefficient are both smaller than the confidence coefficient extremum threshold, not returning a recognition result.

It should be noted that, in the embodiment of the present invention, different semantic understanding models are trained according to the voice request category, and a corresponding knowledge map database is constructed; a voice scheduling service is established before different semantic understanding models, voice requests of different types are preprocessed and distributed to the corresponding semantic understanding models, if the voice requests cannot be distinguished, the voice requests are distributed to all the semantic understanding models for PK, results with higher weight values are returned, and the semantic understanding accuracy and recall rate are improved; combining FAQ-based vehicle question-answering services with knowledge-graph-based vehicle question-answering services; a question-answer decision service is constructed, and a question-answer request of a user is simultaneously distributed to a vehicle question-answer service based on FAQ and a vehicle question-answer service based on a knowledge map, so that the vehicle question-answer service has both generalized searching capability and accurate searching capability, and the vehicle question-answer accuracy rate and the vehicle recall rate are improved.

It should be noted that for simplicity of description, the above-mentioned method embodiments are shown as a series of combinations of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a vehicle voice recognition apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used hereinafter, the terms "module" and "apparatus" may refer to a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

According to an embodiment of the present invention, there is also provided an apparatus embodiment for implementing the vehicle voice recognition method, and fig. 3 is a schematic structural diagram of a vehicle voice recognition apparatus according to an embodiment of the present invention, as shown in fig. 3, the vehicle voice recognition apparatus includes: a first obtaining module 300, a first identifying module 302, a first determining module 304, a second determining module 306, and a third determining module 308, wherein:

the first obtaining module 300 is configured to obtain a voice to be recognized;

the first recognition module 302 is connected to the first obtaining module 300, and configured to perform text recognition on the speech to be recognized, so as to obtain an initial text recognition result corresponding to the speech to be recognized;

the first determining module 304, connected to the first identifying module 302, is configured to determine, based on a pre-constructed vehicle question-answer database, a first text identification result corresponding to the initial text identification result and a first confidence of the first text identification result;

the second determining module 306, connected to the first determining module 304, is configured to determine, based on a pre-constructed knowledge map database, a second text recognition result corresponding to the initial text recognition result and a second confidence of the second text recognition result;

the third determining module 308 is connected to the second determining module 306, and configured to determine a final recognition result of the speech to be recognized according to the first text recognition result and the first confidence level, and the second text recognition result and the second confidence level.

In the embodiment of the present invention, the first obtaining module 300 is configured to obtain a voice to be recognized; the first recognition module 302 is connected to the first obtaining module 300, and configured to perform text recognition on the speech to be recognized, so as to obtain an initial text recognition result corresponding to the speech to be recognized; the first determining module 304, connected to the first identifying module 302, is configured to determine, based on a pre-constructed vehicle question-answer database, a first text identification result corresponding to the initial text identification result and a first confidence of the first text identification result; the second determining module 306, connected to the first determining module 304, is configured to determine, based on a pre-constructed knowledge map database, a second text recognition result corresponding to the initial text recognition result and a second confidence of the second text recognition result; the third determining module 308 is connected to the second determining module 306, and configured to determine a final recognition result of the speech to be recognized according to the first text recognition result and the first confidence level, and the second text recognition result and the second confidence level, so as to achieve a purpose of determining a final speech recognition result by integrating a vehicle question and answer database and a knowledge graph database and comparing results according to confidence levels, thereby achieving a technical effect of improving vehicle speech question and answer recognition accuracy and further improving user experience, and further solving technical problems of low recognition accuracy and poor user experience in a related technology in which speech question and answer recognition is performed only in a vehicle question and answer service or knowledge graph manner.

It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.

It should be noted here that the first obtaining module 300, the first identifying module 302, the first determining module 304, the second determining module 306, and the third determining module 308 correspond to steps S102 to S110 in the embodiments, and the modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the embodiments. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.

It should be noted that, for alternative or preferred embodiments of the present embodiment, reference may be made to the relevant description in the embodiments, and details are not described herein again.

The vehicle voice recognition device may further include a processor and a memory, where the first obtaining module 300, the first recognition module 302, the first determination module 304, the second determination module 306, the third determination module 308, and the like are stored in the memory as program modules, and the processor executes the program modules stored in the memory to implement corresponding functions.

The processor comprises a kernel, and the kernel calls corresponding program modules from the memory, and the kernel can be set to be one or more than one. The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and when the program runs, the apparatus in which the nonvolatile storage medium is located is controlled to execute any one of the vehicle voice recognition methods.

Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals, and the nonvolatile storage medium includes a stored program.

Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: acquiring a voice to be recognized; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

According to the embodiment of the application, the embodiment of the processor is also provided. Optionally, in this embodiment, the processor is configured to run a program, where the program executes any one of the vehicle voice recognition methods.

There is also provided, in accordance with an embodiment of the present application, an embodiment of a computer program product adapted to, when executed on a data processing device, execute a program initializing the steps of the vehicle speech recognition method of any of the above.

Optionally, the computer program product described above, when being executed on a data processing device, is adapted to perform a procedure for initializing the following method steps: acquiring a voice to be recognized; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

The embodiment of the invention provides electronic equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps: acquiring a voice to be recognized; performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized; determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database; determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database; and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described modules may be divided into one logical function, and may be implemented in another way, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, modules or indirect coupling or communication connection of modules, and may be in an electrical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer-readable non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A vehicle speech recognition method, comprising:

acquiring a voice to be recognized;

performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized;

determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database;

determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge graph database;

and determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

2. The method according to claim 1, wherein the determining a final recognition result of the speech to be recognized according to the first confidence level and the second confidence level comprises:

determining a confidence difference between the first text recognition result and the second text recognition result based on the first confidence and the second confidence;

and determining the final recognition result according to the first text recognition result, the first confidence coefficient, the second text recognition result, the second confidence coefficient and the confidence coefficient difference.

3. The method of claim 2, wherein determining the final recognition result based on the first text recognition result and the first confidence level, the second text recognition result and the second confidence level, and the confidence difference comprises:

judging whether the first confidence coefficient or the second confidence coefficient is larger than a preset confidence coefficient extreme value threshold, and judging whether the confidence coefficient difference is smaller than a preset confidence coefficient difference threshold;

and if the first confidence degree or the second confidence degree is greater than the confidence degree extreme value threshold value and the confidence degree difference value is less than the confidence degree difference value threshold value, taking knowledge map information corresponding to the second text recognition result in the knowledge map database as the final recognition result.

4. The method of claim 3, further comprising:

if the first confidence degree or the second confidence degree is greater than the confidence degree extreme value threshold, the first confidence degree is greater than the second confidence degree, and the confidence degree difference value is greater than or equal to the confidence degree difference value threshold, taking first question-answer information corresponding to the first text recognition result in the vehicle question-answer database as the final recognition result;

and if the first confidence coefficient or the second confidence coefficient is greater than the confidence coefficient extreme value threshold, the first confidence coefficient is less than or equal to the second confidence coefficient, and the confidence coefficient difference is greater than or equal to the confidence coefficient difference threshold, taking the knowledge-graph information corresponding to the second text recognition result as the final recognition result.

5. The method of claim 1, wherein determining a first text recognition result corresponding to the initial text recognition result and a first confidence of the first text recognition result based on a pre-constructed vehicle question and answer database comprises:

acquiring a plurality of groups of first question-answer information included in the vehicle question-answer database, wherein the plurality of groups of first question-answer information include a plurality of first question sentences and a plurality of corresponding first answer sentences;

determining a first semantic similarity between the plurality of first question sentences and the initial text recognition result;

determining a first number of candidate text recognition results from the plurality of first question sentences by adopting a rough model based on the first semantic similarity corresponding to the plurality of first question sentences respectively;

determining the first text recognition result from the first number of candidate text recognition results by using a fine-line model based on the first semantic similarity corresponding to the first number of candidate text recognition results, and determining the first confidence between the first text recognition result and the initial text recognition result.

6. The method of claim 1, wherein determining a second text recognition result corresponding to the initial text recognition result and a second confidence level of the second text recognition result based on a pre-constructed knowledge graph database comprises:

acquiring a plurality of groups of second question-answer information included in the knowledge graph database, wherein the plurality of groups of second question-answer information include a plurality of second question sentences and a plurality of corresponding second answer sentences;

respectively determining second semantic similarity between the plurality of second question sentences and the initial text recognition result;

determining a third text recognition result and a target text type corresponding to the third text recognition result from the second question sentences based on the second semantic similarity, wherein the third text recognition result is the second question sentence with the maximum second semantic similarity in the second question sentences;

judging whether a pre-trained target semantic understanding model corresponding to the target text type exists in a plurality of different semantic understanding models, wherein the different semantic understanding models correspond to different text types;

if the target semantic understanding model exists in the plurality of different semantic understanding models, the target semantic understanding model is adopted to identify the third text recognition result, the second text recognition result is obtained, and the second confidence degree between the second text recognition result and the initial text recognition result is determined.

7. The method of claim 6, further comprising:

if the target semantic understanding model does not exist in the different semantic understanding models, respectively identifying the third text recognition result by adopting the different semantic understanding models to obtain a plurality of fourth text recognition results;

respectively determining a third semantic similarity between the plurality of fourth text recognition results and the initial text recognition result;

determining the second text recognition result from the plurality of fourth text recognition results based on the third semantic similarity, and determining the second confidence between the second text recognition result and the initial text recognition result.

8. A vehicle voice recognition apparatus, characterized by comprising:

the first acquisition module is used for acquiring the voice to be recognized;

the first recognition module is used for performing text recognition on the voice to be recognized to obtain an initial text recognition result corresponding to the voice to be recognized;

the first determining module is used for determining a first text recognition result corresponding to the initial text recognition result and a first confidence coefficient of the first text recognition result based on a pre-constructed vehicle question-answer database;

the second determination module is used for determining a second text recognition result corresponding to the initial text recognition result and a second confidence coefficient of the second text recognition result based on a pre-constructed knowledge map database;

and the third determining module is used for determining a final recognition result of the voice to be recognized according to the first text recognition result, the first confidence coefficient, the second text recognition result and the second confidence coefficient.

9. A non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the vehicle speech recognition method of any of claims 1 to 7.

10. An electronic device comprising one or more processors and memory storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the vehicle speech recognition method of any of claims 1-7.