CN115410553A

CN115410553A - Vehicle voice optimization method and device, electronic equipment and storage medium

Info

Publication number: CN115410553A
Application number: CN202211020241.3A
Authority: CN
Inventors: 魏东东; 李阳; 张奇磊; 张轮; 刘文焱; 马东旺; 张�杰
Original assignee: Great Wall Motor Co Ltd
Current assignee: Great Wall Motor Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-29

Abstract

The invention provides a vehicle voice optimization method, a vehicle voice optimization device, electronic equipment and a storage medium, wherein the method comprises the following steps: intercepting voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment; determining a processing stage with unqualified voice performance according to the duration of each processing stage in the process of responding the user voice by the vehicle-mounted terminal; according to the method, the voice processing result of the processing stage with the voice performance not reaching the standard is optimized according to the voice of the user, so that the voice performance of the processing stage is in the state of reaching the standard, the voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment is intercepted, then the stage needing to be optimized and the time point of occurrence of a problem are determined based on the voice performance data, and then the optimization is carried out by combining the voice of the user.

Description

Vehicle voice optimization method and device, electronic equipment and storage medium

Technical Field

The invention relates to the field of voice recognition, in particular to a vehicle voice optimization method and device, electronic equipment and a storage medium.

Background

Traditional car is developing to intelligent direction, and fixed third party software provider is mostly needed to bind to on-vehicle speech recognition at present, and its speech processing relies on third party software provider completely, and on the one hand the third party software provider's speech recognition performance has the height, on the other hand can't match with own vehicle completely, can't set up to own vehicle pertinence, in addition because there is not better pronunciation mistake investigation scheme at present, third party software provider all can't investigate the pronunciation mistake at present, consequently has a great deal of not enough.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a vehicle voice optimization method, a device, electronic equipment and a storage medium, aiming at solving at least one of the problems that the voice recognition performance of a third-party software provider is high or low, the third-party software provider cannot be completely matched with a vehicle and cannot be set in a targeted manner aiming at the vehicle, and in addition, because a better voice error checking scheme is not provided at present, the third-party software provider cannot check the voice error at present.

In order to solve the technical problems, the invention provides the following technical scheme:

an embodiment of a first aspect of the present application provides a vehicle voice optimization method, which is executed by a vehicle voice optimization device, and includes:

intercepting voice performance data generated by the vehicle-mounted terminal responding to the user voice at the current moment; the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the user voice;

determining a processing stage with unqualified voice performance according to the duration of each processing stage in the process of responding the user voice by the vehicle-mounted terminal;

and optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user so as to enable the voice performance of the processing stage to reach the standard.

In an optional embodiment, each processing stage corresponds to a preset duration, and the determining, according to the duration of each processing stage in the process of responding to the user voice by the vehicle-mounted terminal, a processing stage with a voice performance not meeting the standard includes:

and for each processing stage, if the time length corresponding to the processing stage is greater than the preset time length corresponding to the processing stage, determining that the voice performance of the processing stage does not reach the standard.

In an alternative embodiment, the processing stage comprises a speech recognition stage;

the optimizing the voice processing result of the processing stage with the voice performance reaching the unqualified standard according to the voice of the user comprises the following steps:

converting user voice into question sentences, and segmenting the question sentences to obtain a plurality of participles;

performing feature processing on each word segmentation to obtain a feature vector of each word segmentation;

and inputting all the feature vectors into a preset semantic recognition model, and outputting semantic information of the question sentences by the semantic recognition model to obtain the voice processing result.

In an optional embodiment, after outputting the semantic information of the question sentence, the vehicle voice optimization method further includes:

determining user requirements according to semantic information corresponding to the user voice;

according to user requirements, searching a third-party information source corresponding to the user requirements from a corresponding relation table of the user requirements and the third-party information source;

and calling the service of the third-party information source to respond to the user voice.

In an alternative embodiment,

selecting an information service channel according to the user voice; each information service channel corresponds to a public cloud server, and the public cloud server is provided with voice search service;

and sending a service calling instruction to the public cloud server, calling the selected voice search service, generating a calling result by combining the user voice, and taking the calling result as the voice processing result.

In an alternative embodiment of the method of the invention,

inputting the user voice to each public cloud server through each information service channel, wherein each public cloud server generates a candidate voice processing result based on the user voice; each information service channel corresponds to a public cloud server;

and selecting one of a plurality of candidate voice processing results based on the semantic keyword of the voice of the user to obtain the voice processing result.

An embodiment of a second aspect of the present application provides a vehicle voice optimization device, including:

the intercepting module intercepts voice performance data generated by the vehicle-mounted terminal responding to the user voice at the current moment; the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the user voice;

the determining module is used for determining the processing stage with the voice performance not reaching the standard according to the duration of each processing stage in the process that the vehicle-mounted terminal responds to the voice of the user;

and the optimization module optimizes the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user so as to enable the voice performance of the processing stage to reach the standard.

An embodiment of a third aspect of the present application provides a vehicle voice interaction system, including: the system comprises a vehicle-mounted terminal 1, a vehicle voice optimization device 2 and a plurality of cloud servers 3;

the vehicle-mounted terminal is used for picking up user voice and sending a corresponding service calling instruction to at least one of the cloud servers according to the user voice after recognizing the user voice; the cloud servers receive a service calling instruction of the vehicle-mounted terminal, call corresponding services, and send service calling results to the vehicle-mounted terminal so that the vehicle-mounted terminal can respond to the user voice;

the vehicle voice optimization device is used for intercepting voice performance data generated by the vehicle-mounted terminal responding to the user voice at the current moment; determining a processing stage with unqualified voice performance according to the duration of each processing stage in the process of responding the user voice by the vehicle-mounted terminal; then, optimizing a voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user; and the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the voice of the user.

And inputting the user voice to each public cloud server through each information service channel. In yet another aspect of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the vehicle voice optimization method.

In yet another aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out a method for vehicle speech optimization.

According to the technical scheme, the vehicle voice optimization method, the vehicle voice optimization device, the electronic equipment and the storage medium provided by the invention have the advantages that the voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment are firstly intercepted, then the stage needing to be optimized is determined based on the voice performance data, and then the optimization is carried out by combining the voice of the user.

Drawings

In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the embodiments or technical solutions in the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a vehicle voice optimization method according to an embodiment of the present invention.

Fig. 2 is a schematic flowchart of step S3 in fig. 1 according to an embodiment of the present invention.

Fig. 3a is a schematic view of an application scenario in the embodiment of the present invention.

Fig. 3b is a schematic diagram of a network architecture according to an embodiment of the present invention.

FIG. 3c is a schematic diagram of the substandard processing stage and the substandard processing stage in the embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a vehicle voice optimization device in the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

At present, most of vehicle-mounted voice recognition needs to be bound with fixed third-party software providers, voice processing of the vehicle-mounted voice recognition is completely dependent on the third-party software providers, on one hand, the voice recognition performance of the third-party software providers is high or low, on the other hand, the vehicle-mounted voice recognition cannot be completely matched with the vehicle, and the vehicle-mounted voice recognition cannot be set in a targeted mode according to the vehicle.

The core concept of the method is that voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment are intercepted in real time or at each time interval, then the duration of each processing stage is found from the voice performance data, and then when the duration is higher than a set value, the fact that the voice recognition has obstacles can be judged, response voice regeneration can be further carried out by combining the voice of the user, namely, a more optimal answer is regenerated to be stored for being directly adopted when similar voice appears next time.

Fig. 3a shows an application scenario of the embodiment of the present application, and as shown in fig. 3a, the interactive system of the present application specifically includes: the vehicle voice optimizing device comprises a vehicle-mounted terminal 1, a vehicle voice optimizing device 2 and a plurality of cloud servers 3.

The vehicle-mounted terminal is specifically used for picking up user voice and sending a corresponding service calling instruction to at least one of the cloud servers according to the user voice after recognizing the user voice.

The cloud servers receive a service calling instruction of the vehicle-mounted terminal, call corresponding services, and send service calling results to the vehicle-mounted terminal so that the vehicle-mounted terminal can respond to the user voice;

the vehicle voice optimization device is used for intercepting voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment; determining the processing stage with the voice performance not reaching the standard according to the duration of each processing stage in the process of responding the voice of the user by the vehicle-mounted terminal; then, optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user; and the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the voice of the user.

Specifically, the vehicle voice optimization device 2 is equivalent to a transit device in one embodiment, and as shown in fig. 1, the vehicle voice optimization device 2 transmits the cloud service called by each public cloud server to the vehicle-mounted terminal, and further, as can be seen from fig. 3b, the vehicle voice optimization device of the present application may further include functions of device authentication, load balancing, and the like, please refer to detailed descriptions of subsequent embodiments.

The method performed by the vehicle voice optimization device of the present application, which is performed by the vehicle voice optimization device, is described in detail below. The vehicle voice optimization device can be specifically a cloud server, and also can be a processor or a controller of a vehicle end and the like.

As shown in fig. 1, an embodiment of an aspect of the present invention provides a vehicle voice optimization method, including:

s1: intercepting voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment; and the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the user voice.

S2: determining the processing stage with the voice performance not reaching the standard according to the duration of each processing stage in the process of responding the voice of the user by the vehicle-mounted terminal;

s3: and optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user so as to enable the voice performance of the processing stage to be in a state of reaching the standard.

According to the vehicle voice optimization method, the voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment are firstly intercepted, then the stage needing to be optimized and the time point of occurrence of a problem are determined based on the voice performance data, and then optimization is carried out by combining the voice of the user.

In the embodiment of the present application, the vehicle-mounted terminal may be a vehicle machine (i.e., a vehicle operating system) mounted on a vehicle carrier, or a mobile terminal connected to a vehicle, such as a mobile phone and a laptop, which is not limited in this application.

The mobile terminal can be connected with the automobile through short-distance wireless communication, bluetooth and wired connection, and the method is not limited in the application.

For example, the mobile terminal is connected with an automobile through bluetooth, and the mobile terminal can further unlock and lock the automobile door, pre-cool and pre-heat the air conditioner of the automobile, and the like.

In the embodiment of the present application, the user speech refers to instructional speech uttered by a driver or a passenger, such as "turn on an air conditioner", "how today's weather" and the like.

The user voice of the present application generally includes a recognition name, for example, "xxx, turn on the air conditioner", where xxx is the recognized name, so that it is possible to accurately avoid the false activation of the function when the user communicates with each other (for example, the driver communicates with the passenger) without a corresponding prediction procedure.

Furthermore, after the application is based on a mature system, the application can also be carried out in a prediction mode instead of a name recognition mode, for example, the semantics of the users are recognized, and then whether the users chat or the user instructions are determined based on the semantics.

For example, the number of drivers and passengers in the cabin can be further used as an influence parameter for determining whether the chat between users or the user instruction is carried out, for example, when only one driver exists currently, the voice of the user can be judged when the driver utters voice in the case of excluding telephone calls and singing songs.

In this embodiment, analysis may be performed in combination with the weight of each influence parameter, that is, a weight or an empirical coefficient may be respectively assigned to the duration of the voice, the number of users in the cabin, the current telephone communication situation, and the time interval between the user speaking and speaking, and then whether to issue the instruction voice may be determined based on the calculation result.

It can be understood that, in the voice recognition process, when the vehicle-mounted terminal responds or responds through a third-party software provider, the vehicle-mounted terminal generally includes a voice recognition stage and a result output stage, the voice recognition stage is used for recognizing voice content, and the result output stage is used for providing response content according to the recognized voice content.

The inventor initiatively acquires the duration of each processing stage in the process that the vehicle-mounted terminal responds to the user voice by intercepting the voice performance data, and then determines whether the processing stage fails according to the duration.

It is understood that the speech processing result is a recognition result or a response result given based on the speech of the user, and in the speech recognition stage, the speech processing result, i.e. the semantic content, of the stage is given, and in the result output stage, the speech response result is given.

The optimization in the present application may be to modify the corresponding program code or to pre-store the response result, and form a corresponding relationship matching the user's voice, and then to call the response result when the user sends out a similar voice next time.

and aiming at each processing stage, if the corresponding duration is higher than the corresponding preset duration, determining that the voice performance of the processing stage does not reach the standard.

For example, if the preset duration of a processing stage is 700ms, it is determined that there is an obstacle in the processing stage if it is intercepted that the duration currently in a processing stage exceeds 700 ms.

As shown in fig. 3c, fig. 3c shows that the processing duration of each processing stage is analyzed according to the voice performance data in the embodiment of the present application, and the circled portion in the figure is that the processing duration exceeds the set duration, that is, the stage is determined to be the substandard stage.

Referring to fig. 3c, a detailed description is given below on a specific processing procedure of the present application, as shown in fig. 3c, the vehicle-mounted optimization apparatus of the present application first intercepts voice performance data generated by a vehicle-mounted terminal (for example, a car machine) in response to a user voice at a current time (for example, 12 o 10 m 2 s), where the voice performance data may be a processing duration.

For example, the processing stage may include converting speech into text sentences, splitting word segments of text sentences for semantic analysis, organizing answer sentences according to the semantic analysis, and converting the answer sentences into speech.

In some embodiments, the intercepted voice is converted into a text sentence, the participles of the text sentence are split for semantic analysis, the answer sentence is organized according to the semantic analysis, and the voice performance data of the answer sentence converted into voice are respectively as follows: 500ms, 1000ms, 1200ms, and 300ms.

Exemplarily, the speech is converted into a text sentence, the semantic analysis is performed on the segmented words of the split text sentence, the answer sentence organization is performed according to the semantic analysis, and the preset duration for converting the answer sentence into the speech is respectively 700ms, 1000ms and 500ms, so that it can be seen that the processing duration of the stage for performing the semantic analysis on the segmented words of the split text sentence and the stage for organizing the answer sentence according to the semantic analysis is greater than the preset duration, at this time, the speech is defined as not reaching the standard, and the processing duration of the two stages for converting the speech into the text sentence and converting the answer sentence into the speech is less than the preset duration, the speech is defined as reaching the standard.

After the interception, the stage of semantic analysis of the participles of the split text sentences and the stage of organizing the answer sentences according to the semantic analysis may be optimized, for example, an algorithm of the semantic analysis and database replacement of the corresponding relationship of the answer sentences may be replaced.

It can be understood that, in this embodiment of the present application, the voice processing result is output of the voice processing data at each stage, for example, the voice processing result is a text sentence at the processing stage of converting the voice into the text sentence, and the voice processing result at the processing stage of splitting the word of the text sentence and performing semantic analysis is semantic content, which is not exhaustive in the present application.

For example, the processing stage of the present application is defined according to a processing procedure of data content, and in other embodiments, the processing stage may also be defined based on a processing state of the voice data, for example, the processing stage of the voice data includes: and a speech recognition screen-loading stage, a speech recognition completion stage and a dialogue result output stage.

It can be understood that the duration of the processing phase and the like can be intercepted from the vehicle-mounted terminal, the vehicle-mounted terminal can generate a processing log during processing, or give an identifier of the processing phase, that is, the vehicle-mounted terminal generates an identifier, an instruction or a processing log after completing one processing phase, and the vehicle optimization device can determine the duration of the processing phase by intercepting the identifier, the instruction or the processing log.

In an optional embodiment, the processing stage includes a speech recognition stage, and a speech processing result of the speech recognition stage is semantic information;

the optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user comprises the following steps:

s31: converting user voice into question sentences, and segmenting the question sentences to obtain a plurality of participles;

s32: performing feature processing on each word segmentation to obtain a feature vector of each word segmentation;

s33: and inputting all the feature vectors into a preset semantic recognition model, and outputting semantic information of the question sentences by the semantic recognition model.

In the embodiment, the built-in semantic recognition model is used, and then the model is called to perform semantic recognition, so that on one hand, the inaccuracy of the semantic recognition of a third-party software provider is avoided, and on the other hand, the optimization can be performed by utilizing self data and self user feedback, so that the semantic recognition is more and more accurate.

Further, in an optional embodiment, the semantic recognition model is a neural network model, and the method further includes:

establishing the neural network model;

and training the neural network model by utilizing the word segmentation group labeled with the semantic information, wherein the neural network model after the training convergence is the semantic recognition model.

In this embodiment, the semantic recognition model may be a neural network model, specifically, an artificial neural network model (ANN), a convolutional neural network model (CNN), or a probabilistic neural model (PNN), which is not limited in this application.

For example, specifically, the semantic recognition model is a CNN model, that is, when training is performed first, each participle combination is labeled with one answer sentence, then the answer sentences are identified, for example, 1, 2, 3, and 4, that is, the participle combination corresponds to 1, 2, 3, and 4, and then the participle combination is input into the CNN model, and then the whole network weight is obtained through convolution and pooling, and the network weight after the training is stable, so that after different participle combinations are input, a specific answer sentence can be given.

In an optional embodiment, the method further comprises: and carrying out equipment authority authentication operation on the vehicle-mounted terminal, and if the authentication is passed, sending the voice processing result to the vehicle-mounted terminal.

In the embodiment, the permission authentication is carried out on the equipment, so that the non-vehicle-mounted terminal equipment is prevented from being accessed.

Further, the processing stage of the present application includes a result output stage, that is, as shown in table 1 below, the speech recognition screen-up and speech recognition completion belong to the speech recognition stage, which is 700ms in total, and the dialog result output is the result output stage.

TABLE 1 treatment phase schedules

Speech recognition is on-screen asr _ final _ elapsed	Standard 200ms
		Speech recognition completes first _ asr _ text _ accessed	Standard 500ms
Dialog result output dm _ final _ elapsed	Standard 700ms

Further, in this embodiment of the application, multiple public clouds in the market may be accessed, and a service of the public cloud may be called, for example, the service of the public cloud a includes: authentication service, speech Recognition (ASR) service, ASR training, skill center control, DM service center control, skill semantic service, skill platform, information source service, and the like, which are not further described herein.

Furthermore, the method can access a public cloud B, that is, the method can access a plurality of public clouds at the same time, that is, the processing stage comprises a result output stage;

the vehicle voice optimization device comprises network interfaces for a plurality of information service channels to access;

selecting an information service channel according to the user voice; each information service channel corresponds to a public cloud server and comprises voice search service;

According to the voice recognition method and the voice recognition system, the public cloud server with stronger voice recognition can be selected, the voice search service of the public cloud server is borrowed, and the voice is processed and recognized through higher processing performance of the public cloud server, so that the accuracy and the speed of the voice recognition can be further improved.

Further, in some embodiments, after obtaining the semantic information, the vehicle voice optimization method further includes:

according to the user requirements, searching a third party information source corresponding to the user requirements from a corresponding relation table of the user requirements and the third party information source;

In this embodiment, based on the semantic information of the user voice and the correspondence table of the third-party information source, an information service channel is selected according to the user voice, that is, a corresponding public cloud server is selected, and a corresponding service is searched from the public cloud server to be called, so that various different public cloud servers can be matched, and a public cloud server with the best effect can be called according to the user requirements of each type, for example, the best weather forecast service on the market on the public cloud a, and when the user needs to consult weather, the weather forecast service on the public cloud a can be called.

In this embodiment, a plurality of information service channels, for example, the public cloud, may be accessed, and then the terminal device selects one information service channel, where a specific selection process includes: the semantics of the voice of the user are analyzed, then the requirements of the user are matched, and then an information service channel such as a weather forecast service, a road early warning service and the like is selected according to the requirements.

In an alternative embodiment, the processing stage includes a result output stage, for example, the preset duration of the result output stage in table 1 is 700ms, the vehicle voice optimization apparatus includes a network interface for accessing a plurality of information service channels, and the optimizing the voice processing result of the processing stage with the voice performance not meeting the standard according to the voice of the user includes: inputting the user voice to each public cloud server through each information service channel, wherein each information service channel generates a voice processing result based on the user voice; one of the plurality of speech processing results is selected based on semantic keywords of the user's speech.

In this embodiment, the voice is sent to all the information service channels, then each information service channel generates a voice processing result based on the user voice, and then one of the plurality of voice processing results is selected based on the semantic keyword of the user voice, that is, the voice processing result is screened in a manner of whether being matched with the user voice.

The interaction process of the present application is described in detail below with reference to fig. 3a and 3 b. In fig. 3b, the vehicle-mounted terminal is the vehicle-mounted terminal of the present application, and the whole interaction process is as follows:

step 1: the driver or passenger issues a voice command that can be set to include certain response words, such as: small XX, how tomorrow? The small XX is a response word which can be recognized by the vehicle-mounted terminal, and the next voice is defined as the voice which the user needs to interact with the vehicle-mounted terminal.

Step 2: the vehicle-mounted terminal receives' small XX, how is tomorrow? "by responding to the word" small XX ", determine" how is tomorrow weather? "is an instruction that the user wants to interact with.

And 3, step 3: the vehicle-mounted terminal displays "how much tomorrow is on a display screen? "to ensure that the user can adjust or re-speak the speech when an error is recognized. At this time, a time stamp for the screen printing is generated inside the system and can be generated through a log.

And 4, step 4: the voice optimization device acquires the log of the vehicle-mounted terminal in real time, the log cannot be acquired before an event is not finished because the log is not generated, the time stamp is acquired immediately after the vehicle-mounted terminal generates a screen printing time stamp through the log, the processing time length of a screen printing stage can be determined, then the processing time length is compared with the preset processing time length to judge whether the log reaches the standard, if the log reaches the standard, the step 4 is carried out, if the log does not reach the standard, the algorithm of the vehicle-mounted terminal in the screen printing process needs to be optimized, and the like, for example, the log is replaced by a faster and more targeted algorithm.

And 4, step 4: the in-vehicle terminal confirms "how is the weather tomorrow? After the voice of the user is received (namely, the user does not modify the voice within the specified time), the internal semantic analysis program starts semantic analysis, confirms that the user wants to know the weather of tomorrow, and generates a processing log which comprises a process time stamp of the semantic analysis.

And 5: the voice optimization device acquires the log of the vehicle-mounted terminal in real time, the log cannot be acquired because the log is not generated before the event is not finished, the time stamp of the semantic analysis process is found after the log is acquired by the vehicle-mounted terminal, if the log reaches the standard, the step 6 is carried out, and if the log does not reach the standard, the algorithm and the like of the vehicle-mounted terminal in the semantic analysis process need to be optimized, for example, the algorithm is replaced by a faster and more targeted algorithm.

And 6: and after each processing stage, the vehicle-mounted terminal outputs a response sentence and transmits the response sentence to the vehicle optimization device.

And 7: the vehicle optimization device can call services of all public clouds by combining semantic contents in the process that the vehicle-mounted terminal outputs the response sentences.

The vehicle optimization device invocation service is explained in detail below.

1. The vehicle optimization device can authenticate with the public cloud server when the vehicle optimization device is connected to the public cloud server, whether the public cloud server has authority to be connected is determined, and further the vehicle optimization device authenticates equipment with vehicle-mounted break when the vehicle optimization device is connected to the vehicle-mounted terminal.

2. In a specific processing process, the vehicle optimization device further performs load balancing control and flow control on the whole processing process.

3. The vehicle optimization device can perform transfer service, and public cloud service is accessed to the vehicle-mounted terminal through an information service channel, so that routing terminal and data transfer can be realized.

4. The public cloud server can include a plurality of services, and the public cloud server can first include an Automatic Speech Recognition (ASR) service, and the ASR service can be called by the vehicle-mounted terminal through the vehicle optimization device, and when performing Speech Recognition, or when the vehicle optimization device detects that the Speech Recognition of the vehicle-mounted terminal does not meet the standard, the Speech Recognition service of the public cloud server can be used for replacing a Speech Recognition algorithm of the vehicle-mounted terminal, so that a Speech Recognition result is called from the public cloud server and transmitted to the vehicle-mounted terminal.

5. The public cloud server further comprises an ASR training service, a model training service and a skill center control service which can perform semantic recognition, and a skill feedback control after the semantic recognition, for example, weather can be recognized semantically, weather forecast skills can be performed, a user registration service center control service, a skill semantic service, a skill platform and an information source service are performed, and details are not repeated herein.

According to the vehicle voice optimization method, the voice performance data generated by the vehicle-mounted terminal responding to the user voice at the current moment are firstly intercepted, then the stage needing to be optimized and the time point of occurrence of the problem are determined based on the voice performance data, and then the optimization is carried out by combining the user voice. The present application provides a vehicle voice optimization device on a software level, comprising:

the intercepting module 1 intercepts voice performance data generated by the vehicle-mounted terminal responding to the user voice at the current moment; the voice performance data is the duration of each processing stage in the process that the vehicle-mounted terminal responds to the user voice;

the determining module 2 is used for determining the processing stage with the voice performance not reaching the standard according to the duration of each processing stage in the process that the vehicle-mounted terminal responds to the voice of the user;

and the optimization module 3 optimizes the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user so as to enable the voice performance of the processing stage to reach the standard.

It can be understood that the apparatus may be integrated inside the vehicle-mounted terminal, and may also be in communication connection with the vehicle-mounted terminal, which is not limited in this application.

According to the vehicle voice optimization device, the voice performance data generated by the vehicle-mounted terminal responding to the voice of the user at the current moment is firstly intercepted, then the stage needing optimization and the time point of occurrence of a problem are determined based on the voice performance data, and then optimization is carried out by combining the voice of the user.

From a hardware level, for the embodiment of the electronic device for implementing all or part of the contents in the vehicle voice optimization method, the electronic device specifically includes the following contents:

a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission among related equipment such as a server, a device, a distributed message middleware cluster device, various databases, a user terminal and the like; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the electronic device may refer to the embodiment of the vehicle voice optimization method in the embodiment and the embodiment of the vehicle voice optimization apparatus, which are incorporated herein, and repeated details are not repeated.

Fig. 5 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present invention. As shown in fig. 5, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 5 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, vehicle voice optimization functionality may be integrated into central processor 9100.

In another embodiment, the vehicle voice optimization apparatus may be configured separately from the central processor 9100, for example, the vehicle voice optimization apparatus may be configured as a chip connected to the central processor 9100, and the vehicle voice optimization function is realized by the control of the central processor.

As shown in fig. 5, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 5; further, the electronic device 9600 may further include components not shown in fig. 5, which may be referred to in the art.

As shown in fig. 5, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.

The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.

The memory 9140 can be a solid state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.

The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.

A plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, can be provided in the same electronic device based on different communication technologies. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132 to implement general telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.

Embodiments of the present invention also provide a computer-readable storage medium capable of implementing all steps in the vehicle voice optimization method, the execution subject of which may be a server, in the above embodiments, the computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, implements all steps of the vehicle voice optimization method in the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A vehicle voice optimization method is applicable to a vehicle voice optimization device and comprises the following steps:

determining the processing stage with the voice performance not reaching the standard according to the duration of each processing stage in the process of responding the voice of the user by the vehicle-mounted terminal;

and optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the voice of the user so as to enable the voice performance of the processing stage to be in a state of reaching the standard.

2. The vehicle voice optimization method according to claim 1, wherein each processing stage corresponds to a preset duration, and the processing stage for determining that the voice performance does not reach the standard according to the duration of each processing stage in the process of responding to the user voice by the vehicle-mounted terminal comprises:

3. The vehicle voice optimization method of claim 2, wherein the processing stage includes a voice recognition stage;

4. The vehicle voice optimization method according to claim 3, wherein after outputting semantic information of the question sentence, the vehicle voice optimization method further comprises:

according to user requirements, searching a third party information source corresponding to the user requirements from a corresponding relation table of the user requirements and the third party information source;

5. The vehicle voice optimization method according to claim 1, wherein the optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the user voice comprises:

6. The vehicle voice optimization method according to claim 2, wherein the optimizing the voice processing result of the processing stage with the voice performance not reaching the standard according to the user voice comprises:

and selecting one from a plurality of candidate voice processing results based on the semantic keywords of the voice of the user to obtain the voice processing result.

7. A vehicle voice optimization device, comprising:

and the optimization module optimizes the voice processing result of the processing stage with the unqualified voice performance according to the voice of the user so as to enable the voice performance of the processing stage to be in a standard state.

8. A vehicle voice interaction system, comprising: the system comprises a vehicle-mounted terminal, a vehicle voice optimization device and a plurality of cloud servers;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the vehicle voice optimization method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the vehicle speech optimization method according to any one of claims 1 to 7.