CN116403583A

CN116403583A - Voice data processing method and device, nonvolatile storage medium and vehicle

Info

Publication number: CN116403583A
Application number: CN202310388680.8A
Authority: CN
Inventors: 尹鹏; 吕贵林; 陈涛; 陈岩; 王建
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2023-07-07

Abstract

The invention discloses a voice data processing method and device, a nonvolatile storage medium and a vehicle. Wherein the method comprises the following steps: responding to the received first voice data, performing voice recognition on the first voice data to obtain a first recognition result corresponding to the first voice data; determining whether the first voice data contains a control instruction based on the first recognition result; when the first voice data does not contain the control instruction, carrying out natural language processing on the first recognition result based on the natural semantic processing model to generate second voice data corresponding to the first voice data when the first recognition result does not contain the control instruction based on the natural semantic processing model; and outputting the second voice data. The invention solves the technical problem of lower accuracy of voice data processing.

Description

Voice data processing method and device, nonvolatile storage medium and vehicle

Technical Field

The invention relates to the field of intelligent vehicles, in particular to a voice data processing method and device, a nonvolatile storage medium and a vehicle.

Background

At present, a vehicle voice control system is rapidly developed, a user can control a vehicle by utilizing voice, a traditional vehicle directly operates according to acquired voice content after acquiring user voice data, however, the voice content acquired by the vehicle may deviate from the voice data input by the user, so that the voice data processing accuracy is lower.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a voice data processing method and device, a nonvolatile storage medium and a vehicle, which are used for at least solving the technical problem of low voice data processing accuracy.

According to an aspect of an embodiment of the present invention, there is provided a voice data processing method including: in response to receiving the first voice data, performing voice recognition on the first voice data to obtain a first recognition result corresponding to the first voice data; determining whether the first voice data contains a control instruction based on the first recognition result; under the condition that the first voice data does not contain a control instruction, carrying out natural language processing on the first recognition result based on a natural semantic processing model, and generating second voice data corresponding to the first voice data; and outputting the second voice data.

Optionally, determining whether the first voice data includes the control instruction based on the first recognition result includes: carrying out semantic analysis on the first recognition result to obtain user intention information; it is determined whether the first voice data includes a control instruction based on the user intention information.

Optionally, determining whether the first voice data includes the control instruction based on the user intention information includes: determining that the first voice data does not contain a control instruction in response to the user intention information being a voice interaction intention; in response to the user intent information being a vehicle control intent, it is determined that the first voice data includes control instructions.

Optionally, the method further comprises: responding to the feedback voice data of the received second voice data, and performing voice recognition on the feedback voice data to obtain a second recognition result corresponding to the feedback voice data; determining whether the feedback voice data contains an interaction instruction based on the second recognition result; under the condition that the feedback voice data comprises an interaction instruction, performing natural language processing on the second recognition result based on a natural semantic processing model to generate third voice data corresponding to the feedback voice data; and outputting the third voice data.

Optionally, in the case that the feedback voice data does not contain an interaction instruction, the method further includes: and prohibiting the natural language processing of the second recognition result by the natural semantic processing model.

Optionally, outputting the second voice data includes: and broadcasting the second voice data and/or displaying text information corresponding to the second voice data on the interactive interface.

Optionally, in the case that the first voice data contains a control instruction, the method further comprises: controlling the vehicle to execute the operation corresponding to the control instruction to obtain an execution result, wherein the execution result is used for indicating whether the vehicle successfully executes the operation corresponding to the control instruction; and outputting an execution result.

According to another aspect of the embodiment of the present invention, there is also provided a voice data processing apparatus, including: the recognition module is used for responding to the received first voice data, carrying out voice recognition on the first voice data and obtaining a first recognition result corresponding to the first voice data; the determining module is used for determining whether the first voice data contains a control instruction or not based on the first recognition result; the processing module is used for carrying out natural language processing on the first recognition result based on the natural semantic processing model under the condition that the first voice data does not contain a control instruction, and generating second voice data corresponding to the first voice data; and the output module is used for outputting the second voice data.

According to another aspect of the embodiments of the present invention, there is also provided a nonvolatile storage medium including a stored program, wherein the voice data processing method of the above embodiments is executed in a processor of a device in which the program is controlled to run.

According to another aspect of an embodiment of the present invention, there is also provided a vehicle including: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the voice data processing method of the above embodiment.

In the embodiment of the invention, after receiving first voice data, voice recognition is carried out on the first voice data to obtain a first recognition result corresponding to the first voice data; determining whether the first voice data contains a control instruction based on the first recognition result; and under the condition that the first voice data does not contain a control instruction, performing natural language processing on the first recognition result based on the natural semantic processing model, and generating second voice data corresponding to the first voice data. It should be noted that, the first voice data is identified, whether the first voice data contains the control instruction is determined, and under the condition that the first voice data does not contain the control instruction, the first identification result is further processed in natural language, so that the purpose of improving the voice data processing efficiency is achieved, the technical effect of efficiently and clearly responding to the first voice data is achieved, and the technical problem of lower voice data processing accuracy is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic diagram of a voice data processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an alternative voice data module relationship according to an embodiment of the present invention;

FIG. 3 is a flow chart of an alternative method of determining user intent information in accordance with an embodiment of the present invention;

FIG. 4 is a flow chart of an alternative voice data processing method according to an embodiment of the invention;

fig. 5 is a schematic diagram of a voice data processing apparatus according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to an embodiment of the present invention, there is provided an embodiment of a voice data processing method, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

Fig. 1 is a flowchart of a voice data processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

step S102, responding to the received first voice data, and performing voice recognition on the first voice data to obtain a first recognition result corresponding to the first voice data.

The first voice data may be voice data output by a user, or may be user voice data acquired by a vehicle-mounted voice system. The first recognition result may be a result obtained by performing recognition according to the voice data of the user, or may be a result obtained by converting the voice data of the user to obtain a language that can be recognized by the computer, so as to obtain a meaning of a text corresponding to the first voice data, or may be text information corresponding to the first voice data.

In an alternative embodiment, after receiving the first voice data, the voice intelligent system performs voice recognition on the first voice data to obtain text information corresponding to the first voice data, and takes the text information as a first recognition result.

In another alternative embodiment, after receiving the first voice data, the voice intelligent system performs voice recognition on the first voice data to obtain text information corresponding to the first voice data, performs text extraction on the text information to obtain nouns and/or verbs in the first voice data, performs analysis according to meanings of the nouns and/or verbs, determines meanings of the first voice data according to analysis results, and combines names and/or verbs according to a language logic relationship to obtain a first recognition result corresponding to the first voice data.

In yet another alternative embodiment, the user or technician may set the voice of the wake-up voice system in advance, avoiding that chatting between passengers in the vehicle affects vehicle control. After the first voice data is received, judging whether the first voice data contains voice for waking up the voice system, and if the first voice data does not contain voice for waking up the voice system, continuing to stand by; if the voice of the wake-up voice system is included, voice recognition is carried out on the first voice data, text information corresponding to the first voice data is obtained, and the text information is used as a first recognition result.

Step S104, based on the first recognition result, whether the first voice data contains a control instruction or not is determined.

The control command described above may be used to indicate a command to control the vehicle.

In an alternative embodiment, the first recognition result is subjected to semantic analysis to obtain intention information of the first voice data input by the user, and if the intention of the user is voice interaction, that is, the user wants to obtain certain information, it is determined that the first voice data does not contain a control instruction; if the user intends to control the vehicle, that is, if the user wishes the vehicle to perform a certain control operation, it is determined that the control instruction is included in the first voice data.

For example, if the first recognition result is "play music", the first recognition result is subjected to voice analysis, so that a function that the user wants the vehicle to perform playing, that is, the user intends to control the playing device of the vehicle, is obtained. And determines that the first voice data includes a control instruction of play. If the first recognition result is "what weather is tomorrow", performing voice analysis on the first recognition result to obtain weather conditions that the user wants to know, namely, voice boring interaction is intended by the user. And it is determined that the control instruction is not included in the first voice data.

In another alternative embodiment, a user or a technician can set text voice for controlling the starting of each function of the vehicle in advance, after the first recognition result is obtained, the voice intelligent system matches the first recognition result with the text voice set in advance, if the first voice data is successfully matched with the text voice set in advance, it is determined that the first voice data contains a control instruction, and the vehicle operates according to the control instruction; if the first voice data fails to match with the text voice set in advance, determining that the first voice data does not contain a control instruction.

Step S106, when the first voice data does not contain the control instruction, natural language processing is performed on the first recognition result based on the natural semantic processing model, and second voice data corresponding to the first voice data is generated.

The natural semantic processing model may be a model for processing user voice, or may be a model for processing user voice such as voice recognition, translation, classification, and question answering, including but not limited to: chatGPT (Chat Generative Pre-trained Transformer, chat robot model).

The above-described natural language processing may be a variety of theories and methods for enabling efficient communication between a person and a computer in natural language. The method can also be used for converting and translating human language into a language which can be understood by a computer, answering according to the human language and converting into the language which can be understood by the human, so that communication between the human and the computer is realized. Including but not limited to: machine translation, public opinion monitoring, automatic summarization, opinion extraction, text classification, question answering, text semantic comparison, and speech recognition.

The second voice data may be voice data obtained by processing the first voice data, or may be voice data replied by the vehicle-mounted voice system according to the voice data of the user.

In an alternative embodiment, when the first voice data contains a control instruction, the natural semantic processing model obtains a first recognition result, extracts text information in the first recognition result, searches for a proper text answer through the internet according to the text information, and converts the text answer into second voice data.

In another alternative embodiment, when the first voice data does not include the control instruction, the natural semantic processing model obtains a meaning expressed by the user according to the first recognition result, and determines the whole communication context according to the meaning of the first voice data, the language of the user inputting the first voice data and the language speed, so as to generate the second voice data corresponding to the first voice data in combination with the whole communication context.

In yet another alternative embodiment, the second voice data is input in advance, and when the first voice data includes a control instruction, the natural semantic processing model obtains a first recognition result, and obtains the second voice data corresponding to the first voice data according to the first recognition result.

It should be noted that the natural semantic processing model belongs to the interaction model, and can generate different models for different users, record language data of the vehicle owners and passengers in real time, train and analyze the language data of the vehicle owners and passengers after acquiring certain training data, and generate different training models including language styles of different users. The user can select the language style, the natural semantic processing model searches the internet according to the language style selected by the user to obtain the language style which accords with the preference of the user, for example, the passenger 1 likes the language style of humorous fun, and the natural semantic processing model can obtain the language styles of different humorous fun on the network and generate the language style which accords with the preference of the user; the passenger 2 likes the language style of the stable atmosphere, and the natural semantic processing model can acquire the language styles of different stable atmospheres on the network and generate the language style which accords with the preference of the user. When the first voice data is generated to correspond to the second voice data, the second voice data which accords with the answer text of the first voice data can be generated according to the language style favored by the user.

Step S108, outputting the second voice data.

In an optional embodiment, after the second voice data is obtained, the second voice data is sent to the voice output module and played through the vehicle-mounted sound equipment or the mobile terminal, wherein the vehicle-mounted system can firstly obtain the current position of the personnel in the vehicle, play the second voice data through the vehicle-mounted sound equipment of the position accessory of the personnel in the vehicle, and also convert the second voice data into characters to be displayed on the central control screen or the rear entertainment screen.

In another alternative embodiment, after the second voice data is obtained, the second voice data may be processed, and the second voice data may be converted into sound that is liked by the user, such as a cartoon character, or converted into sound that is recorded in advance by the user, and then played back through sound equipment.

In yet another alternative embodiment, after the second voice data is acquired, a 3D (3-Dimensions) video may be acquired, the 3D video projected, and the second voice data displayed by a person in the video.

Through the steps, the first voice data can be received, voice recognition is carried out on the first voice data, and a first recognition result corresponding to the first voice data is obtained; determining whether the first voice data contains a control instruction based on the first recognition result; and under the condition that the first voice data does not contain a control instruction, performing natural language processing on the first recognition result based on the natural semantic processing model, and generating second voice data corresponding to the first voice data. It should be noted that, the first voice data is identified, whether the first voice data contains the control instruction is determined, and under the condition that the first voice data does not contain the control instruction, the first identification result is further processed in natural language, so that the purpose of improving the voice data processing efficiency is achieved, the technical effect of efficiently and clearly responding to the first voice data is achieved, and the technical problem of lower voice data processing accuracy is solved.

It should be noted that the voice data processing system comprises five modules, namely a voice input module, a semantic analysis module, a functional response module, an intelligent question-answering module and a voice output module, wherein the voice input module receives voice information of a user and submits the voice information to the semantic analysis module; the semantic analysis module processes semantic information of the user and analyzes intention of the user; if the user wants to control the on or off of a certain function in the vehicle, the function response module executes the instruction of the user and transmits the feedback information to the voice output module for broadcasting; if the user is simply boring and answering instructions, the intelligent answering module gives intelligent answers according to the questions of the user, realizes man-machine conversation, and transmits feedback information to the voice output module and the central control screen for broadcasting and displaying. After a user wakes up a voice system, a voice input module acquires first voice data of the user and transmits the first voice data to a semantic analysis module, the semantic analysis module acquires user intention information, and whether the first voice data contains a control instruction or not is determined according to the user intention information, so that whether the control instruction is executed or not is determined. When the first voice data contains a control instruction, the functional response module executes according to the control instruction, a corresponding execution result is obtained, corresponding voice is generated according to the corresponding execution result, and the voice is output through the voice output module; and when the first voice data does not contain the control instruction, waking up the intelligent question-answering module, generating corresponding second voice data as an answer according to the first voice data, and outputting the second voice data through the voice output module.

Wherein, in the embodiment of the present invention, fig. 2 is a flowchart of an alternative voice control according to the embodiment of the present invention, as shown in fig. 2, the method includes:

step S201, acquiring first voice data of a user;

step S202, carrying out semantic analysis on first voice data;

step S203, determining whether a control instruction is included; if yes, go to step S204; if not, executing step S205;

step S204, controlling the vehicle;

step S205, obtaining user intention and generating first voice data;

step S206, outputting voice data.

The semantic analysis may be a method of analyzing human language, or may be a method of researching the meaning of a matter or concept by using a semantic distinction scale through human intuition and association. The semantic analysis is a logic stage of the compiling process, and the structurally correct source program is subjected to the examination of context-related properties and the examination of types.

The user intention information may be the intrinsic intention of the user output information, or may be the purpose of the user output voice data, including but not limited to: and (5) performing voice chat interaction and controlling the vehicle.

In an alternative embodiment, a first recognition result is obtained, semantic analysis is carried out on the first recognition result according to the context before and after the user, the language of the user is obtained according to the first voice data, if the language of the user is softer, the result after the semantic analysis is directly taken as the main part, and the intention information of the user is determined; if the language of the user is angry, the user needs to further combine the upper sentence with the lower sentence, the whole voice data is subjected to semantic analysis, and then the intention information of the user is determined. Thereby determining whether the first voice data includes a control instruction according to the user intention information.

After the first recognition result is obtained, semantic analysis is required to be performed on the first recognition result, so that the intention of the user to input the first voice data can be accurately obtained, and the corresponding module is determined to be started according to the intention of the user. After the user wakes up the intelligent voice assistant, the recognized complete voice information is submitted to the semantic analysis module, the semantic analysis module processes the voice information of the user to acquire user intention information, and if the user intention information is an instruction for controlling the opening of the function, the function response module executes the control instruction of the user and transmits feedback information to the voice output module for broadcasting. If the user intends to perform voice interaction, the intelligent question-answering module gives an answer according to the first voice data of the user and transmits feedback information to the voice output module for broadcasting. Wherein, in an embodiment of the present invention, FIG. 3 is a flowchart of an alternative method for determining user intent information according to an embodiment of the present invention, as shown in FIG. 3, the method includes

Step S301, a user wakes up an intelligent voice assistant and transmits complete first voice data to a semantic analysis module;

step S302, a semantic analysis module processes voice information of a user and analyzes intention of the user;

step S303, if the user intends to be an instruction for controlling the opening of the function, the function corresponding module executes the control instruction and transmits feedback information to the voice output module for broadcasting;

and step S304, if the user intends to perform voice interaction, the intelligent question-answering module gives an intelligent answer according to the first voice data, and transmits feedback information to the voice output module and the central control screen for broadcasting and displaying.

The voice interaction intention may be that the user wants to chat with the vehicle intelligent system, and may be, but is not limited to: voice chat and text chat.

The above-described vehicle control intention may be that the user wants to control the vehicle by voice.

In an optional embodiment, when determining that the user intention information is voice interaction intention, indicating that the current user needs to acquire certain information and needs to acquire the information through a chat mode, determining that the first voice data does not contain a control instruction; when the user intention information is determined to be the vehicle control intention, and the current user needs to control the vehicle to execute the instruction is indicated, the first voice data is determined to contain the control instruction.

In another alternative embodiment, when it is determined that the user intention information is voice interaction intention and control instruction, that is, the user needs to acquire certain information in the first voice data and needs the vehicle to execute certain instructions, the information in the first voice data is classified, and operation is performed according to the classified voice data, and whether the classified voice data contains the control instruction is determined. For example, the first voice data is "how weather is today throughout the day, and music is played", the information in the first voice data is classified to get "how weather is the whole day today" and "play music", according to the classified voice data, whether the classified voice data contains a control instruction or not is determined, and if the user intention corresponding to the weather how the user is on the whole day today is the voice interaction intention, the voice interaction intention is determined to not contain the control instruction; the user intention corresponding to the "play music" is the vehicle control intention, and the control instruction is determined to be included.

The feedback voice data may be voice data that the user feeds back or answers to the second voice data output by the vehicle-mounted voice system, or may be voice data that the user performs a dialogue with the vehicle-mounted voice system again after receiving the second voice data.

The second recognition result may be a result obtained by recognizing the second voice data, including but not limited to: including the interactive instructions, excluding the interactive instructions. The interactive instruction may be an instruction that the user wants to chat with the vehicle.

The third voice data may be voice data obtained by processing the feedback voice data, or may be voice data replied by the vehicle-mounted voice system according to the voice data of the user.

In an alternative embodiment, after receiving the feedback voice data of the second voice data, performing voice recognition on the feedback voice data to obtain a second recognition result corresponding to the feedback voice data, determining whether the feedback voice data contains an interaction instruction, and when the feedback voice data contains the interaction instruction, processing the feedback voice data by using a natural semantic processing model, generating corresponding third voice data, and outputting the third voice data. If the feedback voice data does not contain the interaction instruction, determining whether the feedback voice data contains the control instruction, and if the feedback voice data does not contain the control instruction, waiting.

In another alternative embodiment, after receiving the feedback voice data of the second voice data, performing voice recognition on the feedback voice data to obtain a second recognition result corresponding to the feedback voice data, determining whether the feedback voice data contains an interactive instruction, when the feedback voice data contains the interactive instruction, processing the feedback voice data by using a natural semantic processing model to obtain a text meaning corresponding to the feedback voice data, and searching a corresponding answer through the internet as third voice data in combination with a user communication context. If the feedback voice data does not contain the interaction instruction, determining whether the feedback voice data contains the control instruction, and if the feedback voice data does not contain the control instruction, waiting.

In yet another alternative embodiment, the language characters of the answer are prerecorded as some simple answers, after the feedback voice data of the second voice data is received, voice recognition is carried out on the feedback voice data, a second recognition result corresponding to the feedback voice data is obtained, whether the feedback voice data contains an interaction instruction is determined, when the feedback voice data contains the interaction instruction, the natural semantic processing model processes the feedback voice data, the processing result is matched with the language characters recorded in the language, and if the matching is successful, the corresponding language characters are used as third voice data; if the matching is unsuccessful, searching the second recognition result by using the Internet, and taking the answer with the highest searching click rate as third voice data.

It should be noted that, when the recognition result is ambiguous or the word order is incorrect, the user may be requested to speak again, for example, a preset "do not hear, do you speak again? "equal voice data. The user may speak the text again or directly.

In an optional embodiment, when the feedback voice data does not include the interaction instruction, the natural semantic processing model is not required to be used for performing natural language processing on the second recognition result, so that the natural semantic processing model can be shielded, and the corresponding second recognition result cannot be obtained.

The interactive interface may be an interface for enabling a user to communicate with a computer, including but not limited to: the vehicle comprises a central control screen, a rear entertainment screen and an APP interface of the mobile terminal. The text information may be text content that the vehicle replies to the user voice data, including but not limited to: pictures, text, video, music.

In an optional embodiment, when the corresponding second voice data is obtained, the second voice data may be broadcasted or text information corresponding to the second voice data may be directly displayed on the interactive interface, or the second voice data may be broadcasted and the text information corresponding to the second voice data may be displayed on the interactive interface.

In addition, in order to take care of special crowds, the second voice data can be converted into corresponding sign language videos, and the second voice data is displayed on the interactive interface in a sign language video mode.

The execution result may be a result of the vehicle executing a control instruction, including but not limited to: successful execution, failed execution, specific execution parameters.

In an alternative embodiment, when the first voice data includes a control instruction, each controller of the vehicle executes a corresponding operation according to the control instruction, feeds back the execution result, plays the corresponding execution result through voice, or sets a prompt tone to prompt the user whether the vehicle is successfully executed.

Fig. 4 is a flowchart of an alternative voice data processing method according to an embodiment of the present invention, as shown in fig. 4, the method comprising:

step S401, receiving first voice data;

step S402, recognizing the first voice data to obtain a first recognition result;

step S403, determining whether the first voice data contains a control instruction according to the first recognition result; if yes, go to step S404, if no, go to step S405;

step S404, controlling the vehicle based on the control instruction;

step S405, the natural semantic processing model processes the first recognition result, generates second voice data corresponding to the first voice data, and outputs the second voice data;

step S406, outputting the second voice data.

Example 2

According to another aspect of the embodiment of the present invention, a remote control testing device is further provided, where the device may execute the remote control testing method in the above embodiment, and the specific implementation method and the preferred application scenario are the same as those in the above embodiment, and are not described herein.

Fig. 5 is a schematic view of a test device for a remote control vehicle according to an embodiment of the present invention, as shown in fig. 5, the device includes the following parts: a first identification module 50, a first determination module 52, a first processing module 54, a first output module 56.

The first recognition module 50 is configured to perform voice recognition on the first voice data in response to the received first voice data, so as to obtain a first recognition result corresponding to the first voice data;

a first determining module 52, configured to determine, based on the first recognition result, whether the first voice data includes a control instruction;

the first processing module 54 is configured to perform natural language processing on the first recognition result based on the natural semantic processing model, and generate second voice data corresponding to the first voice data, where the first voice data does not include a control instruction;

the first output module 56 is configured to output the second voice data.

Optionally, the first determining module includes: the first analysis unit is used for carrying out semantic analysis on the first recognition result to obtain user intention information; and a first determining unit for determining whether the first voice data contains a control instruction based on the user intention information.

Optionally, the first determining unit includes: a first determining subunit configured to determine that the first voice data does not include a control instruction in response to the user intention information being a voice interaction intention; and a second determination subunit configured to determine that the first voice data contains a control instruction in response to the user intention information being a vehicle control intention.

Optionally, the apparatus further comprises: the second recognition module is used for responding to the feedback voice data of the received second voice data, carrying out voice recognition on the feedback voice data and obtaining a second recognition result corresponding to the feedback voice data; the second determining module is used for determining whether the feedback voice data contains an interaction instruction or not based on a second recognition result; the second processing module is used for carrying out natural language processing on the second recognition result based on the natural semantic processing model under the condition that the feedback voice data contains an interactive instruction, and generating third voice data corresponding to the feedback voice data; and the second output module is used for outputting third voice data.

Optionally, in the case that the feedback voice data does not contain an interaction instruction, the apparatus further includes: and the third processing module is used for prohibiting the natural language processing of the second recognition result by the natural semantic processing model.

Optionally, the first output module includes: the first display unit is used for broadcasting the second voice data and/or displaying text information corresponding to the second voice data on the interactive interface.

Optionally, in the case that the first voice data contains a control instruction, the apparatus further includes: the control module is used for controlling the vehicle to execute the operation corresponding to the control instruction to obtain an execution result, wherein the execution result is used for indicating whether the vehicle successfully executes the operation corresponding to the control instruction; and the third output module is used for outputting an execution result.

Example 3

Example 4

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of processing speech data, comprising:

responding to received first voice data, and performing voice recognition on the first voice data to obtain a first recognition result corresponding to the first voice data;

determining whether the first voice data contains a control instruction based on the first recognition result;

performing natural language processing on the first recognition result based on a natural semantic processing model under the condition that the first voice data does not contain the control instruction, and generating second voice data corresponding to the first voice data;

and outputting the second voice data.

2. The voice data processing method according to claim 1, wherein determining whether the first voice data contains a control instruction based on the first recognition result includes:

carrying out semantic analysis on the first recognition result to obtain user intention information;

determining whether the first voice data includes the control instruction based on the user intention information.

3. The voice data processing method according to claim 2, wherein determining whether the first voice data contains the control instruction based on the user intention information comprises:

determining that the first voice data does not include the control instruction in response to the user intent information being a voice interaction intent;

in response to the user intent information being a vehicle control intent, it is determined that the first voice data includes the control instruction.

4. The voice data processing method of claim 1, wherein the method further comprises:

responding to the received feedback voice data of the second voice data, and performing voice recognition on the feedback voice data to obtain a second recognition result corresponding to the feedback voice data;

determining whether the feedback voice data contains an interaction instruction based on the second recognition result;

performing natural language processing on the second recognition result based on the natural semantic processing model under the condition that the feedback voice data contains the interaction instruction, and generating third voice data corresponding to the feedback voice data;

and outputting the third voice data.

5. The method according to claim 4, wherein in case the feedback voice data does not contain the interactive instruction, the method further comprises:

and prohibiting the natural semantic processing model from carrying out natural language processing on the second recognition result.

6. The voice data processing method according to claim 1, wherein outputting the second voice data comprises:

and broadcasting the second voice data and/or displaying text information corresponding to the second voice data on an interactive interface.

7. The voice data processing method according to claim 1, wherein in the case where the first voice data contains the control instruction, the method further comprises:

controlling a vehicle to execute an operation corresponding to the control instruction to obtain an execution result, wherein the execution result is used for indicating whether the vehicle successfully executes the operation corresponding to the control instruction;

and outputting the execution result.

8. A voice data processing apparatus, comprising:

the recognition module is used for responding to the received first voice data, carrying out voice recognition on the first voice data and obtaining a first recognition result corresponding to the first voice data;

the determining module is used for determining whether the first voice data contains a control instruction or not based on the first recognition result;

the processing module is used for carrying out natural language processing on the first recognition result based on a natural semantic processing model under the condition that the first voice data does not contain the control instruction, and generating second voice data corresponding to the first voice data;

and the output module is used for outputting the second voice data.

9. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein the program, when run, controls the execution of the speech data processing method according to any one of claims 1 to 7 in a processor of a device in which the program is located.

10. A vehicle, characterized by comprising:

one or more processors;

a storage means for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors are caused to perform the speech data processing method of any of claims 1 to 7.