CN111767021A - Voice interaction method, vehicle, server, system and storage medium - Google Patents

Voice interaction method, vehicle, server, system and storage medium Download PDF

Info

Publication number
CN111767021A
CN111767021A CN202010596817.5A CN202010596817A CN111767021A CN 111767021 A CN111767021 A CN 111767021A CN 202010596817 A CN202010596817 A CN 202010596817A CN 111767021 A CN111767021 A CN 111767021A
Authority
CN
China
Prior art keywords
vehicle
voice
server
context information
voice request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010596817.5A
Other languages
Chinese (zh)
Inventor
孙仿逊
胡梓垣
翁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Internet of Vehicle Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Internet of Vehicle Technology Co Ltd filed Critical Guangzhou Xiaopeng Internet of Vehicle Technology Co Ltd
Priority to CN202010596817.5A priority Critical patent/CN111767021A/en
Publication of CN111767021A publication Critical patent/CN111767021A/en
Priority to PCT/CN2020/135150 priority patent/WO2022001013A1/en
Priority to CN202110432528.6A priority patent/CN113031905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention relates to the technical field of voice, in particular to a voice interaction method, a vehicle, a server and a storage medium, wherein the method comprises the following steps: the vehicle receives a voice request of a user and sends the voice request and the context information of the current vehicle-mounted system graphical user interface to the server; the server finishes the natural language understanding processing of the voice request according to the context information; the server understands the processing result by using the natural language, generates a vehicle executable instruction and sends the vehicle executable instruction to the vehicle; the vehicle receives and executes the instruction, and simultaneously feeds back an execution result to the user through voice. The server can fully utilize the context information to finish natural language understanding processing in the voice interaction process, and due to the fact that more dimensionality information is added, a user can operate any content on a graphical user interface in a vehicle through voice, and interaction quality of a man-machine interaction system is improved.

Description

Voice interaction method, vehicle, server, system and storage medium
Technical Field
The present invention relates to the field of voice technology, and in particular, to a voice interaction method, a vehicle, a server, a system, and a storage medium.
Background
With the development of automobile intelligence and voice technology, the application of voice on automobiles is more and more extensive. In the process of driving the vehicle by the user, the control of the vehicle or the vehicle-mounted system on the vehicle by the user can be realized in a non-contact manner, and the use experience of the user can be enhanced under the condition of ensuring the driving safety.
The automobile intelligence brings stronger car machine chips and graphic chips, the computing power of the new generation car machine chips and the performance of the graphic chips, so that realization of richer interfaces and more interesting animations on a vehicle-mounted system like a mobile phone becomes possible. The way of using voice on the vehicle is often to set up a separate voice assistant, and after receiving the voice request of the user, the feedback is given by the server. The use mode and the interface of the vehicle-mounted system are completely independent, and the interaction quality of the man-machine interaction system is difficult to satisfy due to the fact that only voice signals are utilized and more dimensional information is lacked.
Disclosure of Invention
In view of the above, embodiments of the present invention are proposed in order to provide a voice interaction method, a vehicle, a server, a system and a storage medium that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present invention discloses a voice interaction method, which is applied to a voice interaction system including a vehicle and a server capable of communicating with the vehicle, and is characterized by comprising:
the vehicle receives a voice request of a user and sends the voice request and the context information of the current vehicle-mounted system graphical user interface to the server;
the server finishes the natural language understanding processing of the voice request according to the context information;
the server understands the processing result by using the natural language, generates a vehicle executable instruction and sends the vehicle executable instruction to the vehicle;
the vehicle receives and executes the instruction, and simultaneously feeds back an execution result to the user through voice.
Further, the context information includes the name and type of the operable control in the current vehicle-mounted system graphical user interface, the action supported by the operable control, the value range of the action, and the current state of the operable control.
Further, the server completes natural language understanding processing of the voice request according to the context information, and the processing comprises the following steps:
creating a scene semantic space according to the context information;
performing semantic understanding on the voice request and outputting a semantic understanding result;
in the scene semantic space, retrieving, recalling, sequencing and matching operable controls by using semantic understanding results;
and outputting the operation of the operable control responding to the voice request as a natural language understanding processing result.
Further, creating a scene semantic space according to the context information, comprising:
receiving context information sent by a vehicle;
loading and analyzing scene elements included in the context information;
and generating a scene semantic document according to the scene elements.
Further, performing semantic understanding on the voice request and outputting a semantic understanding result, including:
performing text preprocessing and text normalization processing on a text in the voice request, and then extracting a sentence backbone;
and understanding the intention of the voice request of the user according to the sentence backbone and outputting a semantic understanding result.
Further, understanding the intention of the user voice request according to the sentence backbone and outputting a semantic understanding result, comprising:
and determining a preliminary result for understanding the intention of the user voice request according to the sentence backbone, correcting the preliminary result by using a negative word in the sentence backbone, and outputting a corrected semantic understanding result.
Further, in the scene semantic space, the operable controls are retrieved, recalled, sorted and matched by using the semantic understanding result, which includes:
extracting a text in the voice request to retrieve in a scene semantic document;
recalling the retrieval result by using a preset recall strategy, and scoring the matching degree;
sorting the scored retrieval results according to a preset sorting strategy;
outputting a matching result according to the sorting result; wherein the matching result comprises the operation intention of the operable control, the name of the operable control and the execution action of the operable control.
Further, the text in the voice request includes all text or partial text in the voice request, and the text in the voice request is extracted and retrieved from the scene semantic document, including any one of the following:
extracting entity words in the voice request to search in the scene semantic document;
extracting texts including entity words and action words in the voice request and searching in the scene semantic documents;
or the like, or, alternatively,
all text in the voice request is extracted and retrieved in the scene semantic document.
Further, recalling the retrieval result by using a preset recall strategy, comprising:
and according to the retrieval result, recalling by utilizing one or more preset recalling strategies including text omission based on a preset list of negligible words, core word must hit, threshold value setting for recalling, and verification of action words or negative intentions in the text.
The embodiment of the invention also discloses a vehicle, which comprises: a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the steps of the voice interaction method described above.
The embodiment of the invention also discloses a server, which comprises: a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the steps of the voice interaction method described above.
The embodiment of the invention also discloses a voice interaction system, which comprises a vehicle and a server capable of communicating with the vehicle, wherein the vehicle is provided with a request receiving module, an information sending module, an instruction receiving module and an execution feedback module, and the server is provided with a natural language understanding module and an instruction sending module;
the request receiving module is used for receiving a voice request of a user;
the information sending module is used for sending the voice request and the context information of the current vehicle-mounted system graphical user interface to the server;
the natural language understanding module is used for finishing natural language understanding processing of the voice request according to the context information;
the instruction sending module is used for understanding the processing result by the server through the natural language, generating an instruction executable by the vehicle and then sending the instruction to the vehicle;
and the instruction receiving module is used for receiving and executing the instruction, and simultaneously feeding back an execution result to the user through voice by the execution feedback module.
The embodiment of the invention also discloses a computer-readable storage medium, which is characterized in that a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to realize the voice interaction method.
The embodiment of the invention has the following advantages:
context information of a Graphical User Interface (GUI) of a current vehicle-mounted system is sent to a server, so that the server can fully utilize the context information to finish natural language understanding processing in a voice interaction process, and a User can operate any content on the GUI when seeing the GUI in a vehicle through voice due to the fact that more dimensionality information is added, and interaction quality of a human-computer interaction system is improved.
Drawings
FIG. 1 is a flow chart of the steps of a voice interaction method embodiment of the present invention;
fig. 2 is a schematic diagram of a navigation broadcast graphical user interface of the vehicle system of the present invention;
FIG. 3 is a flow chart of the steps of natural language understanding in a method of voice interaction of the present invention;
FIG. 4 is a code diagram of context information in one embodiment of a voice interaction method of the present invention;
FIG. 5 is a block diagram of a voice interaction system according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a voice interaction method of the present invention is shown, which may specifically include the following steps:
and S1, the vehicle receives the voice request of the user and sends the voice request and the context information of the current vehicle-mounted system graphical user interface to the server.
S2, the server completes the natural language understanding process of the voice request according to the context information.
And S3, the server uses the natural language to understand the processing result, generates the vehicle executable instruction and sends the vehicle executable instruction to the vehicle.
And S4, the vehicle receives and executes the instruction, and simultaneously feeds back the execution result to the user through voice.
The voice interaction method is applied to a voice interaction system which comprises a vehicle and a server capable of communicating with the vehicle. Specifically, a communication module is arranged on the vehicle, and can communicate with a server based on an operator network including 3G, 4G or 5G or other communication connection modes to complete data interaction.
In a vehicle, the Display area of the vehicle may include an instrument panel, an on-vehicle center control screen, and a HUD (Head Up Display) that can be implemented on a windshield of the vehicle. The display area of the vehicle-mounted system running on the vehicle using a Graphical User Interface (GUI) includes a plurality of UI elements, and different display areas may display different UI elements or the same UI element. UI elements may include, among other things, card objects, application icons or interfaces, folder icons, multimedia file icons, and controls for making interactive actionable.
In the step S1, the context information includes names and types of the operable controls in the current vehicle-mounted system graphical user interface, actions supported by the operable controls, value ranges of the actions, and current states of the operable controls.
Taking fig. 2 as an example, when viewing fig. 2, the user may directly send out voice requests such as "navigation broadcast volume is set to 18", "system prompt tone is turned off", and the like. The operable controls referred to in fig. 2 include three, the first being a slide type control named "navigation broadcast volume", the second being a SelectTab type control named "vehicle alert tone", and the third being a Switch type control named "system alert tone". Each control has a supported action, a value range of the action and a current state of the operable control.
For example, a control named "navigation broadcast volume" may drag a value of the volume to be adjusted, that is, a supported action is a Set (Set), a value range of the action is 0 to 30, and the current state is that the volume is Set to 16.
Continuing with the example of a control named "vehicle alert tone", this control may be set to "small", "medium", "large"; that is, the supported action is Set, the range of values of this action is "small", "medium", and "large", and the current state is that the vehicle warning tone is Set to be small.
Taking a control named as "system prompt tone" as an example, the control can be opened and closed. That is, the supported actions include two actions of turning On (Turn On) and turning Off (Turn Off), and the current state is that the system alert tone is turned On.
Specifically, as shown in fig. 3, the step of S2 includes:
s20, creating scene semantic space according to the context information;
s21, carrying out semantic understanding on the voice request and outputting a semantic understanding result;
s22, in the scene semantic space, the operable control is searched, recalled, sorted and matched by the semantic understanding result;
s23, outputting an operation of the operable control in response to the voice request as a result of the natural language understanding process.
The scene semantic space is a semantic space that is created to be understandable based on contextual information of the GUI. In step S20, based on fig. 2, the server creates an example of scene semantic space from the context information, such as table 1 below:
Figure BDA0002557539440000061
TABLE 1
Specifically, the step of S20 includes:
s201, receiving context information sent by a vehicle;
s202, loading and analyzing scene elements included in the context information;
and S203, generating a scene semantic document according to the scene elements.
In step S201, the vehicle sends the context information to the server in the form of a Json file through a communication network including, but not limited to, an operator network. Fig. 4 is an example of a Json file, and in this embodiment, other file formats may be used to send the context information, which is not limited herein. In fig. 4, label represents the name of an operable control, and type represents the type of the operable control.
In the step of S202, the server loads and parses the Json file to obtain scene elements recorded in the file, where the scene elements include a plurality of operable controls and other UI elements.
In step S203, the server generates a scene semantic document in which a scene semantic space is described, based on the scene element.
Further, the step of S21 includes:
s211, performing text preprocessing and text normalization processing on the text in the voice request, and then extracting a sentence backbone;
s212, understanding the intention of the user voice request according to the sentence backbone and outputting a semantic understanding result.
In step S211, text preprocessing is performed on the text in the voice request, including performing chinese word segmentation and removing vocalic words ("kayi" and "ba") and the like. The text normalization process includes normalization of numbers and entities, for example, "one dot five seconds" becomes "1.5 seconds" after the normalization process; the large screen brightness is changed into the central control brightness after normalization processing. The extraction of the sentence skeleton is to extract entity words, action words and numerical values in the sentence, and the extracted sentence skeleton is mainly used for subsequent retrieval.
In the step of S212, the intention of the user can be understood by using the extracted action words in the sentence skeleton, which facilitates subsequent verification of the operable control.
Further, the step of S212 includes: and determining a preliminary result for understanding the intention of the user voice request according to the sentence backbone, correcting the preliminary result by using a negative word in the sentence backbone, and outputting a corrected semantic understanding result. For example, if the text corresponding to the voice request of the user is "do not open system alert tone", the preliminary result including the action word of "open" and the entity word of "system alert tone" may be obtained, but if "open system alert tone" is used as the semantic understanding result, the meaning is opposite to the real meaning of the user, so after the preliminary result is obtained, whether the main stem of the sentence has a negative word is determined, and the text includes "do not open" this time, and may be extracted to correct the preliminary result, that is, "do not open" is understood as "close". The semantic understanding result after the current correction is 'system shutdown prompt tone'.
In the step of S22, the method specifically includes:
s221, extracting the text in the voice request to retrieve in the scene semantic document;
s222, recalling the retrieval result by using a preset recall strategy, and scoring the matching degree;
s223, sorting the scored retrieval results according to a preset sorting strategy;
s224, outputting a matching result according to the sorting result; wherein the matching result comprises the operation intention of the operable control, the name of the operable control and the execution action of the operable control.
In step S221, a vocabulary of segmented words is created in advance from scenes such as navigation and music, and then a search is performed based on the vocabulary. When searching, various searching strategies can be used according to the utilization modes of different texts. That is, the text in the voice request includes all or part of the text in the voice request, the step S221 includes any one of the following steps:
extracting entity words in the voice request to search in the scene semantic document;
extracting texts including entity words and action words in the voice request and searching in the scene semantic documents;
or the like, or, alternatively,
all text in the voice request is extracted and retrieved in the scene semantic document.
The three search strategies listed above, including partial texts including entity words, combinations of entity words and action words, and all texts in the voice request, can determine what search strategy is used according to specific needs. When the search is implemented, the search can be implemented using, for example, an inverted index and a search based on words and pinyin, and the specific implementation is not limited herein.
In step S222, a preset recall policy is used to recall the search result, where the preset recall policy includes multiple types, specifically as follows:
recall strategy 1: text omission based on preset list of negligible words
Example 1, Label ═ rock >, text Query of a voice request ═ switch to rock mode, and "mode" can be ignored in the current scenario.
Recall policy 2: the core word must be hit
Example 2, Label ═ open map setting >, text Query of voice request ═ open system setting ", map" must hit in the current scene, otherwise, a result of a false recall is generated.
Recall policy 3: setting thresholds for recalls
Example 3, set threshold X%, reach threshold and recall.
Recall policy 4: checking action words or negative intentions in text
Example 4, Label ═ connect first bluetooth >, text Query of voice request ═ disconnect first bluetooth ═ connect first bluetooth ", do not check action words or negative intentions can be recalled by mistake.
In the step S222, scoring may adopt multiple scoring modes such as Query matching degree or document matching degree.
The Query matching degree is as follows: matching length/Query stem length (word length), matching length ═ Query and document matching word length (word length).
The document matching degree is as follows: matching length/document length (word length), a specific matching policy is used for a specific control, such as a Point Of Interest (POI) list that is frequently found in navigation. A specific matching policy, such as document matching, may be used for specific controls like POI lists.
In the step of S223, the preset ordering policy may include:
strategy one: sequencing the scene semantic documents according to the highest scores of all the retrieval strategies;
and (2) strategy two: sequencing the scene semantic documents according to the sum of scores of all retrieval strategies;
strategy three: and sequencing the scene semantic documents according to the sum of the weighted scores of all the retrieval strategies.
The score is calculated in the following mode: the score α is document matching degree + (1- α) Query matching degree. α represents a preset score weight parameter.
Namely, the sorting strategy is selected according to the requirement, and then the corresponding sorting result is obtained.
In the step S224, matching includes cases of exact matching and fuzzy matching, where the exact matching refers to a complete matching scene semantic document, and if there is an action word in the speech request Query and it conforms to the control operation, it is regarded as a complete matching; fuzzy matching refers to selecting a document with the highest score (if a plurality of results with the same score exist in the sorting results, multiple selections are performed on the plurality of results with the same score), and judging whether the selected control is correct or not by combining action words when the action words exist. The matching result comprises the operation intention of the operable control, the name of the operable control and the execution action of the operable control. An example of the matching result is that the operational intention of the operable control is "set the gesture direction to be inward", the name of the operable control is "gesture touch rotation direction", and the execution action of the operable control is "set to be inward". If a voice request Query is issued for a control operable in the displayed GUI interface, named "gesture touch rotation direction", the gesture direction is set to inward ", then the step S22 is performed.
In step S23, the operation of the operable control in response to the voice request is: the operable control named "gesture touch rotation direction" is executed with an action of "set to inward", that is, this operation may be output as a result of the natural language understanding process.
In step S3, the server generates and transmits instructions executable by the vehicle to the vehicle using the natural language understanding processing result output in step S23.
In step S4, the vehicle receives and executes the command, and after the command is executed, the current state of the control named "gesture touch rotation direction" is "inward", and the execution result can be fed back To the user by voice in a TTS (Text-To-Speech) manner.
From the above, the user realizes 'visible' to the graphical user interface on the vehicle-mounted system, and does not need to touch the screen, press the keys and other physical operations in the whole process, and the full voice operation in the vehicle driving process enables the sight and attention of the user to be completely concentrated on driving, so that the vehicle driving safety can be fully ensured. And the context information of the current vehicle-mounted system graphical user interface is sent to the server, so that the server can fully utilize the context information to finish natural language understanding processing in the voice interaction process, and because more dimensional information is added, a user can operate any content on the graphical user interface in a vehicle through voice, and the interaction quality of the human-computer interaction system is improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 5, a block diagram illustrating a structure of an embodiment of a voice interaction system of the present invention may specifically include: the system comprises a vehicle and a server capable of communicating with the vehicle, wherein the vehicle is provided with a request receiving module, an information sending module, an instruction receiving module and an execution feedback module, and the server is provided with a natural language understanding module and an instruction sending module.
The request receiving module is used for receiving a voice request of a user;
the information sending module is used for sending the voice request and the context information of the current vehicle-mounted system graphical user interface to the server;
the natural language understanding module is used for finishing natural language understanding processing of the voice request according to the context information;
the instruction sending module is used for understanding the processing result by the server through the natural language, generating an instruction executable by the vehicle and then sending the instruction to the vehicle;
and the instruction receiving module is used for receiving and executing the instruction, and simultaneously feeding back an execution result to the user through voice by the execution feedback module.
In the voice interaction system, the context information comprises the name and the type of an operable control in the graphical user interface of the current vehicle-mounted system, an action supported by the operable control, a value range of the action and the current state of the operable control.
Specifically, the natural language understanding module includes:
the creating submodule is used for creating a scene semantic space according to the context information;
the understanding submodule is used for carrying out semantic understanding on the voice request and outputting a semantic understanding result;
the processing submodule is used for retrieving, recalling, sequencing and matching the operable control by using the semantic understanding result in the scene semantic space;
and the output submodule is used for outputting the operation of the operable control responding to the voice request as a natural language understanding processing result.
Wherein creating the sub-module comprises:
the receiving unit is used for receiving the context information sent by the vehicle;
a loading unit for loading and analyzing scene elements included in the context information;
and the generating unit is used for generating a scene semantic document according to the scene elements.
Wherein the understanding submodule comprises:
the processing unit is used for performing text preprocessing and text normalization processing on the text in the voice request and then extracting a sentence backbone;
and the output unit is used for understanding the intention of the voice request of the user according to the sentence backbone and outputting a semantic understanding result.
Further, the output unit is further configured to determine a preliminary result of understanding the intention of the user voice request according to the sentence skeleton, correct the preliminary result by using a negative word in the sentence skeleton, and output a corrected semantic understanding result.
Wherein, the processing submodule includes:
the retrieval unit is used for extracting the text in the voice request to retrieve in the scene semantic document;
the recall unit is used for recalling the retrieval result by using a preset recall strategy and then scoring the matching degree;
the sorting unit is used for sorting the scored retrieval results according to a preset sorting strategy;
the matching unit is used for outputting a matching result according to the sorting result; wherein the matching result comprises the operation intention of the operable control, the name of the operable control and the execution action of the operable control.
Further, the text in the voice request includes all or part of the text in the voice request, and the retrieving unit is specifically configured to any one of:
extracting entity words in the voice request to search in the scene semantic document;
extracting texts including entity words and action words in the voice request and searching in the scene semantic documents;
or the like, or, alternatively,
all text in the voice request is extracted and retrieved in the scene semantic document.
Further, the recall unit is specifically configured to recall, for the search result, by using one or more preset recall policies including text omission based on a preset list of negligible words, core word hit necessity, recall by setting a threshold, and checking an action word or a negative intention in the text.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
An embodiment of the present invention further provides a vehicle, including:
the voice interaction method comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, each process of the voice interaction method embodiment is realized, the same technical effect can be achieved, and the details are not repeated here to avoid repetition.
An embodiment of the present invention further provides a server, including:
the voice interaction method comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, each process of the voice interaction method embodiment is realized, the same technical effect can be achieved, and the details are not repeated here to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the voice interaction method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The voice interaction method, the vehicle, the server and the storage medium provided by the invention are described in detail, and the principle and the implementation of the invention are explained by applying specific examples, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (5)

1. A voice interaction method is applied to a voice interaction system which comprises a vehicle and a server capable of communicating with the vehicle, and is characterized by comprising the following steps:
the vehicle receives a voice request of a user and sends the voice request and the context information of the current vehicle-mounted system graphical user interface to the server;
the server finishes the natural language understanding processing of the voice request according to the context information;
the server understands the processing result by using the natural language, generates a vehicle executable instruction and sends the vehicle executable instruction to the vehicle;
the vehicle receives and executes the instruction, and simultaneously feeds back an execution result to the user through voice.
2. The voice interaction method of claim 1, wherein the context information comprises a name and a type of an operable control in the current vehicle-mounted system graphical user interface, an action supported by the operable control, a value range of the action, and a current state of the operable control.
3. The voice interaction method of claim 2, wherein the server performs the natural language understanding processing of the voice request according to the context information, comprising:
creating a scene semantic space according to the context information;
performing semantic understanding on the voice request and outputting a semantic understanding result;
in the scene semantic space, retrieving, recalling, sequencing and matching operable controls by using semantic understanding results;
and outputting the operation of the operable control responding to the voice request as a natural language understanding processing result.
4. The voice interaction method of claim 3, wherein creating the scene semantic space based on the context information comprises:
receiving context information sent by a vehicle;
loading and analyzing scene elements included in the context information;
and generating a scene semantic document according to the scene elements.
5. The voice interaction method of claim 4, wherein semantically understanding the voice request and outputting a semantic understanding result comprises:
performing text preprocessing and text normalization processing on a text in the voice request, and then extracting a sentence backbone;
and understanding the intention of the voice request of the user according to the sentence backbone and outputting a semantic understanding result.
CN202010596817.5A 2020-06-28 2020-06-28 Voice interaction method, vehicle, server, system and storage medium Pending CN111767021A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010596817.5A CN111767021A (en) 2020-06-28 2020-06-28 Voice interaction method, vehicle, server, system and storage medium
PCT/CN2020/135150 WO2022001013A1 (en) 2020-06-28 2020-12-10 Voice interaction method, vehicle, server, system, and storage medium
CN202110432528.6A CN113031905A (en) 2020-06-28 2021-04-21 Voice interaction method, vehicle, server, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596817.5A CN111767021A (en) 2020-06-28 2020-06-28 Voice interaction method, vehicle, server, system and storage medium

Publications (1)

Publication Number Publication Date
CN111767021A true CN111767021A (en) 2020-10-13

Family

ID=72722481

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010596817.5A Pending CN111767021A (en) 2020-06-28 2020-06-28 Voice interaction method, vehicle, server, system and storage medium
CN202110432528.6A Pending CN113031905A (en) 2020-06-28 2021-04-21 Voice interaction method, vehicle, server, system and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110432528.6A Pending CN113031905A (en) 2020-06-28 2021-04-21 Voice interaction method, vehicle, server, system and storage medium

Country Status (2)

Country Link
CN (2) CN111767021A (en)
WO (1) WO2022001013A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164400A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN112614491A (en) * 2020-12-11 2021-04-06 广州橙行智动汽车科技有限公司 Vehicle-mounted voice interaction method and device, vehicle and readable medium
CN112637264A (en) * 2020-11-23 2021-04-09 北京百度网讯科技有限公司 Information interaction method and device, electronic equipment and storage medium
CN112685535A (en) * 2020-12-25 2021-04-20 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium
CN113053394A (en) * 2021-04-27 2021-06-29 广州小鹏汽车科技有限公司 Voice processing method, server, voice processing system and storage medium
CN113076079A (en) * 2021-04-20 2021-07-06 广州小鹏汽车科技有限公司 Voice control method, server, voice control system and storage medium
CN113253970A (en) * 2021-07-09 2021-08-13 广州小鹏汽车科技有限公司 Voice interaction method and device, voice interaction system, vehicle and medium
CN113421561A (en) * 2021-06-03 2021-09-21 广州小鹏汽车科技有限公司 Voice control method, voice control device, server and storage medium
CN113472806A (en) * 2021-07-14 2021-10-01 斑马网络技术有限公司 Voice interaction method, device, system, equipment and storage medium for protecting privacy
WO2022001013A1 (en) * 2020-06-28 2022-01-06 广州橙行智动汽车科技有限公司 Voice interaction method, vehicle, server, system, and storage medium
CN113990322A (en) * 2021-11-04 2022-01-28 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and medium
CN114442989A (en) * 2020-11-02 2022-05-06 海信视像科技股份有限公司 Natural language analysis method and device
WO2024113870A1 (en) * 2022-12-01 2024-06-06 浙江极氪智能科技有限公司 Voice interaction method and apparatus, computer device, and computer readable storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450801A (en) * 2021-08-27 2021-09-28 广州小鹏汽车科技有限公司 Voice interaction method, device, system, vehicle and medium
CN113971954B (en) * 2021-12-23 2022-07-12 广州小鹏汽车科技有限公司 Voice interaction method and device, vehicle and storage medium
CN113990299B (en) * 2021-12-24 2022-05-13 广州小鹏汽车科技有限公司 Voice interaction method and device, server and readable storage medium thereof
CN114842847A (en) * 2022-04-27 2022-08-02 中国第一汽车股份有限公司 Vehicle-mounted voice control method and device
CN115457951A (en) * 2022-05-10 2022-12-09 北京罗克维尔斯科技有限公司 Voice control method and device, electronic equipment and storage medium
CN114913854A (en) * 2022-07-11 2022-08-16 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841312A (en) * 2006-01-19 2006-10-04 吉林大学 Voice control system for vehicle navigation apparatus
CN101217584A (en) * 2008-01-18 2008-07-09 同济大学 A voice commanding control method and system applicable on automobiles
CN102566961A (en) * 2010-12-31 2012-07-11 上海博泰悦臻电子设备制造有限公司 Voice executing method and voice executing device based on application program of vehicle-mounted device
CN103187055A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Data processing system based on vehicle-mounted application
CN104536647A (en) * 2014-12-16 2015-04-22 广东欧珀移动通信有限公司 Application icon position adjusting method and device
CN106601232A (en) * 2017-01-04 2017-04-26 江西沃可视发展有限公司 Vehicle mounted terminal oriented man-machine interaction system based on speech recognition
CN107204185A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, system and computer-readable recording medium
US20180307504A1 (en) * 2017-04-25 2018-10-25 Google Inc. Initializing a conversation with an automated agent via selectable graphical element
CN110211584A (en) * 2019-06-04 2019-09-06 广州小鹏汽车科技有限公司 Control method for vehicle, device, storage medium and controlling terminal
CN110211586A (en) * 2019-06-19 2019-09-06 广州小鹏汽车科技有限公司 Voice interactive method, device, vehicle and machine readable media
WO2019223351A1 (en) * 2018-05-23 2019-11-28 百度在线网络技术(北京)有限公司 View-based voice interaction method and apparatus, and server, terminal and medium
US20200027452A1 (en) * 2018-07-17 2020-01-23 Ford Global Technologies, Llc Speech recognition for vehicle voice commands
CN110728982A (en) * 2019-10-11 2020-01-24 上海博泰悦臻电子设备制造有限公司 Information interaction method and system based on voice touch screen, storage medium and vehicle-mounted terminal
CN111312233A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Voice data identification method, device and system
CN111477224A (en) * 2020-03-23 2020-07-31 一汽奔腾轿车有限公司 Human-vehicle virtual interaction system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130032966A (en) * 2011-09-26 2013-04-03 엘지전자 주식회사 Method and device for user interface
CN105070288B (en) * 2015-07-02 2018-08-07 百度在线网络技术(北京)有限公司 Vehicle-mounted voice instruction identification method and device
CN108279839A (en) * 2017-01-05 2018-07-13 阿里巴巴集团控股有限公司 Voice-based exchange method, device, electronic equipment and operating system
CN107608652B (en) * 2017-08-28 2020-05-22 三星电子(中国)研发中心 Method and device for controlling graphical interface through voice
CN110795175A (en) * 2018-08-02 2020-02-14 Tcl集团股份有限公司 Method and device for analog control of intelligent terminal and intelligent terminal
CN111002996B (en) * 2019-12-10 2023-08-25 广州小鹏汽车科技有限公司 Vehicle-mounted voice interaction method, server, vehicle and storage medium
CN111767021A (en) * 2020-06-28 2020-10-13 广州小鹏车联网科技有限公司 Voice interaction method, vehicle, server, system and storage medium
CN111768777A (en) * 2020-06-28 2020-10-13 广州小鹏车联网科技有限公司 Voice control method, information processing method, vehicle and server
CN114005445A (en) * 2020-06-28 2022-02-01 广州小鹏汽车科技有限公司 Information processing method, server, and computer-readable storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1841312A (en) * 2006-01-19 2006-10-04 吉林大学 Voice control system for vehicle navigation apparatus
CN101217584A (en) * 2008-01-18 2008-07-09 同济大学 A voice commanding control method and system applicable on automobiles
CN102566961A (en) * 2010-12-31 2012-07-11 上海博泰悦臻电子设备制造有限公司 Voice executing method and voice executing device based on application program of vehicle-mounted device
CN103187055A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Data processing system based on vehicle-mounted application
CN104536647A (en) * 2014-12-16 2015-04-22 广东欧珀移动通信有限公司 Application icon position adjusting method and device
CN106601232A (en) * 2017-01-04 2017-04-26 江西沃可视发展有限公司 Vehicle mounted terminal oriented man-machine interaction system based on speech recognition
US20180307504A1 (en) * 2017-04-25 2018-10-25 Google Inc. Initializing a conversation with an automated agent via selectable graphical element
CN107204185A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, system and computer-readable recording medium
WO2019223351A1 (en) * 2018-05-23 2019-11-28 百度在线网络技术(北京)有限公司 View-based voice interaction method and apparatus, and server, terminal and medium
US20200027452A1 (en) * 2018-07-17 2020-01-23 Ford Global Technologies, Llc Speech recognition for vehicle voice commands
CN111312233A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Voice data identification method, device and system
CN110211584A (en) * 2019-06-04 2019-09-06 广州小鹏汽车科技有限公司 Control method for vehicle, device, storage medium and controlling terminal
CN110211586A (en) * 2019-06-19 2019-09-06 广州小鹏汽车科技有限公司 Voice interactive method, device, vehicle and machine readable media
CN110728982A (en) * 2019-10-11 2020-01-24 上海博泰悦臻电子设备制造有限公司 Information interaction method and system based on voice touch screen, storage medium and vehicle-mounted terminal
CN111477224A (en) * 2020-03-23 2020-07-31 一汽奔腾轿车有限公司 Human-vehicle virtual interaction system

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022001013A1 (en) * 2020-06-28 2022-01-06 广州橙行智动汽车科技有限公司 Voice interaction method, vehicle, server, system, and storage medium
CN112164400A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112242141A (en) * 2020-10-15 2021-01-19 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN112242141B (en) * 2020-10-15 2022-03-15 广州小鹏汽车科技有限公司 Voice control method, intelligent cabin, server, vehicle and medium
CN114442989A (en) * 2020-11-02 2022-05-06 海信视像科技股份有限公司 Natural language analysis method and device
CN112637264A (en) * 2020-11-23 2021-04-09 北京百度网讯科技有限公司 Information interaction method and device, electronic equipment and storage medium
CN112614491A (en) * 2020-12-11 2021-04-06 广州橙行智动汽车科技有限公司 Vehicle-mounted voice interaction method and device, vehicle and readable medium
CN112614491B (en) * 2020-12-11 2024-03-08 广州橙行智动汽车科技有限公司 Vehicle-mounted voice interaction method and device, vehicle and readable medium
CN112685535A (en) * 2020-12-25 2021-04-20 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium
CN113076079A (en) * 2021-04-20 2021-07-06 广州小鹏汽车科技有限公司 Voice control method, server, voice control system and storage medium
CN113053394A (en) * 2021-04-27 2021-06-29 广州小鹏汽车科技有限公司 Voice processing method, server, voice processing system and storage medium
CN113053394B (en) * 2021-04-27 2024-01-09 广州小鹏汽车科技有限公司 Speech processing method, server, speech processing system, and storage medium
CN113421561A (en) * 2021-06-03 2021-09-21 广州小鹏汽车科技有限公司 Voice control method, voice control device, server and storage medium
WO2022252946A1 (en) * 2021-06-03 2022-12-08 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN113421561B (en) * 2021-06-03 2024-01-09 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN113253970A (en) * 2021-07-09 2021-08-13 广州小鹏汽车科技有限公司 Voice interaction method and device, voice interaction system, vehicle and medium
CN113472806A (en) * 2021-07-14 2021-10-01 斑马网络技术有限公司 Voice interaction method, device, system, equipment and storage medium for protecting privacy
CN113472806B (en) * 2021-07-14 2022-11-22 斑马网络技术有限公司 Voice interaction method, device, system, equipment and storage medium for protecting privacy
CN113990322A (en) * 2021-11-04 2022-01-28 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and medium
CN113990322B (en) * 2021-11-04 2023-10-31 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and medium
WO2024113870A1 (en) * 2022-12-01 2024-06-06 浙江极氪智能科技有限公司 Voice interaction method and apparatus, computer device, and computer readable storage medium

Also Published As

Publication number Publication date
WO2022001013A1 (en) 2022-01-06
CN113031905A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN113031905A (en) Voice interaction method, vehicle, server, system and storage medium
US10922322B2 (en) Systems and methods for speech-based searching of content repositories
CN108846037B (en) Method and device for prompting search terms
US20190164540A1 (en) Voice recognition system and voice recognition method for analyzing command having multiple intents
CN107408107B (en) Text prediction integration
AU2013270485C1 (en) Input processing method and apparatus
WO2009055819A1 (en) Improving free-speech command classification for car navigation system
US20130018894A1 (en) System and method of sentiment data generation
CN109119079B (en) Voice input processing method and device
EP2546764A1 (en) System and method of sentiment data use
CN112579733B (en) Rule matching method, rule matching device, storage medium and electronic equipment
CA3233457A1 (en) Machine learning-implemented chat bot database query system for multi-format database queries
CN110020429B (en) Semantic recognition method and device
CN104077105B (en) A kind of information processing method and a kind of electronic equipment
CN110704591A (en) Information processing method and computer equipment
Huang et al. DuIVA: An Intelligent Voice Assistant for Hands-free and Eyes-free Voice Interaction with the Baidu Maps App
EP2835734A1 (en) Apparatus and method for selecting a control object by voice recognition
CN112639796B (en) Multi-character text input system with audio feedback and word completion
US20130179165A1 (en) Dynamic presentation aid
CN114399994A (en) Voice interaction method, vehicle and storage medium
CN112562668A (en) Semantic information deviation rectifying method and device
CN111966267A (en) Application comment method and device and electronic equipment
EP4316882A1 (en) Data processing method and apparatus
US20240127810A1 (en) Dialogue Management Method, Dialogue Management System, And Computer-Readable Recording Medium
CN108959238B (en) Input stream identification method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 46, room 406, No.1, Yichuang street, Zhongxin knowledge city, Huangpu District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou Xiaopeng Automatic Driving Technology Co.,Ltd.

Address before: Room 46, room 406, No.1, Yichuang street, Zhongxin knowledge city, Huangpu District, Guangzhou City, Guangdong Province

Applicant before: Guangzhou Xiaopeng Internet of vehicles Technology Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20201224

Address after: No.8 Songgang street, Cencun, Tianhe District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU XIAOPENG MOTORS TECHNOLOGY Co.,Ltd.

Address before: Room 46, room 406, No.1, Yichuang street, Zhongxin knowledge city, Huangpu District, Guangzhou City, Guangdong Province

Applicant before: Guangzhou Xiaopeng Automatic Driving Technology Co.,Ltd.

TA01 Transfer of patent application right
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201013

WD01 Invention patent application deemed withdrawn after publication