CN111883116A

CN111883116A - Voice interaction method, voice interaction device, server and voice navigation system

Info

Publication number: CN111883116A
Application number: CN202010631913.9A
Authority: CN
Inventors: 赵鹏; 杜满; 易晖; 翁志伟
Original assignee: Guangzhou Xiaopeng Internet of Vehicle Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Internet of Vehicle Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-11-03

Abstract

The invention discloses a voice interaction method, a voice interaction device, a server and a voice navigation system. The voice interaction method is used for a server, and comprises the following steps: acquiring navigation request voice; calling a map supplier interface to obtain a navigation result according to the navigation request voice; processing the navigation request voice to obtain an error correction candidate item, wherein the error correction candidate item comprises a core segment and an attribute suffix; and adding the error correction candidates to the navigation result under the condition that the navigation result does not contain the core segment or the navigation result contains the core segment and does not contain the attribute suffix, filtering the error correction candidates under the condition that the navigation result contains the core segment and the attribute suffix, and sending the navigation result to the vehicle-mounted terminal. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

Description

Voice interaction method, voice interaction device, server and voice navigation system

Technical Field

The present invention relates to the field of voice recognition technologies, and in particular, to a voice interaction method, a voice interaction apparatus, a server, and a voice navigation system.

Background

In an in-vehicle voice interactive navigation scene, a voice recognition system generally only recognizes a Point of Information (POI) address as a unique result, and due to a high possibility of ambiguity appearing in voice interaction and the diversity of POI contents, the recognized result is often not a real place that a user wants to express.

Disclosure of Invention

The invention discloses a voice interaction method, a voice interaction device, a server and a voice navigation system.

The voice interaction method is used for a server, and comprises the following steps: acquiring navigation request voice; calling a map supplier interface to obtain a navigation result according to the navigation request voice; processing the navigation request voice to obtain an error correction candidate item, wherein the error correction candidate item comprises a core segment and an attribute suffix; and adding the error correction candidates to the navigation result under the condition that the navigation result does not contain the core segment or the navigation result contains the core segment and does not contain the attribute suffix, filtering the error correction candidates under the condition that the navigation result contains the core segment and the attribute suffix, and sending the navigation result to a vehicle-mounted terminal.

In the above voice interaction method, the error correction candidates are added to the navigation result when the navigation result does not contain the core segment, or the navigation result contains the core segment and does not contain the attribute suffix, and the error correction candidates are filtered when the navigation result contains the core segment and the attribute suffix. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

In some embodiments, processing the navigation request speech to obtain error correction candidates includes: processing the navigation request voice to obtain slot position information; and obtaining the error correction candidate according to the slot position information.

In some embodiments, processing the navigation request voice to obtain slot information includes: and performing semantic recognition on the navigation request voice to acquire the slot position information.

In some embodiments, obtaining the error correction candidate according to the slot information includes: removing an area prefix from the slot information to obtain a first result, wherein the first result comprises the core fragment and the attribute suffix; obtaining an error correction candidate set according to the first result, wherein the error correction candidate set comprises a plurality of error correction candidates; sorting the plurality of error correction candidates; and outputting one or more error correction candidates according to a preset condition.

In some embodiments, obtaining an error correction candidate set based on the first result comprises: obtaining homophonic core segments and/or phono core segments of the core segments according to the core segments; and combining the homophonic core segments and/or the nearsighted core segments and the attribute suffixes into the error correction candidate set.

In some embodiments, the voice interaction method comprises: and sending the navigation result to the vehicle-mounted terminal in a navigation list form so that the vehicle-mounted terminal displays the navigation list.

In some embodiments, the voice interaction method comprises: and sending the navigation result to the vehicle-mounted terminal in the form of an information point card and/or an information point floating window so that the vehicle-mounted terminal displays the navigation result.

The voice interaction device is used for a server and comprises an acquisition module, a processing module and an output module; the acquisition module is used for acquiring navigation request voice; the processing module is used for calling a map supplier interface to obtain a navigation result according to the navigation request voice and processing the navigation request voice to obtain an error correction candidate item, wherein the error correction candidate item comprises a core fragment and an attribute suffix; the output module is used for adding the error correction candidate items to the navigation result under the condition that the navigation result does not contain the core segment or the navigation result contains the core segment and does not contain the attribute suffix, filtering the error correction candidate items under the condition that the navigation result contains the core segment and the attribute suffix, and sending the navigation result to the vehicle-mounted terminal.

The server of the embodiment of the invention comprises a memory and a processor, wherein the memory stores a computer program, and the processor is used for executing the program to realize the voice interaction method of any one of the above embodiments.

The voice navigation system of the embodiment of the invention comprises a vehicle and the server of the embodiment.

In the voice interaction device, the server, and the voice navigation system according to the embodiments of the present invention, the error correction candidate items are added to the navigation result when the navigation result does not include the core piece, or the navigation result includes the core piece and does not include the attribute suffix, and the error correction candidate items are filtered when the navigation result includes the core piece and the attribute suffix. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart diagram of a voice interaction method of an embodiment of the present invention;

FIG. 2 is a diagram of an application scenario of a voice interaction method according to an embodiment of the present invention;

FIG. 3 is a display interface of the in-vehicle terminal of an embodiment of the present invention;

FIG. 4 is another display interface of the in-vehicle terminal according to the embodiment of the present invention;

FIGS. 5-8 are flow diagrams of a voice interaction method according to an embodiment of the present invention;

FIG. 9 is yet another display interface of the in-vehicle terminal according to the embodiment of the present invention;

fig. 10 is still another display interface of the in-vehicle terminal according to the embodiment of the present invention;

FIG. 11 is a block diagram of a voice interaction device in accordance with an embodiment of the present invention;

FIG. 12 is a block diagram of a server in accordance with an embodiment of the present invention;

FIG. 13 is a schematic diagram of a scenario of a voice navigation system according to an embodiment of the present invention.

Description of the main element symbols:

a voice navigation system 1000;

server 100, processor 110, memory 120, in-vehicle terminal 200, voice interaction device 300, acquisition module 310, processing module 320, output module 330, vehicle 400.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. In the following, reference is made to the accompanying drawings

In the present invention, unless otherwise expressly stated or limited, "above" or "below" a first feature means that the first and second features are in direct contact, or that the first and second features are not in direct contact but are in contact with each other via another feature therebetween. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.

The following disclosure provides many different embodiments or examples for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, the present invention provides examples of various specific processes and materials, but one of ordinary skill in the art may recognize applications of other processes and/or uses of other materials.

Referring to fig. 1 to 4, an embodiment of the present invention provides a voice interaction method, where the voice interaction method is used in a server 100, and the voice interaction method includes:

s10, acquiring navigation request voice;

s20, calling a map supplier interface to obtain a navigation result according to the navigation request voice;

s30, processing the navigation request voice to obtain error correction candidate items, wherein the error correction candidate items comprise core fragments and attribute suffixes;

s40, adding the error correction candidates to the navigation result if the navigation result does not contain the core piece, or if the navigation result contains the core piece and does not contain the attribute suffix, filtering the error correction candidates if the navigation result contains the core piece and the attribute suffix, and sending the navigation result to the in-vehicle terminal 200.

The voice interaction method of the embodiment of the invention can be used for the server 100 of vehicle-mounted voice navigation, and the server 100 acquires the navigation request voice from a user through the vehicle-mounted terminal 200 arranged on a vehicle. The in-vehicle terminal 200 may include, but is not limited to, an in-vehicle display panel, a mobile phone, an in-vehicle computer, a tablet computer, and other electronic terminals.

In one example, the navigation request voice is "navigate to red garden", and the server 100 calls the map provider interface to obtain the navigation result as shown in fig. 3 according to the navigation request voice. Meanwhile, the server 100 obtains two error correction candidates of "lucky garden" and "moistening garden" according to the navigation request voice.

The core fragment of the error correction candidate item 'lucky garden' is 'lucky', the attribute suffix is 'garden', the core fragment of the error correction candidate item 'hongrun garden' is 'hongrun', and the attribute suffix is 'garden'. Subsequently, for the error correction candidate "lucky garden", the server 100 determines the relationship between the navigation result and the error correction candidate, and adds the error correction candidate "lucky garden" as a cue word to the navigation result in the case that the navigation result does not contain the core segment "lucky", or the navigation result contains the core segment "lucky" but does not contain the attribute suffix "garden", and the navigation result displayed by the in-vehicle terminal 200 is as shown in fig. 3.

When the navigation result contains the core segment "lucky" and the attribute suffix "garden", it is considered that the error correction candidate "lucky garden" already exists in the navigation result, the error correction candidate is filtered and not added to the navigation result, and the navigation result displayed by the in-vehicle terminal 200 is as shown in fig. 4.

It can be understood that the judgment manner of the error correction candidate "moistening the garden with macros" is the same.

It is noted that the above "including" may include the case where the core fragment and the attribute suffix are subsequences of the navigation result, a subsequence of one sequence being a new sequence formed from the original sequence by removing some elements without destroying the relative positions (preceding or following) of the remaining elements, such as: a "commercial building" is a subsequence of a "commercial building".

In the embodiment shown in fig. 1, step S20 is performed before step S30. In other embodiments, step S20 may be performed after step S30, or step S20 may be performed simultaneously with step S30.

Referring to fig. 5, in some embodiments, step S30 includes:

s31, processing the navigation request voice to obtain slot position information;

and S32, obtaining error correction candidates according to the slot position information.

Therefore, the error correction candidate can be obtained according to the navigation voice request of the user.

In general, the slot position information of the navigation request voice may be obtained through a slot filling model in NLU (Natural-language understanding). In navigation, the slot information may be a place name, a building name, and other information points, for example, the navigation request voice is "navigate to red garden", where "red garden" is the slot information.

Referring to fig. 6, in some embodiments, step S31 includes:

s311, performing semantic recognition on the navigation request voice to acquire slot position information.

Specifically, the server 100 performs semantic recognition on the navigation request voice to obtain a semantic recognition result, where the semantic recognition result may include information such as a field, an intention, and a slot position. Such information can be obtained through a classification model and a slot filling model in the NLU, for example, intention information, i.e., navigation intention information of the user, can be obtained through dialog intention classification, and slot information, i.e., "red garden" in the above embodiment, can be obtained through a named entity recognition model.

Referring to fig. 7, in some embodiments, step S32 includes:

s321, removing the regional prefix from the slot position information to obtain a first result, wherein the first result comprises a core fragment and an attribute suffix;

s322, obtaining an error correction candidate set according to the first result, wherein the error correction candidate set comprises a plurality of error correction candidates;

s323, sorting the multiple error correction candidates;

s324, outputting one or more error correction candidates according to preset conditions.

In this way, the server 100 may output one or more error correction candidates that meet the preset condition.

In one example, the navigation request voice is "navigate to guangzhou shipping garden", the slot information is "guangzhou shipping garden" obtained by the NLU, and in step S321, the area prefix is removed from the slot information "guangzhou shipping garden" to obtain a first result "shipping garden", where the first result includes the core fragment "shipping" and the attribute suffix "garden". In step S322, the server 100 obtains an error correction candidate set composed of a plurality of error correction candidates, such as an error correction candidate set composed of error correction candidates "lucky garden", "macro garden", "transport garden", "macro transport garden", and the like, according to the first result. In step S323, the plurality of error correction candidates may be ranked according to the degree of correlation between the error correction candidates and the first result, with the error correction candidates with higher degrees of correlation having higher priorities. In step S324, the server 100 may output one or more error correction candidates that meet a preset condition. The preset condition in the embodiment of the present invention may be that the correlation between the error correction candidate and the first result is greater than the preset correlation, and of course, in other embodiments, the preset condition may also be other conditions, and may be set according to specific situations.

It should be noted that the correlation may include a plurality of judgment criteria composed of elements such as pronunciation, word number, radical, stroke, etc., each element is matched, the correlation is increased by 1, and the higher the correlation is, the more similar the two are.

Referring to fig. 8, in some embodiments, step S322 includes:

s3221, obtaining a homophonic core segment and a neartone core segment of the core segment according to the core segment;

s3222, the homophonic core segment, the nearsighted core segment and the attribute suffix form an error correction candidate set.

In this way, an error correction candidate set composed of a plurality of error correction candidates with high reliability can be obtained.

In one example, the core segment of the first result is "red shipping", and the server 100 may retrieve word banks in the local and cloud sides, and obtain a plurality of homophonic core segments such as "lucky", "carry transportation", and "macro transportation" and a plurality of near-sound core segments such as "macro run" and "red run" according to the core segment "red shipping". Subsequently, the homophonic core segment and the neartonic core segment described above and the attribute suffix "garden" are grouped into an error correction candidate set including a plurality of error correction candidates.

After obtaining the error correction candidate set containing the homophonic core segments and the nearsighted core segments, when sorting the error correction candidates, at least following two principles are followed:

firstly, the error correction candidates containing the homophonic core segments have higher priority than the error correction candidates containing the nearsighted core segments.

And secondly, in the error correction candidates containing the homophonic core segments, sorting can be carried out according to the correlation degrees of the homophonic core segments and the first results, wherein the higher the correlation degree of the error correction candidates containing the homophonic core segments and the first results is, the higher the priority is. Similarly, in the error correction candidates including the neartone core segment, the higher the correlation with the first result, the higher the priority.

Of course, there may be other sorting rules, for example, the error correction candidates that are closer to each other may have higher priority according to the distance between the locations corresponding to the error correction candidates.

In other embodiments, a homophonic core segment or a nearing core segment of a core segment may be obtained from the core segment; and combining the homophonic core segment or the nearsighted core segment and the attribute suffix into an error correction candidate set.

In some embodiments, a voice interaction method comprises: the navigation result is transmitted to the in-vehicle terminal 200 in the form of a navigation list to cause the in-vehicle terminal 200 to display the navigation list.

In this way, the navigation result can be displayed in the form of a navigation list at the in-vehicle terminal 200, and the user can intuitively acquire navigation information from the navigation list.

Specifically, in the case where the error correction candidates are added to the navigation result, the navigation list may include an initial list obtained directly from the navigation request speech by the map provider and the error correction candidates added via step S40. The error correction candidate items can be placed above the initial list in a prompt word mode or in other more prominent positions to form a final navigation list. For example, the navigation request voice is "navigate to red garden", and the map provider recognizes and searches the server 100 and local map information for "red garden" and forms an initial list. Subsequently, the error correction candidates "lucky garden" and "moistening garden" obtained and added through the server 100 are added to the initial list to form a final navigation list and are presented in the form of cue words, as shown in fig. 3, the user can click on "lucky garden" or "moistening garden" to make the map provider re-select a place by using the "lucky garden" or "moistening garden" as a keyword, and the place selected by the user is used as a navigation place.

In some embodiments, a voice interaction method comprises: and sending the navigation result to the vehicle-mounted terminal 200 in the form of an information point card and/or an information point floating window so that the vehicle-mounted terminal 200 displays the navigation result.

Thus, the information point card and the information point floating window can comprise contents about the position, distance, telephone and the like of the information point, and more optional information can be provided for a user.

In one example, the navigation request voice is "navigate to red garden", and the navigation result is displayed on the in-vehicle terminal 200 in the form of an information point card. As shown in fig. 9, the information point card currently acquired as the information point of the "red garden" includes contents such as error correction candidates, scene pictures, phone calls, time information, charging information, and routes, and the user clicks the corresponding contents, which can be displayed by different functions. For example, the user clicks 'phone' on fig. 9, and the in-vehicle terminal 200 may display and dial a phone number of 'red garden' stored at the map provider side. The user clicks on the route, and the in-vehicle terminal 200 may display a driving route from the current location to the "lucky garden". The user clicks on the "lucky garden" in the correction candidates, and the map vendor searches for the "lucky garden" again to obtain the information point card of the "lucky garden" shown in fig. 10.

Of course, the user can also select the content on the information point card in a voice control mode, and the user can control the vehicle more attentively by both hands during the running of the vehicle, so that the running safety factor is improved.

Referring to fig. 11, the voice interaction apparatus 300 according to the embodiment of the present invention is applied to the server 100, and the voice interaction apparatus 300 according to the embodiment of the present invention includes an obtaining module 310, a processing module 320, and an output module 330.

The obtaining module 310 is used for obtaining the navigation request voice. The processing module 320 is configured to invoke the map provider interface to obtain a navigation result according to the navigation request voice, and to process the navigation request voice to obtain an error correction candidate, where the error correction candidate includes a core segment and an attribute suffix. The output module 330 is configured to add the error correction candidates to the navigation result when the navigation result does not include the core segment, or the navigation result includes the core segment and does not include the attribute suffix, filter the error correction candidates when the navigation result includes the core segment and the attribute suffix, and send the navigation result to the vehicle-mounted terminal 200.

In the speech interaction apparatus 300, in the case that the navigation result does not include the core segment, or the navigation result includes the core segment and does not include the attribute suffix, the error correction candidates are added to the navigation result, and in the case that the navigation result includes the core segment and the attribute suffix, the error correction candidates are filtered. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

Referring to fig. 12, a server 100 according to an embodiment of the present invention includes a memory 120 and a processor 110, where the memory 120 stores a computer program, and the processor 110 is configured to execute the program to implement a voice interaction method according to any of the above embodiments.

In the server 100 according to the embodiment of the present invention, the error correction candidate is added to the navigation result when the navigation result does not include the core piece, or the navigation result includes the core piece and does not include the attribute suffix, and the error correction candidate is filtered when the navigation result includes the core piece and the attribute suffix. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

In the present invention, the computer program comprises computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The memory 120 may include high speed random access memory 120 and may also include non-volatile memory 120, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), at least one piece of disk storage 120, a Flash memory device, or other volatile solid state storage 120. The Processor 110 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 13, a voice navigation system 1000 according to an embodiment of the present invention includes a vehicle 400 and the server 100 according to the above embodiment.

In the speech navigation system 1000 according to the embodiment of the present invention, the error correction candidate is added to the navigation result when the navigation result does not include the core piece, or the navigation result includes the core piece and does not include the attribute suffix, and the error correction candidate is filtered when the navigation result includes the core piece and the attribute suffix. Therefore, the navigation result can be accurately acquired, and the accuracy of voice navigation can be effectively improved.

In the description herein, references to the description of the terms "one embodiment," "certain embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A voice interaction method is used for a server, and is characterized in that the voice interaction method comprises the following steps:

acquiring navigation request voice;

calling a map supplier interface to obtain a navigation result according to the navigation request voice;

processing the navigation request voice to obtain an error correction candidate item, wherein the error correction candidate item comprises a core segment and an attribute suffix;

and adding the error correction candidates to the navigation result under the condition that the navigation result does not contain the core segment or the navigation result contains the core segment and does not contain the attribute suffix, filtering the error correction candidates under the condition that the navigation result contains the core segment and the attribute suffix, and sending the navigation result to a vehicle-mounted terminal.

2. The voice interaction method of claim 1, wherein processing the navigation request voice to obtain error correction candidates comprises:

processing the navigation request voice to obtain slot position information;

and obtaining the error correction candidate according to the slot position information.

3. The voice interaction method of claim 2, wherein processing the navigation request voice to obtain slot information comprises:

and performing semantic recognition on the navigation request voice to acquire the slot position information.

4. The voice interaction method of claim 2, wherein obtaining the error correction candidate according to the slot information comprises:

removing an area prefix from the slot information to obtain a first result, wherein the first result comprises the core fragment and the attribute suffix;

obtaining an error correction candidate set according to the first result, wherein the error correction candidate set comprises a plurality of error correction candidates;

sorting the plurality of error correction candidates;

and outputting one or more error correction candidates according to a preset condition.

5. The method of claim 4, wherein obtaining the error correction candidate set according to the first result comprises:

obtaining homophonic core segments and/or phono core segments of the core segments according to the core segments;

and combining the homophonic core segments and/or the nearsighted core segments and the attribute suffixes into the error correction candidate set.

6. The voice interaction method according to claim 1, wherein the voice interaction method comprises: and sending the navigation result to the vehicle-mounted terminal in a navigation list form so that the vehicle-mounted terminal displays the navigation list.

7. The voice interaction method according to claim 1, wherein the voice interaction method comprises: and sending the navigation result to the vehicle-mounted terminal in the form of an information point card and/or an information point floating window so that the vehicle-mounted terminal displays the navigation result.

8. A voice interaction apparatus for a server, the voice interaction apparatus comprising:

the acquisition module is used for acquiring the navigation request voice;

the processing module is used for calling a map supplier interface to obtain a navigation result according to the navigation request voice and processing the navigation request voice to obtain an error correction candidate item, and the error correction candidate item comprises a core fragment and an attribute suffix; and

an output module, configured to add the error correction candidate to the navigation result when the navigation result does not include the core segment, or the navigation result includes the core segment and does not include the attribute suffix, filter the error correction candidate when the navigation result includes the core segment and the attribute suffix, and send the navigation result to a vehicle-mounted terminal.

9. A server, characterized by comprising a memory storing a computer program and a processor for executing the program to implement the voice interaction method of any one of claims 1-7.

10. A voice navigation system comprising a vehicle and the server of claim 9.