CN116153313A

CN116153313A - Voice interaction method, server and computer readable storage medium

Info

Publication number: CN116153313A
Application number: CN202310374380.4A
Authority: CN
Inventors: 宁洪珂; 丁鹏傑; 樊骏锋; 朱麒宇; 赵群
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-05-23

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a voice request forwarded by a vehicle; performing slot identification on the voice request to obtain first slot information; performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information; according to the second slot position information, carrying out application program interface prediction on the voice request; and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be performed on ambiguous slot information according to the additional information, parameter filling is performed on the predicted application program interface according to the slot information obtained after the disambiguation processing, and finally an execution result is output and issued to a vehicle to complete voice interaction. The accuracy of slot identification is effectively improved, and the voice interaction experience of the user is improved.

Description

Voice interaction method, server and computer readable storage medium

Technical Field

The present invention relates to the field of vehicle-mounted voice technologies, and in particular, to a voice interaction method, a server, and a computer readable storage medium.

Background

The current dialogue system uses a natural language generation module to analyze the sentence of the user into a semantic label which can be understood by a machine, maintains an internal dialogue state as a compact representation of the whole dialogue history through a dialogue state tracking module, uses a dialogue strategy module to select a proper dialogue action according to the state, and finally converts the dialogue action into a natural language reply through the natural language generation module. In an actual interaction scene, ambiguity may exist in the slot information extracted from the user voice request, the recognition result in the related technology may be wrong, the desired slot result cannot be extracted, and the voice interaction requirement of the user in the vehicle-mounted scene is difficult to meet.

Disclosure of Invention

The application provides a voice interaction method, a server and a computer readable storage medium.

The voice interaction method comprises the following steps:

receiving a voice request forwarded by a vehicle;

performing slot identification on the voice request to obtain first slot information;

performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information;

performing application program interface prediction on the voice request according to the second slot position information;

and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to a vehicle to complete voice interaction.

Therefore, the voice interaction method can perform disambiguation processing on the first slot information obtained by identifying the slot according to the preset additional information to obtain the second slot information without ambiguity. And the predicted application program interface can be filled with parameters according to the second slot position information, and finally the execution result is output and issued to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be carried out on ambiguous slot values according to the additional information in the slot recognition process, the accuracy of slot recognition is effectively improved, and the voice interaction experience of a user is improved.

The step of performing slot recognition on the voice request to obtain first slot information includes:

and carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain the first slot information.

Therefore, the voice request can be subjected to slot identification to obtain the slot value and at least one corresponding slot type thereof, so that the plurality of slot types can be subjected to disambiguation treatment subsequently, and a more accurate slot identification result can be obtained.

The additional information includes a plurality of sub-information, and the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

determining sub-information for performing disambiguation according to the first slot information;

and performing disambiguation processing on the first slot position information according to the sub-information to obtain second slot position information.

Therefore, sub-information can be determined according to the first slot position information and used for disambiguation processing, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.

The step of performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information includes:

and performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.

Therefore, disambiguation processing can be performed according to the corresponding relation between the slot position value and the slot position type in the sub-information, so that second slot position information is obtained, a more accurate slot position identification result is obtained, and interaction experience of users is improved.

The method further comprises the steps of:

updating the corresponding relation in response to a modification operation of the corresponding relation;

according to the corresponding relation between the slot position value and the slot position type in the preset sub-information, performing disambiguation processing on the first slot position information to obtain second slot position information, wherein the disambiguation processing comprises the following steps:

and performing disambiguation processing on the first slot position information according to the updated corresponding relation to obtain second slot position information.

In this way, the corresponding relation between the slot position value and the slot position information in the sub-information can be modified, and the second slot position information is obtained through disambiguation processing according to the updated corresponding relation, so that the parameters for filling the predicted application program interface can be determined, and finally the execution result is output and issued to the vehicle to complete the voice interaction process.

The step of performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to sentence pattern information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.

Therefore, disambiguation processing can be performed according to sentence pattern information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and interaction experience of a user is improved.

and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to the media resource library heat information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.

Therefore, disambiguation processing can be performed according to the media resource library heat information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.

and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the vehicle-mounted system interface information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.

Therefore, disambiguation processing can be performed according to the interface information of the vehicle-mounted system in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and interaction experience of a user is improved.

and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the user preference information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.

Therefore, disambiguation processing can be performed according to the user preference information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of the user is improved.

The server of the present application comprises a processor and a memory, wherein the memory stores a computer program, and the computer program realizes the method when being executed by the processor.

The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the voice interaction method of any of the above embodiments.

Therefore, the storage medium of the invention adopts the end-to-end architecture to reduce the delay of the vehicle-mounted system, improve the response speed to the user command, integrate the slot recognition result of the user voice request and the predicted additional characteristics of the application program interface, effectively improve the precision of the application program interface parameter filling task and meet the vehicle control requirement.

Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a dialogue system in the related art;

FIG. 2 is a schematic diagram of a related art end-to-end architecture dialog system;

FIG. 3 is one of the flow diagrams of the voice interaction method of the present application;

FIG. 4 is a schematic diagram of the architecture of the dialog system of the end-to-end architecture of the present application;

FIG. 5 is a second flow chart of the voice interaction method of the present application;

FIG. 6 is a third flow chart of the voice interaction method of the present application;

FIG. 7 is a flow chart of a voice interaction method of the present application;

FIG. 8 is a fifth flow chart of the voice interaction method of the present application;

FIG. 9 is a flow chart of a voice interaction method of the present application;

FIG. 10 is a flow chart of a voice interaction method of the present application;

FIG. 11 is a flowchart eighth of a voice interaction method of the present application;

fig. 12 is a flowchart illustrating a voice interaction method according to the present application.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present invention and are not to be construed as limiting the embodiments of the present invention.

Referring to fig. 1, the conventional vehicle-mounted voice architecture is based on a conventional modularized policy, and the entire dialogue flow, such as natural language understanding, state tracking, dialogue policy, natural language generation, etc., is implemented between components by division of labor. These components are either mainly manually built on rules or generated by training models on a supervised dataset. Training of each component requires a large amount of annotation data, which however tends to be expensive, which also limits the scalability of the system. Meanwhile, the traditional vehicle-mounted voice system depends on a large number of rules and business logic to ensure the accuracy and stability of the system, and the scale and the functions of the system are further limited.

From the whole processing link of the dialogue, the traditional vehicle-mounted voice architecture takes user input, and needs to perform natural language understanding, namely domain classification, intention recognition and slot recognition, then select and execute an application program interface (Application Programming Interface, API) meeting the user input requirement in the dialogue management module in combination with the dialogue state and dialogue strategy, and return system output interacting with the user through the natural language generation module.

In view of this, referring to fig. 2, the end-to-end based dialog system of the present invention includes three core algorithm modules: the slot position recognition module is used for recognizing an entity in a voice request input by a user; the action prediction (Action Prediction, AP) module is used for predicting an application program interface which corresponds to the user input and realizes the current target of the user; the parameter Filling (AF) module is used to identify the entity in the user input corresponds to the parameter in the application program interface obtained in the previous step.

The slot position identification module is used for acquiring the entity which needs to be called in the application program interface, the action prediction module determines whether the application program interface which is called by the subsequent realization of the user voice input is correct, and the parameter filling module selects which entity is used for being executed as the parameter of the application program interface.

In order for the action prediction module to work normally, slot information is introduced to predict an application program interface. In some examples, the correspondence of the user voice request (query) and slot information (slot), and the predicted target's application program interface (api) are as follows:

query = play paris polyphylla, slot = { song: paris polyphylla }, infer api = musicsearachlplay;

query = play little marbali, slot = { xmby_album }, infer api = xmbysearchlplay;

query = play tiger's dragon, slot = { video_name }, infer api = video searachlplay;

the slot information in the above example has a clear boundary, and when the slot information has ambiguity, only the corresponding slot information cannot accurately predict the target application program interface. For example, in the process of performing slot recognition on a voice request sent by a user, "query=play cross-epoch, slot= { song: cross-epoch, album: cross-epoch }", in the slot recognition result, the slot "album: cross-epoch" may cover the slot "song: cross-epoch" with higher heat. That is, the user desires to play a song through voice interaction, but eventually plays the album of the same name. In the slot information obtained through the slot identification, the slot value may be ambiguous, which may lead to inaccuracy of the application program interface (api) of the target.

Based on the above problems, referring to fig. 3, the present invention provides a voice interaction method. The voice interaction method comprises the following steps:

01: receiving a voice request forwarded by a vehicle;

02: performing slot identification on the voice request to obtain first slot information;

03: performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information;

04: according to the second slot position information, carrying out application program interface prediction on the voice request;

05: and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to the vehicle to complete voice interaction.

The invention also provides a server. The server includes a processor and a memory having a computer program stored thereon. The processor is used for receiving a voice request forwarded by the vehicle, carrying out slot recognition on the voice request to obtain first slot information, carrying out disambiguation processing on the first slot information according to preset additional information to obtain second slot information, carrying out application program interface prediction on the voice request according to the second slot information, selecting the predicted application program interface to execute application program interface parameter filling according to the second slot information and the predicted application program interface, and outputting an execution result to be issued to the vehicle to complete voice interaction.

Firstly, receiving a user voice request forwarded by a vehicle, and directly carrying out slot recognition on the received voice request to obtain first slot information. Ambiguity may exist in the first slot information, i.e., the slot values in the first slot information may correspond to a plurality of slot types. Therefore, the end-to-end architecture shown in fig. 5 can be obtained by adding the disambiguation module into the end-to-end architecture shown in fig. 2, that is, the multidirectional corresponding relationship in the first slot information can be eliminated by the preset additional information, and the second slot information after the disambiguation processing can be obtained. The preset additional information includes a description of a slot value in the first slot information. And performing disambiguation processing on the first slot information according to the preset additional information, wherein the slot value corresponds to the unique slot type in the obtained second slot information.

The correspondence in the second slot information may predict a target application program interface (api) required to perform an action in the voice request. And selecting parameters conforming to the second slot information according to the slot value and the slot type in the second slot information and the predicted application program interface, filling the predicted application program interface, and finally outputting an execution result and transmitting the execution result to the vehicle to finish voice interaction.

In one example, the user voice request "switch headrest mode", the voice interaction process is shown in 4 steps as follows:

step 1: the query= "switch headrest mode", input to slot identification module, obtain slot (slot) information, namely slot= { gui _entity_name: headrest mode, target_function: headrest mode }.

Step 2: performing disambiguation processing on query= "switching headrest mode", slot= { gui _entity_name: headrest mode, target_function: headrest mode }, and displaying preset additional information, wherein the tag corresponding to the "switching headrest mode" has a high probability of "gui", namely the second slot information is reserved with gui related slot types, specifically: slot= { gui _entity_name: headrest mode }.

Step 3: the query= "switch headrest mode", slot= { gui _entity_name: headrest mode } is input to the action prediction module, and the predicted application program interface is: api=screenpageopen.

Step 4: the query= "switch headrest mode", slot= { gui _entity_name: headrest mode }, api=screenPageopen input parameter filling module, and the parameter values corresponding to the application program interface are predicted by the model in the parameter filling module as follows: arment= { gui _entity_name: headrest mode }.

When the preset additional information is changed, only the slot type to be reserved in the second slot information in Step2 is changed in the voice interaction process, and the model in the follow-up prediction module is not changed. In some fields, such as music, video, etc., resources are relatively open and variable, and are updated frequently as additional information. And disambiguating the first slot information obtained by the slot identification, wherein a prediction model of an application program interface in the action prediction module can be changed without adding information update, so that the stability of the model is ensured.

The end-to-end architecture can simplify intermediate modules of a traditional dialogue system architecture, such as a natural language understanding module, a dialogue management module, a car machine instruction generation module, a natural language generation module and the like, reduce the call of a plurality of models in different vertical domains, reduce the delay of a vehicle-mounted system and improve the response speed to user instructions.

In summary, the voice interaction method of the present application may perform disambiguation processing on the first slot information obtained by identifying the slot according to the preset additional information, so as to obtain second slot information without ambiguity. And the predicted application program interface can be filled with parameters according to the second slot position information, and finally the execution result is output and issued to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be carried out on ambiguous slot values according to the additional information in the slot recognition process, the accuracy of slot recognition is effectively improved, and the voice interaction experience of a user is improved.

Referring to fig. 5, step 02 includes:

021: and carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain first slot information.

The processor is used for carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain first slot information.

Specifically, after receiving a user voice request forwarded by the vehicle, the voice assistant needs to perform slot recognition on the voice request to obtain a slot value, and at least one slot type tag corresponding to the slot value can also be obtained. For example, the user sends a voice request "switch headrest mode", and inputs the voice request to the slot position recognition module, so that the slot position value is "headrest mode". The slot identification may also identify corresponding slot type tags, such as "gui _entity_name" and "target_function", for the slot value "headrest module". Finally, the obtained first slot information comprises a slot value and at least one slot type label corresponding to the slot value: "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; ".

Referring to fig. 6, the additional information includes a plurality of sub-information, and step 03 includes:

031: determining sub-information for performing disambiguation processing according to the first slot information;

032: and performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information.

The processor is used for determining sub-information for performing disambiguation processing according to the first slot information, and performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information.

Specifically, there may be ambiguity in the first slot information, that is, the slot value in the first slot information may correspond to a plurality of slot type tags. Sub-information for performing the disambiguation process may be determined based on ambiguities present in the first slot information. The sub-information includes constraints on a plurality of specific slot type labels corresponding to the slot values. The manner in which the sub-information is obtained may include matching keywords in a sentence repository, and the like. Finally, the slot value and the most likely corresponding slot type tag in the present context may together form the second slot information.

In one example, the first slot information is "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; "wherein the slot type tag corresponding to the slot value" headrest mode "includes" gui _entity_name "and" target_function ". It may be determined from the above-described first slot information that the sub-information for performing the disambiguation process requires constraint on the slot type tags "gui _entity_name" and "target_function". When the sub information is: when the label corresponding to the "switch headrest mode" is "gui", it may be obtained that the slot type label corresponding to the slot value "headrest mode" most strongly in the current context should be "gui _entity_name", and the slot type label "target_function" is omitted, so as to obtain the second slot information slot= { gui _entity_name: headrest mode }.

Referring to fig. 7, step 032 includes:

0321: and performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.

The processor is used for performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.

Specifically, there may be ambiguity in the first slot information, that is, the slot value in the first slot information may correspond to a plurality of slot type tags. And the sub-information includes constraints on a plurality of specific slot type labels corresponding to the slot values. The sub-information generally indicates the correspondence between the slot value and the slot type label, and the process of disambiguation treatment is to discard the slot type label without the correspondence in the sub-information. The disambiguation process may discard the slot types in the sub-information that do not form a corresponding relationship with the slot values, and finally obtain a single slot type label corresponding to the slot values, that is, the second slot information.

In one example, the first slot information is "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; "wherein the slot type tag corresponding to the slot value" headrest mode "includes" gui _entity_name "and" target_function ". It may be determined from the above-described first slot information that the sub-information for performing the disambiguation process requires constraint on the slot type tags "gui _entity_name" and "target_function". Matching the sentence pattern resource library to obtain the sub-information: the label corresponding to the "switch headrest mode" is "gui". Then the slot type tag that corresponds most to the slot value "headrest mode" in the present context should be "gui _entry_name" to obtain the second slot information slot= { gui _entry_name: headrest mode }.

That is, the slot type corresponding to the "headrest mode" is a graphical user interface, and only the slot type label "GUI _entity_name" related to the GUI needs to be reserved.

Referring to fig. 8, the method further includes:

06: updating the corresponding relation in response to the modification operation of the corresponding relation;

step 0321 includes:

The processor is used for responding to the modification operation of the corresponding relation, updating the corresponding relation, and performing disambiguation processing on the first slot information according to the updated corresponding relation to obtain second slot information.

Specifically, the sub-information has a corresponding relation between the slot position value and the slot position type, so that disambiguation processing can be performed on the first slot position information, and finally the second slot position information is obtained. When the corresponding relation between the slot position value and the slot position type contained in the sub-information changes, the corresponding relation needs to be modified, and the corresponding relation is updated in time.

And according to the updated corresponding relation in the sub-information, performing disambiguation processing on the first slot information according to the method, and finally obtaining the second slot information.

In one example, the first slot information is "slot= { singer: zhou Jielun, actor: zhou Jielun }; the slot type label corresponding to the slot value of Zhou Jielun includes singer and actor. Since the sub information indicates that "Zhou Jielun" is more famous as singer than as actor, the first slot information is disambiguated according to the sub information, and the second slot information "slot= { singer: zhou Jielun }") can be obtained.

When the preset additional information is changed, only the slot type to be reserved in the second slot information is changed in the voice interaction process. For example, when the artist Zhou Jielun is hotter as an actor than as a singer, the correspondence in the sub-information may be changed. And performing disambiguation processing according to the updated corresponding relation, wherein the obtained second slot information is slot= { actor: zhou Jielun }). The change in the second slot information does not affect the model in the follow-up prediction module. Therefore, in the field of relatively open and changeable resources, the prediction model of the application program interface in the action prediction module can be changed without adding information update, so that the stability of the model is ensured.

Referring to fig. 9, step 03 includes:

033: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to sentence pattern information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain second slot information.

The processor is used for performing disambiguation processing according to sentence information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type label uniquely corresponding to the slot value so as to obtain second slot information.

Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, sentence pattern information is included in the additional information. When a voice request hits a particular sentence, such as "switch xx functions," etc. And selecting a plurality of specific slot type labels corresponding to the slot values according to the sentence pattern information in the additional information, and finally, leaving the slot type labels matched with the current sentence pattern.

The sentence pattern information sources in the additional information can be the results obtained by training cloud dialogue information through a model, and can also be sentence pattern information in dialogue left by a user in the history voice interaction process. The slot value in the voice request meeting the specific sentence pattern needs to correspond to the specific slot type label, and the corresponding function can be normally realized. For example, in the voice request satisfying the sentence "switch … …", the slot type tags corresponding to "switch main driving massage", "switch sub driving rhythm", "switch xx cruising standard", "switch to user habit a" are the vehicle control type tags "target_function", but the slot type tags corresponding to "switch headrest mode", "switch steering wheel" are the scene through tag "gui _entity_name". The sentence pattern information carries out disambiguation treatment on the slot type labels which do not meet the function realization conditions, and the slot values and the slot type labels which meet the conditions jointly form second slot information.

In one example, the user issues a voice request "switch headrest mode". The slot value "headrest mode" in the first slot information may correspond to a plurality of slot type tags, such as "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; ". From the above sentence pattern information, it is known that the slot type tag corresponding to the "switch headrest mode" is highly probable to be "gui", but cannot be other tags. After the disambiguation process, the second slot information "slot= { gui _entity_name: headrest mode }".

Referring to fig. 10, step 03 includes:

034: and under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, performing disambiguation processing according to the media resource library heat information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain second slot information.

The processor is used for performing disambiguation processing according to the media resource library heat information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type label uniquely corresponding to the slot value so as to obtain second slot information.

Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information includes media asset library heat information. When the voice request belongs to the field of media resources, such as playing xx songs or playing xx videos, additional information, namely describing a plurality of specific slot type labels corresponding to the slot values, comprises the heat information of the media resource library, which is periodically obtained from each large music platform. And performing disambiguation processing according to the heat information of the media resource library, and preferentially selecting a unique tag meeting certain conditions in the media resource library as a slot type tag corresponding to the slot value to jointly form second slot information when the voice request is not particularly limited. The conditions to be satisfied may be the highest heat, etc., and are not limited herein.

In one example, the user issues a voice request "play Song A". The slot value "song a" in the first slot information may correspond to a plurality of media assets, including audio and video works sung by different singers, and the like. The slot type tag corresponding to the slot value "song a" may include an audio slot type tag "song" or a video type tag "video". According to the heat information of the media resource, the most clicked number of users in the searched media files is preferentially played, for example, the clicked number of the audio files corresponding to ' music ' is higher than the clicked number of the video files corresponding to ' video ', and the second slot information obtained through disambiguation processing is ' slot= { music1: song A }.

Referring to fig. 11, step 03 includes:

035: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the vehicle-mounted system interface information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain second slot information.

The processor is used for performing disambiguation processing according to the vehicle-mounted system interface information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type uniquely corresponding to the slot value so as to obtain second slot information.

Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information includes in-vehicle system interface information. When the voice request relates to the function of the vehicle-mounted system application, and the vehicle-mounted system application interface corresponding to the voice request is just opened in the vehicle-mounted system interface, the slot type label corresponding to the current vehicle-mounted system application interface can be directly judged to be the slot type label corresponding to the slot value extracted from the voice request. For example, when a user makes a voice request "search for norway forests", if the in-vehicle system user interface opens a music playing application, the music "norway forests" is searched directly in the current music playing application. When the reading application is opened, the book "Norway forest" is searched directly in the current reading application. The interface information of the vehicle-mounted system in the additional information is application information of the current screen being opened, when the slot position value in the voice request hits the content in a plurality of applications, the slot position type label corresponding to the application of the current screen being opened is preferentially selected as the slot position type label corresponding to the slot position value, and the second slot position information is formed together.

In one example, the user makes a voice request "search for forest in Norway". The first slot information "forest in norway" may correspond to a plurality of media assets, including audio and video works sung by different singers, etc. The slot type labels corresponding to the slot values of "forest in Norway" comprise slot type labels such as "song" and "book". If the user interface of the vehicle-mounted system opens the music playing application, the second slot position information obtained through disambiguation processing is slot= { song: norway forest } ", and the music" Norway forest "is directly searched in the current music playing application.

If the information of the vehicle-mounted system interface does not exist, after the user sends out the voice request, the vehicle-mounted system interface may have wrong page skip, and the fluency of the voice interaction process is affected.

Therefore, disambiguation processing can be performed according to the page information of the vehicle-mounted system in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.

Referring to fig. 12, step 03 includes:

036: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the user preference information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.

Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information further includes user preference information. The user preference information may be obtained from a record of use of the media asset library, or a record of user history selections. When the user sends out the voice request, disambiguation processing can be performed according to the user preference information, and when the voice request is not particularly limited, the label with the largest occurrence number in the historical user preference is preferentially selected as the slot type label corresponding to the slot value, so that the second slot information is formed together.

In one example, the user makes a voice request "play me late in spring". The slot value of the first slot information, "i am about to go to late in the spring," may correspond to a plurality of media assets, including well known variety of shows. If the user preference information shows that the current user prefers to hear the sound, then the user may prefer to begin playing the sound program "i am late in spring". When the user preference information includes "like listening", the slot type tags corresponding to the slot values "i want to get late in spring" of the voice request include video slot type tags such as "tv_video", "crosswalk_video", and the like. According to the user preference information, the current user likes to hear, and the second slot information obtained through disambiguation is' slot= { cross talk_video }. The "me spring evening" variety program with higher heat will not be played, but the phase sound program "me spring evening" is played preferentially.

The following is an auxiliary description of the results of the disambiguation process and the predicted application program interface by way of four examples:

query = play trans-epoch, slot= { song: trans-epoch, album = trans-epoch }, because the heat of this song is higher than that of the album, "api = musicsearachlplay" is inferred.

query=play Zhou Jielun, slot= { singer: zhou Jielun, actor: zhou Jielun }, since Zhou Jielun is hotter as singer, it is a default to play a song, and "api=musicsearachlplay" is inferred.

query = play Guo Degang, slot = { anchor: guo Degang, actor: guo Degang }, since Guo Degang is well known as a vocal actor, it can be considered that this voice request requires a vocal work of play Guo Degang, inferring "api = xmbysearachlplay".

query=play Yang Zi, slot= { singer: yang Zi, actor: yang Zi }, yang Zi is hotter as an actor, so we consider video to play Yang Zi, deducing "api=videosearchplay".

The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the methods described above.

In the description of the present specification, reference to the terms "above," "specifically," "particularly," "further," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requests for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims

1. A method of voice interaction, comprising:

receiving a voice request forwarded by a vehicle;

2. The voice interaction method according to claim 1, wherein the step of performing slot recognition on the voice request to obtain the first slot information includes:

3. The voice interaction method according to claim 1, wherein the additional information includes a plurality of sub-information, and the performing disambiguation processing on the first slot information according to the preset additional information to obtain the second slot information includes:

4. The voice interaction method according to claim 3, wherein said performing disambiguation on the first slot information according to the sub-information to obtain second slot information includes:

5. The voice interaction method of claim 4, further comprising:

6. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

7. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

8. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

9. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:

10. A server comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the voice interaction method of any of claims 1-9.

11. A non-transitory computer readable storage medium containing a computer program, characterized in that the voice interaction method of any of claims 1-9 is implemented when the computer program is executed by one or more processors.