CN116153313A - Voice interaction method, server and computer readable storage medium - Google Patents

Voice interaction method, server and computer readable storage medium Download PDF

Info

Publication number
CN116153313A
CN116153313A CN202310374380.4A CN202310374380A CN116153313A CN 116153313 A CN116153313 A CN 116153313A CN 202310374380 A CN202310374380 A CN 202310374380A CN 116153313 A CN116153313 A CN 116153313A
Authority
CN
China
Prior art keywords
slot
information
voice interaction
value
disambiguation processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310374380.4A
Other languages
Chinese (zh)
Inventor
宁洪珂
丁鹏傑
樊骏锋
朱麒宇
赵群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202310374380.4A priority Critical patent/CN116153313A/en
Publication of CN116153313A publication Critical patent/CN116153313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a voice request forwarded by a vehicle; performing slot identification on the voice request to obtain first slot information; performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information; according to the second slot position information, carrying out application program interface prediction on the voice request; and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be performed on ambiguous slot information according to the additional information, parameter filling is performed on the predicted application program interface according to the slot information obtained after the disambiguation processing, and finally an execution result is output and issued to a vehicle to complete voice interaction. The accuracy of slot identification is effectively improved, and the voice interaction experience of the user is improved.

Description

Voice interaction method, server and computer readable storage medium
Technical Field
The present invention relates to the field of vehicle-mounted voice technologies, and in particular, to a voice interaction method, a server, and a computer readable storage medium.
Background
The current dialogue system uses a natural language generation module to analyze the sentence of the user into a semantic label which can be understood by a machine, maintains an internal dialogue state as a compact representation of the whole dialogue history through a dialogue state tracking module, uses a dialogue strategy module to select a proper dialogue action according to the state, and finally converts the dialogue action into a natural language reply through the natural language generation module. In an actual interaction scene, ambiguity may exist in the slot information extracted from the user voice request, the recognition result in the related technology may be wrong, the desired slot result cannot be extracted, and the voice interaction requirement of the user in the vehicle-mounted scene is difficult to meet.
Disclosure of Invention
The application provides a voice interaction method, a server and a computer readable storage medium.
The voice interaction method comprises the following steps:
receiving a voice request forwarded by a vehicle;
performing slot identification on the voice request to obtain first slot information;
performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information;
performing application program interface prediction on the voice request according to the second slot position information;
and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to a vehicle to complete voice interaction.
Therefore, the voice interaction method can perform disambiguation processing on the first slot information obtained by identifying the slot according to the preset additional information to obtain the second slot information without ambiguity. And the predicted application program interface can be filled with parameters according to the second slot position information, and finally the execution result is output and issued to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be carried out on ambiguous slot values according to the additional information in the slot recognition process, the accuracy of slot recognition is effectively improved, and the voice interaction experience of a user is improved.
The step of performing slot recognition on the voice request to obtain first slot information includes:
and carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain the first slot information.
Therefore, the voice request can be subjected to slot identification to obtain the slot value and at least one corresponding slot type thereof, so that the plurality of slot types can be subjected to disambiguation treatment subsequently, and a more accurate slot identification result can be obtained.
The additional information includes a plurality of sub-information, and the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
determining sub-information for performing disambiguation according to the first slot information;
and performing disambiguation processing on the first slot position information according to the sub-information to obtain second slot position information.
Therefore, sub-information can be determined according to the first slot position information and used for disambiguation processing, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.
The step of performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information includes:
and performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.
Therefore, disambiguation processing can be performed according to the corresponding relation between the slot position value and the slot position type in the sub-information, so that second slot position information is obtained, a more accurate slot position identification result is obtained, and interaction experience of users is improved.
The method further comprises the steps of:
updating the corresponding relation in response to a modification operation of the corresponding relation;
according to the corresponding relation between the slot position value and the slot position type in the preset sub-information, performing disambiguation processing on the first slot position information to obtain second slot position information, wherein the disambiguation processing comprises the following steps:
and performing disambiguation processing on the first slot position information according to the updated corresponding relation to obtain second slot position information.
In this way, the corresponding relation between the slot position value and the slot position information in the sub-information can be modified, and the second slot position information is obtained through disambiguation processing according to the updated corresponding relation, so that the parameters for filling the predicted application program interface can be determined, and finally the execution result is output and issued to the vehicle to complete the voice interaction process.
The step of performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to sentence pattern information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.
Therefore, disambiguation processing can be performed according to sentence pattern information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and interaction experience of a user is improved.
The step of performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to the media resource library heat information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.
Therefore, disambiguation processing can be performed according to the media resource library heat information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.
The step of performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the vehicle-mounted system interface information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.
Therefore, disambiguation processing can be performed according to the interface information of the vehicle-mounted system in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and interaction experience of a user is improved.
The step of performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the user preference information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.
Therefore, disambiguation processing can be performed according to the user preference information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of the user is improved.
The server of the present application comprises a processor and a memory, wherein the memory stores a computer program, and the computer program realizes the method when being executed by the processor.
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the voice interaction method of any of the above embodiments.
Therefore, the storage medium of the invention adopts the end-to-end architecture to reduce the delay of the vehicle-mounted system, improve the response speed to the user command, integrate the slot recognition result of the user voice request and the predicted additional characteristics of the application program interface, effectively improve the precision of the application program interface parameter filling task and meet the vehicle control requirement.
Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a dialogue system in the related art;
FIG. 2 is a schematic diagram of a related art end-to-end architecture dialog system;
FIG. 3 is one of the flow diagrams of the voice interaction method of the present application;
FIG. 4 is a schematic diagram of the architecture of the dialog system of the end-to-end architecture of the present application;
FIG. 5 is a second flow chart of the voice interaction method of the present application;
FIG. 6 is a third flow chart of the voice interaction method of the present application;
FIG. 7 is a flow chart of a voice interaction method of the present application;
FIG. 8 is a fifth flow chart of the voice interaction method of the present application;
FIG. 9 is a flow chart of a voice interaction method of the present application;
FIG. 10 is a flow chart of a voice interaction method of the present application;
FIG. 11 is a flowchart eighth of a voice interaction method of the present application;
fig. 12 is a flowchart illustrating a voice interaction method according to the present application.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present invention and are not to be construed as limiting the embodiments of the present invention.
Referring to fig. 1, the conventional vehicle-mounted voice architecture is based on a conventional modularized policy, and the entire dialogue flow, such as natural language understanding, state tracking, dialogue policy, natural language generation, etc., is implemented between components by division of labor. These components are either mainly manually built on rules or generated by training models on a supervised dataset. Training of each component requires a large amount of annotation data, which however tends to be expensive, which also limits the scalability of the system. Meanwhile, the traditional vehicle-mounted voice system depends on a large number of rules and business logic to ensure the accuracy and stability of the system, and the scale and the functions of the system are further limited.
From the whole processing link of the dialogue, the traditional vehicle-mounted voice architecture takes user input, and needs to perform natural language understanding, namely domain classification, intention recognition and slot recognition, then select and execute an application program interface (Application Programming Interface, API) meeting the user input requirement in the dialogue management module in combination with the dialogue state and dialogue strategy, and return system output interacting with the user through the natural language generation module.
In view of this, referring to fig. 2, the end-to-end based dialog system of the present invention includes three core algorithm modules: the slot position recognition module is used for recognizing an entity in a voice request input by a user; the action prediction (Action Prediction, AP) module is used for predicting an application program interface which corresponds to the user input and realizes the current target of the user; the parameter Filling (AF) module is used to identify the entity in the user input corresponds to the parameter in the application program interface obtained in the previous step.
The slot position identification module is used for acquiring the entity which needs to be called in the application program interface, the action prediction module determines whether the application program interface which is called by the subsequent realization of the user voice input is correct, and the parameter filling module selects which entity is used for being executed as the parameter of the application program interface.
In order for the action prediction module to work normally, slot information is introduced to predict an application program interface. In some examples, the correspondence of the user voice request (query) and slot information (slot), and the predicted target's application program interface (api) are as follows:
query = play paris polyphylla, slot = { song: paris polyphylla }, infer api = musicsearachlplay;
query = play little marbali, slot = { xmby_album }, infer api = xmbysearchlplay;
query = play tiger's dragon, slot = { video_name }, infer api = video searachlplay;
the slot information in the above example has a clear boundary, and when the slot information has ambiguity, only the corresponding slot information cannot accurately predict the target application program interface. For example, in the process of performing slot recognition on a voice request sent by a user, "query=play cross-epoch, slot= { song: cross-epoch, album: cross-epoch }", in the slot recognition result, the slot "album: cross-epoch" may cover the slot "song: cross-epoch" with higher heat. That is, the user desires to play a song through voice interaction, but eventually plays the album of the same name. In the slot information obtained through the slot identification, the slot value may be ambiguous, which may lead to inaccuracy of the application program interface (api) of the target.
Based on the above problems, referring to fig. 3, the present invention provides a voice interaction method. The voice interaction method comprises the following steps:
01: receiving a voice request forwarded by a vehicle;
02: performing slot identification on the voice request to obtain first slot information;
03: performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information;
04: according to the second slot position information, carrying out application program interface prediction on the voice request;
05: and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to the vehicle to complete voice interaction.
The invention also provides a server. The server includes a processor and a memory having a computer program stored thereon. The processor is used for receiving a voice request forwarded by the vehicle, carrying out slot recognition on the voice request to obtain first slot information, carrying out disambiguation processing on the first slot information according to preset additional information to obtain second slot information, carrying out application program interface prediction on the voice request according to the second slot information, selecting the predicted application program interface to execute application program interface parameter filling according to the second slot information and the predicted application program interface, and outputting an execution result to be issued to the vehicle to complete voice interaction.
Firstly, receiving a user voice request forwarded by a vehicle, and directly carrying out slot recognition on the received voice request to obtain first slot information. Ambiguity may exist in the first slot information, i.e., the slot values in the first slot information may correspond to a plurality of slot types. Therefore, the end-to-end architecture shown in fig. 5 can be obtained by adding the disambiguation module into the end-to-end architecture shown in fig. 2, that is, the multidirectional corresponding relationship in the first slot information can be eliminated by the preset additional information, and the second slot information after the disambiguation processing can be obtained. The preset additional information includes a description of a slot value in the first slot information. And performing disambiguation processing on the first slot information according to the preset additional information, wherein the slot value corresponds to the unique slot type in the obtained second slot information.
The correspondence in the second slot information may predict a target application program interface (api) required to perform an action in the voice request. And selecting parameters conforming to the second slot information according to the slot value and the slot type in the second slot information and the predicted application program interface, filling the predicted application program interface, and finally outputting an execution result and transmitting the execution result to the vehicle to finish voice interaction.
In one example, the user voice request "switch headrest mode", the voice interaction process is shown in 4 steps as follows:
step 1: the query= "switch headrest mode", input to slot identification module, obtain slot (slot) information, namely slot= { gui _entity_name: headrest mode, target_function: headrest mode }.
Step 2: performing disambiguation processing on query= "switching headrest mode", slot= { gui _entity_name: headrest mode, target_function: headrest mode }, and displaying preset additional information, wherein the tag corresponding to the "switching headrest mode" has a high probability of "gui", namely the second slot information is reserved with gui related slot types, specifically: slot= { gui _entity_name: headrest mode }.
Step 3: the query= "switch headrest mode", slot= { gui _entity_name: headrest mode } is input to the action prediction module, and the predicted application program interface is: api=screenpageopen.
Step 4: the query= "switch headrest mode", slot= { gui _entity_name: headrest mode }, api=screenPageopen input parameter filling module, and the parameter values corresponding to the application program interface are predicted by the model in the parameter filling module as follows: arment= { gui _entity_name: headrest mode }.
When the preset additional information is changed, only the slot type to be reserved in the second slot information in Step2 is changed in the voice interaction process, and the model in the follow-up prediction module is not changed. In some fields, such as music, video, etc., resources are relatively open and variable, and are updated frequently as additional information. And disambiguating the first slot information obtained by the slot identification, wherein a prediction model of an application program interface in the action prediction module can be changed without adding information update, so that the stability of the model is ensured.
The end-to-end architecture can simplify intermediate modules of a traditional dialogue system architecture, such as a natural language understanding module, a dialogue management module, a car machine instruction generation module, a natural language generation module and the like, reduce the call of a plurality of models in different vertical domains, reduce the delay of a vehicle-mounted system and improve the response speed to user instructions.
In summary, the voice interaction method of the present application may perform disambiguation processing on the first slot information obtained by identifying the slot according to the preset additional information, so as to obtain second slot information without ambiguity. And the predicted application program interface can be filled with parameters according to the second slot position information, and finally the execution result is output and issued to the vehicle to complete voice interaction. According to the voice interaction method, a disambiguation processing process is introduced, disambiguation processing can be carried out on ambiguous slot values according to the additional information in the slot recognition process, the accuracy of slot recognition is effectively improved, and the voice interaction experience of a user is improved.
Referring to fig. 5, step 02 includes:
021: and carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain first slot information.
The processor is used for carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain first slot information.
Specifically, after receiving a user voice request forwarded by the vehicle, the voice assistant needs to perform slot recognition on the voice request to obtain a slot value, and at least one slot type tag corresponding to the slot value can also be obtained. For example, the user sends a voice request "switch headrest mode", and inputs the voice request to the slot position recognition module, so that the slot position value is "headrest mode". The slot identification may also identify corresponding slot type tags, such as "gui _entity_name" and "target_function", for the slot value "headrest module". Finally, the obtained first slot information comprises a slot value and at least one slot type label corresponding to the slot value: "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; ".
Therefore, the voice request can be subjected to slot identification to obtain the slot value and at least one corresponding slot type thereof, so that the plurality of slot types can be subjected to disambiguation treatment subsequently, and a more accurate slot identification result can be obtained.
Referring to fig. 6, the additional information includes a plurality of sub-information, and step 03 includes:
031: determining sub-information for performing disambiguation processing according to the first slot information;
032: and performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information.
The processor is used for determining sub-information for performing disambiguation processing according to the first slot information, and performing disambiguation processing on the first slot information according to the sub-information to obtain second slot information.
Specifically, there may be ambiguity in the first slot information, that is, the slot value in the first slot information may correspond to a plurality of slot type tags. Sub-information for performing the disambiguation process may be determined based on ambiguities present in the first slot information. The sub-information includes constraints on a plurality of specific slot type labels corresponding to the slot values. The manner in which the sub-information is obtained may include matching keywords in a sentence repository, and the like. Finally, the slot value and the most likely corresponding slot type tag in the present context may together form the second slot information.
In one example, the first slot information is "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; "wherein the slot type tag corresponding to the slot value" headrest mode "includes" gui _entity_name "and" target_function ". It may be determined from the above-described first slot information that the sub-information for performing the disambiguation process requires constraint on the slot type tags "gui _entity_name" and "target_function". When the sub information is: when the label corresponding to the "switch headrest mode" is "gui", it may be obtained that the slot type label corresponding to the slot value "headrest mode" most strongly in the current context should be "gui _entity_name", and the slot type label "target_function" is omitted, so as to obtain the second slot information slot= { gui _entity_name: headrest mode }.
Therefore, sub-information can be determined according to the first slot position information and used for disambiguation processing, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.
Referring to fig. 7, step 032 includes:
0321: and performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.
The processor is used for performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.
Specifically, there may be ambiguity in the first slot information, that is, the slot value in the first slot information may correspond to a plurality of slot type tags. And the sub-information includes constraints on a plurality of specific slot type labels corresponding to the slot values. The sub-information generally indicates the correspondence between the slot value and the slot type label, and the process of disambiguation treatment is to discard the slot type label without the correspondence in the sub-information. The disambiguation process may discard the slot types in the sub-information that do not form a corresponding relationship with the slot values, and finally obtain a single slot type label corresponding to the slot values, that is, the second slot information.
In one example, the first slot information is "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; "wherein the slot type tag corresponding to the slot value" headrest mode "includes" gui _entity_name "and" target_function ". It may be determined from the above-described first slot information that the sub-information for performing the disambiguation process requires constraint on the slot type tags "gui _entity_name" and "target_function". Matching the sentence pattern resource library to obtain the sub-information: the label corresponding to the "switch headrest mode" is "gui". Then the slot type tag that corresponds most to the slot value "headrest mode" in the present context should be "gui _entry_name" to obtain the second slot information slot= { gui _entry_name: headrest mode }.
That is, the slot type corresponding to the "headrest mode" is a graphical user interface, and only the slot type label "GUI _entity_name" related to the GUI needs to be reserved.
Therefore, disambiguation processing can be performed according to the corresponding relation between the slot position value and the slot position type in the sub-information, so that second slot position information is obtained, a more accurate slot position identification result is obtained, and interaction experience of users is improved.
Referring to fig. 8, the method further includes:
06: updating the corresponding relation in response to the modification operation of the corresponding relation;
step 0321 includes:
and performing disambiguation processing on the first slot position information according to the updated corresponding relation to obtain second slot position information.
The processor is used for responding to the modification operation of the corresponding relation, updating the corresponding relation, and performing disambiguation processing on the first slot information according to the updated corresponding relation to obtain second slot information.
Specifically, the sub-information has a corresponding relation between the slot position value and the slot position type, so that disambiguation processing can be performed on the first slot position information, and finally the second slot position information is obtained. When the corresponding relation between the slot position value and the slot position type contained in the sub-information changes, the corresponding relation needs to be modified, and the corresponding relation is updated in time.
And according to the updated corresponding relation in the sub-information, performing disambiguation processing on the first slot information according to the method, and finally obtaining the second slot information.
In one example, the first slot information is "slot= { singer: zhou Jielun, actor: zhou Jielun }; the slot type label corresponding to the slot value of Zhou Jielun includes singer and actor. Since the sub information indicates that "Zhou Jielun" is more famous as singer than as actor, the first slot information is disambiguated according to the sub information, and the second slot information "slot= { singer: zhou Jielun }") can be obtained.
When the preset additional information is changed, only the slot type to be reserved in the second slot information is changed in the voice interaction process. For example, when the artist Zhou Jielun is hotter as an actor than as a singer, the correspondence in the sub-information may be changed. And performing disambiguation processing according to the updated corresponding relation, wherein the obtained second slot information is slot= { actor: zhou Jielun }). The change in the second slot information does not affect the model in the follow-up prediction module. Therefore, in the field of relatively open and changeable resources, the prediction model of the application program interface in the action prediction module can be changed without adding information update, so that the stability of the model is ensured.
In this way, the corresponding relation between the slot position value and the slot position information in the sub-information can be modified, and the second slot position information is obtained through disambiguation processing according to the updated corresponding relation, so that the parameters for filling the predicted application program interface can be determined, and finally the execution result is output and issued to the vehicle to complete the voice interaction process.
Referring to fig. 9, step 03 includes:
033: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to sentence pattern information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain second slot information.
The processor is used for performing disambiguation processing according to sentence information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type label uniquely corresponding to the slot value so as to obtain second slot information.
Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, sentence pattern information is included in the additional information. When a voice request hits a particular sentence, such as "switch xx functions," etc. And selecting a plurality of specific slot type labels corresponding to the slot values according to the sentence pattern information in the additional information, and finally, leaving the slot type labels matched with the current sentence pattern.
The sentence pattern information sources in the additional information can be the results obtained by training cloud dialogue information through a model, and can also be sentence pattern information in dialogue left by a user in the history voice interaction process. The slot value in the voice request meeting the specific sentence pattern needs to correspond to the specific slot type label, and the corresponding function can be normally realized. For example, in the voice request satisfying the sentence "switch … …", the slot type tags corresponding to "switch main driving massage", "switch sub driving rhythm", "switch xx cruising standard", "switch to user habit a" are the vehicle control type tags "target_function", but the slot type tags corresponding to "switch headrest mode", "switch steering wheel" are the scene through tag "gui _entity_name". The sentence pattern information carries out disambiguation treatment on the slot type labels which do not meet the function realization conditions, and the slot values and the slot type labels which meet the conditions jointly form second slot information.
In one example, the user issues a voice request "switch headrest mode". The slot value "headrest mode" in the first slot information may correspond to a plurality of slot type tags, such as "slot= { gui _entity_name: headrest mode, target_function: headrest mode }; ". From the above sentence pattern information, it is known that the slot type tag corresponding to the "switch headrest mode" is highly probable to be "gui", but cannot be other tags. After the disambiguation process, the second slot information "slot= { gui _entity_name: headrest mode }".
Therefore, disambiguation processing can be performed according to sentence pattern information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and interaction experience of a user is improved.
Referring to fig. 10, step 03 includes:
034: and under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, performing disambiguation processing according to the media resource library heat information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain second slot information.
The processor is used for performing disambiguation processing according to the media resource library heat information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type label uniquely corresponding to the slot value so as to obtain second slot information.
Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information includes media asset library heat information. When the voice request belongs to the field of media resources, such as playing xx songs or playing xx videos, additional information, namely describing a plurality of specific slot type labels corresponding to the slot values, comprises the heat information of the media resource library, which is periodically obtained from each large music platform. And performing disambiguation processing according to the heat information of the media resource library, and preferentially selecting a unique tag meeting certain conditions in the media resource library as a slot type tag corresponding to the slot value to jointly form second slot information when the voice request is not particularly limited. The conditions to be satisfied may be the highest heat, etc., and are not limited herein.
In one example, the user issues a voice request "play Song A". The slot value "song a" in the first slot information may correspond to a plurality of media assets, including audio and video works sung by different singers, and the like. The slot type tag corresponding to the slot value "song a" may include an audio slot type tag "song" or a video type tag "video". According to the heat information of the media resource, the most clicked number of users in the searched media files is preferentially played, for example, the clicked number of the audio files corresponding to ' music ' is higher than the clicked number of the video files corresponding to ' video ', and the second slot information obtained through disambiguation processing is ' slot= { music1: song A }.
Therefore, disambiguation processing can be performed according to the media resource library heat information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.
Referring to fig. 11, step 03 includes:
035: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the vehicle-mounted system interface information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain second slot information.
The processor is used for performing disambiguation processing according to the vehicle-mounted system interface information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type uniquely corresponding to the slot value so as to obtain second slot information.
Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information includes in-vehicle system interface information. When the voice request relates to the function of the vehicle-mounted system application, and the vehicle-mounted system application interface corresponding to the voice request is just opened in the vehicle-mounted system interface, the slot type label corresponding to the current vehicle-mounted system application interface can be directly judged to be the slot type label corresponding to the slot value extracted from the voice request. For example, when a user makes a voice request "search for norway forests", if the in-vehicle system user interface opens a music playing application, the music "norway forests" is searched directly in the current music playing application. When the reading application is opened, the book "Norway forest" is searched directly in the current reading application. The interface information of the vehicle-mounted system in the additional information is application information of the current screen being opened, when the slot position value in the voice request hits the content in a plurality of applications, the slot position type label corresponding to the application of the current screen being opened is preferentially selected as the slot position type label corresponding to the slot position value, and the second slot position information is formed together.
In one example, the user makes a voice request "search for forest in Norway". The first slot information "forest in norway" may correspond to a plurality of media assets, including audio and video works sung by different singers, etc. The slot type labels corresponding to the slot values of "forest in Norway" comprise slot type labels such as "song" and "book". If the user interface of the vehicle-mounted system opens the music playing application, the second slot position information obtained through disambiguation processing is slot= { song: norway forest } ", and the music" Norway forest "is directly searched in the current music playing application.
If the information of the vehicle-mounted system interface does not exist, after the user sends out the voice request, the vehicle-mounted system interface may have wrong page skip, and the fluency of the voice interaction process is affected.
Therefore, disambiguation processing can be performed according to the page information of the vehicle-mounted system in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of a user is improved.
Referring to fig. 12, step 03 includes:
036: and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the user preference information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.
The processor is used for performing disambiguation processing according to the vehicle-mounted system interface information in the additional information under the condition that one slot value corresponds to a plurality of slot type labels in the first slot information, and determining the slot type uniquely corresponding to the slot value so as to obtain second slot information.
Specifically, the slot values in the first slot information may correspond to a plurality of slot type tags. As shown in fig. 4, the additional information further includes user preference information. The user preference information may be obtained from a record of use of the media asset library, or a record of user history selections. When the user sends out the voice request, disambiguation processing can be performed according to the user preference information, and when the voice request is not particularly limited, the label with the largest occurrence number in the historical user preference is preferentially selected as the slot type label corresponding to the slot value, so that the second slot information is formed together.
In one example, the user makes a voice request "play me late in spring". The slot value of the first slot information, "i am about to go to late in the spring," may correspond to a plurality of media assets, including well known variety of shows. If the user preference information shows that the current user prefers to hear the sound, then the user may prefer to begin playing the sound program "i am late in spring". When the user preference information includes "like listening", the slot type tags corresponding to the slot values "i want to get late in spring" of the voice request include video slot type tags such as "tv_video", "crosswalk_video", and the like. According to the user preference information, the current user likes to hear, and the second slot information obtained through disambiguation is' slot= { cross talk_video }. The "me spring evening" variety program with higher heat will not be played, but the phase sound program "me spring evening" is played preferentially.
Therefore, disambiguation processing can be performed according to the user preference information in the additional information, and second slot position information is obtained, so that a more accurate slot position identification result is obtained, and the interactive experience of the user is improved.
The following is an auxiliary description of the results of the disambiguation process and the predicted application program interface by way of four examples:
query = play trans-epoch, slot= { song: trans-epoch, album = trans-epoch }, because the heat of this song is higher than that of the album, "api = musicsearachlplay" is inferred.
query=play Zhou Jielun, slot= { singer: zhou Jielun, actor: zhou Jielun }, since Zhou Jielun is hotter as singer, it is a default to play a song, and "api=musicsearachlplay" is inferred.
query = play Guo Degang, slot = { anchor: guo Degang, actor: guo Degang }, since Guo Degang is well known as a vocal actor, it can be considered that this voice request requires a vocal work of play Guo Degang, inferring "api = xmbysearachlplay".
query=play Yang Zi, slot= { singer: yang Zi, actor: yang Zi }, yang Zi is hotter as an actor, so we consider video to play Yang Zi, deducing "api=videosearchplay".
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the methods described above.
In the description of the present specification, reference to the terms "above," "specifically," "particularly," "further," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requests for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims (11)

1. A method of voice interaction, comprising:
receiving a voice request forwarded by a vehicle;
performing slot identification on the voice request to obtain first slot information;
performing disambiguation processing on the first slot position information according to preset additional information to obtain second slot position information;
performing application program interface prediction on the voice request according to the second slot position information;
and selecting the predicted application program interface to execute application program interface parameter filling according to the second slot position information and the predicted application program interface, outputting an execution result and transmitting the execution result to a vehicle to complete voice interaction.
2. The voice interaction method according to claim 1, wherein the step of performing slot recognition on the voice request to obtain the first slot information includes:
and carrying out slot identification on the voice request to obtain a slot value and at least one slot type label corresponding to the slot value so as to obtain the first slot information.
3. The voice interaction method according to claim 1, wherein the additional information includes a plurality of sub-information, and the performing disambiguation processing on the first slot information according to the preset additional information to obtain the second slot information includes:
determining sub-information for performing disambiguation according to the first slot information;
and performing disambiguation processing on the first slot position information according to the sub-information to obtain second slot position information.
4. The voice interaction method according to claim 3, wherein said performing disambiguation on the first slot information according to the sub-information to obtain second slot information includes:
and performing disambiguation processing on the first slot information according to the corresponding relation between the slot value and the slot type in the preset sub-information to obtain second slot information.
5. The voice interaction method of claim 4, further comprising:
updating the corresponding relation in response to a modification operation of the corresponding relation;
according to the corresponding relation between the slot position value and the slot position type in the preset sub-information, performing disambiguation processing on the first slot position information to obtain second slot position information, wherein the disambiguation processing comprises the following steps:
and performing disambiguation processing on the first slot position information according to the updated corresponding relation to obtain second slot position information.
6. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to sentence pattern information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.
7. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, performing disambiguation processing according to the media resource library heat information in the additional information, and determining the slot type label uniquely corresponding to the slot value to obtain the second slot information.
8. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the vehicle-mounted system interface information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.
9. The voice interaction method according to any one of claims 1 to 5, wherein the performing disambiguation processing on the first slot information according to the preset additional information to obtain second slot information includes:
and under the condition that a plurality of slot type labels corresponding to one slot value exist in the first slot information, disambiguating according to the user preference information in the additional information, and determining the slot type uniquely corresponding to the slot value to obtain the second slot information.
10. A server comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the voice interaction method of any of claims 1-9.
11. A non-transitory computer readable storage medium containing a computer program, characterized in that the voice interaction method of any of claims 1-9 is implemented when the computer program is executed by one or more processors.
CN202310374380.4A 2023-04-07 2023-04-07 Voice interaction method, server and computer readable storage medium Pending CN116153313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310374380.4A CN116153313A (en) 2023-04-07 2023-04-07 Voice interaction method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310374380.4A CN116153313A (en) 2023-04-07 2023-04-07 Voice interaction method, server and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116153313A true CN116153313A (en) 2023-05-23

Family

ID=86350884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310374380.4A Pending CN116153313A (en) 2023-04-07 2023-04-07 Voice interaction method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116153313A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985249A (en) * 2020-09-03 2020-11-24 贝壳技术有限公司 Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN113505591A (en) * 2020-03-23 2021-10-15 华为技术有限公司 Slot position identification method and electronic equipment
CN114595696A (en) * 2022-03-03 2022-06-07 Oppo广东移动通信有限公司 Entity disambiguation method, entity disambiguation apparatus, storage medium, and electronic device
CN115064166A (en) * 2022-08-17 2022-09-16 广州小鹏汽车科技有限公司 Vehicle voice interaction method, server and storage medium
CN115083413A (en) * 2022-08-17 2022-09-20 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505591A (en) * 2020-03-23 2021-10-15 华为技术有限公司 Slot position identification method and electronic equipment
CN111985249A (en) * 2020-09-03 2020-11-24 贝壳技术有限公司 Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN114595696A (en) * 2022-03-03 2022-06-07 Oppo广东移动通信有限公司 Entity disambiguation method, entity disambiguation apparatus, storage medium, and electronic device
CN115064166A (en) * 2022-08-17 2022-09-16 广州小鹏汽车科技有限公司 Vehicle voice interaction method, server and storage medium
CN115083413A (en) * 2022-08-17 2022-09-20 广州小鹏汽车科技有限公司 Voice interaction method, server and storage medium

Similar Documents

Publication Publication Date Title
CN107832434B (en) Method and device for generating multimedia play list based on voice interaction
WO2021232957A1 (en) Response method in man-machine dialogue, dialogue system, and storage medium
CN109165302B (en) Multimedia file recommendation method and device
CN108182229B (en) Information interaction method and device
CN107210033A (en) The language understanding sorter model for personal digital assistant is updated based on mass-rent
KR20180044481A (en) Method and system for providing recommendation query using search context
CN115064167B (en) Voice interaction method, server and storage medium
CN115083413B (en) Voice interaction method, server and storage medium
CN113421561B (en) Voice control method, voice control device, server, and storage medium
CN107844587B (en) Method and apparatus for updating multimedia playlist
JP2021179979A (en) Method for extracting attribute of item for shopping search
CN115064166A (en) Vehicle voice interaction method, server and storage medium
De Valk et al. MIRchiving: Challenges and opportunities of connecting MIR research and digital music archives
CN112765398A (en) Information recommendation method and device and storage medium
CN110795547A (en) Text recognition method and related product
CN116092494B (en) Voice interaction method, server and computer readable storage medium
KR20230152629A (en) Method, system, and computer readable record medium for generating reformulated query
CN116153313A (en) Voice interaction method, server and computer readable storage medium
CN111142728A (en) Vehicle-mounted environment intelligent text processing method and device, electronic equipment and storage medium
Yin et al. Context-uncertainty-aware chatbot action selection via parameterized auxiliary reinforcement learning
CN111339291B (en) Information display method and device and storage medium
CN116110397B (en) Voice interaction method, server and computer readable storage medium
CN118093792B (en) Method, device, computer equipment and storage medium for searching object
KR102446300B1 (en) Method, system, and computer readable record medium to improve speech recognition rate for speech-to-text recording
CN116092495B (en) Voice interaction method, server and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230523

RJ01 Rejection of invention patent application after publication