CN115579008B - Voice interaction method, server and computer readable storage medium - Google Patents

Voice interaction method, server and computer readable storage medium Download PDF

Info

Publication number
CN115579008B
CN115579008B CN202211551227.6A CN202211551227A CN115579008B CN 115579008 B CN115579008 B CN 115579008B CN 202211551227 A CN202211551227 A CN 202211551227A CN 115579008 B CN115579008 B CN 115579008B
Authority
CN
China
Prior art keywords
reply text
clause
voice request
voice
clauses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211551227.6A
Other languages
Chinese (zh)
Other versions
CN115579008A (en
Inventor
张煜
郑毅
湛志强
李晨延
易晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202211551227.6A priority Critical patent/CN115579008B/en
Publication of CN115579008A publication Critical patent/CN115579008A/en
Application granted granted Critical
Publication of CN115579008B publication Critical patent/CN115579008B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a voice request forwarded by a vehicle; in the case that the voice request comprises a plurality of clauses, extracting a plurality of pieces of structured information comprising each clause in the voice request and an initial reply text for each clause; and generating a fusion reply text aiming at the voice request according to the logic relation among the plurality of pieces of structural information so as to complete voice interaction. In the application, a plurality of structural information including each clause in a user voice request and an initial reply text for each clause can be extracted, a smooth and clear fusion reply text for the voice request is generated by combining the logical relationship among the structural information, and finally voice interaction is completed. According to the voice interaction method, when a user sends a voice request containing a plurality of clauses, the reply text is not mechanically generated, but a fusion reply text which is more in line with the language communication habit is generated and fed back to the user, and voice interaction is completed.

Description

Voice interaction method, server and computer readable storage medium
Technical Field
The present application relates to the field of vehicle-mounted voice technologies, and in particular, to a voice interaction method, a server, and a computer-readable storage medium.
Background
Currently, in-vehicle voice technology may enable a user to interact within a vehicle cabin through voice, such as controlling vehicle components or interacting with components in an in-vehicle system user interface. When a voice request sent by a user comprises a plurality of instruction clauses, the reply text generated according to the sequence of the clauses has the problems of hard or ambiguous semantics and the like, the fluency and the accuracy of voice interaction are influenced, and the user experience is poor.
Disclosure of Invention
The application provides a voice interaction method, a server and a computer readable storage medium.
The voice interaction method comprises the following steps:
receiving a voice request forwarded by a vehicle;
extracting a plurality of structured messages including each clause in the voice request and an initial reply text for each clause if the voice request comprises a plurality of clauses;
and generating a fusion reply text aiming at the voice request according to the logical relation among the plurality of pieces of structural information so as to complete the voice interaction.
In this way, in the present application, when the voice request issued by the user includes multiple clauses, the voice request may be first subjected to sentence-breaking processing, and multiple pieces of structured information including each clause in the voice request and the initial reply text obtained for each clause may be extracted. And combining the logical relation among the structural information to generate a smooth and clear fusion reply text aiming at the voice request, and finally completing the voice interaction. According to the voice interaction method, when a user sends a voice request containing a plurality of clauses, the reply texts are not broadcasted in sequence mechanically, and fusion reply texts which are more in line with the language communication habit are generated and fed back to the user, so that voice interaction is completed. The fluency of human-vehicle voice interaction can be improved, and the user experience is improved.
The extracting, in the case that the voice request includes multiple clauses, multiple pieces of structured information including each clause in the voice request and an initial reply text for each clause includes:
performing sentence breaking processing on the voice request;
and according to the sentence breaking processing result, under the condition that the speech request can break sentences, extracting a plurality of structured information including each sentence obtained by sentence breaking processing and an initial reply text aiming at each sentence.
In this way, the speech request containing a plurality of instructions is subjected to sentence-breaking processing according to the instruction content, and a plurality of structural information including each sentence obtained after sentence-breaking and the initial reply text for each sentence are extracted, so that the logical relationship among the structural information is determined, and the fusion reply text is generated.
The extracting, in the case that the voice request includes multiple clauses, multiple pieces of structured information including each clause in the voice request and an initial reply text for each clause, includes:
in the case that the voice request is uninterruptable and includes multiple sentences from different tonal users in a vehicle cabin, extracting multiple structured messages including each of the sentences in the voice request and an initial reply text for each of the sentences.
In this manner, when a user voice request includes multiple phonemes without sentence breaks, multiple structured messages including each sentence and the initial reply text for each sentence can be extracted to determine the logical relationship between the structured messages to generate a fused reply text.
The extracting, in the case that the voice request includes multiple clauses, multiple pieces of structured information including each clause in the voice request and an initial reply text for each clause, includes:
and extracting a plurality of structural information including each clause and each initial reply text according to a pre-constructed vehicle-mounted system function map, wherein the vehicle-mounted system function map is established according to vehicle parts, part attachment relations, part functions and operable attributes of the part functions.
Therefore, the structural information of the multiple clauses of the same phoneme region user or the multiple clauses of different phoneme region users and the initial reply text aiming at each clause can be extracted according to the pre-constructed vehicle-mounted system function map, so that the fusion reply text can be generated according to the logical relationship among the structural information.
The extracting of a plurality of structured information including each clause and each initial reply text according to a pre-constructed vehicle-mounted system function map comprises the following steps:
extracting slot position information of each clause and each initial reply text, and carrying out normalization processing on the slot position information;
and matching the slot position information after the normalization processing to the functional map of the vehicle-mounted system so as to obtain the structural information.
Therefore, the corresponding structural information can be obtained by matching the information in the pre-constructed vehicle-mounted system function map according to the extracted multiple clauses of the same sound zone user or the extracted multiple clauses of different sound zone users and the result of normalization processing on the slot position information of the initial reply text of each clause, so that the fusion reply text can be generated according to the logical relationship in the structural information.
The method further comprises the following steps:
under the condition that the semantics of the slot information in the clauses meet the preset requirement, the first slot information obtained by extracting the clauses is adopted as the structural information, so that the fusion reply text is generated according to the structural information;
and under the condition that the semantics of the slot position information in the clause does not meet the preset requirement, adopting second slot position information obtained by extracting the initial reply text as the structural information so as to generate the fusion reply text according to the structural information.
Therefore, the clauses after the sentence break is requested by the user voice or the slot position information is extracted from the initial reply text to be used as the structural information according to the preset requirement, so that the fused reply text which is more in line with the requirements of the user can be generated.
Generating a fusion reply text aiming at the voice request according to the logical relationship among the plurality of pieces of structural information so as to complete the voice interaction, wherein the method comprises the following steps of:
and identifying a logical relationship among the plurality of pieces of structured information according to a pre-stored vehicle-mounted system function preset logical relationship, wherein the preset logical relationship comprises dependence, conflict and the like.
Therefore, the logical relationship among the plurality of pieces of structural information is identified according to the pre-stored vehicle-mounted system function preset logical relationship, so that the voice interaction is completed aiming at the reply text of the voice request.
Generating a fusion reply text aiming at the voice request according to the logic relation among the plurality of pieces of structural information so as to complete the voice interaction, wherein the method comprises the following steps:
and generating the fusion reply text according to the logic relation and a predefined reply template.
Therefore, the fusion reply text with clear and standard semantics can be generated according to the logic relation and the predefined reply template determined according to the logic relation, and the user experience is improved.
The server of the application comprises a processor and a memory, wherein the memory stores a computer program, and the computer program realizes the method when being executed by the processor.
The computer-readable storage medium of the present application stores a computer program that, when executed by one or more processors, implements the method described above.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a voice interaction method according to the present application;
FIG. 2 is a second flowchart of the voice interaction method of the present application;
FIG. 3 is a third flowchart of the voice interaction method of the present application;
FIG. 4 is a fourth flowchart of the voice interaction method of the present application;
FIG. 5 is a schematic diagram of an on-board system function map of the present application;
FIG. 6 is a fifth flowchart illustrating a voice interaction method of the present application;
FIG. 7 is a sixth flowchart illustrating a voice interaction method of the present application;
FIG. 8 is a seventh schematic flow chart of the voice interaction method of the present application;
FIG. 9 is an eighth flowchart of the voice interaction method of the present application;
FIG. 10 is a ninth flowchart illustrating a voice interaction method of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.
With the development and popularization of vehicle electronic technology, a vehicle can perform voice interaction with a user, namely, a voice request of the user can be recognized and finally an intention in the voice request of the user is completed. The human-vehicle voice interaction function meets various experiences of a driver and passengers in the driving process. However, as the vehicle-mounted voice assistant gradually opens the function of multi-voice area and multi-person conversation, a plurality of voice requests of the user do not interact with the vehicle-mounted system voice assistant sequentially any more, but the plurality of persons interact with the vehicle-mounted system intensively. In the related art, the reply text of the vehicle-mounted voice assistant to the voice request of the user is generally in a 'command-answer' form, and in a multi-person conversation, a 'multi-command-answer' or a 'multi-command-multi-answer' proceeding mode may exist. In one example, the primary driver user issues a voice request containing a continuous command "set to sport mode, turn seat heating on", and the in-vehicle system replies: "good, good", there is the phenomenon that the recovery object is ambiguous. In other examples, if multiple users in multiple zones continuously make voice requests, such as the primary driver making a voice request "wind direction turns to blow head" followed by the secondary driver making a voice request "wind direction turns to blow foot", the reply may become confusing in such cases: "blow head, blow foot open cheer". How to make clear reply and clear operation under the condition that the voice request comprises a plurality of instruction clauses is a problem to be solved urgently by the vehicle-mounted voice recognition function.
Therefore, in the related technology, the degree of intellectualization of voice interaction between the user and the vehicle is low, the user cannot deal with a complex application scene, or the user needs to realize the voice interaction through complex operation, and the willingness of the user to use the function is reduced.
Based on the above problems that may be encountered, referring to fig. 1 and fig. 2, the present application provides a voice interaction method, including:
01: receiving a voice request forwarded by a vehicle;
02: in the case that the voice request comprises a plurality of clauses, extracting a plurality of pieces of structured information comprising each clause in the voice request and an initial reply text for each clause;
03: and generating a fusion reply text aiming at the voice request according to the logical relation among the plurality of pieces of structural information so as to complete voice interaction.
The application also provides a server comprising a memory and a processor. The voice processing method can be realized by the server. Specifically, the memory stores a computer program, and the processor is configured to extract a plurality of structured messages including each clause in the voice request and an initial reply text for each clause in the voice request in case that the voice request includes a plurality of clauses, and generate a fused reply text for the voice request according to a logical relationship between the plurality of structured messages to complete the voice interaction.
In this case, the user voice request includes a plurality of clauses, for example, a voice request including a plurality of instructions issued by the same user, such as "set to exercise mode, turn on seat heating". In addition, the voice request can also be a voice request containing a plurality of instructions sent by different users, for example, the voice request comprises that the main driving user sends the 'wind direction to blow the head' and the auxiliary driving user sends the 'wind direction to blow the feet'.
The initial reply text is a default reply text set for the execution state of the vehicle for each instruction generated according to the user voice request. The structured information is a normalized result including information such as requesting each clause and an initial reply text for each clause from a user's voice having different expression modes but the same meaning. The fusion reply text is generated according to the logical relationship between the initial reply text and the structural information of each clause, so that the problem of unclear sequential reply semantics can be solved, and the sentences are smooth.
The method and the device can support the fusion reply of the voice request containing a plurality of instructions. Because a voice request sent by a user in a short time comprises a plurality of instructions, functional conflicts may exist among the instructions, and clear and understandable sentences in the fusion reply text need to be ensured according to the logical relationship among the instructions. In the above example, the user makes a voice request "set to sport mode, turn on seat heating", the in-vehicle system no longer reverts to "good, good" in order, but reverts to "sport mode and seat heating are both on cheer". The voice interaction method can generate a fusion reply text for the voice request of the user containing a plurality of clauses, inform the user of the execution condition of the voice request instruction by a vehicle, feed back the reasons of incapability of executing certain instructions and improve the fluency of the process of controlling the vehicle by the user through voice.
After receiving a user voice request containing a plurality of clauses, the server can process the voice request, extract each clause of the voice request and the initial reply text aiming at each clause for processing to obtain corresponding structural information, and generate a fusion reply text aiming at the voice request by combining the logical relationship between the structural information of each clause so as to complete the voice interaction process.
In summary, in the present application, when the voice request issued by the user includes multiple clauses, the voice request may be first subjected to sentence-breaking processing, and multiple pieces of structured information including each clause in the voice request and the initial reply text obtained for each clause may be extracted. And combining the logical relation among the structural information to generate a smooth and clear fusion reply text aiming at the voice request, and finally completing the voice interaction. According to the voice interaction method, when a user sends a voice request containing a plurality of clauses, the reply texts are not broadcasted in sequence mechanically, and fusion reply texts which are more in line with the language communication habit are generated and fed back to the user, so that voice interaction is completed. The fluency of human-vehicle voice interaction can be improved, and the user experience is improved.
Referring to fig. 2, step 02 includes:
021: sentence breaking processing is carried out on the voice request;
022: and according to the result of sentence breaking processing, under the condition that the sentence can be broken by the voice request, extracting a plurality of structured information including each clause obtained by the sentence breaking processing and the initial reply text aiming at each clause.
The processor is used for performing sentence breaking processing on the voice request and extracting a plurality of pieces of structural information including each clause obtained by the sentence breaking processing and the initial reply text aiming at each clause under the condition that the voice request can be broken according to the sentence breaking processing result.
Specifically, sentence-breaking processing is performed on the received voice request, and if a plurality of clauses are obtained through the sentence-breaking processing, an initial reply text for each clause can be obtained according to the clauses obtained through the sentence-breaking processing. The model for sentence-breaking processing may be a whole sentence-breaking model for performing a sentence-breaking process by using a breakpoint and a corresponding confidence. Further, the multiple clauses processed by the sentence break and the structural information of the initial reply text for each clause can be extracted.
In one example, for a single-person multi-instruction scene, for example, when a main driving area user sends a voice request to turn on an air conditioner wiper slow-adjustment point, the voice request can be divided into two clauses according to a whole sentence and punctuation model, and the processing result is "turn on the air conditioner # wiper slow-adjustment point". According to the result of the sentence segmentation processing, the structured information comprising two clauses, namely 'air conditioner on' and 'wiper slow-down point' in the voice request of the user in the main driving area and an initial reply text aiming at each clause, namely 'good' and 'wiper sensitivity turn-down cheer' can be extracted.
In other examples, there may be situations where multiple soundtrack users each make a voice request that requires sentence break processing. For example, in the above example, the main driving area user sends a voice request to turn on the air conditioner wiper slow-down point, the assistant driving user immediately sends a voice request to turn on the reading lamp to turn on music, and four clauses of "turn on the air conditioner", "wiper slow-down point", "turn on the reading lamp" and "turn on music" are obtained after sentence break processing. The sentence breaking process is not influenced by the number of instructions in the user voice request.
Therefore, sentence breaking processing can be carried out on the voice request containing a plurality of instructions according to the instruction content, and a plurality of structural information including each clause obtained after sentence breaking and the initial reply text aiming at each clause are extracted, so that the logical relation among the structural information is determined, and the fusion reply text is generated.
Referring to fig. 3, step 02 further includes:
023: in the case of a speech request uninterruptable sentence and including multiple clauses from different range users in the vehicle cabin, multiple structured messages are extracted including each clause in the speech request and the initial reply text for each clause.
The processor is configured to extract a plurality of structured messages including each clause in the voice request and an initial reply text for each clause in the case where the voice request is an indestructible sentence and includes a plurality of clauses from users in different vocal regions in the vehicle cabin.
Specifically, under the condition that the voice requests sent by users in different sound zones do not need to be subjected to sentence break processing, the initial reply text for each clause can be obtained according to the clause sent by each user in each sound zone. Further, structured information for each clause and the initial reply text for each clause may be extracted.
In one example, for a multi-person and multi-instruction scene, for example, when the main driving area user sends a voice request to turn on an air conditioner, and when the auxiliary driving area user simultaneously sends a voice request to turn on a wiper slow-down point, no sentence break is needed for both sentences. The method can extract structural information including a main driving area user voice request clause of ' turning on an air conditioner ' and a secondary driving area user voice request clause of ' wiper slow-down point ' and an initial reply text aiming at each clause, namely ' turning on the air conditioner ' and ' wiper sensitivity lowering.
In this manner, when a user voice request includes multiple phonemes without sentence breaks, multiple structured messages including each clause and the initial reply text for each clause may be extracted to determine a logical relationship between the structured messages to generate a fused reply text.
Referring to fig. 4, step 02 further includes:
024: and extracting a plurality of structural information including each clause and each initial reply text according to a pre-constructed vehicle-mounted system function map.
The processor is used for extracting a plurality of pieces of structural information including each clause and each initial reply text according to a pre-constructed vehicle-mounted system function map.
Specifically, in the present application, a pre-constructed on-board system function map may be established according to vehicle components, component affiliations, component functions, and operational attributes of the component functions.
In one example, the pre-constructed in-vehicle system function map may be a tree structure as shown in fig. 5, and the category of the in-vehicle system function map is not limited herein. Extracting a plurality of structured information including each clause in the sentence-breaking result and an initial text aiming at each clause after the sentence-breaking result, and extracting a plurality of structured information { "feature": air conditioner "," operation ": open" and { "feature": wiper "," operation ": adjust" according to a pre-constructed vehicle-mounted system function map when the clauses obtained after the sentence-breaking are 'air conditioner on' and 'wiper slow-down point'.
Therefore, the multiple clauses of the same voice zone user or the multiple clauses of different voice zone users and the structural information of the initial reply text aiming at each clause can be extracted according to the pre-constructed vehicle-mounted system function map, so that the fusion reply text is generated according to the logical relationship among the structural information.
Referring to fig. 6, step 024 includes:
0241: extracting slot position information of each clause and each initial reply text, and carrying out normalization processing on the slot position information;
0242: and matching the slot position information after the normalization processing to a functional map of the vehicle-mounted system so as to obtain structural information.
The processor is used for extracting the slot position information of each clause and each initial reply text, carrying out normalization processing on the slot position information, and matching the slot position information after the normalization processing into a functional map of the vehicle-mounted system, thereby obtaining structural information.
Specifically, a plurality of clauses obtained by sentence-breaking processing of a voice request sent by a user in the same sound zone or a plurality of clauses of users in different sound zones and slot position information of an initial reply text for each clause can be extracted, and normalization processing is performed on the extracted slot position information. And matching the slot position information after the normalization processing with information in a pre-constructed vehicle-mounted system function map to finally obtain corresponding structural information. The structured information may be a business map representing the voice request or the initial reply text. In the present application, the normalization process is a process of giving a uniform meaning to words having different expressions but the same meaning.
In one example, a user sends a voice request "open an air conditioner wiper slow-down point", and a sentence breaking result is obtained through sentence breaking processing, wherein the extracted slot position information in the sentence division "open the air conditioner" comprises "open" and "air conditioner", and since the slot position information extracted in the voice request of the user can accurately express vehicle parts and operation attributes thereof in a vehicle-mounted system function map, the normalization process does not need to change an expression mode, and structural information { "feature": air conditioner and "open" } can be obtained through matching, so that a service map as shown in fig. 7 is formed. For the second clause "wiper slow-down point" in the above example, the extracted slot position information includes "wiper" and "slow-down", and since the "wiper" and the "slow-down" are not clearly corresponding to each other in the vehicle-mounted system function map, the "wiper" can be normalized to "wiper sensitivity", and then structured information { "feature": the "wiper sensitivity", "operation": the "adjust" is obtained by matching, so that the service map shown in fig. 8 is formed.
Therefore, the corresponding structural information can be obtained by matching the information in the pre-constructed vehicle-mounted system function map according to the extracted multiple clauses of the same sound zone user or the extracted multiple clauses of different sound zone users and the result of normalization processing on the slot position information of the initial reply text of each clause, so that the fusion reply text can be generated according to the logical relationship in the structural information.
The voice interaction method further comprises the following steps:
under the condition that the semantics of the slot information in the clauses meet the preset requirement, the first slot information obtained by extracting the clauses is adopted as structural information to generate a fusion reply text according to the structural information;
and under the condition that the semantics of the slot information in the clauses do not meet the preset requirements, adopting second slot information obtained by extracting the initial reply text as structural information to generate a fusion reply text according to the structural information.
The processor is used for adopting the first slot information obtained by extracting the clause as the structural information under the condition that the semantics of the slot information in the clause meet the preset requirement so as to generate the fusion reply text according to the structural information, and adopting the second slot information obtained by extracting the initial reply text as the structural information under the condition that the semantics of the slot information in the clause do not meet the preset requirement so as to generate the fusion reply text according to the structural information.
Specifically, the user can set the predetermined requirement of the clause slot information according to personal requirements. When the semantics of the slot position information extracted from the multiple clauses obtained by sentence breaking processing of the voice request sent by the user in the same sound zone or the multiple clauses of the users in different sound zones meet the preset requirement, the first slot position information obtained by extracting each clause is directly adopted as the structural information, so that the fusion reply text is generated according to the structural information.
And if the semantics of the slot position information extracted from the clause do not meet the preset requirement, discarding the slot position information extracted from the clause, and extracting second slot position information obtained from the initial reply text as structural information to generate a fusion reply text according to the structural information.
In the above example, the user issues a voice request "turn on the air conditioner wiper slow-adjustment point", and the sentence-breaking result is "turn on the air conditioner # wiper slow-adjustment point", where the first slot position information extracted from the sentence "turn on the air conditioner" includes "turn on" and "air conditioner". When the user sets the predetermined requirement to be more accurate official, the clause of "opening the air conditioner" corresponds to the initial reply text "well", as shown in fig. 9, since the first slot information "open" and "air conditioner" extracted by the clause are more accurate, the first slot information is directly adopted as the structured information for generating the fused reply text. The clause "wiper slow-down point" corresponds to the initial reply text "wiper sensitivity slow-down", as shown in fig. 10, since the slot position information "wiper" and "slow-down" extracted by the clause is not an official expression and does not meet the predetermined requirements of the user, the initial reply text is extracted to obtain the second slot position information "wiper sensitivity" and "slow-down" as the structural information for generating the fusion reply text.
Therefore, the clause after the user voice request is interrupted or the slot information is extracted from the initial reply text according to the preset requirement to be used as the structural information, so that the fused reply text which better meets the requirements of the user can be generated.
Generating a fusion reply text aiming at the voice request according to the logical relation among the plurality of pieces of structural information so as to complete voice interaction, wherein the method comprises the following steps:
and identifying the logical relationship among the plurality of pieces of structural information according to a pre-stored vehicle-mounted system function preset logical relationship.
The processor is used for identifying the logic relation among the plurality of pieces of structural information according to the pre-stored vehicle-mounted system function preset logic relation.
Specifically, the on-board system server may store a preset logical relationship between a plurality of pieces of structural information. Taking the on-board air conditioner and its operating attributes as an example, as shown in table 1, dependencies, conflicts, and others are included. The dependency relationship may be understood that the implementation of the object one needs to depend on the object two, for example, the relationship between the "air conditioner" and the "temperature" is a dependency relationship, and it can be understood that the adjustment of the "temperature" needs to be implemented by the "air conditioner", and the adjustment of the temperature cannot be implemented when the air conditioner is not in operation. The conflict relationship can be understood that the first object and the second object cannot be simultaneously realized, for example, "unidirectional wind" and "mirror wind" are two different air outlet modes of the air conditioner, and the two objects cannot be simultaneously started. Other relationships indicate that the two relationships are neither dependent on nor conflicting, for example, the relationship between "unidirectional wind" and "temperature" is other because the adjustment of the air-conditioning wind direction and temperature have neither dependent nor conflicting relationships with each other.
TABLE 1
Figure 670873DEST_PATH_IMAGE001
The logical relationship among the plurality of structured information can be identified and obtained according to the pre-stored vehicle-mounted system function preset logical relationship shown in table 1, and the obtained vehicle-mounted system function relationship matrix is shown in table 2:
TABLE 2
Figure 66082DEST_PATH_IMAGE002
Therefore, the logic relation among the plurality of pieces of structural information is identified according to the pre-stored vehicle-mounted system function preset logic relation, so that the voice interaction is completed aiming at the reply text of the voice request.
Generating a fusion reply text aiming at the voice request according to the logical relation among the plurality of pieces of structural information so as to complete voice interaction, wherein the method comprises the following steps:
and generating a fusion reply text according to the logic relation and the predefined reply template.
The processor is used for generating a fusion reply text according to the logic relation and the predefined reply template.
Specifically, the pre-defined reply template may be filled with information in a pre-stored vehicle-mounted function preset logical relationship to form a fusion reply text. The predefined reply template includes sentence components connecting the logical relationships among the plurality of structured information. And when the conflict sentences exist in the user voice request, the fusion reply text with clear expression and smooth semantics is obtained.
In one example, the speech request sentence-breaking processing sent by users in different sound zones obtains two clauses of 'air conditioner on unidirectional wind' and 'air conditioner mirror on' respectively. The internal logic of the predefined reply template may be set to: "last of the dependency, conflict attributes, and other attributes are the functions that are ultimately executed" from which predefined reply templates "< dependent execution > opened, < conflict relationship > not operable, help you call < other execution >" are available ". And finally generating a fusion reply text according to the slot position information extracted from the two clauses, wherein the fusion reply text is that the air conditioner is opened, the unidirectional wind and the mirror wind cannot be operated simultaneously, and the temperature and the mirror wind are adjusted, and the generated fusion reply text can comprise the execution state of each instruction in the voice request of the user and the feedback of the instruction which cannot be executed because the execution condition is not met.
Therefore, the fusion reply text with clear and standard semantics can be generated according to the logic relationship and the predefined reply template determined according to the logic relationship, and the user experience is improved.
The computer-readable storage medium of the present application stores a computer program that, when executed by one or more processors, implements the method described above.
In the description of the present specification, reference to the description of the terms "above", "specifically", etc., means that a particular feature, structure, material, or characteristic described in connection with an embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requirements for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application and that variations, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A method of voice interaction, comprising:
receiving a voice request forwarded by a vehicle;
extracting a plurality of structured messages including each clause in the voice request and an initial reply text for each clause if the voice request comprises a plurality of clauses;
and generating a fusion reply text aiming at the voice request according to the logical relation among the plurality of pieces of structural information so as to complete the voice interaction.
2. The method of claim 1, wherein the extracting a plurality of structured information including each clause in the voice request and an initial reply text for each clause in the voice request in case that the voice request includes a plurality of clauses comprises:
performing sentence breaking processing on the voice request;
and according to the sentence breaking processing result, under the condition that the speech request can break sentences, extracting a plurality of structured information including each sentence obtained by sentence breaking processing and an initial reply text aiming at each sentence.
3. The method of claim 2, wherein the extracting a plurality of structured information including each clause in the voice request and the initial reply text for each clause in the voice request in case that the voice request includes a plurality of clauses comprises:
in the case where the voice request is uninterruptable and includes multiple clauses from different range users in a vehicle cabin, extracting a plurality of the structured information including each of the clauses in the voice request and an initial reply text for each of the clauses.
4. The method of claim 2 or claim 3, wherein the extracting a plurality of structured information including each clause and an initial reply text for each clause in the voice request in the case that the voice request includes a plurality of clauses comprises:
and extracting a plurality of structural information including each clause and each initial reply text according to a pre-constructed vehicle-mounted system function map, wherein the vehicle-mounted system function map is established according to vehicle parts, part attachment relations, part functions and operable attributes of the part functions.
5. The method according to claim 4, wherein the extracting a plurality of structured information including each of the clauses and each of the initial reply texts according to a pre-constructed in-vehicle system function map comprises:
extracting slot position information of each clause and each initial reply text, and carrying out normalization processing on the slot position information;
and matching the slot position information after the normalization processing to the functional map of the vehicle-mounted system so as to obtain the structural information.
6. The voice interaction method of claim 5, further comprising:
under the condition that the semantics of the slot information in the clauses meet the preset requirement, the first slot information obtained by extracting the clauses is adopted as the structural information, so that the fusion reply text is generated according to the structural information;
and under the condition that the semantics of the slot position information in the clause does not meet the preset requirement, adopting second slot position information obtained by extracting the initial reply text as the structural information so as to generate the fusion reply text according to the structural information.
7. The method of claim 6, wherein the generating a fused reply text for the voice request according to the logical relationship between the plurality of pieces of structured information to complete the voice interaction comprises:
and identifying a logical relationship among the plurality of pieces of structured information according to a pre-stored vehicle-mounted system function preset logical relationship, wherein the preset logical relationship comprises dependence, conflict and the like.
8. The method of claim 7, wherein the generating a fused reply text for the voice request according to the logical relationship between the plurality of pieces of structured information to complete the voice interaction comprises:
and generating the fusion reply text according to the logic relation and a predefined reply template.
9. A server, characterized in that the server comprises a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, carries out the method of any one of claims 1-8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by one or more processors, implements the method of any one of claims 1-8.
CN202211551227.6A 2022-12-05 2022-12-05 Voice interaction method, server and computer readable storage medium Active CN115579008B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211551227.6A CN115579008B (en) 2022-12-05 2022-12-05 Voice interaction method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211551227.6A CN115579008B (en) 2022-12-05 2022-12-05 Voice interaction method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN115579008A CN115579008A (en) 2023-01-06
CN115579008B true CN115579008B (en) 2023-03-31

Family

ID=84590680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211551227.6A Active CN115579008B (en) 2022-12-05 2022-12-05 Voice interaction method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115579008B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004021028A (en) * 2002-06-18 2004-01-22 Toyota Central Res & Dev Lab Inc Speech interaction system and speech interaction program
WO2014192959A1 (en) * 2013-05-31 2014-12-04 ヤマハ株式会社 Technology for responding to remarks using speech synthesis
CN111292731A (en) * 2018-11-21 2020-06-16 深圳绿米联创科技有限公司 Voice information processing method and device, electronic equipment and storage medium
WO2021196981A1 (en) * 2020-03-31 2021-10-07 华为技术有限公司 Voice interaction method and apparatus, and terminal device
CN114048756A (en) * 2021-11-26 2022-02-15 北京房江湖科技有限公司 Information difference identification method, storage medium and electronic equipment
CN114678028A (en) * 2022-04-29 2022-06-28 深圳力思联信息技术股份有限公司 Voice interaction method and system based on artificial intelligence

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7991724B2 (en) * 2006-12-21 2011-08-02 Support Machines Ltd. Method and a computer program product for providing a response to a statement of a user
KR20160056548A (en) * 2014-11-12 2016-05-20 삼성전자주식회사 Apparatus and method for qusetion-answering
CN107871500B (en) * 2017-11-16 2021-07-20 百度在线网络技术(北京)有限公司 Method and device for playing multimedia
CN110246493A (en) * 2019-05-06 2019-09-17 百度在线网络技术(北京)有限公司 Address book contact lookup method, device and storage medium
JP2020204971A (en) * 2019-06-18 2020-12-24 Arithmer株式会社 Dialog management server, dialog management method, and program
CN113194346A (en) * 2019-11-29 2021-07-30 广东海信电子有限公司 Display device
US20220129507A1 (en) * 2020-10-28 2022-04-28 Aviso LTD. System and Method for Personalized Query and Interaction Set Generation using Natural Language Processing Techniques for Conversational Systems
CN112597288B (en) * 2020-12-23 2023-07-25 北京百度网讯科技有限公司 Man-machine interaction method, device, equipment and storage medium
CN113555133A (en) * 2021-05-31 2021-10-26 北京易康医疗科技有限公司 Medical inquiry data processing method and device
CN114898752B (en) * 2022-06-30 2022-10-14 广州小鹏汽车科技有限公司 Voice interaction method, vehicle and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004021028A (en) * 2002-06-18 2004-01-22 Toyota Central Res & Dev Lab Inc Speech interaction system and speech interaction program
WO2014192959A1 (en) * 2013-05-31 2014-12-04 ヤマハ株式会社 Technology for responding to remarks using speech synthesis
CN111292731A (en) * 2018-11-21 2020-06-16 深圳绿米联创科技有限公司 Voice information processing method and device, electronic equipment and storage medium
WO2021196981A1 (en) * 2020-03-31 2021-10-07 华为技术有限公司 Voice interaction method and apparatus, and terminal device
CN114048756A (en) * 2021-11-26 2022-02-15 北京房江湖科技有限公司 Information difference identification method, storage medium and electronic equipment
CN114678028A (en) * 2022-04-29 2022-06-28 深圳力思联信息技术股份有限公司 Voice interaction method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN115579008A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
AU2021286360B2 (en) Systems and methods for integrating third party services with a digital assistant
US11232265B2 (en) Context-based natural language processing
US10733983B2 (en) Parameter collection and automatic dialog generation in dialog systems
US10853582B2 (en) Conversational agent
WO2022001013A1 (en) Voice interaction method, vehicle, server, system, and storage medium
CN107112013B (en) Platform for creating customizable dialog system engines
US11069351B1 (en) Vehicle voice user interface
CN113239178A (en) Intention generation method, server, voice control system and readable storage medium
CN112397063A (en) System and method for modifying speech recognition results
CN115579008B (en) Voice interaction method, server and computer readable storage medium
CN110767219A (en) Semantic updating method, device, server and storage medium
CN116368459A (en) Voice commands for intelligent dictation automated assistant
CN115565532B (en) Voice interaction method, server and computer readable storage medium
CN115910035B (en) Voice interaction method, server and computer readable storage medium
CN117496972B (en) Audio identification method, audio identification device, vehicle and computer equipment
CN115579009B (en) Voice interaction method, server and computer readable storage medium
CN117496972A (en) Audio identification method, audio identification device, vehicle and computer equipment
CN117789696A (en) Large language model prompt message determining method, server and storage medium
EP4264399A1 (en) Biasing interpretations of spoken utterance(s) that are received in a vehicular environment
CN117573919A (en) Car end music recommendation method, device, equipment and storage medium
CN117136405A (en) Automated assistant response generation using large language models
CN117807986A (en) Generation method, server and storage medium
CN117724604A (en) Method, apparatus, device and computer readable storage medium for human-vehicle interaction
CN116416965A (en) Speech synthesis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant