GB2596092A

GB2596092A - A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module

Info

Publication number: GB2596092A
Application number: GB2009185.6A
Authority: GB
Inventors: Toghi Behrad; Chen Daguan
Original assignee: Daimler AG
Current assignee: Mercedes Benz Group AG
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2021-12-22
Also published as: GB202009185D0

Abstract

A method for generating at least one alternative utterance 24 to an initial utterance 18 is performed by a semantic analyser module 10. The utterances are configured to activate a functionality of a motor vehicle. The semantic analyser module comprises an artificial neural network 12 for generating the alternative utterance depending on extracted semantics of the initial phrase. The neural network is an autoencoder comprising an encoder 30 and a decoder 32. The input language may be turned into a semantic vector by a sentence embedder module. The output utterance may have a different wording but preserve the same meaning of the initial utterance. The generated synonymous phrases may be used for interpreting voice commands from a user of a vehicle, who may have many different ways of wording the same instruction.

Description

A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module

FIELD OF THE INVENTION

[0001] The present disclosure relates to the field of automobiles. More specifically, the present disclosure relates to a method for generating at least one alternative utterance to an initial utterance by a semantic analyzer module.

BACKGROUND INFORMATION

[0002] According to the state of the art, a major challenge for voice assistance systems consists in the ability to understand voice commands and match them to implemented functionalities in a motor vehicle. For example, a voice assistance system is able to turn off the air condition. Currently, there are three main ways to perform such an utteranceto-functionality matching. The first option is to specify all possible utterances explicitly in the form of a dictionary for each skill. A more efficient approach is to define a sort of sentence graph that represents multiple sentences in a single expression in a combinatorial manner. Another approach is to use artificial intelligence to implicitly learn what utterances would match a given functionality. In all three methods, the voice assistance developer is required to come up with a large number of sentences, whether all of them explicitly, all of them implicitly, or a large representative number of them.

[0003] US 20180293221 Al discloses a method for executing computer-actionable directives conveyed in human speech comprising receiving audio data, recording speech from one or more speakers, converting the audio data into linguistic representation of the recorded speech, detecting a target corresponding to the linguistic representation, committing to the data structure language data associated with the detected target and based on the linguistic representation, parsing the data structure to identify one or more computer-actionable directives, and submitting the one or more computer-actionable directives to the computer for processing.

[0004] US 20040117189 Al discloses a query system for processing voice-based queries. This distributed client-server system, typically implemented on an intranet or over the internet accepts a user's queries at his/her computer, PDA or workstation using a speech input interface. After converting the user's query from speech to text, a natural language engine, a database processor and a full-text SQL database are implemented to find a single answer that best matches the user's query. Both statistical and semantic decoding are used to assist and improve the performance of the query recognition.

[0005] The methods according to the state of the art are time-consuming processes that require rather strenuous thinking or brainstorming of a person. Furthermore, it is highly likely that the developer becomes tunnel-visioned in the brainstorming process. As a result, interaction with the voice assistant becomes robotic, because the assistant only understands sentences consisting of a very specific sequence of words, rather than their underlying general meaning. Therefore, there is a need in the art for a more efficient process.

SUMMARY OF THE INVENTION

[0006] It is an object of the invention to provide a method and a semantic analyzer module, by which an increased ability to comprehend the semantics behind the voice commands may be realized.

[0007] This object is solved by a method as well as a semantic analyzer module according to the independent claims. Advantageous embodiments are disclosed in the dependent claims.

[0008] One aspect of the invention relates to a method for generating at least one alternative utterance to an initial utterance by a semantic analyzer module, wherein the initial utterance is configured to activate a functionality of a motor vehicle, and wherein the semantic analyzer module comprises a neural network for generating the at least one alternative utterance depending on the initial utterance for the activation of the functionality of the motor vehicle.

[0009] In an embodiment the neural network is an autoencoder and that a semantics of the initial utterance is extracted. Depending on the extracted semantics then the autoencoder generates the at least one alternative utterance.

[0010] Therefore, this streamline development process reduces the burden on the developers and furthermore increases the voice assistance ability to comprehend the semantics behind voice commands.

[0011] The main objective of the semantic analyzer module is an automatic utterance generation. On the basis of an original or initial utterance the semantic analyzer module is able to automatically generate new utterances that are identical in meaning but worded differently. These newly generated utterances may be used at the developer's discretion to improve utterance coverage. Furthermore, a redundant utterance generator is proposed that leverages a pre-trained sentence embedder. The sentence embedder is able to map sentences to vectors inside a high-dimensionally semantic space. A new neural network, which may be called utterance generator network, is built up that takes sentence embeddings and generates redundant utterances, which are passed to the sentence embedder for quality-control, and then are returned to the user. Together, the sentence embedder and the utterance generator network make up the semantic analyzer module.

[0012] In an embodiment, the initial utterance is turned into a semantic vector by a sentence embedder module of the semantic analyzer module and the semantic vector is transmitted to a deep neural network of the semantic analyzer module.

[0013] In another embodiment, the initial utterance and the at least one alternative utterance have the same meaning but different wordings.

[0014] Another aspect of the invention relates to a semantic analyzer module for generating at least one alternative utterance, wherein the semantic analyzer module comprises at least one autoencoder and is configured to perform a method according to the preceding aspect. In particular, the method is performed by the semantic analyzer module.

[0015] Further advantages, features, and details of the invention derive from the following description of preferred embodiments as well as from the drawings. The features and feature combinations previously mentioned in the description as well as the features and feature combinations mentioned in the following description of the figures and/or shown in the figures alone can be employed not only in the respectively indicated combination but also in any other combination or taken alone without leaving the scope of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The novel features and characteristics of the disclosure are set forth in the independent claims. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and together with the description serve to explain the disclosed principles. The same reference signs are used throughout the figures to reference like features and components. Some embodiments of the system and/or methods in accordance with embodiments of the present subject-matter are now described below, by way of example only and with reference to the accompanying figures.

[0017] The drawings show in: [0018] Fig. 1 a schematic block view of a semantic analyzer module.

[0019] Fig. 2 another block view of the semantic analyzer module in an interference phase.

[0020] Fig. 3 another schematic block view of the semantic analyzer module.

[0021] In the figures the same elements or elements having the same function are indicated by the same reference signs.

DETAILED DESCRIPTION

[0022] In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

[0023] While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described in detail below. It should be understood, however, that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.

[0024] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus preceded by "comprises.., a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.

[0025] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.

[0026] Fig. 1 shows a schematic block view of a semantic analyzer module 10 during a training phase. The semantic analyzer module 10 comprises a deep neural network 12, a sentence embedder module 14, and a performance evaluation module 16.

[0027] In one embodiment, to realize the semantic analyzer module 10, the sentence embedder module 14 and an utterance generator network, which is shown as the neural network 12, are required. The sentence embedder module 14 may be pre-trained, wherein there are several publically available forms which may be chosen from. The utterance generator network may be developed and trained in-house. To train the utterance generator network a process similar to that used in connection with the semantic analyzer module 10. A seed sentence, which may also be referred as an initial utterance 18, is supplied to the sentence embedder module 14, when the embedded sentence vectors are parsed to the neural network 12. The utterance generator network creates new sentences. As an additional step, these generated sentences are compared with target sentences 22, which describe the sentences which the network is meant to generate, using the feedback mechanism. This target sentences may be stored in storing device of the semantic analyzer module 10. This mechanism is able to correct the generation network such that consequently the generated sentences would eventually convert to the input training set.

[0028] Furthermore, the seed sentences form pairs of paraphrases. For example, one pair may be "turn off the air condition" and "please shut down the air condition". The training data set which is used contains a large number, potentially thousands, of such paraphrase pairs. Such data may be either taken from open available datasets or crafted specifically for training the semantic analyzer module 10. The pre-trained sentence embedder module 14 is a module that turns sentences into vectors representing the underlying semantics of the sentences. Given an original sentence and a set of newly generated sentences, it is also capable of evaluating how relevant the new sentences are to the original sentence.

[0029] In an embodiment, the utterance generator network or the deep neural network 12 is the deep neural network architecture that takes the semantic vector of a source sentence and attempts to generate several redundant sentences. Its performance is then evaluated through calculating the loss function with regard to the given ground truth sentence set.

[0030] Fig. 2 shows another schematic block view of the semantic analyzer module 10 in an interference phase. After training the neural network 12, the semantic analyzer module 10 now may be used for the purpose of generating redundant utterances or alternative utterances 24 to the given input sentence, in other words, the initial utterance 18. In particular, this phase is realized in real-time. For example, the consumer of the semantic analyzer module 10 enters a target sentence and receives the sentence set almost immediately, as the model has already been trained.

[0031] Furthermore Fig. 2 shows an autoencoder 30, 32 network comprising an encoder 30 and a decoder 32, where the deep neural network computes between the encoder 30 and the decoder 32.

[0032] In an embodiment, Fig. 3 shows another block view of the semantic analyzer module 10 comprising a speech back-end 26. As mentioned before, the main developer is the intermediate consumer of the semantic analyzer module 10. A domain developer, by definition, creates use cases for the voice assistance system, for example, voice domains or shorter domains. The domain developer 28 then links all sentences with the same semantics to a specific domain and the domain developer 28 further inputs a source sentence of the semantic analyzer module 10 and receives the set of utterances, in particular, the alternative utterance 24 generated by the semantic analyzer module 10. The developer 28 then utilizes the semantic analyzer module 10, the domain developer 28 manually averages its output and marks the false sentences generated by the semantic analyzer module 10. In more details, the network is re-trained and fine-tuned by panelizing those falsely generated sentences and hence keeps improving until the performance is satisfactory for the domain developer 28. This concept is known as online learning approach.

[0033] The final user of the voice assistance system and the semantic analyzer module 10, which is a sub-system, would be the passenger and/or the driver of a motor vehicle. When the final user inputs a voice command or inquiry to the system, for example, to turn on the lights or to search for restaurants. The passenger communicates with the head unit of the motor vehicle via a voice command, and the final user's voice is then transmitted to the voice assistance framework and converted to a sentence, in particular speech to text. The resulting sentence is then matched to a domain which has been created by the domain developer 28 and the user's inquiry will be realized through the domain handler. The inquiry is transmitted from the voice back-end to the head unit and may be communicated to the final user via a visual illustration or a voice response.

[0034] In more details, if the user of the semantic analyzer module 10 intends to create a new voice functionality, the user must have a large set of utterances to be matched to the new functionality. The user of the semantic analyzer module 10 provides an original seed sentence, or a small set of seed sentences, for which redundant utterances would be generated. The sentence embedder module 14 takes the original sentence and embeds it as a mathematical sentence embedding vector. The paraphraser neural network 12 takes the vector as input and generates multiple redundant versions that expresses the same meaning as does the sentence embedding vector but use different words and sentences. The generated paraphrases are passed to the sentence embedder module 14, which checks how relevant each paraphrase is to the original sentence. Then all paraphrases that are deemed irrelevant are filtered out. All remaining paraphrases are forwarded to the user. After this, the user is free to use the generated redundant sentences by performing any of the three utterance-to-functionality matching methods.

Reference signs semantic analyzer module 12 neural network 14 sentence embedder module 14 performance evaluation module 18 initial utterance embedded sentence vectors 22 target sentences 24 alternative utterance 26 speech back-end 28 domain developer encoder 32 decoder

Claims

CLAIMSA method for generating at least one alternative utterance (24) to an initial utterance (18) by a semantic analyzer module (10), wherein the initial utterance (18) is configured to activate a functionality of a motor vehicle and wherein the semantic analyzer module (10) comprises a deep neural network (12) for generating the at least one alternative utterance (24) depending on the initial utterance (18) for the activation of the functionality of the motor vehicle, characterized in that the neural network (12) is an autoencoder (30, 32) and a semantics of the initial utterance (18) is extracted and depending on the extracted semantics the autoencoder (30, 32) generates the at least one alternative utterance (24).The method according to claim 1, characterized in that the initial utterance (18) is turned into a semantic vector by a sentence embedder module (14) of the semantic analyzer module (10) and the semantic vector is transmitted to a deep neural network (12) of the semantic analyzer module (10).The method according to claim 1 or 2, characterized in that the initial utterance (18) and the at least one alternative utterance (24) have the same meaning but different wordings.A semantic analyzer module (10) for generating at least one alternative utterance (24) to an initial utterance (18), wherein the semantic analyzer module (10) comprises at least one autoencoder (30, 32) and is configured to perform a method according to claims 1 to 3.