GB2596092A - A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module - Google Patents

A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module Download PDF

Info

Publication number
GB2596092A
GB2596092A GB2009185.6A GB202009185A GB2596092A GB 2596092 A GB2596092 A GB 2596092A GB 202009185 A GB202009185 A GB 202009185A GB 2596092 A GB2596092 A GB 2596092A
Authority
GB
United Kingdom
Prior art keywords
utterance
analyzer module
semantic analyzer
initial
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB2009185.6A
Other versions
GB202009185D0 (en
Inventor
Toghi Behrad
Chen Daguan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mercedes Benz Group AG
Original Assignee
Daimler AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daimler AG filed Critical Daimler AG
Priority to GB2009185.6A priority Critical patent/GB2596092A/en
Publication of GB202009185D0 publication Critical patent/GB202009185D0/en
Publication of GB2596092A publication Critical patent/GB2596092A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

A method for generating at least one alternative utterance 24 to an initial utterance 18 is performed by a semantic analyser module 10. The utterances are configured to activate a functionality of a motor vehicle. The semantic analyser module comprises an artificial neural network 12 for generating the alternative utterance depending on extracted semantics of the initial phrase. The neural network is an autoencoder comprising an encoder 30 and a decoder 32. The input language may be turned into a semantic vector by a sentence embedder module. The output utterance may have a different wording but preserve the same meaning of the initial utterance. The generated synonymous phrases may be used for interpreting voice commands from a user of a vehicle, who may have many different ways of wording the same instruction.

Description

A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module
FIELD OF THE INVENTION
[0001] The present disclosure relates to the field of automobiles. More specifically, the present disclosure relates to a method for generating at least one alternative utterance to an initial utterance by a semantic analyzer module.
BACKGROUND INFORMATION
[0002] According to the state of the art, a major challenge for voice assistance systems consists in the ability to understand voice commands and match them to implemented functionalities in a motor vehicle. For example, a voice assistance system is able to turn off the air condition. Currently, there are three main ways to perform such an utteranceto-functionality matching. The first option is to specify all possible utterances explicitly in the form of a dictionary for each skill. A more efficient approach is to define a sort of sentence graph that represents multiple sentences in a single expression in a combinatorial manner. Another approach is to use artificial intelligence to implicitly learn what utterances would match a given functionality. In all three methods, the voice assistance developer is required to come up with a large number of sentences, whether all of them explicitly, all of them implicitly, or a large representative number of them.
[0003] US 20180293221 Al discloses a method for executing computer-actionable directives conveyed in human speech comprising receiving audio data, recording speech from one or more speakers, converting the audio data into linguistic representation of the recorded speech, detecting a target corresponding to the linguistic representation, committing to the data structure language data associated with the detected target and based on the linguistic representation, parsing the data structure to identify one or more computer-actionable directives, and submitting the one or more computer-actionable directives to the computer for processing.
[0004] US 20040117189 Al discloses a query system for processing voice-based queries. This distributed client-server system, typically implemented on an intranet or over the internet accepts a user's queries at his/her computer, PDA or workstation using a speech input interface. After converting the user's query from speech to text, a natural language engine, a database processor and a full-text SQL database are implemented to find a single answer that best matches the user's query. Both statistical and semantic decoding are used to assist and improve the performance of the query recognition.
[0005] The methods according to the state of the art are time-consuming processes that require rather strenuous thinking or brainstorming of a person. Furthermore, it is highly likely that the developer becomes tunnel-visioned in the brainstorming process. As a result, interaction with the voice assistant becomes robotic, because the assistant only understands sentences consisting of a very specific sequence of words, rather than their underlying general meaning. Therefore, there is a need in the art for a more efficient process.
SUMMARY OF THE INVENTION
[0006] It is an object of the invention to provide a method and a semantic analyzer module, by which an increased ability to comprehend the semantics behind the voice commands may be realized.
[0007] This object is solved by a method as well as a semantic analyzer module according to the independent claims. Advantageous embodiments are disclosed in the dependent claims.
[0008] One aspect of the invention relates to a method for generating at least one alternative utterance to an initial utterance by a semantic analyzer module, wherein the initial utterance is configured to activate a functionality of a motor vehicle, and wherein the semantic analyzer module comprises a neural network for generating the at least one alternative utterance depending on the initial utterance for the activation of the functionality of the motor vehicle.
[0009] In an embodiment the neural network is an autoencoder and that a semantics of the initial utterance is extracted. Depending on the extracted semantics then the autoencoder generates the at least one alternative utterance.
[0010] Therefore, this streamline development process reduces the burden on the developers and furthermore increases the voice assistance ability to comprehend the semantics behind voice commands.
[0011] The main objective of the semantic analyzer module is an automatic utterance generation. On the basis of an original or initial utterance the semantic analyzer module is able to automatically generate new utterances that are identical in meaning but worded differently. These newly generated utterances may be used at the developer's discretion to improve utterance coverage. Furthermore, a redundant utterance generator is proposed that leverages a pre-trained sentence embedder. The sentence embedder is able to map sentences to vectors inside a high-dimensionally semantic space. A new neural network, which may be called utterance generator network, is built up that takes sentence embeddings and generates redundant utterances, which are passed to the sentence embedder for quality-control, and then are returned to the user. Together, the sentence embedder and the utterance generator network make up the semantic analyzer module.
[0012] In an embodiment, the initial utterance is turned into a semantic vector by a sentence embedder module of the semantic analyzer module and the semantic vector is transmitted to a deep neural network of the semantic analyzer module.
[0013] In another embodiment, the initial utterance and the at least one alternative utterance have the same meaning but different wordings.
[0014] Another aspect of the invention relates to a semantic analyzer module for generating at least one alternative utterance, wherein the semantic analyzer module comprises at least one autoencoder and is configured to perform a method according to the preceding aspect. In particular, the method is performed by the semantic analyzer module.
[0015] Further advantages, features, and details of the invention derive from the following description of preferred embodiments as well as from the drawings. The features and feature combinations previously mentioned in the description as well as the features and feature combinations mentioned in the following description of the figures and/or shown in the figures alone can be employed not only in the respectively indicated combination but also in any other combination or taken alone without leaving the scope of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The novel features and characteristics of the disclosure are set forth in the independent claims. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and together with the description serve to explain the disclosed principles. The same reference signs are used throughout the figures to reference like features and components. Some embodiments of the system and/or methods in accordance with embodiments of the present subject-matter are now described below, by way of example only and with reference to the accompanying figures.
[0017] The drawings show in: [0018] Fig. 1 a schematic block view of a semantic analyzer module.
[0019] Fig. 2 another block view of the semantic analyzer module in an interference phase.
[0020] Fig. 3 another schematic block view of the semantic analyzer module.
[0021] In the figures the same elements or elements having the same function are indicated by the same reference signs.
DETAILED DESCRIPTION
[0022] In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
[0023] While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described in detail below. It should be understood, however, that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.
[0024] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus preceded by "comprises.., a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
[0025] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
[0026] Fig. 1 shows a schematic block view of a semantic analyzer module 10 during a training phase. The semantic analyzer module 10 comprises a deep neural network 12, a sentence embedder module 14, and a performance evaluation module 16.
[0027] In one embodiment, to realize the semantic analyzer module 10, the sentence embedder module 14 and an utterance generator network, which is shown as the neural network 12, are required. The sentence embedder module 14 may be pre-trained, wherein there are several publically available forms which may be chosen from. The utterance generator network may be developed and trained in-house. To train the utterance generator network a process similar to that used in connection with the semantic analyzer module 10. A seed sentence, which may also be referred as an initial utterance 18, is supplied to the sentence embedder module 14, when the embedded sentence vectors are parsed to the neural network 12. The utterance generator network creates new sentences. As an additional step, these generated sentences are compared with target sentences 22, which describe the sentences which the network is meant to generate, using the feedback mechanism. This target sentences may be stored in storing device of the semantic analyzer module 10. This mechanism is able to correct the generation network such that consequently the generated sentences would eventually convert to the input training set.
[0028] Furthermore, the seed sentences form pairs of paraphrases. For example, one pair may be "turn off the air condition" and "please shut down the air condition". The training data set which is used contains a large number, potentially thousands, of such paraphrase pairs. Such data may be either taken from open available datasets or crafted specifically for training the semantic analyzer module 10. The pre-trained sentence embedder module 14 is a module that turns sentences into vectors representing the underlying semantics of the sentences. Given an original sentence and a set of newly generated sentences, it is also capable of evaluating how relevant the new sentences are to the original sentence.
[0029] In an embodiment, the utterance generator network or the deep neural network 12 is the deep neural network architecture that takes the semantic vector of a source sentence and attempts to generate several redundant sentences. Its performance is then evaluated through calculating the loss function with regard to the given ground truth sentence set.
[0030] Fig. 2 shows another schematic block view of the semantic analyzer module 10 in an interference phase. After training the neural network 12, the semantic analyzer module 10 now may be used for the purpose of generating redundant utterances or alternative utterances 24 to the given input sentence, in other words, the initial utterance 18. In particular, this phase is realized in real-time. For example, the consumer of the semantic analyzer module 10 enters a target sentence and receives the sentence set almost immediately, as the model has already been trained.
[0031] Furthermore Fig. 2 shows an autoencoder 30, 32 network comprising an encoder 30 and a decoder 32, where the deep neural network computes between the encoder 30 and the decoder 32.
[0032] In an embodiment, Fig. 3 shows another block view of the semantic analyzer module 10 comprising a speech back-end 26. As mentioned before, the main developer is the intermediate consumer of the semantic analyzer module 10. A domain developer, by definition, creates use cases for the voice assistance system, for example, voice domains or shorter domains. The domain developer 28 then links all sentences with the same semantics to a specific domain and the domain developer 28 further inputs a source sentence of the semantic analyzer module 10 and receives the set of utterances, in particular, the alternative utterance 24 generated by the semantic analyzer module 10. The developer 28 then utilizes the semantic analyzer module 10, the domain developer 28 manually averages its output and marks the false sentences generated by the semantic analyzer module 10. In more details, the network is re-trained and fine-tuned by panelizing those falsely generated sentences and hence keeps improving until the performance is satisfactory for the domain developer 28. This concept is known as online learning approach.
[0033] The final user of the voice assistance system and the semantic analyzer module 10, which is a sub-system, would be the passenger and/or the driver of a motor vehicle. When the final user inputs a voice command or inquiry to the system, for example, to turn on the lights or to search for restaurants. The passenger communicates with the head unit of the motor vehicle via a voice command, and the final user's voice is then transmitted to the voice assistance framework and converted to a sentence, in particular speech to text. The resulting sentence is then matched to a domain which has been created by the domain developer 28 and the user's inquiry will be realized through the domain handler. The inquiry is transmitted from the voice back-end to the head unit and may be communicated to the final user via a visual illustration or a voice response.
[0034] In more details, if the user of the semantic analyzer module 10 intends to create a new voice functionality, the user must have a large set of utterances to be matched to the new functionality. The user of the semantic analyzer module 10 provides an original seed sentence, or a small set of seed sentences, for which redundant utterances would be generated. The sentence embedder module 14 takes the original sentence and embeds it as a mathematical sentence embedding vector. The paraphraser neural network 12 takes the vector as input and generates multiple redundant versions that expresses the same meaning as does the sentence embedding vector but use different words and sentences. The generated paraphrases are passed to the sentence embedder module 14, which checks how relevant each paraphrase is to the original sentence. Then all paraphrases that are deemed irrelevant are filtered out. All remaining paraphrases are forwarded to the user. After this, the user is free to use the generated redundant sentences by performing any of the three utterance-to-functionality matching methods.
Reference signs semantic analyzer module 12 neural network 14 sentence embedder module 14 performance evaluation module 18 initial utterance embedded sentence vectors 22 target sentences 24 alternative utterance 26 speech back-end 28 domain developer encoder 32 decoder

Claims (1)

  1. CLAIMSA method for generating at least one alternative utterance (24) to an initial utterance (18) by a semantic analyzer module (10), wherein the initial utterance (18) is configured to activate a functionality of a motor vehicle and wherein the semantic analyzer module (10) comprises a deep neural network (12) for generating the at least one alternative utterance (24) depending on the initial utterance (18) for the activation of the functionality of the motor vehicle, characterized in that the neural network (12) is an autoencoder (30, 32) and a semantics of the initial utterance (18) is extracted and depending on the extracted semantics the autoencoder (30, 32) generates the at least one alternative utterance (24).The method according to claim 1, characterized in that the initial utterance (18) is turned into a semantic vector by a sentence embedder module (14) of the semantic analyzer module (10) and the semantic vector is transmitted to a deep neural network (12) of the semantic analyzer module (10).The method according to claim 1 or 2, characterized in that the initial utterance (18) and the at least one alternative utterance (24) have the same meaning but different wordings.A semantic analyzer module (10) for generating at least one alternative utterance (24) to an initial utterance (18), wherein the semantic analyzer module (10) comprises at least one autoencoder (30, 32) and is configured to perform a method according to claims 1 to 3.
GB2009185.6A 2020-06-17 2020-06-17 A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module Withdrawn GB2596092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2009185.6A GB2596092A (en) 2020-06-17 2020-06-17 A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2009185.6A GB2596092A (en) 2020-06-17 2020-06-17 A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module

Publications (2)

Publication Number Publication Date
GB202009185D0 GB202009185D0 (en) 2020-07-29
GB2596092A true GB2596092A (en) 2021-12-22

Family

ID=71835674

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2009185.6A Withdrawn GB2596092A (en) 2020-06-17 2020-06-17 A method for generating at least one alternative utterance to an initial utterance, as well as a semantic analyzer module

Country Status (1)

Country Link
GB (1) GB2596092A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180011843A1 (en) * 2016-07-07 2018-01-11 Samsung Electronics Co., Ltd. Automatic interpretation method and apparatus
WO2018023356A1 (en) * 2016-08-01 2018-02-08 Microsoft Technology Licensing, Llc Machine translation method and apparatus
US20180121419A1 (en) * 2016-10-31 2018-05-03 Samsung Electronics Co., Ltd. Apparatus and method for generating sentence
US20190370336A1 (en) * 2018-06-05 2019-12-05 Koninklijke Philips N.V. Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity
US20200097554A1 (en) * 2018-09-26 2020-03-26 Huawei Technologies Co., Ltd. Systems and methods for multilingual text generation field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180011843A1 (en) * 2016-07-07 2018-01-11 Samsung Electronics Co., Ltd. Automatic interpretation method and apparatus
WO2018023356A1 (en) * 2016-08-01 2018-02-08 Microsoft Technology Licensing, Llc Machine translation method and apparatus
US20180121419A1 (en) * 2016-10-31 2018-05-03 Samsung Electronics Co., Ltd. Apparatus and method for generating sentence
US20190370336A1 (en) * 2018-06-05 2019-12-05 Koninklijke Philips N.V. Simplifying and/or paraphrasing complex textual content by jointly learning semantic alignment and simplicity
US20200097554A1 (en) * 2018-09-26 2020-03-26 Huawei Technologies Co., Ltd. Systems and methods for multilingual text generation field

Also Published As

Publication number Publication date
GB202009185D0 (en) 2020-07-29

Similar Documents

Publication Publication Date Title
US9911413B1 (en) Neural latent variable model for spoken language understanding
US10388274B1 (en) Confidence checking for speech processing and query answering
US10332508B1 (en) Confidence checking for speech processing and query answering
US10163436B1 (en) Training a speech processing system using spoken utterances
US10170107B1 (en) Extendable label recognition of linguistic input
US10672391B2 (en) Improving automatic speech recognition of multilingual named entities
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US6681206B1 (en) Method for generating morphemes
US8566076B2 (en) System and method for applying bridging models for robust and efficient speech to speech translation
WO2016067418A1 (en) Conversation control device and conversation control method
WO2020123227A1 (en) Speech processing system
US11837225B1 (en) Multi-portion spoken command framework
US7966177B2 (en) Method and device for recognising a phonetic sound sequence or character sequence
CA2525729A1 (en) System and method for compiling rules created by machine learning program
Hakkinen et al. N-gram and decision tree based language identification for written words
KR20200105057A (en) Apparatus and method for extracting inquiry features for alalysis of inquery sentence
Mridha et al. A study on the challenges and opportunities of speech recognition for Bengali language
KR20200080914A (en) Free dialogue system and method for language learning
KR20220118123A (en) Qestion and answer system and method for controlling the same
KR20210059995A (en) Method for Evaluating Foreign Language Speaking Based on Deep Learning and System Therefor
Sharma et al. Leveraging acoustic and linguistic embeddings from pretrained speech and language models for intent classification
Ahmed et al. Arabic automatic speech recognition enhancement
KR20210051523A (en) Dialogue system by automatic domain classfication
US20220269850A1 (en) Method and device for obraining a response to an oral question asked of a human-machine interface
Zahariev et al. An approach to speech ambiguities eliminating using semantically-acoustical analysis

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)