GB2612079A - Voice command processing method and apparatus - Google Patents

Voice command processing method and apparatus Download PDF

Info

Publication number
GB2612079A
GB2612079A GB2115125.3A GB202115125A GB2612079A GB 2612079 A GB2612079 A GB 2612079A GB 202115125 A GB202115125 A GB 202115125A GB 2612079 A GB2612079 A GB 2612079A
Authority
GB
United Kingdom
Prior art keywords
voice command
application key
data file
vehicle
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2115125.3A
Other versions
GB202115125D0 (en
Inventor
Parkes Jo
Bhatia Lovene
Arrowsmith Ernie
Luncan Daniel
Gavigan Kevin
Jackson Glen
Mistry Kalpesh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jaguar Land Rover Ltd
Original Assignee
Jaguar Land Rover Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jaguar Land Rover Ltd filed Critical Jaguar Land Rover Ltd
Priority to GB2115125.3A priority Critical patent/GB2612079A/en
Publication of GB202115125D0 publication Critical patent/GB202115125D0/en
Priority to PCT/EP2022/078418 priority patent/WO2023066760A1/en
Publication of GB2612079A publication Critical patent/GB2612079A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]

Abstract

Aspects of the present invention relate to a speech control module (17) for a vehicle (3). The speech control module (17) is operative to generate a voice command data file (VCD) for processing by a speech recognition server (7) disposed externally of the vehicle (3). The speech control module (17) includes one or more controllers (17). The speech control module (17) receives an audio data set (ADS) and at least one application key (15-n). The speech control module (17) generates an audio data set (ADS) in dependence on the received audio signal (S1). The audio data set (ADS) includes a voice command (CMV) spoken by an occupant of the vehicle. The speech control module (17) generates the voice command data file (VCD) comprising the audio data set (ADS) and an application key (15-n). The voice command data file (VCD) is output for transmission to the speech recognition server (7). Aspects of the present invention relate to an authentication module (20); a computer-implemented method of generating a voice command data file (VCD); a vehicle (3); and a speech recognition system (5).

Description

VOICE COMMAND PROCESSING METHOD AND APPARATUS
TECHNICAL FIELD
The present disclosure relates to a voice command processing method and apparatus.
Aspects of the invention relate to a speech control module, an authentication module, a speech service system for a vehicle, a computer-implemented method of generating a voice command data file, a computer-implemented method of obtaining an application key; a non-transitory computer-readable medium, a speech recognition system and a vehicle.
BACKGROUND
It is known to provide a speech recognition system in a vehicle to enable voice control of vehicle systems. The user, typically a driver of the vehicle, issues a voice command. The speech recognition system converts the voice command to text for processing to determine a driver request. Hybrid speech recognition systems include a combination of speech recognisers embedded in the vehicle and a remote server (providing cloud-based functions).
The increased processing capability of the remote server typically allows improved speech recognition, for example allowing better natural language processing for one or more languages. Using a remote server to perform the speech recognition may also allow the speech system access to richer data from back-end data providers, such as navigation Points of Interest (POls), online media, weather etc. The embedded portion of the hybrid system then supports the user when context needs to be considered and if there is poor connectivity to the remote server.
The vehicle communicates with the remote server over a wireless network to transmit the voice command for processing. A secure connection is required to for the voice command to maintain privacy. The secure connection between the vehicle and the remote server may require increased bandwidth to enable the transfer of audio files, for example to stream audio from the vehicle. It is known to provide a dedicated gateway to establish a secure connection for interactions between the vehicle and the remote server. The voice command is sent to the remote service only once authentication has been established between the vehicle and the gateway. However, this approach requires the implementation of a dedicated gateway.
It is an aim of the present invention to address one or more of the disadvantages associated with the prior art.
SUMMARY OF THE INVENTION
Aspects and embodiments of the invention provide a speech control module, an authentication module, a speech service system for a vehicle, a computer-implemented method of generating a voice command data file, a computer-implemented method of obtaining an application key; a non-transitory computer-readable medium, a speech recognition system and a vehicle as claimed in the appended claims According to an aspect of the present invention there is provided a speech control module for a vehicle, the speech control module being operative to generate a voice command data file for processing by a speech recognition server disposed externally of the vehicle, the speech control module comprising one or more controllers, the speech control module configured to: receive an audio signal; generate an audio data set in dependence on the received audio signal, the audio data set comprising a voice command spoken by an occupant of the vehicle; receive an application key; generate the voice command data file comprising the audio data set and the application key; and output the voice command data file for transmission to the speech recognition server.
The one or more controllers may comprise at least one electronic processor having an input and an output. The input may be configured to received the audio signal, for example from a microphone disposed in the vehicle. The output be configured to output the voice command data file. The one or more controller may comprise a memory device for storing a set of computational instructions which, when executed on the at least one processor, cause the one or more controller to perform the method(s) described herein.
The application key may be received from an authentication module. The authentication module may be configured to communicate with an authentication server.
The application key is provided in the voice command data file to enable secure communication between the speech service system and the speech recognition server. At least in certain embodiments, the application key is suitable for authenticating the voice command data file. The speech recognition server may access the application key provided in the voice command data file to authenticate the voice command data file. The speech recognition server may compare the application key provided in the voice command data file to an active application key. The application key may be used to provide credentials to the speech recognition server. Alternatively, or in addition, the application key may be provided to a backend data aggregator to authenticate the user and/or the vehicle.
The speech control module may receive a plurality of application keys. The speech control module may provide one of the plurality of application keys in the voice command data file.
The speech control module may identify one of the plurality of application keys as being activated. The activated application key may be provided in the voice command data file.
The speech control module may be configured to write the application key to a header of the voice command data file.
The speech control module may be configured to write a vehicle identifier to the voice command data file to identify the vehicle. The vehicle identifier may be received from an authentication module provided in the vehicle. The authentication module may comprise an authentication daemon.
The speech control module may be configured to write a user identifier to the voice command data file to identify a user associated with the vehicle. The user identifier may be received from an authentication module provided in the vehicle. The authentication module may comprise an authentication daemon.
The speech control module may be configured to receive a command text from the speech recognition server, the command text at least substantially corresponding to the voice command.
According to a further aspect of the present invention there is provided an authentication module for a speech control module provided in a vehicle, the authentication module being configured to: generate an application key request to request at least one application key from an authentication server; transmit the application key request to the authentication server disposed externally of the vehicle; receive the at least one application key; and output the at least one application key to the speech control module.
The authentication module may be provided in the vehicle, for example in an embedded system. At least in certain embodiments, the authentication module may comprise an authentication daemon. The speech control module may be of the type described herein.
The authentication module may be configured to transmit the application key request to the authentication server in dependence on an initialisation condition being satisfied. The initialisation condition may comprise a start-up of the vehicle.
The authentication module may be configured to delete the at least one application key in dependence on a cessation condition being met. The cessation condition may comprise a shut-down of the vehicle.
The authentication module may be configured to output a vehicle identifier to the speech control module.
The authentication module may be configured to output a user identifier to the speech control module.
According to a further aspect of the present invention there is provided a speech service system for a vehicle, the speech service system comprising a speech control module as described herein configured to generate the voice command data file comprising the audio data set and the application key; and an authentication module as described herein to retrieve the application key for the voice command data file.
According to a further aspect of the present invention there is provided a speech recognition system for processing a voice command, the speech recognition system comprising: a speech control module as claimed in any one of claims 1 to 5 configured to generate the voice command data file comprising the audio data set and the application key; and a speech recognition server for receiving the voice command data file generated by the speech service system, the external server being configured to authenticate the voice command data file in dependence on the application key; wherein the external server is configured to process the audio data set provided in the voice command data file in dependence on a positive authentication of the voice command data file.
The speech recognition system may comprise an authentication module as described herein. The authentication module may be configured to retrieve the application key for the voice command data file.
According to a further aspect of the present invention there is provided a computer-implemented method of generating a voice command data file for transmission from a vehicle to a speech recognition server disposed remotely for processing, the method comprising: receiving an audio signal; generating an audio data set in dependence on the received audio signal, the audio data set comprising a voice command spoken by an occupant of the vehicle; receiving at least one application key; generating the voice command data file comprising the audio data set and the application key; and outputting the voice command data file for transmission to the speech recognition server.
The method may comprise writing the application key to a header of the voice command data file.
The method may comprise writing a vehicle identifier and/or a user identifier to the voice command data file.
The method may comprise receiving a command text from the speech recognition server. The command text may at least substantially correspond to the voice command.
According to a further aspect of the present invention there is provided a computer-implemented method of obtaining an application key for incorporation into a voice command data file for transmission from a vehicle to a speech recognition server, the method comprising: generating an application key request to request at least one application key from an authentication server; transmitting the application key request to the authentication server disposed externally of the vehicle; receiving the at least one application key; and outputting the at least one application key to a speech control module for incorporation into the voice command data file.
The method may comprise transmitting the application key request to the authentication server in dependence on an initialisation condition being satisfied. The initialisation condition may comprise a start-up of the vehicle.
The method may comprise deleting the at least one application key in dependence on a cessation condition being met. The cessation condition may comprise a shut-down of the vehicle.
According to a further aspect of the present invention there is provided a computer-implemented method of processing a voice command, the method comprising: generating a voice command data file as described herein; wherein the application key is obtained using the method as described herein.
According to a further aspect of the present invention there is provided a non-transitory computer-readable medium having a set of instructions stored therein which, when executed, cause a processor to perform the method(s) described herein.
According to a further aspect of the present invention there is provided a speech recognition system for processing a voice command data file received from a vehicle, the voice command data file comprising an audio data set representing a voice command issued by an occupant of the vehicle and an application key; wherein the speech recognition system being configured to: read the application key in the voice command data file; authenticate the voice command data file in dependence on the application key; and process the audio data set in the voice command data file in dependence on a successful authentication of the voice command data file.
According to a further aspect of the present invention there is provided a vehicle comprising a speech control module as described herein; and/or an authentication module as described 30 herein.
According to a further aspect of the present invention there is provided a vehicle comprising a speech service system as described herein.
Any control unit or controller described herein may suitably comprise a computational device having one or more electronic processors. The system may comprise a single control unit or electronic controller or alternatively different functions of the controller may be embodied in, or hosted in, different control units or controllers. As used herein the term "controller" or "control unit" will be understood to include both a single control unit or controller and a plurality of control units or controllers collectively operating to provide any stated control functionality. To configure a controller or control unit, a suitable set of instructions may be provided which, when executed, cause said control unit or computational device to implement the control techniques specified herein. The set of instructions may suitably be embedded in said one or more electronic processors. Alternatively, the set of instructions may be provided as software saved on one or more memory associated with said controller to be executed on said computational device. The control unit or controller may be implemented in software run on one or more processors. One or more other control unit or controller may be implemented in software run on one or more processors, optionally the same one or more processors as the first controller. Other suitable arrangements may also be used.
Within the scope of this application it is expressly intended that the various aspects, embodiments, examples and alternatives set out in the preceding paragraphs, in the claims and/or in the following description and drawings, and in particular the individual features thereof, may be taken independently or in any combination. That is, all embodiments and/or features of any embodiment can be combined in any way and/or combination, unless such features are incompatible. The applicant reserves the right to change any originally filed claim or file any new claim accordingly, including the right to amend any originally filed claim to depend from and/or incorporate any feature of any other claim although not originally claimed in that manner.
BRIEF DESCRIPTION OF THE DRAWINGS
One or more embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 shows a schematic representation of a vehicle incorporating a vehicle speech service system in accordance with an embodiment of the present invention; Figure 2 shows an overview of a speech recognition system incorporating the speech service system shown in Figure 1; Figure 3 shows a schematic representation of the vehicle speech service system according to an embodiment of the present invention; Figure 4 shows a first block diagram representing the initialisation of the vehicle speech service system; Figure 5 shows a second block diagram representing the operation of the vehicle speech service system to generate a voice command data file; Figure 6 shows a third block diagram representing the operation of a speech recognition server and a content service data aggregator to deliver requested content to the speech service system; Figure 7 shows a fourth block diagram representing the routine updating of the application keys; and Figure 8 shows a fifth block diagram representing the replacement of the application keys due to a threat condition.
DETAILED DESCRIPTION
A vehicle speech service system 1 in accordance with an embodiment of the present invention is described herein with reference to the accompanying Figures. The vehicle speech service system 1 in the present embodiment is disposed in a vehicle 3. The speech service system 1 forms part of a hybrid speech recognition system 5 operative to provide local (embedded) and remote (cloud) speech recognition functions. A schematic representation of the speech recognition system 5 is shown in Figure 1.
As shown in Figure 2, the speech recognition system 5 comprises the speech service system 1, a speech recognition server 7 and an authentication server 9. The speech service system 1 is configured selectively to communicate with the speech recognition server 7 and the authentication server 9 over a wireless network 11, such as a cellular telephone or data network. The speech service system 1 is integrated into a head unit (also known as an infotainment system) in the vehicle 3. The speech recognition server 7 and the authentication server 9 are disposed externally of the vehicle 3. The speech recognition server 7 and the authentication server 9 could be combined. In the present embodiment the speech recognition server 7 and the authentication server 9 are maintained separate from each other. As illustrated in Figure 1, the speech recognition server 7 and the authentication server 9 are configured to communicate with each other over a network connection. As described herein, an SFTP server (SSH File Transfer Protocol server) 10 is provided for secure transfer of files from the speech recognition server 7 to the authentication server 9.
The speech recognition server 7 comprises a computer-implemented speech recogniser for recognising speech spoken by an occupant of the vehicle 3. The term "voice command" CMV is used herein to refer to natural language spoken by an occupant of the vehicle 3. The voice command CMV may comprise or consist of one or more of the following: a command, a request, an instruction, or a query. The computer-implemented speech recogniser may translate (transcribe) the voice command CMV into text for further analysis. The speech recognition server 7 is configured to generate a command text CMT corresponding to a partial or full transcription of the voice command CMV. The speech recognition server 7 may comprise a natural language processing function for parsing the command text CMT. The speech recognition server 7 can thereby identify and extract instructions and/or commands from the voice command CMV.
The command text CMT in the present embodiment is transmitted from the speech recognition server 7 to a content service data aggregator 13. The content service data aggregator 13 is configured to control access to one or more content service providers CSP-n to control the supply of digital content DCn, such as digital media, to the vehicle 3. The content service providers CSP-n may, for example, comprise one or more of the following: a music streaming service, a digital map provider, a route planning provider, and a traffic data provider. The content service data aggregator 13 is configured to control the delivery of digital content DCn from the one or more content service providers CSP-n. A user associated with the vehicle 3 may have one or more subscription in place with the content service providers CSP-n. The content service data aggregator 13 stores user data in relation to each content service provider CSP-n. For example, the content service data aggregator 13 may store one or more of the following: user information, login (access) details, subscription details and account information. The command text CMT comprises a data content request OCR requesting the delivery of digital content DCn from one or more of the content service providers CSP-n associated with the user.
The processing of the voice command CMV may be subject to privacy rules and regulations. The speech recognition server 7 is configured to generate at least one application key 15-n to enable secure processing of the voice command CMV captured by the speech service system 1. The at least one application key 15-n is used as a credential to authenticate data transmissions between the speech service system 1 and the speech recognition server 7. One or more application identifier APPID is defined on the speech recognition server 7 to identify a service or application provided by the operator. The application identifier APPID remains static for the term of the relevant contract. The at least one application key 15-n enables secure communication between the speech service system 1 and the speech recognition server 7.
The or each application key 15-n is an electronic key and may, for example, comprise a numerical key or an alphanumerical key. The speech recognition server 7 delivers the at least one application key 15-n to the authentication server 9. In the present embodiment, the at least one application key 15-n is delivered to an SFTP server 10 which functions as a drop location. The SFTP server 10 may be hosted by the supplier of the speech recognition server 7, for example. The authentication server 9 is configured to collect the or each application key 15-n from the SFTP server 10. The or each application key 15-n may be unique, for example a unique application key 15-n may be allocated to each user. Alternatively, the same application key(s) 15-n may be issued to a plurality of users. In the present embodiment, the same application key 15-n is used across all users. The application key 15-n may be combined with a unique identifier, such as a user identifier and/or a vehicle identifier, to provide a unique identifier. In the present embodiment, the application key 15-n is combined with a user identifier, a vehicle identifier and a vehicle identification number (VIN).
The speech recognition server 7 may issue a single application key 15-1; or may issue a plurality of the application keys 15-n. The plurality of the application keys 15-n may, for example, be issued as a batch. The speech recognition server 7 issues the at least one application key 15-n at predetermined time intervals. The speech recognition server 7 issues the at least one new (replacement) application key 15-n upon expiry of the predetermined time interval. The time interval may, for example, be a 30-day interval. The or each application key 15-n is valid for the duration of the predetermined time interval. The or each application key 15-n may be invalidated on expiry of the predetermined time interval or within an extended time period after expiry of the predetermined time interval. The extended time period may, for example, be a 1-day interval. Alternatively, or in addition, the speech recognition server 7 may issue the at least one new application key 15-n in dependence on a request received from the speech service system 1 or the authentication server 9. Alternatively, or in addition, the speech recognition server 7 may issue the at least one new application key 15-n in dependence on a determination that an existing application key 15-n has been compromised or used improperly.
The authentication server 9 communicates with the SFTP server 10 to obtain the at least one application key 15-n. The SFTP server 10 may transmit a notification to the authentication server 9 to indicate that a new application key 15-n has been issued. Alternatively, or in addition, the authentication server 9 may periodically contact the SFTP server 10 to determine if a new application key 15-n has issued. In a variant, the at least one application key 15-n may be generated by the authentication server 9 and output to the speech recognition server 7. The SFTP server 10 may be used to relay the at least one application key 15-n from the authentication server 9 to the speech recognition server 7.
The speech service system 1 is configured to communicate with the authentication server 9 to obtain an active one of the at least one application key 15-n. The speech service system 1 communicates with the authentication server 9 in dependence on identification of an initiation condition. The initiation condition in the present embodiment comprises a start-up of the vehicle 3. The speech service system 1 in the present embodiment communicates with the authentication server 9 on start-up of the vehicle 3. The start-up of the vehicle 3 may correspond to an initiation of an ignition cycle of an internal engine or an activation of an electric traction motor. The speech service system 1 is configured to delete the at least one application key 15-n in dependence on identification of a cessation condition. The cessation condition in the present embodiment comprises a shut-down of the vehicle 3. The shut-down of the vehicle 3 may correspond to a termination of the ignition cycle of the internal engine or a de-activation of an electric traction motor. The speech service system 1 transmits an application key request RQK to the authentication server 9. The application key request RQK may comprise one or more of the following: a user identifier, for example comprising user credentials; a vehicle identifier; and a vehicle identification number (VI N). The authentication server 9 transmits the at least one application key 15-n to the vehicle 3 in dependence on receipt of the application key request RQK. The speech service system 1 implements an authentication module 20 to generate the application key request RQK and/or receive the application key 15-n from the authentication server 9. The authentication module 20 is in the form of an authentication daemon in the present embodiment.
As shown in Figure 3, the speech service system 1 comprises at least one speech control module 17 having at least one electronic processor 19 and at least one memory device 21. The speech control module 17 in the present embodiment is configured to provide a voice user interface. The speech control module 17 has at least one electrical input 23 and at least one electrical output 24. The electrical input 23 is configured to receive an audio signal Si from a microphone 25 disposed in the vehicle 5. The electrical output 24 is configured to output a voice command data file VCD. The microphone 25 is disposed in a cabin 27 of the vehicle 3. The microphone 25 is operative to sense audible sounds in the cabin 27 and outputs an audio signal Si to the speech control module 17. The speech control module 17 implements an embedded speech recogniser 18 for providing on-board (i.e., local) speech recognition.
The at least one electronic processor 19 is configured to implement a speech recognition algorithm to recognise and translate one or more spoken word or phrase. The speech control module 17 may, for example, identify an activation word or phrase to activate the speech service system 1. The activation word or phrase may be predefined, for example as a default or standardised activation word; or the activation word may be user-defined or user-selected.
Alternatively, or in addition, a user may activate the speech session by use of a "Press To Talk" button, hard key or soft key (for example from a command list on a touchscreen). The speech control module 17 may provide local speech recognition functions when there is limited or poor connectivity to the speech recognition server 7; or when context is material to the control strategy, for example taking account of the current vehicle operating conditions.
The speech control module 17 is configured to communicate with the speech recognition server 7 in dependence on the identification of the activation word or phrase. As outlined above, other inputs may be used to activate the speech control module 17, such as activation of a "Press To Talk" button. The speech control module 17 generates an audio data set ADS in dependence on the received source audio signal Si. In the present embodiment, the audio signal Si is processed using an audio codec for transmission, preferably a low-latency format to enable real-time interactive communication. A suitable coding format is the Opus audio codec which codes speech and general audio in a single format. The audio data set ADS comprises audio data representing the voice command CMV. The speech control module 17 generates the voice command data file VCD for transmission to the speech recognition server 7. The voice command data file VCD comprises the audio data set ADS representing the voice command CMV. The speech control module 17 may be configured to package the voice command data file VCD as a discrete file for transmission to the speech recognition server 7. The speech control module 17 is configured to stream the audio data set ADS to the speech recognition server 7 at least substantially in real time. A transmission rate of 28Kbps enables real-time interactive communication. In a variant, the voice command data file VCD could be transmitted when the voice command CMV is complete, for example in dependence on a determination that the user is no longer speaking.
The speech control module 17 is configured to package the application key 15-n with the audio data set ADS. The voice command data file VCD comprises a header HD. The speech control module 17 populates the header HD with the application key 15-n. The speech recognition server 7 receives the voice command data file VCD and reads the application key 15-n from the header HD. The header HD typically forms an initial portion of the voice command data file VCD, thereby facilitating access to the application key 15-n at the beginning of the transmission. The speech recognition server 7 uses the application key 15-n to authenticate the voice command data file VCD. The speech recognition server 7 performs a check to determine that the application key 15-n associated with the voice command data file VCD is valid and is currently active. The speech recognition server 7 may, for example, search a look-up table or a database to check the validity of the application key 15-n. The speech recognition server 7 may access the SFTP server 10 to check the validity of the application key 15-n. In the present embodiment, the speech recognition server 7 performs a check to ensure that the application identifier APPID and the application key 15-n are a valid pair. If the check determines that the application key 15-n is valid, the authentication of the voice command data file VCD is successful (i.e., positive). If the check determines that the application key 15-n is not valid, for example the application key 15-n has expired, the authentication of the voice command data file VCD is unsuccessful (i.e., negative). The speech recognition server 7 is configured to process the voice command data file VCD in dependence on the authentication of the voice command data file VCD. The speech recognition server 7 is configured to authenticate each voice command data file VCD. Thus, the process of reading and validating the application key 15-n is performed for each voice command data file VCD.
If the authentication is successful, the speech recognition server 7 accesses the audio data set ADS contained in the voice command data file VCD. Speech recognition is performed on the audio data set ADS and a command text CMT is generated corresponding to the voice command CMV. The command text CMT is transmitted to the speech service system 1 or the content service provider CSP-n. If the authentication is unsuccessful, the speech recognition server 7 does not process the audio data set ADS. The speech recognition server 7 may delete the voice command data file VCD without performing further processing. The speech recognition server 7 may optionally transmit a signal to the speech service system 1 indicating that the application key 15-n is invalid. In dependence on the notification that the application key 15-n is invalid, the speech service system 1 may halt or interrupt transmission of the voice command data file VCD and/or may issue a request for a new application key 15-n from the authentication server 9.
It is conceivable that an application key 15-n may be compromised, for example due to unauthorized or fraudulent behaviour. In the event that the application key 15-n is suspected of having been compromised, the authentication server 9 is configured to activate a new application key 15-n. The new application key 15-n may, for example, be selected from the batch of application keys 15-n provided by the speech recognition server 7. The new application key 15-n is supplied to the speech service system 1 in the vehicle 3 during the next initiation process. Similarly, the new application key 15-n is supplied to the speech recognition server 7. The new application key 15-n is used in place of the previous application key 15-n which is deactivated.
An operational interface 30 is provided for communicating with the speech recognition server 7. The operational interface 30 may be provided to allow updating of a black list and/or a white list.
The operation of the speech service system 1 will now be described with reference to the block diagrams shown in Figures 4 to 8.
A first block diagram 100 is shown in Figure 4 representing the initialisation of the speech service system 1. The speech service system 1 identifies an initiation process (BLOCK 105).
The speech service system 1 requests a new application key 15-n (BLOCK 110). The request in the present embodiment is sent to an authentication daemon incorporated into the speech service system 1 in the vehicle 3. The request may include credentials, including one or more of: a user identifier, a vehicle identifier and a vehicle identification number (VIN). An application key request RQK is generated and transmitted to the authentication server 9 (BLOCK 115).
The authentication server 9 receives the application key request RQK and returns the application key 15-n previously shared from the SFTP server 10 (BLOCK 120). The speech service system 1 receives the application key 15-n from the authentication server 9 (BLOCK 125). If the application key 15-n is not received, for example within a predetermined time-out interval, the application key request RQK may optionally be re-sent to the authentication server 9. The authentication daemon determines user and/or vehicle credentials, such as the user identifier, the vehicle identifier and the vehicle identification number (VIN) (BLOCK 130). The application key 15-n, the user identifier, the vehicle identifier and the vehicle identification number (VIN) collected by the authentication daemon are returned to the speech service system 1 provided in the vehicle 3 (BLOCK 135). The speech service system 1 receives the application key 15-n, the user identifier, the vehicle identifier and the vehicle identification number (VIN) (BLOCK 140). The initialisation process is complete (BLOCK 145).
A second block diagram 200 is shown in Figure 5 representing the operation of the speech service system 1 disposed on the vehicle 3. The speech service system 1 is activated (BLOCK 205). The activation may occur in dependence on identification of voice command CMV by the speech control module 17 provided in the vehicle 3. The speech control module 17 may, for example, identify an activation word or phrase. The speech service system 1 accesses the user and/or vehicle credentials cached by the authentication daemon. The speech service system 1 builds a header HD file for the voice command data file VCD to be transmitted to the speech recognition server 7 (BLOCK 210). The speech service system 1 prepares the voice command data file VCD and attaches the header HD file. The voice command data file VCD is transmitted by the speech service system 1 to the speech recognition server 7 (BLOCK 215). The voice command data file VCD is received by the speech recognition server 7 (BLOCK 220). The speech recognition server 7 processes the audio data set ADS comprising the voice command CMV (BLOCK 225). The operation of the speech service system 1 is complete (BLOCK 230).
A third block diagram 300 is shown in Figure 6 representing the operation of the speech recognition server 7 disposed externally of the vehicle 3. The speech recognition server 7 receives the voice command data file VCD (BLOCK 305). The speech recognition server 7 reads the header HD of the voice command data file VCD and accesses the application key 15-n (BLOCK 310). A check is performed to determine if the application key 15-n accessed in the voice command data file VCD and the application identifier APPID is a valid pair (BLOCK 315). If the check does not identify a valid pair, the process ends without performing any speech recognition operations on the voice command data file VCD (BLOCK 320). If the check does identify a valid pair, the speech recognition server 7 accesses the audio data set ADS in the voice command data file VCD and performs speech recognition to recognise the speech in the voice command CMV (BLOCK 325). A command text CMT corresponding to the voice command CMV is generated. Natural language processing may be performed on the command text CMT, for example to determine a request or instruction in the voice command CMV. A data content request OCR is generated in dependence on the command text (BLOCK 330). The data context request OCR may comprise the user identifier and the vehicle identifier. The data content request OCR is transmitted to the content service data aggregator 13 (BLOCK 335). The content service data aggregator 13 receives the data content request OCR (BLOCK 340). The content service data aggregator 13 performs a check to validate the user identifier and/or the vehicle identifier (BLOCK 345). A determination is made as to the validity of the user and/or the vehicle 3 (BLOCK 350). If the validation is not successful, the process ends (BLOCK 320). If the validation is successful, the content service data aggregator 13 obtains the requested content (BLOCK 355). The content service data aggregator 13 returns the requested content to the speech recognition server 7 (BLOCK 360). The speech recognition server 7 receives the requested content (BLOCK 365). The speech recognition server 7 transmits the requested content to the speech service system (BLOCK 370). The process ends (BLOCK 320).
A fourth block diagram 400 is shown in Figure 7 representing the scheduled update of the application keys 15-n. The speech recognition server 7 identifies a scheduled refresh of the application keys 15-n (BLOCK 405). The speech recognition server 7 generates a plurality of the application keys 15-n (BLOCK 410). In the present embodiment, the speech recognition server 7 generates three (3) application keys 15-n. The speech recognition server 7 outputs the application keys 15-n to the SFTP server 10 (BLOCK 415). The speech recognition server 7 outputs a notification to the authentication server 9 indicating that a batch of new (replacement) application keys 15-n is available (BLOCK 420). The speech recognition server 7 accesses the new application keys 15-n on the SFTP server 10. The speech recognition server 7 selects and activates a first one of the applications keys 15-n. The selected application key 15-n is used by the speech recognition server 7 (BLOCK 425). The previous application key 15-n remains active for a handover period, for example twenty-four (24) hours.
This ensures that vehicles 3 which are mid-cycle can still access the speech recognition server 7. The previous application key 15-n is revoked upon expiry of the handover period. The authentication server 9 receives the notification that a batch of new (replacement) application keys 15-n is available (BLOCK 430). The authentication server 9 communicates with the SFTP server 10 via an SFTP connection (BLOCK 435). The authentication server 9 retrieves the new application keys 15-n from the SFTP server 10 (BLOCK 440). The authentication server 9 is updated to use a first one of the application keys 15-1 received from the SFTP server 10 (BLOCK 445). The authentication server 9 supplies the first application key 15-1 to the authentication daemon on request. The process ends when the application keys 15-n have been updated on the speech recognition server 7 and the authentication server 9 (BLOCK 450).
A fifth block diagram 500 is shown in Figure 8 representing the update of the application keys 15-n due to a potential threat, for example due to the active application key 15-n being compromised. The potential threat is identified (BLCOK 505). A notification is sent to the speech recognition server 7 and the authentication server 9 (BLOCK 510). An update is sent to the speech recognition server 7 and the authentication server 9 to use a second one of the applications keys 15-2. The speech recognition server 7 receives the notification (BLOCK 515). The previous application key 15-1 (the first application key 15-1 in this scenario) continues to be accepted by the speech recognition server 7 for a handover period, for example twenty-four (24) hours. This ensures that vehicles 3 which are mid-cycle can still access the speech recognition server 7. The previous application key 15-n is revoked upon expiry of the handover period. The authentication server 9 receives the notification to use the second application key 15-2 (BLOCK 520). The authentication server 9 outputs the second application key 15-2 when the next request is received from the authentication daemon. The process ends when the speech recognition server 7 and the authentication server 9 have received the second application key 15-2 (BLOCK 455).
It will be appreciated that various changes and modifications can be made to the present invention without departing from the scope of the present application.
In the above embodiment the command text CMT is transmitted from the speech recognition server 7 to a content service data aggregator 13. Alternatively, or in addition, the command text CMT may be transmitted from the speech recognition server 7 to the vehicle 3. The command text CMT may comprise a request to control one or more systems VS-n disposed on the vehicle 3. The speech service system 1 may control one or more vehicle systems VS-n in dependence on the command text CMT. The speech service system 1 may process the command text CMT locally.

Claims (25)

  1. CLAIMS1. A speech control module for a vehicle, the speech control module being operative to generate a voice command data file for processing by a speech recognition server disposed externally of the vehicle, the speech control module comprising one or more controllers, the speech control module configured to: receive an audio signal; generate an audio data set in dependence on the received audio signal, the audio data set comprising a voice command spoken by an occupant of the vehicle; receive an application key; generate the voice command data file comprising the audio data set and the application key; and output the voice command data file for transmission to the speech recognition server.
  2. 2. A speech control module as claimed in claim 1, wherein the speech control module is configured to write the application key to a header of the voice command data file.
  3. 3. A speech control module as claimed in claim 1 or claim 2, wherein the speech control module is configured to write a vehicle identifier to the voice command data file to identify the 20 vehicle.
  4. 4. A speech control module as claimed in any one of the preceding claims, wherein the speech control module is configured to write a user identifier to the voice command data file to identify a user associated with the vehicle.
  5. 5. A speech control module as claimed in any one of the preceding claims, wherein the speech control module is configured to receive a command text from the speech recognition server, the command text at least substantially corresponding to the voice command.
  6. 6. An authentication module for a speech control module provided in a vehicle, the authentication module being configured to: generate an application key request to request at least one application key from an authentication server; transmit the application key request to the authentication server disposed externally of the vehicle; receive the at least one application key; and output the at least one application key to the speech control module.
  7. 7. An authentication module as claimed in claim 6, wherein the authentication module is configured to transmit the application key request to the authentication server in dependence on an initialisation condition being satisfied.
  8. 8. An authentication module as claimed in claim 7, wherein the initialisation condition comprises a start-up of the vehicle.
  9. 9. An authentication module as claimed in any one of claims 6, 7 or 8, wherein the authentication module is configured to delete the at least one application key in dependence on a cessation condition being met.
  10. 10. An authentication module as claimed in claim 9, wherein the cessation condition comprises a shut-down of the vehicle. 15
  11. 11. A speech service system for a vehicle, the speech service system comprising a speech control module as claimed in any one of claims 1 to 5 configured to generate the voice command data file comprising the audio data set and the application key; and an authentication module as claimed in any one of claims 6 to 10 configured to retrieve the application key for the voice command data file.
  12. 12. A speech recognition system for processing a voice command, the speech recognition system comprising: a speech control module as claimed in any one of claims 1 to 5 configured to generate the voice command data file comprising the audio data set and the application key; and a speech recognition server for receiving the voice command data file generated by the speech service system, the external server being configured to authenticate the voice command data file in dependence on the application key; wherein the external server is configured to process the audio data set provided in the voice command data file in dependence on a positive authentication of the voice command data file.
  13. 13. A speech recognition system as claimed in claim 12 comprising an authentication module as claimed in any one of claims 6 to 10 configured to retrieve the application key for the voice command data file.
  14. 14. A computer-implemented method of generating a voice command data file for transmission from a vehicle to a speech recognition server disposed remotely for processing, the method comprising: receiving an audio signal; generating an audio data set in dependence on the received audio signal, the audio data set comprising a voice command spoken by an occupant of the vehicle; receiving at least one application key; generating the voice command data file comprising the audio data set and the application key; and outputting the voice command data file for transmission to the speech recognition server.
  15. 15. A computer-implemented method as claimed in claim 14, wherein the method comprises writing the application key to a header of the voice command data file.
  16. 16. A computer-implemented method as claimed in claim 14 or claim 15, wherein the method comprises writing a vehicle identifier and/or a user identifier to the voice command data file.
  17. 17. A computer-implemented method as claimed in any one of claims 14, 15 or 16, wherein the method comprises receiving a command text from the speech recognition server, the command text at least substantially corresponding to the voice command.
  18. 18. A computer-implemented method of obtaining an application key for incorporation into a voice command data file for transmission from a vehicle to a speech recognition server, the method comprising: generating an application key request to request at least one application key from an authentication server; transmitting the application key request to the authentication server disposed externally of the vehicle; receiving the at least one application key; and outputting the at least one application key to a speech control module for incorporation into the voice command data file.
  19. 19. A computer-implemented method as claimed in claim 18, wherein the method comprises transmitting the application key request to the authentication server in dependence on an initialisation condition being satisfied; and the initialisation condition optionally comprising a start-up of the vehicle.
  20. 20. A computer-implemented method as claimed in claim 18 or claim 19, wherein the method comprises deleting the at least one application key in dependence on a cessation condition being met; the cessation condition optionally comprises a shut-down of the vehicle.
  21. 21. A computer-implemented method of processing a voice command, the method cornprising: generating a voice command data file as claimed in any one of claims 14 to 17; wherein the application key is obtained using the method as claimed in any one of claims 18, 19 and 20.
  22. 22. A non-transitory computer-readable medium having a set of instructions stored therein which, when executed, cause a processor to perform the method claimed in any one of claims 14 to 21.
  23. 23. A speech recognition system for processing a voice command data file received from a vehicle, the voice command data file comprising an audio data set representing a voice command issued by an occupant of the vehicle and an application key; wherein the speech recognition system being configured to: read the application key in the voice command data file; authenticate the voice command data file in dependence on the application key; process the audio data set in the voice command data file in dependence on a successful authentication of the voice command data file.
  24. 24. A vehicle comprising a speech control module as claimed in any one of claims 1 to 5; and/or an authentication module as claimed in any one of claims 6 to 10.
  25. 25. A vehicle comprising a speech service system as claimed in claim 11.
GB2115125.3A 2021-10-21 2021-10-21 Voice command processing method and apparatus Pending GB2612079A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB2115125.3A GB2612079A (en) 2021-10-21 2021-10-21 Voice command processing method and apparatus
PCT/EP2022/078418 WO2023066760A1 (en) 2021-10-21 2022-10-12 Voice command processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2115125.3A GB2612079A (en) 2021-10-21 2021-10-21 Voice command processing method and apparatus

Publications (2)

Publication Number Publication Date
GB202115125D0 GB202115125D0 (en) 2021-12-08
GB2612079A true GB2612079A (en) 2023-04-26

Family

ID=78806087

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2115125.3A Pending GB2612079A (en) 2021-10-21 2021-10-21 Voice command processing method and apparatus

Country Status (2)

Country Link
GB (1) GB2612079A (en)
WO (1) WO2023066760A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002144982A (en) * 2000-11-06 2002-05-22 Mazda Motor Corp Operating method and system for on-vehicle apparatus and control device for such apparatus
US20130185072A1 (en) * 2010-06-24 2013-07-18 Honda Motor Co., Ltd. Communication System and Method Between an On-Vehicle Voice Recognition System and an Off-Vehicle Voice Recognition System
CN103559290A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for searching POI (point of interest)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002144982A (en) * 2000-11-06 2002-05-22 Mazda Motor Corp Operating method and system for on-vehicle apparatus and control device for such apparatus
US20130185072A1 (en) * 2010-06-24 2013-07-18 Honda Motor Co., Ltd. Communication System and Method Between an On-Vehicle Voice Recognition System and an Off-Vehicle Voice Recognition System
CN103559290A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for searching POI (point of interest)

Also Published As

Publication number Publication date
WO2023066760A1 (en) 2023-04-27
GB202115125D0 (en) 2021-12-08

Similar Documents

Publication Publication Date Title
US10163273B2 (en) Method and system for operating mobile applications in a vehicle
US8866604B2 (en) System and method for a human machine interface
US20190090174A1 (en) Vehicle as public wireless hotspot
US20170236343A1 (en) Regulating vehicle access using cryptographic methods
US20150270968A1 (en) Securing electronic control units using message authentication codes
KR101551037B1 (en) System for providing user with information in vehicle
US8849509B2 (en) Method and apparatus for interactive vehicular advertising
CN104025539A (en) Methods And Apparatus To Facilitate Single Sign-On Services
CN103368920A (en) Information service providing method, information service system and vehicle-mounted system
CN111882008A (en) Method and system for binding vehicle with Internet of vehicles account
CN112995403B (en) Vehicle control method and related device
RU2013157400A (en) METHOD AND DEVICE FOR AUTHENTICATION OF HYBRID TERMINAL USERS
US10284653B2 (en) Method and apparatus for utilizing NFC to establish a secure connection
CN110891256A (en) Vehicle-mounted system account login method and device based on Bluetooth key identification
CN109819419A (en) The method and apparatus for configuring and relaying for wireless valet parking key
CN111634289A (en) Vehicle component adjusting method, device, terminal and storage medium
CN105392034A (en) method and apparatus for infotainment system control through protocol
US9251788B2 (en) Method and apparatus for voice-based machine to machine communication
US20140336875A1 (en) System And Method For Providing Customized Audio Content To A Vehicle Radio System Using A Smartphone
GB2612079A (en) Voice command processing method and apparatus
CN110247915B (en) Vehicle-mounted system multi-account management method and device of vehicle
CN113936650A (en) Multi-turn conversation method, device and equipment
CN107967525B (en) Vehicle service data processing method and device
KR102386040B1 (en) A method, apparatus and computer readable storage medium having instructions for processing voice input, a vehicle having a voice processing function, and a user terminal
US20170141917A1 (en) Method and apparatus for secure wireless vehicle bus communication