CN115662410A

CN115662410A - Vehicle-mounted machine voice interaction method and vehicle-mounted machine

Info

Publication number: CN115662410A
Application number: CN202210970418.XA
Authority: CN
Inventors: 李威; 李永超; 方昕; 周传福; 潘志兵
Original assignee: Anhui Xunfei Huanyu Technology Co ltd
Current assignee: Anhui Xunfei Huanyu Technology Co ltd
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2023-01-31

Abstract

The invention discloses a car machine voice interaction method and a car machine, wherein the car machine voice interaction method comprises the following steps: acquiring a voice signal to be recognized; when the car machine is in an awakening-free scene, decoding the voice signals by adopting an awakening-free finite state receiver (FSA) decoding mode in the voice model; when the vehicle machine is in a non-wake-free scene, a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-free finite state receiver (FSA) decoding mode in a voice model are adopted to decode the voice signal. Therefore, according to the vehicle-mounted machine voice interaction method, the crosstalk between the wake-up-free scene and the non-wake-up-free scene is reduced through independent decoding of the vehicle-mounted machine in the wake-up-free scene and the non-wake-up-free scene respectively.

Description

Vehicle-mounted machine voice interaction method and vehicle-mounted machine

Technical Field

The invention relates to the technical field of voice, in particular to a vehicle-mounted voice interaction method and a vehicle-mounted device.

Background

With the rapid development of AI (Artificial Intelligence) technology, automobiles are gradually developing in a more intelligent direction. Since speech technology is an entry point for intelligence, it is necessary to build a more convenient and intelligent speech recognition system. In consideration of factors such as small memory occupation, high recognition rate, high safety, customizability and good experience, a voice recognition system adopting an awakening-free technology is provided in the related art.

However, the speech recognition system in the above-described technology has the following problems:

1) Mostly, the system runs in a full-CPU (Central Processing Unit) platform environment, has limited computing power, high power consumption and high cost, and cannot use parallel threads to complete specific tasks;

2) Mostly, WFST (Weighted Finite-State converter) or FSA (Finite State receiver) is adopted for mixed implementation, so that in actual use, a crosstalk problem occurs, that is, it is impossible to know whether an awakening-free scene or a non-awakening-free scene occurs, and usage experience is affected;

3) The resource loading of the voice recognition is mostly performed by directly reading the memory, which may result in a low memory utilization rate.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, a first objective of the present invention is to provide a car-machine voice interaction method, which reduces the problem of crosstalk between an exempt wake-up scene and a non-exempt wake-up scene.

The second purpose of the invention is to provide a vehicle machine.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a car-mounted device voice interaction method, where the method includes: acquiring a voice signal to be recognized; when the vehicle machine is in an awakening-free scene, decoding the voice signal by adopting an FSA decoding mode of an awakening-free finite state receiver in a voice model; and when the car machine is in a non-wake-up-free scene, decoding the voice signal by adopting a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-up-free finite state receiver (FSA) decoding mode in the voice model.

According to the vehicle-mounted voice interaction method, firstly, a voice signal to be recognized is obtained; when the car machine is in an awakening-free scene, decoding the voice signals by adopting an awakening-free finite state receiver (FSA) decoding mode in the voice model; when the car machine is in a non-wake-up-free scene, decoding the voice signal by adopting a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-up-free finite state receiver (FSA) decoding mode in a voice model. Therefore, through the vehicle-mounted machine voice interaction method, the vehicle-mounted machine realizes independent decoding in the wake-up-free scene and the non-wake-up-free scene respectively, and the problem of crosstalk between the wake-up-free scene and the non-wake-up-free scene is reduced.

In addition, the car-machine voice interaction method provided by the above embodiment of the present invention may further have the following additional technical features:

in an embodiment of the present invention, after decoding the speech signal in the wake-free FSA decoding manner, the method further includes: performing semantic recognition on the awakening-free FSA decoding result to obtain a first semantic recognition result; judging whether the first semantic recognition result is in a preset wake-up-free statement set or not; if yes, generating a control instruction corresponding to the first semantic recognition result so as to correspondingly control the vehicle and control the vehicle machine to be switched to a non-exempt awakening scene.

In an embodiment of the present invention, the preset set of wake-up exempt statements includes a personalized sentence pattern and a non-personalized sentence pattern, and the method further includes: acquiring a wake-free hotword list, wherein the wake-free hotword list comprises at least one wake-free hotword; constructing a wake-free hotword network based on the wake-free hotword list; and loading the wake-free hotword network to a position occupying slot corresponding to the personalized sentence pattern before decoding the voice signal by adopting the wake-free FSA decoding mode.

In an embodiment of the present invention, when controlling the car machine to switch to a non-wake-up-exempt scenario, the method further includes: and unloading the wake-free hotword network.

In an embodiment of the present invention, after decoding the speech signal in the WFST decoding manner and the non-wake-up-free FSA decoding manner, the method further includes: performing semantic recognition on the WFST decoding result and the non-wakeup-free FSA decoding result to obtain a second semantic recognition result; and generating a control instruction corresponding to the second semantic recognition result so as to correspondingly control the vehicle.

In an embodiment of the present invention, after the vehicle device is in a non-wake-exempt scenario and enters a wake-up state, the method further includes: and if the voice signal is not acquired after waiting for the preset time, controlling the vehicle machine to exit the awakening state.

In one embodiment of the invention, the method is performed based on a neural network processor NPU environment.

In one embodiment of the invention, the method further comprises: loading the voice model into a memory of the NPU, and establishing a mapping relation between a virtual memory area and a file object in the memory; and when the reading resources need to be loaded, performing data interaction through the mapping relation.

In one embodiment of the invention, the memory comprises a magnetic disk.

In order to achieve the above object, a second embodiment of the present invention provides a car machine, which includes an NPU and a storage medium, where the storage medium stores a computer program, and when the computer program is executed by the NPU, the car machine implements the car machine voice interaction method.

The vehicle machine comprises a neural network processor NPU and a storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by the NPU, a voice signal to be recognized is obtained firstly; when the car machine is in an awakening-free scene, decoding the voice signal by adopting an awakening-free finite state receiver (FSA) decoding mode in a voice model; when the car machine is in a non-wake-up-free scene, decoding the voice signal by adopting a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-up-free finite state receiver (FSA) decoding mode in a voice model. Therefore, voice recognition is achieved in the NPU environment, the CPU occupancy rate can be reduced, the inference efficiency is improved, independent decoding is achieved in the non-awakening-free scene and the non-awakening-free scene through the vehicle machine, and the problem of crosstalk between the non-awakening-free scene and the non-awakening-free scene is reduced.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flowchart of a car-mounted device voice interaction method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an FSA schema for an example wake-free scenario of the present invention;

FIG. 3 is a diagram illustrating an exemplary predefined wake-free grammar-based centralized personalized sentence pattern in accordance with the present invention;

FIG. 4 is a flowchart illustrating an exemplary car-in-vehicle voice interaction method according to the present invention;

FIG. 5 is a flowchart illustrating a car-mounted device voice interaction method according to another exemplary embodiment of the present invention;

FIG. 6 is a schematic diagram of a neural network processor NPU according to an example of the present invention;

fig. 7 is a flowchart illustrating a car-mounted device voice interaction method according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The car-machine voice interaction method and the car machine according to the embodiment of the invention are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a car-mounted device voice interaction method according to an embodiment of the present invention.

S101, acquiring a voice signal to be recognized.

Specifically, the car machine starts, starts resource initialization, and acquires a voice signal to be recognized. The voice signal to be recognized may be a voice signal in any scene, and the voice signal to be recognized may be acquired through a voice acquisition device, for example, the voice acquisition device may be a microphone. In addition, because noise may exist in the whole process of voice acquisition by the car machine, noise filtering may also be performed on the acquired original voice signal to obtain a voice signal to be recognized, and the invention is not limited in any way.

It should be noted that, when the car machine starts, the resources included in the car machine will be called, and since the resources are large, the resources need to be initialized after the car machine starts. For example, the resource may include VAD (Voice Activity Detection), acoustic CLDNN (Long Short-Term Memory, full connected Deep Neural Networks), WFST (Weighted Finite State converter) network, and the like, and particularly, if the wake-free scene needs to be turned on, the resource also needs to include FSA (Finite State receiver) network.

S102, when the vehicle machine is in the scene of no awakening, decoding the voice signal by adopting the FSA decoding mode of the finite state receiver of no awakening in the voice model.

The speech model comprises a finite state receiver wake-free FSA decoding mode, a non-wake-free FSA decoding mode and a Weighted Finite State Transducer (WFST) decoding mode. Specifically, when the car machine is in an awakening-free scene, only an awakening-free FSA decoding mode in the voice model is selected for independent decoding, namely, an awakening-free FSA decoding result is obtained through an acoustic CLDN model and the awakening-free FSA decoding mode in the language model.

In some embodiments of the present invention, after decoding the voice signal by using the wake-free FSA decoding method, the method may further include the following steps:

s201, performing semantic recognition on the awakening-free FSA decoding result to obtain a first semantic recognition result.

Specifically, a wake-free FSA decoding manner is adopted to decode the voice signal to obtain a wake-free FSA decoding result, and further, the wake-free FSA decoding result needs to be subjected to semantic recognition to obtain a first semantic recognition result, referring to the specific example of fig. 2, an arc between each node corresponds to a word, the pointing sequence of the directed graph is the sentence pattern to be expressed, and the preset wake-free statement set in the wake-free scene can be expressed by the FSA sentence pattern similar to fig. 2, so that the FSA sentence pattern in the special wake-free scene can be set to form the preset wake-free statement set.

S202, judging whether the first semantic recognition result is in a preset wake-free statement set.

Specifically, based on the FSA sentence pattern in fig. 2, the FSA sentence pattern is generally divided into two FSA sentence patterns, i.e., a personalized sentence pattern and a non-personalized sentence pattern, so the preset wakeup-free statement set includes the personalized sentence pattern and the non-personalized sentence pattern. For example, the personalized sentence pattern may be a sentence pattern such as "call XX", "play XX song", "navigate XX", and the like, and the non-personalized sentence pattern may be an instruction such as vehicle control, and the like, such as "temperature is increased by one degree", "window is swung down by one half", and the like.

For the personalized sentence pattern in the preset wake-up exemption statement set, in some embodiments of the present invention, the personalized sentence pattern may specifically include:

a1, obtaining an awakening-free hotword list, wherein the awakening-free hotword list comprises at least one awakening-free hotword.

The wake-free hotword list may include one or more wake-free hotwords, which may be imported according to usage habits of the user, for example, the wake-free hotwords may be songs frequently listened to by the user, frequently visited places, and frequently used contacts, which is not limited in the present invention.

And A2, constructing the wake-free hotword network based on the wake-free hotword list.

It should be noted that the wake-free hotword network can be loaded or unloaded according to the switching of the scenes.

And A3, loading the wake-free hotword network to the occupation slot corresponding to the personalized sentence pattern before decoding the voice signal by adopting a wake-free FSA decoding mode.

As an example, referring to fig. 3, the personalized sentence pattern is "call to XX", which includes the main sentence pattern "call to" and the slot "XX", where "XX" is a locally imported wake-free hotword, i.e. the content of the slot in the personalized FSA sentence pattern, and the user can update the slot in the personalized FSA sentence pattern according to the way of locally importing the hotword network. The wake-free hotword network is constructed based on the acquired wake-free hotword list, and is pre-loaded into a position occupying slot corresponding to the personalized sentence pattern so as to identify the voice signal.

It should be noted that before loading the wake-free FSA hotword network, if a non-wake-free FSA hotword network exists, the non-wake-free FSA hotword network also needs to be unloaded, and the non-wake-free FSA hotword network exists in the non-wake-free FSA decoding manner. The difference between the non-wakeup-free FSA decoding mode and the wakeup-free FSA decoding mode is that the sentence patterns of the two decoding modes and the contents of the collocation of the hot word occupying slots are different.

And S203, if so, generating a control instruction corresponding to the first semantic recognition result to correspondingly control the vehicle, and controlling the vehicle machine to switch to a non-wake-up-free scene.

Specifically, if the first semantic recognition result is in the preset wake-up-free expression set, a control command corresponding to the first semantic recognition result is generated. For example, if the first semantic recognition result is "call to the tabloid", the vehicle is controlled to complete the command of "call to the tabloid".

Furthermore, since the wake-free FSA sentence pattern and other voice model resources are already loaded into the car machine, when the car machine is controlled to switch to the non-wake-free scene, only the hotword network in the personalized sentence pattern needs to be unloaded, and the contents of the remaining main sentence pattern and the original space occupying slot are needed.

In addition, if the first semantic recognition result is not in the preset wake-up-free statement set, the car machine continues to be in a wake-up-free scene and is in a listening state until the car machine is manually turned off or the first semantic recognition result is recognized to be in the preset wake-up-free statement set.

S103, when the vehicle is in a non-wake-up-exempt scene, decoding the voice signal by adopting a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-up-exempt finite state receiver (FSA) decoding mode in a voice model.

It should be noted that, when the vehicle machine switches to the non-wake-up-free scenario, if the non-wake-up-free FSA hotword network exists at present, the non-wake-up-free FSA hotword network needs to be unloaded, and the non-wake-up-free FSA hotword network is loaded.

In some embodiments of the present invention, after decoding the voice signal by using the WFST decoding method and the non-wake-up free FSA decoding method, the method may further include:

s301, performing semantic recognition on the WFST decoding result and the non-wakeup-free FSA decoding result to obtain a second semantic recognition result.

And decoding the voice signal through an acoustic CLDNN model, a WFST decoding mode in a voice model and a non-wakeup-free FSA decoding result to obtain a WFST decoding result and a non-wakeup-free FSA decoding result, and performing semantic recognition on the WFST decoding result and the non-wakeup-free FSA decoding result to obtain a second semantic recognition result.

And S302, generating a control instruction corresponding to the second semantic recognition result so as to correspondingly control the vehicle.

In a non-wake-up-free scene, the vehicle device needs to be controlled to enter a wake-up state through a wake-up word, wherein the wake-up word can be preset according to specific conditions of a user or the vehicle device, and the invention is not limited. And after the car machine enters the awakening state, waiting for the acquisition of the voice signal. If waiting for the preset time and not acquiring the next voice signal, controlling the vehicle machine to exit the awakening state; and if the voice signal is acquired within the waiting preset time, decoding the voice signal in a WFST (weighted round robin) decoding mode and a non-wake-up-free FSA (free self-adaptive streaming) decoding mode, controlling the vehicle machine to execute a related instruction according to a semantic judgment result of the voice signal, and then ending the conversation. The waiting time may be set according to practical situations, for example, the waiting time may be 20 seconds.

Based on the above-mentioned switch between the non-wakeup scene and the non-wakeup scene, referring to fig. 4, after the vehicle is started, the resources included in the vehicle are globally initialized and loaded, the corresponding decoding modes are switched accordingly through the switch between the non-wakeup scene and the non-wakeup scene of the vehicle, the non-wakeup scene is independently decoded by adopting the non-wakeup FSA decoding mode, the non-wakeup scene is decoded by adopting the WFST decoding mode and the non-wakeup FSA decoding mode, that is, the serial decoding process is performed through the acoustic CLDNN model and the language model, and when the first semantic recognition result in the non-wakeup scene is in the preset non-wakeup statement set, that is, the semantic is hit semantic, the voice signal reception is stopped. Therefore, the scene switching is carried out by updating the decoding mode in the language model, and the false triggering of the decoding mode in the non-wake-up-free scene and the non-wake-up-free scene can be reduced.

In some embodiments of the invention, the car-machine voice interaction method is performed based on a neural network processor NPU environment.

In the recognition process of the voice signals, in order to improve the generalization and accuracy of recognition, the trained model has a relatively complex structure and a large parameter quantity, so that the calculation power is high in demand, and therefore in some embodiments of the invention, the vehicle-mounted voice interaction method is executed based on the NPU environment of the neural network processor, so that the NPU calculation power can be better utilized. As an example, a high pass 8155 chip may be selected to implement the NPU environment.

Specifically, referring to fig. 5, the architecture of the CPU is generally a von neumann structure, the main body of the architecture is a storage unit and a control unit, the number of computing units is only a small number, and the CPU can be well implemented with relatively few computing units. Then, for the acoustic CLDNN model in the embodiment of the present invention, if the CPU is used to execute, the computing power cannot be satisfied, which may result in high CPU occupation. The counter-looking NPU, which has the advantage of being able to run parallel threads, is globally referred to as SA8155P, has 8 core units, and its computing power can reach 8TOPS, and when running an acoustic CLDNN model, it can reach 8 trillion times/second. The acoustic CLDNN model adopts a Pythrch framework to derive specific model resources, the format can make corresponding operator library support and optimization based on an NPU computing power environment, and can also transfer a voice model related to a vehicle-mounted machine voice interaction method from a CPU to the NPU environment, so that the computing use of a vehicle-mounted machine CPU can be reduced, and the CPU resources are more inclined to the advantages of the CPU.

In some embodiments of the present invention, the car-in-car voice interaction method further includes:

s401, loading the voice model into a memory of the NPU, and establishing a mapping relation between a virtual memory area and a file object in the memory.

In particular, the memory may comprise a magnetic disk.

S402, when the reading resource needs to be loaded, data interaction is carried out through the mapping relation.

Since the WFST and FSA resources in the speech model are themselves large and memory is a very valuable resource for offline devices, the speech model resources need to be processed in a memory-mapped manner. Specifically, referring to fig. 6, firstly, the voice model resource is loaded into a storage, such as a disk, and a virtual memory area is established, where the virtual memory area is in a mapping relationship with the file object in the disk, and when the voice model resource needs to be loaded and read by an operation object, data interaction between the disk and the virtual memory area can be directly performed through the mapping relationship, so that the number of copies is reduced, the efficiency of data reading is improved, and the utilization rate of the memory is also improved.

To better understand the car-machine voice interaction method according to the embodiment of the present invention, as an example, fig. 7 is a schematic flow diagram of the car-machine voice interaction method according to an example of the present invention.

As shown in fig. 7, after the vehicle is started and the resource initialization is completed, if the user starts the wake-free scene, the user will be in a listening state all the time, and at this time, only the wake-free FSA decoding mode in the speech model is used to decode the speech signal to be recognized, the decoding path will not enter WFST decoding, and the wake-free FSA hotword network is loaded, and the non-wake-free FSA hotword network is unloaded; firstly, judging whether a voice signal appears, if not, keeping the voice signal in a listening state, not operating an acoustic CLDNN model and a voice model, if the voice signal is detected by a vehicle machine, processing the voice signal in the acoustic CLDNN model and a wake-free FSA decoding mode in the voice model to obtain a wake-free FSA recognition result, obtaining a first semantic recognition result through semantic recognition, and if the first semantic recognition result is not in a preset wake-free statement set, continuously operating the wake-free scene until a user manually closes the wake-free scene or the first semantic recognition result is in the preset wake-free statement set.

Further, if the first semantic recognition result is specified as hit wake-free semantics in the preset wake-free statement set, the round of recognition is finished, the scene is switched to a non-wake-free scene at this time, the decoding mode is modified to WFST decoding and non-wake-free FSA decoding, meanwhile, the wake-free FSA hot word network needs to be unloaded and the wake-free FSA hot word network needs to be loaded, a sentence network and original space occupying slot contents are left, and fig. 3 can be referred to; and decoding the voice signal by a WFST decoding mode and a non-wake-free FSA decoding mode to obtain a WFST decoding result and a non-wake-free FSA decoding result, obtaining a second semantic recognition result by semantic recognition, and if the second semantic recognition result is used for correspondingly controlling the vehicle.

To sum up, the in-vehicle voice interaction method according to the embodiment of the present invention first obtains a voice signal to be recognized, and when the in-vehicle is in an un-awakening scene, decodes the voice signal by using an un-awakening FSA decoding manner of a finite state receiver in a voice model, and when the in-vehicle is in an un-awakening scene, decodes the voice signal by using a WFST decoding manner of a weighted finite state transducer in the voice model and an un-awakening FSA decoding manner of the finite state receiver. Therefore, the vehicle-mounted voice interaction method reduces the problem of false triggering between the non-wakeup-free scene and the non-wakeup-free scene through independent decoding of the vehicle-mounted device in the wakeup-free scene and the non-wakeup-free scene respectively. In addition, based on a high-pass 8155 chip, voice model resources of the vehicle-mounted voice interaction method are migrated to an NPU (neural network processor) environment, so that the CPU occupancy rate is reduced, and the model operation efficiency is improved; particularly, under the scene of offline voice interaction, the memory utilization rate and the data interaction efficiency are improved through the mapping relation between the disk and the virtual memory area.

Further, the invention provides a vehicle machine.

In the embodiment of the invention, the car machine comprises a neural network processor NPU and a storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by the NPU, the car machine voice interaction method is realized.

The vehicle machine comprises a neural network processor NPU and a storage medium, wherein a computer program is stored on the storage medium, when the computer program is executed by the NPU, a voice signal to be recognized is obtained firstly, when the vehicle machine is in an awakening-free scene, the voice signal is decoded by adopting an awakening-free FSA decoding mode of a finite state receiver in a voice model, and when the vehicle machine is in a non-awakening-free scene, the voice signal is decoded by adopting a weighted finite state transducer WFST decoding mode and a non-awakening-free FSA decoding mode of the finite state receiver in the voice model. Therefore, the vehicle-mounted voice interaction method reduces the problem of false triggering between the non-wakeup-free scene and the non-wakeup-free scene through independent decoding of the vehicle-mounted device in the wakeup-free scene and the non-wakeup-free scene respectively. In addition, based on a high-pass 8155 chip, voice model resources of the vehicle-mounted voice interaction method are migrated to an NPU (neural network processor) environment, so that the CPU occupancy rate is reduced, and the model operation efficiency is improved; particularly, under the scene of offline voice interaction, the memory utilization rate and the data interaction efficiency are improved through the mapping relation between the disk and the virtual memory area.

It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise explicitly stated or limited, the terms "mounted," "connected," "fixed," and the like are to be construed broadly, e.g., as being permanently connected, detachably connected, or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be interconnected within two elements or in a relationship where two elements interact with each other unless otherwise specifically limited. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A vehicle-mounted voice interaction method is characterized by comprising the following steps:

acquiring a voice signal to be recognized;

when the vehicle machine is in an awakening-free scene, decoding the voice signal by adopting an FSA decoding mode of an awakening-free finite state receiver in a voice model;

and when the car machine is in a non-wake-up-free scene, decoding the voice signal by adopting a Weighted Finite State Transducer (WFST) decoding mode and a non-wake-up-free finite state receiver (FSA) decoding mode in the voice model.

2. The car-machine voice interaction method according to claim 1, wherein after decoding the voice signal in a wake-free FSA decoding manner, the method further comprises:

performing semantic recognition on the awakening-free FSA decoding result to obtain a first semantic recognition result;

judging whether the first semantic recognition result is in a preset wake-up-free statement set or not;

if yes, generating a control instruction corresponding to the first semantic recognition result so as to correspondingly control the vehicle and control the vehicle machine to be switched to a non-exempt awakening scene.

3. The car-machine voice interaction method according to claim 2, wherein the preset wake-free grammar set includes a personalized sentence pattern and an un-personalized sentence pattern, the method further comprising:

acquiring a wake-free hotword list, wherein the wake-free hotword list comprises at least one wake-free hotword;

constructing a wake-free hotword network based on the wake-free hotword list;

and loading the wake-free hotword network to a position occupying slot corresponding to the personalized sentence pattern before decoding the voice signal by adopting the wake-free FSA decoding mode.

4. The vehicle-mounted machine voice interaction method according to claim 3, wherein when the vehicle-mounted machine is controlled to switch to a non-wake-up-free scene, the method further comprises:

and unloading the wake-free hotword network.

5. The car machine voice interaction method of claim 1, wherein after decoding the voice signal in the WFST decoding manner and the non-wake-free FSA decoding manner, the method further comprises:

performing semantic recognition on the WFST decoding result and the non-wakeup-free FSA decoding result to obtain a second semantic recognition result;

and generating a control instruction corresponding to the second semantic recognition result so as to correspondingly control the vehicle.

6. The car machine voice interaction method according to claim 5, wherein when the car machine is in a non-wake-exempt scenario and enters a wake-up state, the method further comprises:

and if the voice signal is not acquired after waiting for the preset time, controlling the vehicle machine to exit the awakening state.

7. The car-machine voice interaction method according to any one of claims 1 to 6, wherein the method is executed based on a neural Network Processor (NPU) environment.

8. The car machine voice interaction method of claim 7, characterized by further comprising:

loading the voice model into a memory of the NPU, and establishing a mapping relation between a virtual memory area and a file object in the memory;

and when the reading resources need to be loaded, performing data interaction through the mapping relation.

9. The car-in machine voice interaction method according to claim 8, wherein the memory comprises a disk.

10. A car machine comprising a neural network processor NPU and a storage medium, the storage medium having a computer program stored thereon, the computer program, when executed by the NPU, implementing the car machine voice interaction method as claimed in any one of claims 1 to 9.