CN114093354A

CN114093354A - Method and system for improving recognition accuracy of vehicle-mounted voice assistant

Info

Publication number: CN114093354A
Application number: CN202111245199.0A
Authority: CN
Inventors: 邱安崇; 钟启兴; 唐侨
Original assignee: Huizhou Desay SV Intelligent Transport Technology Research Institute Co Ltd
Current assignee: Huizhou Desay SV Intelligent Transport Technology Research Institute Co Ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2022-02-25

Abstract

The invention provides a method and a system for improving the recognition accuracy of a vehicle-mounted voice assistant.

Description

Method and system for improving recognition accuracy of vehicle-mounted voice assistant

Technical Field

The invention relates to the technical field of vehicle-mounted terminal control, in particular to a method and a system for improving the recognition accuracy of a vehicle-mounted voice assistant.

Background

With the rapid development of the automobile industry, the automobile popularity rate is remarkably improved, the technology in the automobile is more and more mature, and the automobile technology is gradually developed towards intellectualization and unmanned. The vehicle-mounted voice assistant is also an important part of the current automobile technology, and a driver or a passenger can transmit instructions to the vehicle-mounted entertainment system by voice only through the vehicle-mounted voice assistant, such as playing a certain song, inquiring a route of a certain place, adjusting the size of the voice and the like, and after the system receives the instructions, the personalized operation of the user is executed, so that more comfortable driving experience is achieved. At present, most of voice assistants in the industry use microphones as unique sensors for acquiring voice information, and have a single-microphone scheme, a double-microphone scheme, a four-microphone scheme and the like, and voice assistant intelligent technologies are widely applied in the consumption industry and are mature, however, in driving automobiles or some noisy scenes, the voice recognition accuracy is sometimes greatly influenced, and a system may not accurately recognize useful voice instructions; in addition, if other passengers in the vehicle want to adjust the system settings through the voice assistant, the identity, position and other conditions of the sender of the voice command are difficult to recognize by only using the microphone, and the system is easily interfered by other noises, so that the system is subjected to false triggering, and even is controlled by a hacker in a voice control mode.

Wherein, patent No. 201910072525.9 discloses a method, an electronic device and a storage medium for improving accuracy of speech recognition, in particular a method for improving accuracy of speech recognition, which is mainly applied to application scenes of consumption, voice information is acquired through a sound acquisition device, mouth shape identification information is acquired through a camera image sensor, the voice information and the mouth shape identification information are compared by a system, one instruction data is selected by an alternative mode, however, the system needs to collect voice instruction data and mouth shape data in advance, only compares the information in the database, and an object outside the database cannot send a voice instruction, in addition, the system selects other judgment results according to the self-defined recognition rate, and the single judgment still has higher error rate, misdetection rate and the like, and does not effectively improve the voice accuracy rate.

Disclosure of Invention

Aiming at the technical problems, the invention provides a method and a system for improving the recognition accuracy of a vehicle-mounted voice assistant, which are used for fusing a microphone and a camera, namely fusing video information and sound information, and judging the authenticity of a language instruction collected by a language instruction collecting end from character characteristics in a video so as to ensure the accuracy of the language instruction.

Specifically, the method of the invention comprises the following steps:

s1: starting the system to normally run;

s2: when a language instruction is received, recording a time period T1 of the language instruction at the time;

s3: calling image data information corresponding to the time period T1;

s4: and analyzing the image data information to judge whether the consistency of the language instruction and the image data information is greater than a preset value, wherein the preset value is preferably set to 80%. If yes, go to S5; otherwise, go to S6;

s5: the language instruction is effective, and an instruction program corresponding to the language instruction is executed; and feeding back the information of the command sender;

s6: inquiring whether a specific instruction is to be executed, if so, turning to S5; otherwise, go to S2.

The language instruction is collected through the vehicle-mounted microphone terminal, the time period T1 of the language instruction at the moment is recorded, and the time period T1 is fed back to the system controller end.

The image data information comprises at least a video and a time stamp.

The S3 further includes: the system controller end acquires real-time data of image data information acquisition, analyzes face posture and mouth part change of a driver or a passenger, identifies corresponding control instruction information, and simultaneously judges whether the control instruction information is consistent with the language instruction.

Wherein the information of the instruction issuer at least comprises the identity and the location of the instruction issuer.

As another preferred embodiment, the present invention further provides a system for improving recognition accuracy of a vehicle-mounted voice assistant, including:

the system comprises at least 1 camera module, a camera module and a display module, wherein the camera module is used for acquiring image data information in a vehicle in real time;

the at least 1 vehicle-mounted microphone terminal is used for collecting the language instruction, recording the time period T1 of the language instruction at the time and feeding back the time period T1 to the system controller end;

the system controller end is responsible for judging the consistency of the language instruction and the image data information, and when the data is consistent, the language instruction is effective and executes an instruction program corresponding to the language instruction; and feeding back the information of the command sender; otherwise, sending out a query, re-confirming or re-judging.

Data transmission among the camera module, the vehicle-mounted microphone terminal and the system controller end is transmitted through a wireless or USB data line.

The system controller further includes a memory, a processor, and a computer program stored on the memory and executable on the processor.

The computer program when executed by a processor implements a method for improving recognition accuracy of an in-vehicle voice assistant as described above.

In summary, the invention provides a method and a system for improving recognition accuracy of a vehicle-mounted voice assistant, wherein the method and the system are characterized in that after a microphone and a camera are fused, the system receives an instruction of the voice assistant, compares the opening and closing conditions of the mouths of a driver and a passenger and facial expressions in the same time period, and judges that a voice instruction is effective when the relevant characteristics of images accord with a certain degree, and identifies the identity and the position of a sender of the voice instruction, thereby improving the recognition accuracy of the voice assistant.

Drawings

Fig. 1 is a flowchart of a bluetooth short message confusion encryption method according to the present invention.

Fig. 2 is a view of the installation positions of the camera module and the vehicle-mounted microphone terminal of the invention at the whole automobile end.

Fig. 3 is a diagram of a communication process between the systems of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for improving the recognition accuracy of the vehicle-mounted voice assistant according to the present invention includes the following steps:

s1: starting the system to normally run;

s3: calling image data information corresponding to the time period T1;

The image data information comprises at least a video and a time stamp.

The S3 further includes: the system controller end acquires real-time data of image data information acquisition, analyzes face posture and mouth part change of a driver or a passenger, identifies corresponding control instruction information, and simultaneously judges whether the control instruction information is consistent with the language instruction. For example, according to the time period T1 of the language instruction, assume that T1 is am8.00, am8.01, am8.03, am8.04, am8.05, during which the passenger issues the language instruction; the language instructions may be selected as: moxa, please turn on the air conditioner; and after the vehicle-mounted microphone terminal collects the language instruction, recording the time period of the language instruction at the moment, sending the time period to a system controller end, and after the system controller end receives the information, calling video information collected by the camera head end, wherein the video information comprises real-time dynamic videos of the face and mouth shapes of all people in the vehicle, searching videos of the same time node according to T1, and if the face posture and mouth shape change of the copilot are matched at the moment, the change starting time is am8.00 and is continued to am8.05, primarily judging that the language instruction is sent by the people in the copilot, the information is accurate, and whether a specific instruction needs to be executed or not can be inquired in the next step. If the face postures and mouth changes of all people are not changed in the video, the fact that the language instruction is not sent by people in the vehicle, possibly noise outside the vehicle or noise at the mobile phone end can be judged, the language instruction is invalid, and the next operation is not carried out.

As another embodiment, as shown in fig. 2, for the installation positions of the camera module and the vehicle-mounted microphone terminal of the present invention on the whole vehicle end, it is preferable that the camera module, i.e., the in-vehicle monitoring camera, is installed at a position in front of the passenger seat, for example, at any position below the wind-blocking magic, so as to ensure the maximum wide angle of the camera as much as possible, and to clearly shoot all the conditions in the vehicle. The vehicle-mounted microphone terminals can be selectively arranged in the front row 2 and the rear row 2, for example, 1 microphone sensor is arranged in the left front of a driving position and in the right front of a secondary driving position. The rear row is also installed at the left and right front positions of the passenger and at the rear of the front seat. But the method is not limited to the method, and the method can be reasonably adjusted according to different requirements, vehicle models and the like.

The communication process between the systems specifically includes:

the camera module and the vehicle-mounted microphone terminal are in wireless communication with the system controller end or are connected through a USB to complete data transmission. The system controller end is used for controlling the playing and closing of the vehicle-mounted multimedia system, or the selection, addition or deletion of any multimedia information and the like. And finally, controlling the vehicle-mounted multimedia system through a system controller end, and playing through a vehicle-mounted sound box or a loudspeaker. Meanwhile, when any information needs to be notified or issued, the information is directly transmitted through the vehicle-mounted sound, for example, when the video data information and the sound control instruction are not consistent, language broadcasting or inquiry is carried out.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for improving the recognition accuracy of a vehicle-mounted voice assistant is characterized by comprising the following steps:

s1: starting the system to normally run;

s3: calling image data information corresponding to the time period T1;

s4: judging whether the consistency of the language instruction and the image data information is greater than a preset value or not by analyzing the image data information, if so, turning to S5, otherwise, turning to S6;

2. The method for improving the recognition accuracy of the vehicle-mounted voice assistant according to claim 1, further comprising: and acquiring the language instruction through the vehicle-mounted microphone terminal, recording the time period T1 of the language instruction at the moment, and feeding back the time period T1 to the system controller end.

3. The method for improving the recognition accuracy of the vehicle-mounted voice assistant according to claim 2, further comprising: the image data information comprises at least a video and a time stamp.

4. The method for improving recognition accuracy of the vehicle-mounted voice assistant according to claim 3, wherein the step S3 further comprises: the system controller end acquires real-time data of image data information acquisition, analyzes face posture and mouth part change of a driver or a passenger, identifies corresponding control instruction information, and simultaneously judges whether the control instruction information is consistent with the language instruction.

5. The method for improving the recognition accuracy of the vehicle-mounted voice assistant according to claim 4, further comprising: the preset value is set to 80%.

6. The method for improving the recognition accuracy of the vehicle-mounted voice assistant according to claim 5, further comprising: the information of the instruction issuer comprises at least the identity and location of the instruction issuer.

7. A system for improving recognition accuracy of a vehicle-mounted voice assistant is characterized by comprising:

8. The system of claim 7, wherein data transmission between the camera module, the vehicle microphone terminal and the system controller terminal is transmitted through a wireless or USB data line.

9. The system of claim 8, wherein the system controller further comprises a memory, a processor, and a computer program stored on the memory and executable on the processor.

10. The system according to claim 9, wherein the computer program, when executed by a processor, implements a method for improving recognition accuracy of a vehicle-mounted voice assistant according to any of claims 1-6.