CN117095682A

CN117095682A - Visible and speaking vehicle-mounted terminal voice recognition method and system

Info

Publication number: CN117095682A
Application number: CN202311183852.4A
Authority: CN
Inventors: 唐燕祥; 韦彩霞; 程志恒
Original assignee: Chery Automobile Co Ltd
Current assignee: Chery Automobile Co Ltd
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-11-21

Abstract

The invention discloses a visual and namely vehicle-mounted terminal voice recognition method and a system, wherein the method comprises the following steps: judging whether each page control in the vehicle-mounted terminal has a corresponding text identifier, and if not, setting the text identifier for the page control; if so, judging whether the existing text mark has the phenomenon of conflict or incomplete information, and if so, correcting the text mark for the phenomenon of conflict or incomplete information; acquiring a voice signal to be recognized, removing noise of the voice signal to be recognized, and converting the denoised voice signal into text data; all page controls are marked by positive integers according to the sequence from left to right and from top to bottom; and identifying the text data, and controlling the page control with the corresponding label according to the identification result.

Description

Visible and speaking vehicle-mounted terminal voice recognition method and system

Technical Field

The invention relates to the technical field of voice recognition, in particular to a visible and namely-speaking vehicle-mounted terminal voice recognition method and system.

Background

The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.

Voice communication is the most direct and effective communication mode between people, and voice recognition technology is to enable simple and efficient information transmission between people and machines. At present, the voice recognition technology is deep in aspects of our lives, such as applications of mobile phone voice input methods, voice assistants, voice retrieval and the like.

In the aspect of intelligent travel, the voice technology is also very useful, and is just needed in the vehicle-mounted field. From the earliest voice navigation, the vehicle-mounted voice control system of the prior art provides a plurality of brand-new interaction modes including vehicle control, social interaction, entertainment and the like, so that the attention of a driver is not concentrated on various complicated settings and buttons, and the driving experience is improved and the driving safety is enhanced to a certain extent. Different from the traditional vehicle-mounted system which is operated by keys or screens, a series of technologies such as multi-mode fusion detection, intelligent voice interaction, multi-screen interaction gesture operation and the like are used as standard of the next generation intelligent cabin, and the environment in the vehicle is relatively stable, the corpus is not too divergent, the voice recognition rate is high, so that the cabin is an excellent landing scene for deploying voice interaction.

It can be said that the control which can be clicked manually in the system is based on the barrier-free service capability of the android system and is built by combining with the recognition of semantic hotwords, so that the effect of performing simulated clicking on the control through the speech speaking method can be achieved, and the realization logic of the original application APP is not affected.

It can be seen that the speech link is different from the traditional speech interaction mode, does not support the capabilities of complex dialogue management, intention recognition and the like, the semantic understanding range and the vocabulary defined in the page can be understood as an end-to-end docking mode through hot word association, and the method has the advantages that a speech channel can be quickly built, more elements in the page can be controlled through speech, and the corresponding defects are that only the existing elements on the page can be controlled and the complex semantic understanding capability is not supported.

The inventor finds that the technical defects existing in the prior art are as follows: if the text, the picture, the switch, the button, the sliding block and other controls in the car machine are not marked and customized in advance, the driver cannot play the content by using the voice control page.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a visual and namely-speaking vehicle-mounted terminal voice recognition method and a visual and namely-speaking vehicle-mounted terminal voice recognition system; on the basis of the original voice recognition scheme, controls such as texts, pictures, switches, buttons, sliders and the like which appear in the automobile are customized and expanded, and the function that a screen is visible and can be controlled through voice is achieved.

In one aspect, a method for recognizing visible and speaking voice of a vehicle-mounted terminal is provided, which comprises the following steps:

judging whether each page control in the vehicle-mounted terminal has a corresponding text identifier, and if not, setting the text identifier for the page control; if so, judging whether the existing text mark has the phenomenon of conflict or incomplete information, and if so, correcting the text mark for the phenomenon of conflict or incomplete information;

acquiring a voice signal to be recognized, removing noise of the voice signal to be recognized, and converting the denoised voice signal into text data;

all page controls are marked by positive integers according to the sequence from left to right and from top to bottom; identifying the text data, and controlling page controls with corresponding labels according to the identification result;

in the process of identifying the text data, judging whether the text data exceeds a set length, if so, matching the text by adopting a character string fuzzy matching algorithm, performing similarity calculation on the text data and the text mark, and taking the text mark corresponding to the maximum similarity value as a screened text mark; and controlling the corresponding page control to act according to the screened text identification.

In another aspect, there is provided a visual and so-to-speak vehicle-mounted terminal voice recognition system, comprising:

a determination module configured to: judging whether each page control in the vehicle-mounted terminal has a corresponding text identifier, and if not, setting the text identifier for the page control; if so, judging whether the existing text mark has the phenomenon of conflict or incomplete information, and if so, correcting the text mark for the phenomenon of conflict or incomplete information;

an acquisition module configured to: acquiring a voice signal to be recognized, removing noise of the voice signal to be recognized, and converting the denoised voice signal into text data;

an identification control module configured to: all page controls are marked by positive integers according to the sequence from left to right and from top to bottom; identifying the text data, and controlling page controls with corresponding labels according to the identification result;

In still another aspect, there is provided an electronic device including:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.

In yet another aspect, a storage medium is provided that non-transitory stores computer readable instructions, wherein the instructions of the method of the first aspect are performed when the non-transitory computer readable instructions are executed by a computer.

In a further aspect, there is also provided a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.

One of the above technical solutions has the following advantages or beneficial effects:

the APP display interfaces of the automobile machine are marked and positioned, so that the previous song and the next song can be selected through voice control and playing of the interfaces under each page, and the next song enters a lower interface or returns to a main page. And manual operation is not needed, so that convenience is improved, and user experience is improved.

The car machine is controlled through voice recognition, so that functions of human eye visual function setting, multimedia opening and closing, button clicking, up-down and left-right sliding and the like are realized, cabin control convenience is improved, user experience is improved, and customer satisfaction is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a flow chart of a method according to a first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

All data acquisition in the embodiment is legal application of the data on the basis of meeting laws and regulations and agreements of users.

Speech recognition technology is a comprehensive discipline for designing a variety of knowledge. In recent years, speech recognition technology has advanced significantly from small vocabulary, isolated word recognition in laboratories to large vocabulary, non-specific person-specific continuous speech recognition.

The speech recognition application field is very wide, the most common is a speech input system, and compared with a keyboard input method, the speech recognition application field is more suitable for daily habits of people and is more natural and efficient. The voice control equipment operates more quickly and conveniently compared with manual control.

Example 1

The embodiment provides a visual and namely-speaking vehicle-mounted terminal voice recognition method;

as shown in fig. 1, a visual and i.e. a speech recognition method for a vehicle-mounted terminal includes:

s101: judging whether each page control in the vehicle-mounted terminal has a corresponding text identifier, and if not, setting the text identifier for the page control; if so, judging whether the existing text mark has the phenomenon of conflict or incomplete information, and if so, correcting the text mark for the phenomenon of conflict or incomplete information;

s102: acquiring a voice signal to be recognized, removing noise of the voice signal to be recognized, and converting the denoised voice signal into text data;

s103: all page controls are marked by positive integers according to the sequence from left to right and from top to bottom; identifying the text data, and controlling page controls with corresponding labels according to the identification result;

Further, the page control includes: text controls, switch controls, button controls, and slider controls.

Further, the text identification includes: switch, title, drag bar, slider assembly.

Further, a text identifier is set for the page control, which specifically comprises: the switch buttons are provided with text labels, such as Bluetooth switch, WIFI switch and the like, and are invisible to users.

Further, the determining whether the existing text identifier has a conflict or a phenomenon of incomplete information, where the conflict refers to: under the same page, the same label is made for two controls with different actions; incomplete information means that the control is not text marked or the text mark is incorrect.

Further, the text identification correction for the collision or the incomplete information phenomenon specifically includes: the conflicting identifications are modified to ensure that each switch button of the page has a unique identification.

Further, the step S102: the method comprises the steps of obtaining a voice signal to be recognized, and removing noise of the voice signal to be recognized, and specifically comprises the following steps: and removing noise from the voice signal to be recognized by an echo cancellation algorithm.

Further, the converting the denoised voice signal into text data specifically includes: and converting the recognized voice command into a text data stream, and transmitting the text data stream to a CPU for processing.

Further, the step S103: judging whether the text data exceeds the set length, if so, adopting a character string fuzzy matching algorithm to match the text, and specifically comprising the following steps: the CPU compares the text data with the pre-marked text mark, and if the comparison matching rate reaches a set threshold value, the user is considered to send out a voice instruction.

Further, the method further comprises:

when the page jump instruction is encountered, clicking the corresponding control to enter a lower interface, and playing the corresponding program.

When the previous, next, pause, play and stop instructions are met, the system recognizes and clicks the corresponding marked buttons to realize the functions.

Further, the method further comprises:

when the combined control is encountered, the primary function and the secondary function of the combined control are judged, the primary function is executed first, and then the secondary function is executed, wherein the priority of the primary function is higher than that of the secondary function.

For example: and the volume is adjusted, the volume adjusting function is firstly entered, and the volume is further adjusted to the target value.

A combination assembly, comprising: clicking on a component with a different function at a different location, or clicking on a component with the same function at a different location.

Further, the method further comprises:

and shielding the decorative control, and not marking the serial number.

After the voice assistant performs the control scanning of the whole page, the corresponding control is required to be decided according to the marked text on the control, so that the control for simulating clicking is required to be supported in the control, the application layer is required to cooperate with the embedded text marking, and the decorative control (such as brand LOGO and the like) which cannot be clicked is required to be shielded and visible to speak because no barrier service can scan all the controls.

The definition can be said to be: the part of the controls are represented by icons and do not have text marks, the controls are required to be described as being defined by products, the existing text marks have conflict or are not full of information, and the part of the controls are required to be complementarily defined.

The shielding can be said to be visible: since the barrier-free service can scan all the controls, the decorative controls which cannot be clicked need to be excluded from the visible and visible range, so that false clicks, such as logo with characters or decorative pictures, cannot be clicked and cannot be functioned, and bad user experience needs to be avoided due to clicking of voice functions.

Distinguishing semantics and visible can say: because the excessively complex intention recognition (voice instructions of the button control not under the page, such as instructions for navigating to a certain place under a music playing interface, can not be input, a navigation map can not be opened through the visible and visible instructions, and a traditional voice link is needed to be walked), when the intention of the control support is relatively large, the realization of the visible and visible voice link is not suitable, and the realization of the traditional voice link is more flexible.

The visual support for generalized parlance can say: in order to provide more grammar choices for users, and for the situation that users can not say full matching basically in driving scenes aiming at longer texts and titles (such as program names in media services and interest point names in navigation services), semantics are required to have fuzzy matching capability, and prefixes and suffixes commonly used in each service, such as Bluetooth opening/opening, are compatible, or suffixes are used, such as Bluetooth opening/opening, and the like. So as to improve the accuracy of semantic recognition. The product is required to comb the prefix and suffix requirements which are required to be supported by other applications, and submit semantic customization.

Because the pages of the whole system are relatively more, the click rules are simply simulated, and a plurality of conflict situations are difficult to anticipate and compatible, so that conflicting cases need to be identified, a proper feedback design is made, and the whole scheme is continuously perfected.

Numbering rules for controls: the number is increased from left to right, from top to bottom according to positive integers;

within an entire application page, there are no duplicate numbers, e.g., in the media head page, the labels should be from the top of the page to the last of the entire page after the pull-down, rather than just within the page that is currently visible;

after the wake-up, the labels appear, and the labels disappear after the wake-up exits.

The label is preset: the first and second … users can hit the corresponding serial numbers by speaking the first and second ….

Original text on the control is visible to the user, supplementary description is pre-embedded on the control, the part is invisible to the user, for example, the name of the icon is pre-embedded, marks can be scanned by barrier-free service, and in order to improve the accuracy of visible and visible clicking, the part invisible to the user needs to respond preferentially due to more comprehensive description.

Example two

The embodiment provides a visual and can say vehicle-mounted terminal voice recognition system, which comprises:

Here, it should be noted that the above-mentioned judging module, acquiring module and identifying control module correspond to steps S101 to S103 in the first embodiment, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.

Example III

The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.

The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Example IV

The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A visual and speaking vehicle-mounted terminal voice recognition method is characterized by comprising the following steps:

2. The method for recognizing voice of a visible and speaking vehicle-mounted terminal according to claim 1, wherein the judging whether the existing text labels have a collision or a phenomenon of insufficient information is characterized in that the collision is: under the same page, the same label is made for two controls with different actions; incomplete information means that the control is not text marked or the text mark is incorrect.

3. The method for recognizing voice of a visible and speaking vehicle-mounted terminal according to claim 1, wherein the text identification correction for the phenomenon of conflict or information insufficiency comprises: the conflicting identifications are modified to ensure that each switch button of the page has a unique identification.

4. The method for recognizing voice of a visible and speaking vehicle-mounted terminal according to claim 1, wherein the method for recognizing voice signals to be recognized comprises the steps of: and removing noise from the voice signal to be recognized by an echo cancellation algorithm.

5. The method for recognizing voice of a visible and speaking vehicle-mounted terminal according to claim 1, wherein the method for recognizing voice of a visible and speaking vehicle-mounted terminal is characterized by judging whether text data exceeds a set length, and if so, adopting a character string fuzzy matching algorithm to match the text, and specifically comprising: the CPU compares the text data with the pre-marked text mark, and if the comparison matching rate reaches a set threshold value, the user is considered to send out a voice instruction.

6. The visual and speech recognition method for a vehicle-mounted terminal according to claim 1, further comprising: when the page jump instruction is encountered, clicking the corresponding control to enter a lower interface, and playing the corresponding program.

7. The visual and speech recognition method for a vehicle-mounted terminal according to claim 1, further comprising: when the combined control is encountered, the primary function and the secondary function of the combined control are judged, the primary function is executed first, and then the secondary function is executed, wherein the priority of the primary function is higher than that of the secondary function.

8. A visual and can say the vehicle carried terminal speech recognition system, characterized by comprising:

9. An electronic device, comprising:

a memory for non-transitory storage of computer readable instructions; and

a processor for executing the computer-readable instructions,

wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-7.

10. A storage medium, characterized by non-transitory storing computer-readable instructions, wherein the instructions of the method of any one of claims 1-7 are performed when the non-transitory computer-readable instructions are executed by a computer.