CN115938343A

CN115938343A - Voice control method and device based on visual recognition, electronic equipment and medium

Info

Publication number: CN115938343A
Application number: CN202211529628.1A
Authority: CN
Inventors: 刘磊
Original assignee: Chery Automobile Co Ltd; Lion Automotive Technology Nanjing Co Ltd; Wuhu Lion Automotive Technologies Co Ltd
Current assignee: Chery Automobile Co Ltd; Lion Automotive Technology Nanjing Co Ltd; Wuhu Lion Automotive Technologies Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-04-07

Abstract

The present application relates to the field of visual recognition technology, and in particular, to a method, an apparatus, an electronic device, and a medium for voice control based on visual recognition, wherein the method includes: receiving a voice control instruction of a user; judging whether the application to be controlled completes interface control based on the voice control instruction or not, and sending out visual identification prompt when the application to be controlled does not complete interface control based on the voice control instruction; and receiving a visual recognition auxiliary action of a user on the interface to be controlled of the application to be controlled based on the visual recognition prompt, and performing interface control on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction. Therefore, the problem that related functions of the intelligent cabin cannot be controlled through voice in the intelligent cabin interface which does not support voice is solved, and various information in the intelligent cabin interface is identified by using a visual identification technology, so that a voice system obtains functional properties and specific text contents, and further the voice control function is realized.

Description

Voice control method and device based on visual recognition, electronic equipment and medium

Technical Field

The present application relates to the field of visual recognition technologies, and in particular, to a voice control method and apparatus, an electronic device, and a medium based on visual recognition.

Background

With the rapid development of vehicles, the visual recognition technology has been well applied to intelligent cabins, information such as interface characters and interface controls can be recognized by using an intelligent algorithm of vision in the vehicles, meanwhile, the intelligent cabins also support the voice recognition technology, and the interface characters and controls recognized by using the intelligent recognition algorithm provide possibility for controlling the interface characters and controls by using voice of users.

In the related art, voice customization is mostly adopted to implement voice control operation, or touch control action is directly used to implement related operation.

However, the above methods cannot directly implement voice control for the vehicle, so that the operation is complicated, and the driving experience of the user is not facilitated, and a solution is urgently needed.

Disclosure of Invention

The application provides a voice control method, a voice control device, electronic equipment and a voice control medium based on visual recognition, and aims to solve the problems that related functions of an intelligent cabin cannot be controlled through voice in an intelligent cabin interface which does not support voice.

The embodiment of the first aspect of the application provides a voice control method based on visual recognition, which includes the following steps:

receiving a voice control instruction of a user;

judging whether the application to be controlled completes interface control based on the voice control instruction or not, and sending out visual identification prompt when the application to be controlled does not complete interface control based on the voice control instruction; and

and receiving a visual recognition auxiliary action of the user on the to-be-controlled application to-be-controlled interface based on the visual recognition prompt, and performing interface control on the to-be-controlled application according to the visual recognition auxiliary action and the voice control instruction.

According to an embodiment of the application, the interface control of the application to be controlled according to the visual recognition auxiliary action and the voice control instruction includes:

identifying a control to be controlled corresponding to the interface to be controlled according to the visual recognition auxiliary action based on a preset visual recognition algorithm;

and determining the control action of the control to be controlled according to the voice control instruction, and controlling the control to be controlled to execute corresponding action according to the control action.

According to an embodiment of the application, after the interface manipulation is performed on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction, the method further includes:

determining the mapping relation between the voice control instruction and the control to be controlled;

and storing the mapping relation between the voice control instruction and the control to be controlled so as to directly control the control to be controlled to execute corresponding actions according to the voice control instruction when the voice control instruction is received next time.

According to an embodiment of the application, the visual recognition auxiliary action includes selecting a control to be controlled in the interface to be controlled, and/or inputting text information of the control to be controlled in the interface to be controlled.

According to the voice control method based on visual recognition, whether the application to be controlled is based on the received user voice control command to complete interface control is judged, visual recognition reminding is sent out when the application to be controlled does not complete interface control based on the voice control command, visual recognition auxiliary actions of the user on the interface to be controlled of the application to be controlled based on the visual recognition reminding are received, and interface control is conducted on the application to be controlled by combining the voice control command. Therefore, the problem that related functions of the intelligent cabin cannot be controlled through voice in the intelligent cabin interface which does not support voice is solved, and various information in the intelligent cabin interface is identified by using a visual identification technology, so that a voice system obtains functional properties and specific text contents, and further the voice control function is realized.

The embodiment of the second aspect of the present application provides a voice control device based on visual recognition, including:

the receiving module is used for receiving a voice control instruction of a user;

the judging module is used for judging whether the to-be-controlled application completes interface control based on the voice control instruction or not and sending out visual identification prompt when the to-be-controlled application does not complete interface control based on the voice control instruction; and

and the control module is used for receiving the visual recognition auxiliary action of the user on the to-be-controlled application to-be-controlled interface based on the visual recognition prompt, and carrying out interface control on the to-be-controlled application according to the visual recognition auxiliary action and the voice control instruction.

According to an embodiment of the present application, the control module is specifically configured to:

According to an embodiment of the application, after the interface manipulation is performed on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction, the control module is further configured to:

and storing the mapping relation between the voice control instruction and the control to be controlled so as to directly control the control to be controlled to execute corresponding action according to the voice control instruction when the voice control instruction is received next time.

According to the voice control device based on visual recognition, whether the application to be controlled is based on the received user voice control instruction is judged, and when the application to be controlled is not based on the voice control instruction, the visual recognition reminding is sent out, the visual recognition auxiliary action of the user on the interface to be controlled of the application to be controlled based on the visual recognition reminding is received, and the interface control is carried out on the application to be controlled by combining the voice control instruction. Therefore, the problem that related functions of the intelligent cabin cannot be controlled through voice in the intelligent cabin interface which does not support voice is solved, and various information in the intelligent cabin interface is identified by using a visual identification technology, so that a voice system obtains functional properties and specific text contents, and further the voice control function is realized.

An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the visual recognition based speech control method as described in the above embodiments.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the voice control method based on visual recognition as described in the above embodiments.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a voice control method based on visual recognition according to an embodiment of the present application;

FIG. 2 is a block diagram of a speech control apparatus based on visual recognition according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A method, an apparatus, an electronic device, and a medium for voice control based on visual recognition according to embodiments of the present application are described below with reference to the accompanying drawings. In the method, whether the application to be controlled is interface control based on a received user voice control instruction is judged, visual recognition reminding is sent out when the application to be controlled is not interface control based on the voice control instruction, visual recognition auxiliary action of the user on the interface to be controlled of the application to be controlled based on the visual recognition reminding is received, and interface control is carried out on the application to be controlled by combining the voice control instruction. Therefore, the problem that related functions of the intelligent cabin cannot be controlled through voice in the intelligent cabin interface which does not support voice is solved, and various information in the intelligent cabin interface is identified by using a visual identification technology, so that a voice system obtains functional properties and specific text contents, and further the voice control function is realized.

Specifically, fig. 1 is a schematic flowchart of a voice control method based on visual recognition according to an embodiment of the present application.

As shown in fig. 1, the voice control method based on visual recognition includes the following steps:

in step S101, a voice control instruction of a user is received.

Specifically, in this embodiment of the application, if a user needs to open a certain function in a vehicle, the user may control a relevant application to perform a corresponding action by sending a voice message, for example, the user wants to open an air conditioner, and then the following actions may be sent to the vehicle: the command of 'please open the air conditioner to 25 degrees' is used for controlling the vehicle to execute the function of opening and closing the air conditioner, and after the vehicle receives the voice control command sent by the user, the subsequent control operation is executed according to the command.

In step S102, it is determined whether the to-be-controlled application completes interface control based on the voice control instruction, and when the to-be-controlled application does not complete interface control based on the voice control instruction, a visual recognition prompt is issued.

Specifically, after receiving the voice control instruction of the user, the embodiment of the application further needs to judge whether the application to be controlled can complete corresponding interface control based on the voice control instruction of the user, so as to control the application to be controlled to execute the related function according to the judgment result.

Specifically, if the application to be controlled completes interface control based on the voice control instruction, the application to be controlled is directly controlled to execute the related function; if the application to be controlled does not complete interface control based on the voice control instruction, that is, the application to be controlled cannot directly recognize the voice control instruction sent by the user, at this time, visual recognition reminding needs to be performed on the voice control instruction sent by the user, so that the application to be controlled is subjected to voice control through visual recognition.

In step S103, a visual recognition auxiliary action of the user based on the visual recognition prompt on the interface to be controlled of the application to be controlled is received, and interface control is performed on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction.

Specifically, the visual identification auxiliary action of the embodiment of the application includes selecting a control to be controlled in the interface frame to be controlled, and/or inputting text information of the control to be controlled in the interface frame to be controlled. When the application to be controlled cannot complete interface control based on the voice control instruction, the user can remind the visual recognition auxiliary action on the interface to be controlled of the application to be controlled by receiving visual recognition, and interface control is conducted on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction.

Specifically, when a voice control instruction of a user is displayed on an interface to be controlled, the voice control instruction can be directly identified according to control information or text information of the interface to be controlled, but because differences of controls, buttons, text fonts and the like of the interface to be controlled are large, some interfaces for visual identification are difficult to judge, specific control names and corresponding execution actions cannot be accurately identified, visual identification reminding information can be sent to the user at the moment, and the user can inform the visual system of the specific names of the controls and the actions to be executed through auxiliary modes of framing, screenshot or directly inputting the text information of the controls to be controlled on the interface to be controlled, so that the control interface is identified.

Further, in some embodiments, the interface manipulation of the application to be controlled according to the visual recognition auxiliary action and the voice control instruction includes: identifying a control to be controlled corresponding to the interface to be controlled according to the visual recognition auxiliary action based on a preset visual recognition algorithm; and determining the control action of the control to be controlled according to the voice control instruction, and controlling the control to be controlled to execute the corresponding action according to the control action.

The preset visual recognition algorithm may be a visual recognition algorithm set by a person skilled in the art according to a functional requirement, or a visual recognition algorithm obtained through multiple times of simulation by a computer, and is not specifically limited herein.

Specifically, the control to be controlled corresponding to the interface to be controlled is identified through a preset visual identification algorithm and the visual identification auxiliary action of the user, so that the control action of the control to be controlled is determined according to the voice control instruction of the user, and the control to be controlled is controlled to execute the corresponding action according to the control action.

For example, when it is recognized that the to-be-controlled control corresponding to the to-be-controlled interface is a humidity sensor in the vehicle, and the voice control instruction of the user is to control the humidity in the vehicle to be 60%, at this time, the vehicle humidity sensor is controlled to adjust the humidity in the vehicle to be 60% according to the voice control instruction of the user.

Further, in some embodiments, after performing interface manipulation on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction, the method further includes: determining a mapping relation between a voice control instruction and a control to be controlled; and storing the mapping relation between the voice control instruction and the control to be controlled so as to directly control the control to be controlled to execute corresponding action according to the voice control instruction when the voice control instruction is received next time.

Specifically, in the embodiment of the application, after the visual system identifies the to-be-controlled control corresponding to the to-be-controlled interface and the corresponding action to be executed by the to-be-controlled control, the to-be-controlled control and the corresponding execution action are correlated and stored to form the mapping relationship between the voice control instruction and the to-be-controlled control, that is, when the voice control instruction of the user is received, the relevant simulated click operation can be performed according to the characters of the to-be-controlled interface and the relevant control information, so that the voice control is realized, and when the voice control instruction of the user is received next time, the to-be-controlled control can be directly controlled to execute the corresponding action according to the voice control instruction, so that the use requirement of the user is realized.

Next, a voice control apparatus based on visual recognition proposed according to an embodiment of the present application is described with reference to the drawings.

Fig. 2 is a block diagram of a speech control apparatus based on visual recognition according to an embodiment of the present application.

As shown in fig. 2, the voice control apparatus 10 based on visual recognition includes: a receiving module 100, a judging module 200 and a control module 300.

The receiving module 100 is configured to receive a voice control instruction of a user;

the judging module 200 is used for judging whether the to-be-controlled application completes interface control based on the voice control instruction or not and sending out visual identification prompt when the to-be-controlled application does not complete interface control based on the voice control instruction; and

and the control module 300 is configured to receive a visual recognition auxiliary action of the user on the to-be-controlled application interface based on the visual recognition prompt, and perform interface control on the to-be-controlled application according to the visual recognition auxiliary action and the voice control instruction.

Further, in some embodiments, the control module 300 is specifically configured to:

identifying a to-be-controlled control corresponding to the to-be-controlled interface according to the visual identification auxiliary action based on a preset visual identification algorithm;

and determining the control action of the control to be controlled according to the voice control instruction, and controlling the control to be controlled to execute the corresponding action according to the control action.

Further, in some embodiments, after performing the interface manipulation on the application to be controlled according to the visual recognition auxiliary action and the voice control instruction, the control module 300 is further configured to:

determining a mapping relation between a voice control instruction and a control to be controlled;

Further, in some embodiments, the visual recognition auxiliary action includes selecting a control to be controlled in the interface box to be controlled, and/or inputting text information of the control to be controlled in the interface to be controlled.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 301, a processor 302, and a computer program stored on the memory 301 and executable on the processor 302.

The processor 302, when executing the program, implements the voice control method based on visual recognition provided in the above-described embodiments.

Further, the electronic device further includes:

a communication interface 303 for communication between the memory 301 and the processor 302.

A memory 301 for storing computer programs executable on the processor 302.

The memory 301 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 301, the processor 302 and the communication interface 303 are implemented independently, the communication interface 303, the memory 301 and the processor 302 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

Optionally, in a specific implementation, if the memory 301, the processor 302, and the communication interface 303 are integrated on a chip, the memory 301, the processor 302, and the communication interface 303 may complete communication with each other through an internal interface.

The processor 302 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above voice control method based on visual recognition.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A voice control method based on visual recognition is characterized by comprising the following steps:

receiving a voice control instruction of a user;

2. The method according to claim 1, wherein the interface manipulation of the application to be controlled according to the visual recognition auxiliary action and the voice control instruction comprises:

3. The method according to claim 2, further comprising, after the interface manipulation of the application to be controlled according to the visual recognition auxiliary action and the voice control instruction:

4. The method according to any one of claims 1 to 3, wherein the visual recognition auxiliary action comprises selecting a control to be controlled in the interface frame to be controlled, and/or inputting text information of the control to be controlled in the interface to be controlled.

5. A voice control apparatus based on visual recognition, comprising:

6. The apparatus of claim 5, wherein the control module is specifically configured to:

7. The apparatus of claim 6, wherein after the interface manipulation of the application to be controlled according to the visual recognition auxiliary action and the voice control instruction, the control module is further configured to:

8. The device according to any one of claims 5 to 7, wherein the visual recognition auxiliary action comprises selecting a control to be controlled in the interface box to be controlled, and/or inputting text information of the control to be controlled in the interface to be controlled.

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor executing the program to implement the visual recognition based speech control method of any one of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a visual recognition based speech control method according to any of claims 1-4.