CN114333017A

CN114333017A - Dynamic pickup method and device, electronic equipment and storage medium

Info

Publication number: CN114333017A
Application number: CN202111644344.2A
Authority: CN
Inventors: 王磊
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-12

Abstract

The application discloses a dynamic pickup method, a dynamic pickup device, electronic equipment and a storage medium, and relates to the field of artificial intelligence, in particular to the field of intelligent transportation. The specific implementation scheme is as follows: receiving a wake-up instruction sent by a current user; responding to the awakening instruction, and performing face recognition on the current user to obtain a face recognition result of the current user; if the face recognition result of the current user meets the preset detection condition, receiving a voice control instruction sent by the current user; and responding to the voice control instruction, and executing control operation corresponding to the voice control instruction. The embodiment of the application can effectively improve the dynamic pickup accuracy in a vehicle-mounted scene, and meanwhile hardware overhead can be saved, and maintenance cost is reduced.

Description

Dynamic pickup method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and further relates to intelligent transportation technologies, and in particular, to a dynamic sound pickup method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of intelligent networked automobiles, the application of voice control functions in automobiles is becoming more and more common. The dynamic pickup function, namely, locking and calling one sound source, limiting the input of the sound source in other directions and further improving the voice recognition accuracy of the user of the awakening party. The vehicle-mounted scene is complex, and the conditions of vehicle noise, fetal noises, vehicle-mounted music and the like all affect the sound source judgment of the vehicle-mounted scene. If a dynamic sound pickup error occurs, the experience feeling of the user is extremely negative. Because the subsequent instruction input of the awakening party is limited, the voice cannot be used, and the experience entrance of the voice is completely lost.

In most of the first items, there are monophonic items, in which microphones are concentrated between the main driver and the assistant driver, and the time difference between the sound sources of the main driver and the assistant driver received by the microphones is used for judgment. However, the accuracy of the judgment method is still to be improved in the face of complex vehicle-mounted scenes. So that the two-tone and four-tone scheme comes out later. However, this only further improves the accuracy and cannot be completely avoided. The promotion is also based on high hardware consumption, and the development and maintenance cost of the four-tone area in the later period is high.

Disclosure of Invention

The disclosure provides a dynamic sound pickup method, a dynamic sound pickup device, an electronic apparatus, and a storage medium.

In a first aspect, the present application provides a dynamic sound pickup method, including:

receiving a wake-up instruction sent by a current user;

responding to the awakening instruction, and performing face recognition on the current user to obtain a face recognition result of the current user;

if the face recognition result of the current user meets the preset detection condition, receiving a voice control instruction sent by the current user;

and responding to the voice control instruction, and executing control operation corresponding to the voice control instruction.

In a second aspect, the present application provides a dynamic sound pickup apparatus, the apparatus comprising: the system comprises an instruction receiving module, a face recognition module and an instruction execution module; wherein,

the instruction receiving module is used for receiving a wake-up instruction sent by a current user;

the face recognition module is used for responding to the awakening instruction and performing face recognition on the current user to obtain a face recognition result of the current user;

the instruction receiving module is further configured to receive a voice control instruction sent by the current user if the face recognition result of the current user meets a preset detection condition;

and the instruction execution module is used for responding to the voice control instruction and executing the control operation corresponding to the voice control instruction.

In a third aspect, an embodiment of the present application provides an electronic device, including:

one or more processors;

a memory for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the dynamic sound pickup method according to any embodiment of the present application.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the dynamic sound pickup method according to any embodiment of the present application.

In a fifth aspect, a computer program product is provided, which when executed by a computer device implements the dynamic sound pickup method according to any embodiment of the present application.

According to the technical scheme, the dynamic pickup accuracy under the vehicle-mounted scene can be effectively improved, meanwhile, hardware overhead can be saved, and the maintenance cost is reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a first flowchart of a dynamic sound pickup method according to an embodiment of the present disclosure;

fig. 2 is a second flowchart of a dynamic sound pickup method provided by an embodiment of the present application;

fig. 3 is a third flowchart of a dynamic sound pickup method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a dynamic sound pickup apparatus according to a third embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing the dynamic sound pickup method according to the embodiment of the present application.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example one

Fig. 1 is a first flowchart of a dynamic sound pickup method provided in an embodiment of the present application, where the method may be performed by a dynamic sound pickup apparatus or an electronic device, where the apparatus or the electronic device may be implemented by software and/or hardware, and the apparatus or the electronic device may be integrated in any intelligent device with a network communication function. As shown in fig. 1, the dynamic sound pickup method may include the steps of:

s101, receiving a wake-up instruction sent by a current user.

In this step, the electronic device may receive a wake-up instruction sent by the current user. Specifically, the electronic device may receive a wake-up operation triggered by a current user through an earphone; generating a wake-up instruction according to the wake-up operation; or, the electronic device may further receive a wake-up voice sent by the current user through the microphone; generating a wake-up instruction according to the wake-up voice; the wake-up instruction is used for instructing an image acquisition device of the vehicle to wake up.

And S102, responding to the awakening instruction, and performing face recognition on the current user to obtain a face recognition result of the current user.

In this step, the electronic device may perform face recognition on the current user in response to the wake-up instruction, so as to obtain a face recognition result of the current user. Specifically, the electronic device may send an image acquisition instruction to an image acquisition device of the vehicle in response to the wake-up instruction, so that the image acquisition device acquires at least one facial image of the current user in response to the image acquisition instruction; and then carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

And S103, receiving a voice control instruction sent by the current user if the face recognition result of the current user meets the preset detection condition.

In this step, if the face recognition result of the current user meets the preset detection condition, the electronic device may receive the voice control instruction sent by the current user. Specifically, the electronic device may extract face features of the current user from a face recognition result of the current user; if the face feature of the current user is matched with the face feature of one of the target users, the electronic device may determine that the face recognition result of the current user satisfies a predetermined detection condition. If the face features of the current user are not matched with the face features of any one target user in the plurality of target users, the electronic equipment can judge that the face recognition result of the current user does not meet the predetermined detection condition.

And S104, responding to the voice control instruction, and executing control operation corresponding to the voice control instruction.

In this step, the electronic device may respond to the voice control instruction and execute a control operation corresponding to the voice control instruction. Specifically, the electronic device may convert the voice control instruction into a text control instruction, and then execute a control operation corresponding to the text control instruction.

The dynamic sound pickup method provided by the embodiment of the application receives a wake-up instruction sent by a current user; then responding to the awakening instruction, and carrying out face recognition on the current user to obtain a face recognition result of the current user; if the face recognition result of the current user meets the preset detection condition, receiving a voice control instruction sent by the current user; and then responding to the voice control instruction, and executing the control operation corresponding to the voice control instruction. That is to say, this application can combine pronunciation and video to realize dynamic pickup function, can pinpoint the present user who sends out voice control instruction like this, avoids other user's voice interference. In the existing dynamic sound pickup method, the scheme of double sound zones and four sound zones is adopted, so that the hardware consumption is undoubtedly increased, and the maintenance cost is increased. The technical scheme provided by the application can effectively improve the dynamic pickup accuracy in a vehicle-mounted scene, and can save hardware overhead and reduce maintenance cost at the same time; moreover, the technical scheme of the embodiment of the application is simple and convenient to implement, convenient to popularize and wide in application range.

Example two

Fig. 2 is a second flowchart of a dynamic sound pickup method according to an embodiment of the present application. Further optimization and expansion are performed based on the technical scheme, and the method can be combined with the various optional embodiments. As shown in fig. 2, the dynamic sound pickup method may include the steps of:

s201, receiving a wake-up instruction sent by a current user.

S202, responding to the awakening instruction, sending an image acquisition instruction to an image acquisition device of the vehicle, and enabling the image acquisition device to respond to the image acquisition instruction to acquire at least one face image of the current user.

In this step, the electronic device may send an image acquisition instruction to an image acquisition device of the vehicle in response to the wake-up instruction, so that the image acquisition device acquires at least one facial image of the current user in response to the image acquisition instruction. Specifically, the image capturing device in the embodiment of the present application may be installed right in front of each seat in the vehicle, or may be installed at other positions around each seat. The image acquisition device may be a camera or the like.

S203, carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

In this step, the electronic device may perform face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user. Specifically, the electronic device may input each face image of the at least one face image into a pre-trained face recognition model, and obtain a face recognition result corresponding to each face image through the face recognition model; and then determining the face recognition result of the current user based on the face recognition result corresponding to each image.

And S204, if the face recognition result of the current user meets the preset detection condition, receiving a voice control instruction sent by the current user.

And S205, responding to the voice control instruction, and executing the control operation corresponding to the voice control instruction.

EXAMPLE III

Fig. 3 is a third flow chart of the dynamic sound pickup method according to the embodiment of the present application. Further optimization and expansion are performed based on the technical scheme, and the method can be combined with the various optional embodiments. As shown in fig. 3, the dynamic sound pickup method may include the steps of:

s301, receiving a wake-up operation triggered by a current user through an earphone; generating a wake-up instruction according to the wake-up operation; or receiving a wake-up voice sent by a current user through a microphone; generating a wake-up instruction according to the wake-up voice; the wake-up instruction is used for instructing an image acquisition device of the vehicle to wake up.

In this step, the electronic device may receive a wake-up operation triggered by the current user through the headset; generating a wake-up instruction according to the wake-up operation; or receiving a wake-up voice sent by a current user through a microphone; generating a wake-up instruction according to the wake-up voice; the wake-up instruction is used for instructing an image acquisition device of the vehicle to wake up. Specifically, the wake-up operation in the embodiment of the present application may be a tap or other operations such as button clicking, which is not limited in the present application.

S302, responding to the awakening instruction, sending an image acquisition instruction to an image acquisition device of the vehicle, and enabling the image acquisition device to respond to the image acquisition instruction to acquire at least one face image of the current user.

In this step, the electronic device may send an image acquisition instruction to an image acquisition device of the vehicle in response to the wake-up instruction, so that the image acquisition device acquires at least one facial image of the current user in response to the image acquisition instruction. Specifically, the electronic device may send an image acquisition instruction to a camera corresponding to the current user, and after receiving the instruction, the camera may take a picture of the current user to obtain at least one face image of the current user.

In an embodiment of the application, if at least two sound collectors receive a wake-up instruction, the electronic device may determine, based on signal amplitudes of the wake-up instruction received by the at least two sound collectors, a target sound collector corresponding to the wake-up instruction from the at least two sound collectors; and then taking the awakening instruction received by the target sound pickup as the awakening instruction sent by the current user.

S303, carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

S304, extracting the face features of the current user from the face recognition result of the current user.

In this step, the electronic device may extract the face features of the current user from the face recognition result of the current user. Specifically, the electronic device may extract the face features of the current user through a face recognition model. The face features may be a matrix including face feature information.

S305, if the face feature of the current user is matched with the face feature of one target user in the target users, judging that the face recognition result of the current user meets a predetermined detection condition, and receiving a voice control instruction sent by the current user.

In this step, if the face feature of the current user matches with the face feature of one of the target users, the electronic device may determine that the face recognition result of the current user satisfies a predetermined detection condition, and receive a voice control instruction sent by the current user. If the face features of the current user are not matched with the face features of any one target user in the plurality of target users, the electronic equipment can judge that the face recognition result of the current user does not meet the predetermined detection condition.

And S306, responding to the voice control instruction, and executing the control operation corresponding to the voice control instruction.

Example four

Fig. 4 is a schematic structural diagram of a dynamic sound pickup apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus 400 includes: an instruction receiving module 401, a face recognition module 402 and an instruction execution module 403; wherein,

the instruction receiving module 401 is configured to receive a wake-up instruction sent by a current user;

the face recognition module 402 is configured to perform face recognition on the current user in response to the wake-up instruction, so as to obtain a face recognition result of the current user;

the instruction receiving module 401 is further configured to receive a voice control instruction sent by the current user if the face recognition result of the current user meets a preset detection condition;

the instruction executing module 403 is configured to, in response to the voice control instruction, execute a control operation corresponding to the voice control instruction.

Further, the face recognition module 402 is specifically configured to send an image acquisition instruction to an image acquisition device of a vehicle in response to the wake-up instruction, so that the image acquisition device acquires at least one face image of the current user in response to the image acquisition instruction; and carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

Further, the face recognition module 402 is specifically configured to input each face image of the at least one face image into a pre-trained face recognition model, and obtain a face recognition result corresponding to each face image through the face recognition model; and determining the face recognition result of the current user based on the face recognition result corresponding to each image.

Further, the instruction receiving module 401 is specifically configured to receive a wake-up operation triggered by the current user through an earphone; generating the awakening instruction according to the awakening operation; or receiving a wake-up voice sent by the current user through a microphone; generating the awakening instruction according to the awakening voice; the awakening instruction is used for instructing an image acquisition device of the vehicle to awaken.

Further, the instruction receiving module 401 is further configured to determine, if the at least two sound collectors receive the wake-up instruction, a target sound collector corresponding to the wake-up instruction from the at least two sound collectors based on signal amplitudes of the wake-up instruction received by the at least two sound collectors; and taking the awakening instruction received by the target sound pickup as the awakening instruction sent by the current user.

Further, the face recognition module 402 is specifically configured to extract a face feature of the current user from a face recognition result of the current user; and if the face feature of the current user is matched with the face feature of one target user in the plurality of target users, judging that the face recognition result of the current user meets the predetermined detection condition.

The dynamic sound pickup device can execute the method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For details of the technique not described in detail in this embodiment, reference may be made to the dynamic sound pickup method provided in any embodiment of the present application.

EXAMPLE five

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the dynamic sound pickup method. For example, in some embodiments, the dynamic sound pickup method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the dynamic sound pickup method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the dynamic sound pickup method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved. In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of dynamic sound pickup, the method comprising:

receiving a wake-up instruction sent by a current user;

2. The method of claim 1, wherein the performing face recognition on the current user in response to the wake-up instruction to obtain a face recognition result of the current user comprises:

responding to the awakening instruction, sending an image acquisition instruction to an image acquisition device of a vehicle, so that the image acquisition device responds to the image acquisition instruction to acquire at least one face image of the current user;

and carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

3. The method of claim 2, wherein the performing face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user comprises:

inputting each face image in the at least one face image into a face recognition model trained in advance, and obtaining a face recognition result corresponding to each face image through the face recognition model;

and determining the face recognition result of the current user based on the face recognition result corresponding to each image.

4. The method of claim 1, wherein the receiving of the wake-up instruction sent by the current user comprises:

receiving a wake-up operation triggered by the current user through an earphone; generating the awakening instruction according to the awakening operation; or receiving a wake-up voice sent by the current user through a microphone; generating the awakening instruction according to the awakening voice; the awakening instruction is used for instructing an image acquisition device of the vehicle to awaken.

5. The method of claim 4, further comprising:

if at least two sound pickups receive the awakening instruction, determining a target sound pickup corresponding to the awakening instruction from the at least two sound pickups based on signal amplitudes of the awakening instruction received by the at least two sound pickups;

and taking the awakening instruction received by the target sound pickup as the awakening instruction sent by the current user.

6. The method of claim 1, wherein the face recognition result of the current user meets a preset detection condition, and the method comprises:

extracting the face features of the current user from the face recognition result of the current user;

and if the face feature of the current user is matched with the face feature of one target user in the plurality of target users, judging that the face recognition result of the current user meets the predetermined detection condition.

7. A dynamic sound pickup apparatus, the apparatus comprising: the system comprises an instruction receiving module, a face recognition module and an instruction execution module; wherein,

8. The apparatus according to claim 7, wherein the face recognition module is specifically configured to send an image acquisition instruction to an image acquisition apparatus of a vehicle in response to the wake-up instruction, so that the image acquisition apparatus acquires at least one face image of the current user in response to the image acquisition instruction; and carrying out face recognition on the current user based on the at least one face image to obtain a face recognition result of the current user.

9. The apparatus according to claim 8, wherein the face recognition module is specifically configured to input each of the at least one face image into a pre-trained face recognition model, and obtain a face recognition result corresponding to each face image through the face recognition model; and determining the face recognition result of the current user based on the face recognition result corresponding to each image.

10. The apparatus according to claim 7, wherein the instruction receiving module is specifically configured to receive a wake-up operation triggered by the current user through an earphone; generating the awakening instruction according to the awakening operation; or receiving a wake-up voice sent by the current user through a microphone; generating the awakening instruction according to the awakening voice; the awakening instruction is used for instructing an image acquisition device of the vehicle to awaken.

11. The apparatus according to claim 10, wherein the instruction receiving module is further configured to, if at least two microphones receive the wake-up instruction, determine, based on signal amplitudes of the wake-up instruction received by the at least two microphones, a target microphone corresponding to the wake-up instruction from the at least two microphones; and taking the awakening instruction received by the target sound pickup as the awakening instruction sent by the current user.

12. The apparatus according to claim 7, wherein the face recognition module is specifically configured to extract a face feature of the current user from the face recognition result of the current user; and if the face feature of the current user is matched with the face feature of one target user in the plurality of target users, judging that the face recognition result of the current user meets the predetermined detection condition.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.