CN111988493B

CN111988493B - Interaction processing method, device, equipment and storage medium

Info

Publication number: CN111988493B
Application number: CN201910424719.0A
Authority: CN
Inventors: 武隽
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2021-11-30
Anticipated expiration: 2039-05-21
Also published as: CN111988493A

Abstract

The present disclosure provides an interaction processing method, an apparatus, a device, and a storage medium, in this embodiment, a binocular camera module is arranged on a terminal device, and the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and depth image data can be generated by comparing an image collected by the first camera module and an event data stream collected by the second camera module, and a target object for instructing the terminal device to execute an operation is identified from the depth image data, and in response to an operation instruction corresponding to the target object, the terminal device is controlled to execute an operation matched with the operation instruction, so that interaction with the terminal device can be performed without touch and voice input, the target object is identified by using the depth image data, and the identification accuracy is high.

Description

Interaction processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an interaction processing method, apparatus, device, and storage medium.

Background

With the rapid development of technology, various electronic devices, such as personal computers, tablet computers, smart phones, etc., are emerging. Electronic devices with natural interaction are also gaining favor to more and more people. Therefore, the interaction between the intelligent device and the user becomes the research and development focus of each large intelligent terminal manufacturer, and various technical schemes for realizing the operation interaction with the user on the intelligent terminal appear. However, in the prior art, human-computer interaction is mostly performed based on touch or voice, user experience is not changed greatly, and a user cannot operate the electronic device under special conditions such as inconvenient touch control or voice control.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides an interaction processing method, apparatus, device, and storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a method applied to a terminal device, where the terminal device is provided with a binocular camera module, the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the method includes:

generating depth image data by comparing an image acquired by a first camera module with an event data stream acquired by a second camera module;

identifying a target object used for instructing the terminal equipment to execute operation from the depth image data;

and responding to the operation instruction corresponding to the target object, and controlling the terminal equipment to execute the operation matched with the operation instruction.

In one embodiment, the operation executed by the terminal device includes an operation triggered by the screen in the screen-saving state, the image is an image acquired by the first camera module when the screen is in the screen-saving state, and the event data stream is an event data stream acquired by the second camera module when the screen is in the screen-saving state.

In one embodiment, the second camera module is configured with a low resolution mode and at least one other resolution mode, the number of pixel units of the second camera module in a working state in the low resolution mode is less than the number of pixel units of the second camera module in a working state in the other resolution mode, and different modes are switched when a preset mode switching condition is met.

In one embodiment, the method further comprises: when a preset starting condition is met, starting a first camera module;

the preset starting condition may include any one of:

when the second camera module is controlled to be switched from the low resolution mode to other resolution modes;

when the second camera module is controlled to be switched to the high resolution mode.

In one embodiment, the preset mode switching condition includes any one of the following conditions:

judging that the change of the current ambient light meets a preset change condition according to an event data stream acquired by the second camera module in the current mode;

and judging that the object to be identified exists in the acquisition area of the second camera module according to the event data stream acquired by the second camera module in the current mode.

In one embodiment, the other resolution modes include a high resolution mode, and the event data stream used to generate the depth image data is an event data stream acquired by the second camera module in the high resolution mode;

the method further comprises the following steps:

acquiring an event data stream acquired by the second camera module in a low resolution mode;

and controlling the second camera module to be switched from the low resolution mode to the high resolution mode when the change of the current ambient light is judged to meet the preset change condition according to the event data stream acquired by the second camera module in the low resolution mode.

In one embodiment, the other resolution modes include a medium resolution mode and a high resolution mode, and the event data stream for generating the depth image data is an event data stream acquired by the second camera module in the high resolution mode;

the method further comprises the following steps:

when the change of the current ambient light is judged to meet a preset change condition according to an event data stream acquired by a second camera module in a low resolution mode, controlling the second camera module to be switched from the low resolution mode to a medium resolution mode;

acquiring an event data stream acquired by the second camera module in a medium resolution mode;

and when the object to be identified exists in the acquisition area of the second camera module according to the event data stream acquired by the second camera module in the medium resolution mode, controlling the second camera module to be switched from the medium resolution mode to the high resolution mode.

In one embodiment, the target object includes a specified gesture, a specified face, and/or a specified body posture.

In one embodiment, a mapping relation between the target object and the operation instruction is pre-configured, and the operation matched with the operation instruction comprises one or more of the following:

the unlocking screen is triggered in the screen information state;

the starting flashlight is triggered in the screen breath state;

starting a designated application program triggered in a screen breath state;

the method comprises the steps that a specified page for displaying a specified application program is triggered in a screen breath state;

the new message which is triggered in the screen information state and shows the appointed application program;

and receiving the telephone of the dialing party in the screen saver state.

According to a second aspect of the embodiments of the present disclosure, an interaction processing apparatus is provided, where the apparatus is provided in a terminal device, the terminal device is provided with a binocular camera module, the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the apparatus includes:

the data generation module is used for generating depth image data by comparing the image acquired by the first camera module with the event data stream acquired by the second camera module;

the object identification module is used for identifying a target object used for instructing the terminal equipment to execute operation from the depth image data;

and the operation control module is used for responding to the operation instruction corresponding to the target object and controlling the terminal equipment to execute the operation matched with the operation instruction.

In one embodiment, the apparatus further comprises: when a preset starting condition is met, starting a first camera module;

the preset starting condition may include any one of:

According to a third aspect of the embodiments of the present disclosure, there is provided a computer device comprising a binocular camera module, a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the binocular camera module comprises a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the processor implements the method as described in any one of the above when executing the program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the methods described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in this embodiment, by setting the binocular camera module on the terminal device, because the binocular camera module includes the first camera module based on the image sensor and the second camera module based on the dynamic vision sensor DVS, the image collected by the first camera module and the event data stream collected by the second camera module can be compared to generate the depth image data, and the target object used for instructing the terminal device to execute the operation is identified from the depth image data, and in response to the operation instruction corresponding to the target object, the terminal device is controlled to execute the operation matched with the operation instruction, thereby realizing the situation that touch and voice input are not required, also interacting with the terminal device, and the generated data volume is low, the response speed is fast, the target object is identified by using the depth image data, and the identification accuracy rate is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating an interaction processing method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a binocular camera module according to an exemplary embodiment of the present disclosure.

FIG. 3 is a diagram illustrating several gestures according to an exemplary embodiment of the present disclosure.

FIG. 4 is a flow chart illustrating another interaction processing method according to an exemplary embodiment of the present disclosure.

FIG. 5 is a flow chart illustrating another interaction processing method according to an exemplary embodiment of the present disclosure.

FIG. 6 is a block diagram illustrating an interaction processing device according to an example embodiment of the present disclosure.

FIG. 7 is a block diagram illustrating another interaction processing device according to an example embodiment of the present disclosure.

Fig. 8A and 8B are block diagrams of another interaction processing device illustrated in accordance with an exemplary embodiment of the present disclosure, respectively.

Fig. 9 is a hardware block diagram of a computer device in which an interaction processing apparatus according to an exemplary embodiment of the present disclosure is shown.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

With the wide use of intelligent terminals, more and more people can not leave terminal equipment such as mobile phones. The interaction between the terminal equipment and the user becomes the research and development focus of each large terminal manufacturer, and various technical schemes for realizing the operation interaction with the user on the terminal equipment appear. For example, a human-computer interaction scheme on a touch screen, a voice-based human-computer interaction scheme, etc. Under some special conditions such as inconvenient touch control or voice control, the user can not operate the electronic equipment, and user experience is influenced. Although some terminals have the face-brushing unlocking function, the principle is that face recognition is carried out according to images collected by an image sensor so as to carry out identity authentication, and after the authentication is successful, unlocking operation is carried out. However, since an image is acquired using a conventional image sensor (image sensor), a frame rate is low and a generated data amount is large.

In view of this, embodiments of the present application provide an interaction scheme, in which a binocular camera module is disposed on a terminal device, and since the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor, depth image data can be generated by comparing an image collected by the first camera module with an event data stream collected by the second camera module, and a target object for instructing the terminal device to perform an operation is identified from the depth image data, and in response to an operation instruction corresponding to the target object, the terminal device is controlled to perform an operation matching the operation instruction, so that interaction with the terminal device can be performed without touch and voice input, and since the event data stream collected by the second camera module only includes data of pixel units in which a change in light intensity is detected, the generated data amount is low, and the response speed is high.

The interaction processing method provided by this embodiment may be implemented by software, or by a combination of software and hardware, or by hardware, and the related hardware may be composed of two or more physical entities, or may be composed of one physical entity. The method can be applied to the electronic equipment with the binocular camera module. The electronic device may be a portable device such as a smart phone, a smart learning machine, a tablet computer, a notebook computer, a PDA (Personal Digital Assistant), or a fixed device such as a desktop computer, or a wearable device such as a smart band or a smart necklace.

The first camera module based on an image sensor, which may be a conventional image sensor, may be understood as not being an event based image sensor as opposed to a dynamic vision sensor. For example, the image sensor may convert the captured optical information into an electrical signal to represent the gray scale and color of each pixel, thereby completing the reproduction of the entire scene. The first camera module based on the image sensor can be a front camera or a rear camera of the terminal equipment. By way of example, the first camera module may be referred to as a universal camera module.

A Dynamic Vision Sensor (DVS), also known as a Dynamic event Sensor, is a biomimic Vision Sensor that mimics the human retina based on pulse-triggered neurons. The sensor has an array of pixel cells formed by a plurality of pixel cells, wherein each pixel cell responds to and records a region of rapid change in light intensity only when a change in light intensity is sensed. The specific composition of the dynamic vision sensor is not set forth herein in any greater detail. The DVS may employ an event-triggered processing mechanism to output an asynchronous stream of event data, which may be, for example, light intensity change information (e.g., a time stamp of the light intensity change and a light intensity value) and the coordinate location of the triggered pixel cell. The response speed of the DVS is not limited by the traditional exposure time and frame rate any more, and a high-speed object moving at a rate of ten thousand frames/second can be detected; the DVS has a larger dynamic range, and can accurately sense and output scene changes in a low-illumination or high-exposure environment; DVS power consumption is lower; since the DVS is independently responsive to intensity changes per pixel cell, the DVS is not affected by motion blur.

The second camera module may further include a lens, a lens holder, a filter, a capacitor, a resistor, and other components besides the dynamic vision sensor, so as to form a module capable of collecting image data, which is not limited herein. In contrast to a generic camera module, the second camera module may also be referred to as a dedicated camera module. In one embodiment, a smart phone is taken as an example for illustration, and an execution subject of the embodiment of the present disclosure may be the smart phone, or may be a system service installed in the smart phone. It should be noted that the smart phone is only one application example provided in the embodiment of the present disclosure, and it should not be understood that the technical solution provided in the embodiment of the present disclosure can only be applied in the scenario of the smart phone.

The embodiments of the present disclosure will be described below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a flowchart illustrating an interaction processing method according to an exemplary embodiment of the present disclosure, including the following steps:

in step 102, generating depth image data by comparing an image acquired by a first camera module with an event data stream acquired by a second camera module;

in step 104, identifying a target object used for instructing the terminal equipment to execute operation from the depth image data;

in step 106, in response to the operation instruction corresponding to the target object, the terminal device is controlled to execute an operation matched with the operation instruction.

The method can be used in terminal equipment which is provided with a binocular camera module, wherein the binocular camera module comprises a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS.

By way of example, the first camera module may be a camera module already present in the terminal device. The binocular camera module can be formed by adding a second camera module in a position range associated with the camera module on the basis of the existing camera module. In one embodiment, the second camera module is disposed in a designated area around a camera of the terminal device. For example, the second camera module may be disposed in a designated area around the front camera, or the second camera module may be disposed in a designated area around the rear camera. As shown in fig. 2, fig. 2 is a schematic diagram of a binocular camera module according to an exemplary embodiment of the present disclosure. In the figure, a terminal device is taken as an example of a smart phone, and a second camera module including a dynamic vision sensor DVS is arranged on the right side of a front camera of the smart phone.

The visual sensor collects event data in a scene, and the event can be output when the scene changes. For example, when no object in the scene moves relative to the terminal device, the light intensity detected by the pixel unit in the dynamic vision sensor does not change, when a certain object in the scene moves relative to the terminal device, the light ray changes, and therefore a pixel event is triggered, and an event data stream of the pixel unit with the detected light intensity changes is output, wherein each event data in the event data stream may include the coordinate position of the pixel unit with the detected brightness change and the time stamp information of the triggered moment. For example, event data corresponding to the same time stamp information may constitute DVS image data. In the dynamic vision sensor, for a single pixel point, only when the received light intensity changes, an event (pulse) signal is output. Say, the brightness increases beyond a threshold, an event of the brightness increase of the pixel is added. Thus, the stream of event data acquired by the second camera module may be partial image data, with no event data for pixel cells for which no change in light intensity is detected.

After the depth image data collected by the second camera module is obtained, a target object used for instructing the terminal device to execute operation can be identified from the depth image data.

The image collected by the first camera module can be understood as a complete image, namely all pixel points have data, the event data stream collected by the second camera module can be understood as partial image data, all pixel points do not necessarily have data, and only the pixel points corresponding to the pixel units which detect the light intensity change have data. For example, the first camera module and the second camera module may be synchronized by a clock, so that the two modules output data collected at the same time. And comparing the image acquired by the first camera module with the event data stream acquired by the second camera module to calculate the image time difference so as to obtain the depth information. For example, a pixel point corresponding to a pixel point in an event data stream is found from an image acquired by a first camera module, and the distance (depth) between a spatial point and a camera can be determined according to the relationship between the two corresponding pixel points and information such as the camera focal length f and the baselines of the first camera module and a second camera module, so that depth image data is generated.

As for the target object, the target object is an object for instructing the terminal device to perform an operation. And configuring different kinds of target objects according to the operation required to be executed by the terminal equipment. In one example, the operation performed by the terminal device is an operation performed after the authentication is successful, for example, an operation of unlocking, payment, login, and the like. Accordingly, the target object may be an object for authentication. For example, in a scenario of identity verification through face recognition, the target object may be a designated face; as another example, in a scenario of identity verification through gesture recognition, the target object may be a designated gesture or the like.

In another example, the operation performed by the terminal device is a specified operation, for example, an operation of lighting a screen, opening a system favorite, entering a specified page of a specified application, presenting a new message of the specified application, answering an incoming call in a screen-locked state, and the like. Accordingly, the target object may be an object mapped with the specified operation. For example, a gesture may be specified for the part. For example, a designated gesture or a designated body posture, etc. The designated gesture may be a gesture determined by a hand stroke. For example, a "six" gesture, a bixin gesture, a "2" gesture (also known as a jersey gesture), an ok gesture, a thumbs-up gesture, a palm open gesture, etc., may be possible, as well as gestures that stroke other numbers. As shown in fig. 3, fig. 3 is a diagram illustrating several gestures according to an exemplary embodiment of the present disclosure. It will be appreciated that the schematic diagram illustrates only a few gestures, and indeed other gestures are possible, such as strokes 1, 3, 4, etc. Each gesture may have various modifications as long as the meaning of the corresponding gesture can be expressed, and is not limited herein. The designated body posture may be a hand-up posture, a cross-waist posture, or the like.

In one example, the mapping relationship of the target object and the operation instruction may be configured in advance. The operation instruction may be an instruction instructing the terminal device to perform an operation. In one example, there may be a one-to-one mapping between target objects and operation instructions to enable each target object to trigger a device to perform an operation. In another example, the target object and the operation instruction may be in a many-to-one mapping relationship, so that a plurality of target objects trigger the device to perform an operation. Taking the gesture as an example, a plurality of consecutive gestures may correspond to one operation instruction. For example, the three gestures of '3, 2 and 1' are continuously drawn, the terminal device is triggered to light the screen, and the screen is unlocked.

The mapping relationship between the target object and the operation instruction may be configured in advance by the system or set by the user. For example, a mapping relation setting service is provided for a user to create a mapping relation between a target object and an operation instruction.

In one example, a model capable of identifying the target object may be obtained by machine learning, and the target object may be identified from the depth image data by using the model obtained by learning in the model application stage. For example, a supervised learning mode may be adopted to perform model training by using a preset training sample, so as to obtain a deep learning network model. The training samples may be labeled sample images, and the labels may indicate the location and the category of the target object. The sample image may include depth image data. For each target object, sample images under different shooting angles and/or different deformation conditions may be included in order to improve the recognition rate of the model. As an example, Image Processing may be performed by an Image Signal Processing (ISP) unit.

It should be understood that the above-mentioned identification method of the target object is only an example, and should not be construed as any limitation to the present disclosure, and other existing or future methods of identifying the target object may be applied to the present disclosure, and all of them should be included in the scope of the present disclosure.

The operation executed by the terminal device may be an operation after the authentication is successful, or may be a previously designated operation. For example, the operation matched with the operation instruction includes one or more of the following:

lighting up a screen;

unlocking the screen;

starting the flashlight;

starting a designated application;

displaying a designated page of a designated application;

displaying a new message of a specified application;

and answering the phone of the dialing party.

The lighting of the screen may be to control the screen to be switched from a screen-off state to a screen-on state, and the screen-off state may be a state in which the screen is in a black screen. The bright screen state may be a state in which the screen is lit.

In order to ensure the security of personal information, a user often performs screen locking processing on the terminal device, and after the screen is locked, the content of the terminal device can be checked only through an unlocking mode such as password input. The embodiment can realize automatic unlocking in a mode of identifying the target object.

The designated application may be an application installed in the terminal device, for example, a system application, or a third-party application. For example, the designated application may be a system favorites/photos application. After the designated application is launched, the home page/default page of the designated application may be exposed. Illustratively, the system favorites are opened by a rather mental gesture to view the content in the system favorites.

The designated page for the designated application may be a page that the user desires to view quickly, for example, a payment page for a payment program. The payment page may be a page that includes a payment code. The designated application may be an already started application or an un-started application. The started application program may be a foreground-running application program or a background-running application program. Illustratively, the payment page of the payment program is opened by a gesture of je.

The new message of the designated application program can be all unread messages, or unread messages with the latest receiving time, and the like, and can be set according to requirements. The designated application may be an already started application or an un-started application. The started application program may be a foreground-running application program or a background-running application program. For example, a new WeChat message or an unread message may be opened by an OK gesture.

Regarding answering a phone of a dialer, currently, when a dialing request of the dialer is received, answering of the phone of the dialer can be realized by touching an answering button, and the embodiment realizes automatic call connection by identifying a target object.

For example, regardless of the current state of the terminal device, after the target object is identified, the terminal device may be triggered to perform an operation matching the operation instruction. For example, the terminal device is currently in the screen saver state, and the operation matched with the operation instruction can be completed from the screen saver state. The embodiment directly skips from the screen state to execute the corresponding operation, so that the interaction efficiency can be improved, and new experience is brought to the user. For another example, the terminal device is currently in a bright screen state, and the operation matched with the operation instruction can be completed from the bright screen state.

The operations in the above examples may be combined, for example, the combination is performed according to the current state of the terminal device, for example, if the terminal device is currently in a bright screen state, in order to display the specified page of the specified application program, the screen may be lit, the screen may be unlocked, and then the specified page of the specified application program may be displayed. If the appointed application program is not started, the appointed application program can be started before the appointed page is displayed. It is to be understood that various operations in the above embodiments may be arbitrarily combined as long as there is no conflict or contradiction in the combination between the features. Some other operations may need to be performed to achieve the final purpose, and are omitted here, but it should be understood that the operations indispensable in the middle are also included in the operations matching the operation instructions to achieve the final purpose. In addition, the operations performed by the terminal device include, but are not limited to, the above operations, and may also be other operations, which are not listed here.

For example, the operation matched with the operation instruction may be an operation triggered from a screen state. For example, the operation matched with the operation instruction includes one or more of the following:

the unlocking screen is triggered in the screen information state;

the starting flashlight is triggered in the screen breath state;

starting a designated application program triggered in a screen breath state;

and receiving the telephone of the dialing party in the screen saver state.

Therefore, the embodiment can realize the operations of detecting the target object during the black screen to trigger the screen unlocking, starting the flashlight, starting the specified application program, displaying the specified page of the specified application program, displaying the new message of the specified application program or answering the phone of the dialing party and the like, and improve the operation efficiency.

As can be seen from the above embodiments, in this embodiment, by setting the binocular camera module on the terminal device, because the binocular camera module includes the first camera module based on the image sensor and the second camera module based on the dynamic vision sensor, depth image data can be generated by comparing an image acquired by the first camera module with an event data stream acquired by the second camera module, and a target object used for instructing the terminal device to execute an operation is identified from the depth image data, and in response to an operation instruction corresponding to the target object, the terminal device is controlled to execute an operation matched with the operation instruction, so that interaction with the terminal device can be performed without touch and voice input, and the generated data amount is low, the response speed is high, the target object is identified by using the depth image data, and the identification accuracy is improved.

Taking an example that the operation performed by the terminal device includes an operation triggered by the screen in the screen saver state (e.g., at least lighting the screen, that is, entering the screen lightening state from the screen saver state), the image may include an image acquired by the first camera module when the screen is in the screen saver state, and the event data stream may include an event data stream acquired by the second camera module when the screen is in the screen saver state. As shown in fig. 4, fig. 4 is a flowchart of another interaction processing method shown in the present disclosure according to an exemplary embodiment, and the method may be used in a terminal device, where the terminal device is provided with a binocular camera module, and the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and includes the following steps:

in step 402, when the screen of the terminal device is in a screen-turning state, acquiring an image acquired by a first camera module and an event data stream acquired by a second camera module;

in step 404, generating depth image data by comparing the image acquired by the first camera module with the event data stream acquired by the second camera module;

in step 406, identifying a target object used for instructing the terminal device to execute operation from the depth image data; illustratively, the operation may include at least an operation of lighting up a screen.

In step 408, in response to the operation instruction corresponding to the target object, the terminal device is controlled to execute an operation matched with the operation instruction.

Fig. 4 is the same as the related art in fig. 1, and is not repeated herein.

The embodiment can trigger the terminal equipment to execute corresponding operation through the target object when the screen of the terminal equipment is in the screen-turning state. If so, opening the system favorite by comparing the mood with the mood in the breath screen state; opening a payment page of the payment program through a Yes gesture in the state of the information screen; opening a new message of the WeChat or a new message of the short message through an ok gesture in the state of the message screen; answering the call when the screen is locked by the six gestures in the state of the information screen; and unlocking the screen and the like by a full-palm opening gesture in the screen resting state.

The embodiment directly skips from the screen state to execute the corresponding operation, so that the interaction efficiency can be improved, and new experience is brought to the user.

In one embodiment, in order to reduce the power consumption of the second camera module, different power consumption modes may be configured for the second camera module, and the resolution of the image data acquired by the second camera module in the different modes is different, and therefore, may be referred to as a resolution mode. Accordingly, the second camera module consumes different power in different modes, and therefore, may also be referred to as a power consumption mode. The different resolution modes of the second camera module may be divided by the number of pixel cells in the on state (active state) in the dynamic vision sensor. Illustratively, a Low Resolution (LR) mode is configured for the second camera module. And in the low resolution mode, only part of pixel units of the second camera module are in working states. Taking 100 ten thousand pixels collected by the vision sensor as an example, 1/N of the pixel units can be turned on, and the rest of the pixel units can be turned off, so as to reduce power consumption. Wherein, N can be set according to requirements. Even a specified number of pixel cells can be controlled to be in an on state and the others to be in an off state. The second camera module is also provided with at least one other resolution mode with power consumption higher than the low power consumption, and the number of the pixel units of the second camera module in the working state in the low resolution mode is less than that of the pixel units of the second camera module in the working state in the other resolution modes. Accordingly, the resolution of the image data acquired by the second camera module in the low resolution mode is lower than the resolution of the image data acquired by the second camera module in the other resolution modes.

In one embodiment, to enable detection of a target object, the second camera module may be in a normally open state in the low resolution mode. And when the preset mode switching condition is met, performing mode switching. For example, in the screen saver state of the terminal device, the second camera module is in the low resolution mode.

Therefore, in the embodiment, the second camera module is controlled to be in the normally open state in the low resolution mode, so that the real-time detection can be ensured, and the power consumption can be reduced.

As for the preset mode switching condition, a preset condition for switching from the low resolution mode to the other resolution mode, or a switching condition between the other resolution modes, or a condition for switching from the other resolution mode to the low resolution mode, etc. may be included.

In one embodiment, the preset mode switching condition may include: and judging that the change of the current ambient light meets a preset change condition according to the event data stream acquired by the second camera module in the current mode.

As for the preset change condition, a preset condition for switching the mode according to the change of the ambient light may be used. For example, the preset variation condition may be that the current light intensity variation value of the ambient light is greater than a set threshold. The event data stream may include illumination intensity, and thus, it may be determined whether the light intensity variation value of the current ambient light is greater than the set threshold value according to the illumination intensity of the at least two frames of images. In another example, not only the light intensity variation value of the current ambient light is greater than the set threshold, but also the number of the pixel units detecting the illumination variation can be combined to determine whether the variation of the current ambient light satisfies the preset variation condition.

The condition can be a condition that the low resolution mode is switched to other resolution modes, whether the current ambient light change meets the preset change condition or not can be judged according to the data collected by the second camera module in the low resolution mode, when the preset change condition is met, the second camera module can be triggered to be switched to the next-stage mode, and otherwise, the low resolution mode can be maintained. For example, when the switching condition is satisfied, a switching notification may be sent to the camera module, so that the camera module performs mode switching.

By way of example, the other Resolution modes may include a High Resolution (HR) mode. All pixel units of the second camera module can be in a working state in the high-resolution mode, so that the second camera module can acquire image data with higher resolution in the high-resolution mode. The event data stream collected by the second camera module in step 102 may be the event data stream collected by the second camera module in the high resolution mode, and correspondingly, the event data stream used for generating the depth image data is the event data stream collected by the second camera module in the high resolution mode. Correspondingly, the method further comprises the following steps:

In this embodiment, the DVS is in a low resolution, low resolution mode, normally open, detecting only changes in ambient light. When the change of the ambient light is detected to be larger than the set threshold value, the DVS high-resolution mode is triggered, whether the target object is the target object or not is identified, and therefore real-time detection is guaranteed, and meanwhile power consumption can be reduced.

In another embodiment, whether to set a plurality of other resolution modes may be determined according to the resolution requirements of the image required for identifying the target object, for example, in one embodiment, different types of target objects are mapped to different operation instructions, some target objects are identified only by images with medium resolution, and some target objects are identified by images with high resolution, and a plurality of levels of resolution modes may be configured.

Whether to set a plurality of other resolution modes may also be decided depending on whether it is necessary to detect whether there is an object to be recognized before recognizing the target object. For example, in some scenarios, it may be determined whether an object to be recognized exists in the image data first, and then it may be determined whether the object to be recognized is a target object, so as to improve the recognition accuracy. The presence of an object to be recognized is the basis/prerequisite for performing target object recognition. Taking an object as an appointed face as an example, and taking the object to be recognized as the face, whether the face exists in the image data can be judged firstly, and whether the face is the appointed face can be judged under the condition that the face exists. Taking the target object as the body posture as an example, if the object to be recognized is a person, it may be determined whether the person is present in the image data, and if the presence of the person is ensured, the body posture of the person may be determined. Accordingly, the preset mode switching condition may include: and judging that the object to be identified exists in the acquisition area of the second camera module according to the event data stream acquired by the second camera module in the current mode.

Wherein the presence of the object to be identified in the image data is a basis/precondition for performing the identification of the target object. Whether the object to be identified exists in the acquisition area of the second camera module is judged, and the judgment condition can be a switching condition for switching the low resolution mode to other resolution modes or a switching condition among other resolution modes.

Regarding how to determine whether the object to be recognized exists in the acquisition area of the second camera module, in one embodiment, whether the object to be recognized exists in the acquisition area of the second camera module may be determined according to whether the outline of the object to be recognized exists in the image data. It can be understood that other means may also be adopted to determine whether the object to be recognized exists in the acquisition area of the second camera module, for example, an algorithm of whether a human face exists/whether a human exists in the related art is adopted to determine whether the object to be recognized exists in the acquisition area of the second camera module, which is not described herein in detail.

For example, the other Resolution modes may include a Middle Resolution (MR) mode and a high Resolution mode, and the number of the pixel units of the second camera module in the working state in the low Resolution mode, the Middle Resolution mode and the high Resolution mode is increased in sequence. The depth image data acquired by the second camera module in step 102 may include: the event data stream acquired by the second camera module in the high resolution mode may be acquired by the second camera module in the high resolution mode, that is, the event data stream used for generating the depth image data is acquired by the second camera module in the high resolution mode. Correspondingly, the method further comprises the following steps:

The embodiment configures three levels of resolution modes, and switches successively, so that power consumption can be reduced.

The condition for switching the other resolution mode to the low resolution mode may be that the change value of the current ambient light is determined to be less than or equal to a set threshold value according to an event data stream acquired by the second camera module in the low resolution mode; or judging that the object to be identified does not exist in the acquisition area of the second camera module according to the event data stream acquired by the second camera module in the medium resolution mode; or after the control terminal device executes the operation matched with the operation instruction, the preset delay time is set at intervals, and the like. The condition that other resolution modes are switched to the low resolution mode is set, so that the second camera module is ensured to be in the low resolution mode most of the time, and the power consumption is further reduced.

It should be understood that the preset mode switching condition is only an example and should not be construed as any limitation to the present disclosure, and other existing or future conditions for triggering mode switching may be applicable to the present disclosure and shall be included in the protection scope of the present disclosure.

In one embodiment, to save power, the first camera module is activated when a preset activation condition is met. The preset starting condition may be a preset condition for starting the first camera module, and in one embodiment, the preset starting condition may include any one of:

Therefore, the first camera module is started under the condition of demand, and the electric quantity waste caused by normally opening the first camera module can be avoided.

The various technical features in the above embodiments can be arbitrarily combined, so long as there is no conflict or contradiction between the combinations of the features, but the combination is limited by the space and is not described one by one, and therefore, any combination of the various technical features in the above embodiments also belongs to the scope disclosed in the present specification.

One of the combinations is exemplified below.

As shown in fig. 5, fig. 5 is a flowchart of another interaction processing method shown in the present disclosure according to an exemplary embodiment, where the method may be used in a terminal device, where the terminal device is provided with a binocular camera module, where the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the first camera module is a front camera module, and the method includes:

in step 502, acquiring an event data stream acquired by the second camera module in a low resolution mode;

in step 504, it is determined whether the current ambient light variation value is greater than a set threshold value according to the event data stream acquired by the second camera module in the low resolution mode, if not, the process returns to step 502, and if so, the process proceeds to step 506.

In step 506, the second camera module is controlled to switch from the low resolution mode to the high resolution mode, and the first camera module is started.

In step 508, acquiring an event data stream acquired by the second camera module in the high resolution mode and an image acquired by the first camera module;

in step 510, generating depth image data by comparing an image acquired by a first camera module with an event data stream acquired by a second camera module;

in step 512, identifying a target object used for instructing the terminal device to execute operation from the depth image data;

in step 514, in response to the operation instruction corresponding to the target object, the terminal device is controlled to execute an operation matched with the operation instruction.

Fig. 5 is the same as the related art in fig. 1 or fig. 4, and is not repeated herein.

In this embodiment, the second camera module is configured with a low resolution mode LR and a high resolution mode HR, and mode switching is performed according to different scenes to optimize power.

In one example, the operation of the terminal device may be limited to an operation triggered in a screen-saving state, and step 502 may be performed when the screen of the terminal device is in the screen-saving state, and when the screen of the terminal device is in a bright screen state, the second camera module may be switched to a low resolution mode or turned off. According to the embodiment, the DVS is utilized for low power consumption, the target object can be identified in the screen-saving state, the user can unlock the screen in a dark mode, and the target object is combined to control the terminal equipment to directly execute certain specified operation in a bright screen mode.

Corresponding to the embodiment of the interaction processing method, the disclosure also provides embodiments of an interaction processing device, equipment applied by the device and a storage medium.

As shown in fig. 6, fig. 6 is a block diagram of an interaction processing apparatus according to an exemplary embodiment, the apparatus is provided in a terminal device, the terminal device is provided with a binocular camera module, the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the apparatus includes:

a data generation module 62 for generating depth image data by comparing the image acquired by the first camera module with the event data stream acquired by the second camera module;

an object identification module 64, configured to identify a target object used for instructing a terminal device to perform an operation from the depth image data;

and the operation control module 66 is used for responding to an operation instruction corresponding to the target object and controlling the terminal equipment to execute an operation matched with the operation instruction.

In an embodiment, as shown in fig. 7, fig. 7 is a block diagram of another interaction processing apparatus shown in the present disclosure according to an exemplary embodiment, and on the basis of the foregoing embodiment shown in fig. 6, the apparatus further includes a starting module 70 configured to: when a preset starting condition is met, starting a first camera module;

the preset starting condition may include any one of:

In an embodiment, as shown in fig. 8A, fig. 8A is a block diagram of another interaction processing apparatus shown in the present disclosure according to an exemplary embodiment, which is based on the foregoing embodiment shown in fig. 6, where the other resolution modes include a high resolution mode, and the event data stream for generating the depth image data is an event data stream acquired by the second camera module in the high resolution mode; the device further comprises:

a data acquisition module 82, configured to acquire an event data stream acquired by the second camera module in a low resolution mode;

and a mode switching module 84, configured to control the second camera module to switch from the low resolution mode to the high resolution mode when it is determined that the current change of the ambient light meets a preset change condition according to the event data stream acquired by the second camera module in the low resolution mode.

In an embodiment, as shown in fig. 8B, fig. 8B is a block diagram of another interactive processing device shown in the present disclosure according to an exemplary embodiment, which is based on the foregoing embodiment shown in fig. 6, where the other resolution modes include a medium resolution mode and a high resolution mode, and the event data stream for generating the depth image data is an event data stream acquired by the second camera module in the high resolution mode; the device further comprises:

a data acquisition module 86, configured to acquire an event data stream acquired by the second camera module in a low resolution mode;

the mode switching module 88 is configured to control the second camera module to switch from the low resolution mode to the medium resolution mode when it is determined that the change of the current ambient light meets a preset change condition according to the event data stream acquired by the second camera module in the low resolution mode;

the data acquisition module 86 is further configured to acquire an event data stream acquired by the second camera module in the medium-resolution mode;

the mode switching module 88 is further configured to control the second camera module to switch from the middle resolution mode to the high resolution mode when it is determined that the object to be identified exists in the acquisition area of the second camera module according to the event data stream acquired by the second camera module in the middle resolution mode.

the unlocking screen is triggered in the screen information state;

the starting flashlight is triggered in the screen breath state;

starting a designated application program triggered in a screen breath state;

and receiving the telephone of the dialing party in the screen saver state.

Correspondingly, the present disclosure also provides a computer device, including a binocular camera module, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the binocular camera module includes a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the processor implements the method as described in any one of the above when executing the program.

Accordingly, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.

The present disclosure may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

The specific details of the implementation process of the functions and actions of each module in the device are referred to the implementation process of the corresponding step in the method, and are not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.

As shown in fig. 9, fig. 9 is a hardware structure diagram of a computer device in which an interaction processing apparatus according to an exemplary embodiment of the present disclosure is located. The apparatus 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

Referring to fig. 9, apparatus 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916. The device 900 further comprises a binocular camera module comprising a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, not shown in fig. 9.

The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the apparatus 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.

The multimedia component 908 comprises a screen providing an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when apparatus 900 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the apparatus 900. For example, sensor assembly 914 may detect an open/closed state of device 900, the relative positioning of components, such as a display and keypad of device 900, the change in position of device 900 or one of the components of device 900, the presence or absence of user contact with device 900, the orientation or acceleration/deceleration of device 900, and the change in temperature of device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the apparatus 900 and other devices in a wired or wireless manner. The apparatus 900 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the apparatus 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Wherein the instructions in the storage medium, when executed by the processor, enable the apparatus 900 to perform any of the above-described interaction processing methods.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. An interactive processing method is applied to a terminal device, the terminal device is provided with a binocular camera module, the binocular camera module comprises a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the method comprises the following steps:

when the screen of the terminal equipment is in a screen switching state, controlling the second camera module to be in a normally open state in a low resolution mode, and acquiring an event data stream acquired by the second camera module in a current mode;

when the fact that a preset mode switching condition is met is judged according to the event data stream, the second camera module is controlled to be switched to other resolution modes from a low resolution mode, and the first camera module is started; the number of the pixel units of the second camera module working in the low resolution mode is less than that of the pixel units working in other resolution modes;

generating depth image data by comparing an image acquired by a first camera module with an event data stream acquired by a second camera module in a current mode;

2. The method according to claim 1, wherein the operation performed by the terminal device comprises an operation triggered by the screen in the screen-saving state, the image is an image acquired by the first camera module when the screen is in the screen-saving state, and the event data stream is an event data stream acquired by the second camera module when the screen is in the screen-saving state.

3. The method of claim 1, wherein the second camera module is configured with at least one other resolution mode.

4. The method of claim 3, wherein the other resolutions comprise a high resolution mode.

5. The method according to claim 3, wherein the preset mode switching condition comprises any one of the following conditions:

6. The method of claim 5, wherein the other resolution modes include a high resolution mode, and wherein the event data stream used to generate the depth image data is an event data stream acquired by the second camera module in the high resolution mode;

the method further comprises the following steps:

7. The method of claim 5, wherein the other resolution modes include a medium resolution mode and a high resolution mode, and wherein the event data stream used to generate the depth image data is an event data stream acquired by the second camera module in the high resolution mode;

the method further comprises the following steps:

8. The method of any one of claims 1 to 7, wherein the target object comprises a specified gesture, a specified face, and/or a specified body posture.

9. The method according to any one of claims 1 to 7, wherein a mapping relation between the target object and the operation instruction is pre-configured, and the operation matched with the operation instruction comprises one or more of the following:

the unlocking screen is triggered in the screen information state;

the starting flashlight is triggered in the screen breath state;

starting a designated application program triggered in a screen breath state;

and receiving the telephone of the dialing party in the screen saver state.

10. The utility model provides an interactive processing apparatus, its characterized in that, terminal equipment is located to the device, terminal equipment is provided with the binocular module of making a video recording, the binocular module of making a video recording includes the first camera module based on image sensor and the second camera module based on dynamic vision sensor DVS, the device includes:

the data generation module is used for controlling the second camera module to be in a normally open state in a low resolution mode and acquiring an event data stream acquired by the second camera module in a current mode; when the fact that a preset mode switching condition is met is judged according to the event data stream, the second camera module is controlled to be switched to other resolution modes from a low resolution mode, and the first camera module is started; the number of the pixel units of the second camera module working in the low resolution mode is less than that of the pixel units working in other resolution modes; generating depth image data by comparing an image acquired by a first camera module with an event data stream acquired by a second camera module in a current mode;

11. The apparatus according to claim 10, wherein the operation performed by the terminal device includes an operation triggered by the screen in the screen-saving state, the image is an image captured by the first camera module when the screen is in the screen-saving state, and the event data stream is an event data stream captured by the second camera module when the screen is in the screen-saving state.

12. The apparatus of claim 10, wherein the second camera module is configured with at least one other resolution mode.

13. The apparatus of claim 12, wherein the other resolution mode comprises a high resolution mode.

14. The apparatus of claim 12, wherein the preset mode switching condition comprises any one of the following conditions:

15. A computer device comprising a binocular camera module, a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the binocular camera module comprises a first camera module based on an image sensor and a second camera module based on a dynamic vision sensor DVS, and the processor, when executing the program, implements the method of any one of claims 1 to 9.

16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.