WO2022179412A1 - 识别方法及电子设备 - Google Patents

识别方法及电子设备 Download PDF

Info

Publication number
WO2022179412A1
WO2022179412A1 PCT/CN2022/076403 CN2022076403W WO2022179412A1 WO 2022179412 A1 WO2022179412 A1 WO 2022179412A1 CN 2022076403 W CN2022076403 W CN 2022076403W WO 2022179412 A1 WO2022179412 A1 WO 2022179412A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
electronic device
information
parts
key point
Prior art date
Application number
PCT/CN2022/076403
Other languages
English (en)
French (fr)
Inventor
王兆雪
张凯
罗毅
王贺
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022179412A1 publication Critical patent/WO2022179412A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/042Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06F3/042Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means
    • G06F3/0425Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means by opto-electronic means using a single imaging device like a video camera for tracking the absolute position of a single or a plurality of objects with respect to an imaged reference surface, e.g. video camera imaging a display or a projection screen, a table or a wall surface, on which a computer generated image is displayed or projected
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object

Definitions

  • the present application relates to the technical field of human-computer interaction, and in particular, to an identification method and an electronic device.
  • augmented reality (AR) glasses and virtual reality (VR) glasses can capture an image of the user through a configured optical camera, and identify the part of the user's movement (such as performing a tapping action) according to the image. finger) to obtain the information entered by the user (such as text information).
  • AR augmented reality
  • VR virtual reality
  • the captured images may have motion blur, resulting in inaccurate recognition results.
  • users wear VR glasses they can input text through a virtual keyboard in the virtual world.
  • the VR glasses can take an image of the user's hand through the built-in optical camera, and obtain the coordinates of the key points of the user's hand according to the image, and then identify the motion information such as the tapped finger according to the coordinates, and finally combine the entered text to get The text currently entered by the user.
  • the color of the user's hand and the background are similar, and the finger tapping speed is too fast, there is a problem of motion blur in the captured image, resulting in low accuracy of the obtained hand key points and inaccurate text.
  • the embodiments of the present application disclose an identification method and an electronic device, which can interact with the electronic device without external devices such as keyboards and wearable devices, improve the identification accuracy of moving parts, and obtain more accurate input information .
  • an embodiment of the present application provides an identification method, which is applied to an electronic device.
  • the method includes: the electronic device acquires a first image and a second image, the first image is an image captured by an optical camera, and the first image is an image captured by an optical camera.
  • the second image is an image captured by an event camera, and the second image is determined according to the brightness change of the pixels of the moving target object;
  • the electronic device obtains the key point information of N parts of the target object according to the first image, N is a positive integer;
  • the electronic device determines the first moving part among the N parts according to the key point information and the grayscale value of the second image, and the grayscale of the pixel in the second image for the part with a larger motion frequency is the larger the value.
  • the electronic device can combine the optical camera and the event camera to determine the moving part of the target object. Even if the first image has problems such as motion blur, overexposure, dim light, etc., it can be combined with the second image to obtain a more accurate image sports information.
  • users can interact with electronic devices without external devices such as keyboards, wearable devices, and cameras with a third-party perspective, which enhances the interaction capabilities of electronic devices and makes it more convenient for users to use.
  • the electronic device acquiring the key point information of the N parts of the target object according to the first image includes: the electronic device fuses the first image and the second image to obtain a third image ; The electronic device acquires the key point information of the N parts according to the third image.
  • the electronic device can realize the key point recognition by combining the optical camera and the event camera, which reduces the influence of the image quality of the first image on the key point detection accuracy in the case of motion blur, the object and the background color and texture are similar, etc., according to The motion information obtained by key points is also more accurate and robust.
  • the above-mentioned method further includes: the above-mentioned electronic equipment The first weight and the second weight of the second image; the parameters of the first image include at least one of the following: distribution, mean, and standard deviation of the grayscale histogram; the electronic device fuses the first image and the second image
  • the obtaining of the third image includes: the electronic device fuses the first image and the second image based on the first weight and the second weight to obtain the third image.
  • the electronic device determines the first weight of the first image and the second weight of the second image according to the parameters of the first image, including: when a preset condition is satisfied, the electronic device The first weight and the second weight are set as a first preset value and a second preset value, respectively, wherein the first preset value is smaller than the second preset value; the preset condition includes at least one of the following: The distribution of the grayscale histogram of the first image is concentrated in a fixed interval, the mean of the first image is greater than the first threshold, the mean of the first image is less than the second threshold, and the standard deviation of the first image is less than the third threshold , wherein the first threshold is greater than the second threshold.
  • the electronic device may reduce the weight occupied by the first image when acquiring the third image, and the weight occupied by the second image Increase the weight.
  • the third image is used to obtain key points, so it can reduce the influence of the quality of the first image on the detection accuracy of key points in the case of overexposed images and dim light, and the motion information obtained according to the key points is also more accurate.
  • the above-mentioned electronic device acquires the key point information of the above-mentioned N parts according to the above-mentioned third image, including: the above-mentioned electronic device identifies the target area where the above-mentioned N parts are located in the above-mentioned third image; the above-mentioned The electronic device identifies the key point information of the N parts in the target area.
  • the electronic device can first obtain the target area where the N parts of the target object are located, and then perform key point detection based on the target area, without performing key point detection in areas outside the target area, avoiding unnecessary processing procedures , reduce processing pressure and improve availability.
  • the electronic device determines the first moving part among the N parts according to the key point information and the gray value of the second image, including: M parts are determined from the N parts, M is less than or equal to N, and M is a positive integer; the key point information includes the coordinates of at least one key point on the N parts; the electronic device is based on the grayscale of the second image.
  • the value of the first part is determined from the above-mentioned M parts, and the gray value of the pixel of the first part in the second image is greater than the preset grayscale threshold, or the pixel of the first part in the second image.
  • the grayscale value of is greater than the grayscale value of the pixels in the second image of other parts in the above-mentioned M parts.
  • the electronic device can first obtain the motion information (that is, the above-mentioned M parts of the motion) according to the key point information, and then screen the motion information according to the gray value of the second image, so as to obtain a more accurate Motion part: the first part, the recognition accuracy is higher.
  • the motion information that is, the above-mentioned M parts of the motion
  • the difference between the coordinates of the key points on the M parts at the first moment and the coordinates at the second moment is greater than the first preset difference, and the first moment and the second moment are or, the difference between the coordinates of the key points on the M parts and the preset coordinates is greater than the second preset difference.
  • the difference between the coordinates of the key points on the above-mentioned M parts at the first moment and the preset coordinates is the first difference
  • the coordinates of the above-mentioned M parts at the second moment of the key points and the preset coordinates are the first difference.
  • the difference of the coordinates is set as the second difference
  • the difference between the first difference and the second difference is greater than the third preset difference.
  • the preset coordinates are the coordinates of the keys corresponding to the key points on the virtual keyboard displayed by the electronic device.
  • the possible moving parts can be determined by the difference between the coordinates of the key points and the coordinates of the keys on the virtual keyboard.
  • M parts the processing delay is reduced, and the application scenarios are more extensive.
  • the first part is used for the electronic device to determine the first information input by the target object through the first part.
  • the electronic device can combine the optical camera and the event camera to improve the detection accuracy of the moving part, and the moving part is used to obtain the information input by the target object, so the recognition accuracy of the input information can be improved.
  • users can input information to electronic devices without external devices such as keyboards, wearable devices, cameras with a third-party perspective, and even users can use electronic devices normally in mobile scenarios, which is more convenient.
  • the method further includes: the electronic device determines Q pieces of information according to the first part, where Q is a positive integer, and the Q pieces of information include the first information; the electronic device determines, according to the second information, the Q pieces of information.
  • the first information is determined from the Q pieces of information, and the second information is information input by the target object before the first information is input.
  • the electronic device can guess the first information input by the user through the first part in combination with the second information that has been input by the target object and the first part of the movement, so as to improve the recognition accuracy of the input information.
  • the above-mentioned first part is a plurality of parts.
  • the target object is a user
  • the first part is a finger of the user.
  • the electronic device can combine the optical camera and the event camera to realize the recognition of the user's moving finger, and thus recognize the information input by the user through the finger based on the virtual keyboard, and the obtained input information has higher precision, and does not require a physical keyboard. , which greatly facilitates the use of users.
  • an embodiment of the present application provides an electronic device, the electronic device includes one or more memories and one or more processors; the one or more memories are used to store computer programs, and the one or more processors
  • the computer program is used to invoke the computer program, and the computer program includes instructions, when the instructions are executed by the one or more processors, the electronic device causes the electronic device to execute the identification method provided by the first aspect and any implementation manner of the first aspect.
  • the aforementioned electronic device includes the aforementioned event camera and the aforementioned optical camera.
  • the above electronic device is a virtual reality device, an augmented reality device or a mixed reality device.
  • embodiments of the present application provide a computer storage medium, including a computer program, where the computer program includes instructions, and when the instructions are run on a processor, the first aspect and any one of the implementation manners of the first aspect are provided. method of identification.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to execute the first aspect and the identification method provided by any implementation manner of the first aspect .
  • an embodiment of the present application provides a chip, where the chip includes at least one processor and an interface circuit, and optionally, the chip further includes a memory; the above-mentioned memory, the above-mentioned interface circuit, and the above-mentioned at least one processor are interconnected through a line , a computer program is stored in the at least one memory; when the computer program is executed by the at least one processor, the first aspect and the identification method provided by any one of the implementation manners of the first aspect are implemented.
  • the electronic device provided in the second aspect, the computer storage medium provided in the third aspect, the computer program product provided in the fourth aspect, and the chip provided in the fifth aspect are all used to execute any of the first aspect and the first aspect.
  • An implementation-provided identification method Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the identification method provided in the first aspect, which will not be repeated here.
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a processing process provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an identification method provided by an embodiment of the present application.
  • the embodiment of the present application provides an identification method, which can be applied to an electronic device.
  • the electronic device may acquire a first image and a second image, wherein the first image is captured by an optical camera, and the second image is captured by an event camera. Then, the electronic device can recognize the key point information of the target object (eg, the key point coordinates of the user's hand) based on the first image and the second image, and combine the key point information, the second image, and optionally the first image that the user has input.
  • the second information obtains the first information currently input by the user.
  • the output of the event camera is the brightness change at the pixel level. That is to say, in the pixel array, when the brightness change of a pixel is greater than a preset brightness threshold, the pixel will generate an output, which can be called an "event". Therefore, when the object captured by the event camera is not moving, the output of the event camera is a black image. When the object captured by the event camera moves, causing the brightness of multiple pixels to change, the output of the event camera can be a moving object. Event cameras have the advantages of no lighting conditions, low latency, and sensitivity to small, fast movements. Therefore, the second image is determined according to the brightness change of the pixels of the moving target object, and the second image includes the part of the moving target object.
  • the application combines the images output by the optical camera and the event camera to identify the moving parts of the target object, which reduces the influence of the image quality on the detection accuracy of key point information during motion blur, and the recognition results (that is, the moving parts of the target object) , and/or first information) are also more accurate.
  • users can input information to electronic devices without external devices such as keyboards, wearable devices, cameras with a third-party perspective, and perform human-computer interaction. Even in mobile scenarios, users can use electronic devices normally, which is more convenient.
  • the output of an optical camera can be a complete image composed of multiple pixels, such as an RGB image.
  • the image has the problem of motion blur.
  • the imaging effect of optical cameras is greatly affected by light, and the image quality is poor when the light is too bright or too dark.
  • the electronic device may obtain multiple frames of images including the user's hand by using an optical camera, and obtain position information of key points of the user's hand according to the multiple frames of images (for example, including the coordinates of the key points of the hand relative to the center of the optical camera, and the coordinates of the hand key points relative to the center of the hand).
  • the electronic device identifies whether there is a gesture matching the position information in the preset gestures, and if so, the electronic device determines that the gesture input by the current user is the gesture matching the position information, and performs an operation corresponding to the gesture, such as selecting a word or delete the entered word. If there is no gesture matching the position information, the electronic device can identify the motion information of the finger (such as finger coordinates, amplitude, etc.) according to the position information, and use the Bayesian model to combine the motion information of the finger and the language of the frequency of word usage. The model obtains the candidate word with the highest probability, and confirms the candidate word as the text input by the current user.
  • the motion information of the finger such as finger coordinates, amplitude, etc.
  • the electronic device can also recognize the motion information of the finger according to the single frame image, so as to obtain the text currently input by the user.
  • a single frame of image can only reflect the scene at a certain moment, and the finger movement is temporal and belongs to a scene within a period of time, so the accuracy of identifying motion information through a single frame of image is low.
  • the recognition result that is, the text input by the current user
  • the gesture actions of different users may vary greatly, and misrecognition or unrecognized situations such as matching failure are likely to occur during gesture matching. Depending on the quality of the preset gestures, the usability is low.
  • This embodiment of the present application does not limit the form of the information input by the user, such as but not limited to text information, picture information, audio information, instruction information, and the like.
  • the electronic devices involved in the embodiments of the present application may be wearable electronic devices, such as head-mounted electronic devices, glasses, goggles, etc., and users may wear the wearable electronic devices to realize augmented reality (AR), virtual reality (virtual reality) , VR), mixed reality (mixed reality, MR) and other different effects.
  • the electronic device may also be other electronic devices including optical cameras and event cameras, such as mobile phones, tablet computers, notebook computers, smart screens, smart TVs, headphones and other devices.
  • the embodiments of the present application are described by taking the electronic device as a head-mounted electronic device as an example for introduction, but the embodiments of the present application are not limited to the head-mounted electronic device, and the electronic device may also be other devices.
  • FIG. 1 exemplarily shows a schematic structural diagram of an electronic device 100 .
  • the electronic device 100 may include a processor 110, a memory 120, a communication module 130, a display screen 140, a sensor module 150, a camera 160, and the like.
  • the camera 160 may include an optical camera 161 and an event camera 162 .
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • the processor 110 may also be connected to other processing units to cooperate to perform the identification method provided in this application.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the above-mentioned memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • Memory 120 may be used to store computer-executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the memory 120 .
  • the memory 120 may include a stored program area and a stored data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as an image capturing function, an image playing function, etc.), and the like.
  • the storage data area may store data (such as image data, text data, etc.) created during the use of the electronic device 100 and the like.
  • the memory 120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may include wireless communication functionality.
  • the communication module 130 may include a mobile communication module and a wireless communication module.
  • the wireless communication function can be realized by an antenna, a mobile communication module, a wireless communication module, a modem processor, and a baseband processor.
  • Antennas are used to transmit and receive electromagnetic wave signals.
  • the electronic device 100 may contain multiple antennas, each of which may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module can provide a wireless communication solution including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module can receive electromagnetic waves from the antenna, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves for radiation through the antenna.
  • at least part of the functional modules of the mobile communication module may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module may be provided in the same device as at least part of the modules of the processor 110 .
  • Wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code division Multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM , and/or IR technology, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • WCDMA wideband code division multiple access
  • time division code division multiple access time-division code division multiple access
  • LTE long term evolution
  • BT GNSS
  • WLAN NFC
  • FM FM
  • IR technology etc.
  • the above-mentioned GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi- zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quasi-zenith satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs a voice signal through an audio device (not limited to a speaker, etc.), or displays an image or video through the display screen 1100 .
  • the modem processor may be a stand-alone device.
  • the modulation and demodulation processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module or other functional modules.
  • the wireless communication module can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module may be one or more devices integrating at least one communication processing module.
  • the wireless communication module receives electromagnetic waves via the antenna, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module can also receive the signal to be sent from the processor 110, perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna.
  • the display screen 140 is used to display images, videos, and the like.
  • the display screen 140 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the user's eyes can see the image presented by the display screen 140 of the electronic device 100 .
  • the display screen 140 is transparent, the user's eyes can see the solid object through the display screen 140 , or the user's eyes can see the image displayed by another display device through the display screen 140 .
  • the number of display screens 140 in the electronic device 100 may be two, corresponding to two eyeballs of the user respectively.
  • the content displayed on these two displays can be displayed independently. Different images can be displayed on the two displays to enhance the three-dimensionality of the images.
  • the number of display screens 140 in the electronic device 100 may also be one, corresponding to two eyeballs of the user.
  • camera 160 may include optical camera 161 and event camera 162 .
  • Optical camera 161 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the optical camera 161 includes, for example, but not limited to, a monocular camera, a binocular camera, and a depth camera. Among them, the depth camera can measure the depth information of the object by methods such as structured light or time of fly (TOF).
  • TOF time of fly
  • the output of the event camera 162 is the change in luminance at the pixel level. That is to say, in the pixel array, when the brightness change of a pixel is greater than a preset brightness threshold, the pixel will generate an output, which can be called an "event".
  • the event camera 162 can output a series of events, which can also be referred to as event streams.
  • the data volume of the event stream is much smaller than the data output by the optical camera 161 .
  • electronic device 100 may include multiple cameras. Specifically, the electronic device 100 may include at least one optical camera 161 and at least one event camera 162 . Illustratively, as shown in FIG. 1 , the electronic device 100 includes four optical cameras 161 mounted on the sides of the electronic device 100 , two at the upper portion and two at the lower portion (the lower one is not shown). The electronic device also includes two event cameras 162 mounted on the electronic device 100 at positions between the two display screens 140, one on the upper side and one on the lower side (not shown). Cameras are used to capture images and videos in real time from the user's perspective. The electronic device 100 may generate a virtual image according to the captured real-time images and videos, and display the virtual image through the display screen 140 .
  • the electronic device 100 may capture the first image through the optical camera 161 and capture the second image through the event camera 162 .
  • the processor 110 may fuse the first image and the second image through the first algorithm to obtain a third image, and identify key point information (eg, coordinates of 21 or 22 key points of the user's hand) in the third image.
  • the processor 110 can then combine the key point information, the second image, and optionally the second information that has been input by the user, to obtain the first information currently input by the user.
  • the processor 110 may determine to perform a corresponding operation according to the first information. For example, if the first information is text information input by the user, the processor 110 may display the first information after the second information through the display screen 140 .
  • the first information is instruction information, and the processor 110 may perform a corresponding operation (for example, a shutdown operation, a suspend operation, etc.) in response to the instruction information.
  • the electronic device 100 may also be connected to other devices (such as mobile phones, tablet computers, smart screens, etc.), and the electronic device 100 may acquire the first image and the second image from other devices, where the first image is other devices
  • the device is captured by the optical camera, and the second image is captured by other devices by the event camera.
  • the processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, serial peripheral interface (serial peripheral interface, SPI) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input/output (GPIO) interface, and/or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • SPI serial peripheral interface
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • USB universal serial bus
  • the electronic device 100 may implement a display function through a GPU, a display screen 140, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 140 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the electronic device 100 may implement a shooting function through an ISP, a camera, a video codec, a GPU, a display screen 140, an application processor, and the like.
  • the ISP can be used to process the data fed back by the optical camera. For example, when taking a picture with an optical camera, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the above electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera.
  • the sensor module 150 may include a plurality of sensors, such as: a touch sensor, a pressure sensor, an ambient light sensor, an acceleration sensor, a gyroscope sensor, an infrared sensor, etc., but is not limited thereto, and may also include a microphone, an earphone, and the like.
  • sensors such as: a touch sensor, a pressure sensor, an ambient light sensor, an acceleration sensor, a gyroscope sensor, an infrared sensor, etc., but is not limited thereto, and may also include a microphone, an earphone, and the like.
  • the processor 110 may determine the virtual image displayed on the display screen 140 according to the still image or video image captured by the camera, in combination with the data obtained by the sensor module 150 (such as data such as brightness, sound, etc.), so as to realize the Virtual images are superimposed on real-world objects.
  • FIG. 2 exemplarily shows a schematic diagram of a scenario where a user inputs text information.
  • the user 200 wears the electronic device 100
  • the structure of the electronic device 100 may refer to the structure shown in FIG. 1 above.
  • the user 200 can see the virtual interface 300 and the virtual keyboard 400 presented by the display screen 140 of the electronic device 100, and the virtual interface 300 can display the user interface of the application program on the electronic device 100, or other devices (such as mobile phones, mobile phones, etc.) connected to the electronic device 100.
  • the virtual keyboard 400 may have the same structure as the physical keyboard.
  • the user 200 can input information through the virtual keyboard 400 , and the information can be presented on the virtual interface 300 .
  • the virtual interface 300 has displayed the second information that has been input by the user 200 through the virtual keyboard 400 .
  • the user 200 can continue to input information through the virtual keyboard 400.
  • the electronic device 100 can use the camera 160 to capture an image of the user's 200 hand.
  • the optical camera 161 can capture the first image
  • the event camera 162 can capture the second image.
  • the electronic device 100 may recognize the key point information of the hand of the user 200 by combining the first image and the second image, and obtain the second information currently input by the user 200 by combining the second image, the key point information and the second information.
  • the electronic device 100 may display the first information on the virtual interface 300 .
  • the key point information is, for example, coordinate information of 21 or 22 hand key points, and the coordinate information may be based on a right-hand coordinate system established with the center of the head of the user 200 as the origin.
  • the process can include but is not limited to the following steps:
  • Step 1 The electronic device uses the first algorithm to process the first image 410 and the second image 420 to obtain the third image 430 .
  • the first image 410 is captured by an optical camera. It can be seen that the first image 410 shown in FIG. 3 is relatively blurred, and the image quality is poor.
  • the second image 420 is captured by the event camera, and the second image 420 shown in FIG. 3 includes the outline of the moving part (ie, the user's hand).
  • the timing at which the optical camera outputs the first image 410 is the same as the timing at which the event camera outputs the second image 420 .
  • the electronic device may first filter out the second image 410 that is the same as the moment when the event camera outputs the second image 420 from the plurality of images output by the optical camera.
  • the electronic device may first set the weights of the first image 410 and the second image 420, and then perform weighted summation to obtain the third image 430, that is, use the first algorithm to fuse the first image 410 and the second image 420 to obtain the third image 430.
  • Image 430 wherein the sum of the weights of the first image 410 and the second image 420 is 1.
  • the third image 430 is used for subsequent hand key point detection, so as to realize hand motion recognition.
  • the electronic device can reduce the weight of the first image 410 and increase the weight of the second image 420, thereby reducing the image
  • the imaging effect of the first image 410 is poor (such as low definition, low brightness, high brightness, etc.)
  • the electronic device can reduce the weight of the first image 410 and increase the weight of the second image 420, thereby reducing the image
  • the impact of the quality of the first image 410 on the accuracy of the detection algorithm for example, the second algorithm described in step 2 and the third algorithm described in step 3
  • the impact of hand motion recognition The results are also more accurate.
  • the electronic device may determine the weight of the first image 410 and the second image 420 according to the grayscale histogram of the first image 410 .
  • the weights of the first image 410 and the second image 420 are both 0.5 by default.
  • the electronic device may The weight of the image 410 is set to 0.3, and the weight of the second image 420 is set to 0.7.
  • the electronic device may also determine the weight of the first image 410 and the second image 420 according to the average value of the first image 410 . For example, when the average value of the first image 410 is greater than the first threshold, the first image 410 is too bright, or when the average value of the first image 410 is less than the second threshold value, the first image 410 is too dark, the electronic device may The weight is set to 0.4, and the weight of the second image 420 is set to 0.6. The electronic device may also determine the weight of the first image 410 and the second image 420 according to the standard deviation of the first image 410 .
  • the electronic device may set the weight of the first image 410 to 0.35 and the weight of the second image 420 to 0.65.
  • the present application does not limit the specific manner of determining the weights of the first image 410 and the second image 420 .
  • the weighted summation method when the electronic device uses the first algorithm to fuse the first image and the second image to obtain the third image, the weighted summation method may not be used.
  • the specific implementation manner is not limited.
  • Step 2 The electronic device identifies the target area 440 in the third image 430 using the second algorithm.
  • the target area 440 is a rectangular area where the user's hand is located in the third image 430 .
  • the electronic device may use the widest line segment of the user's hand in the third image 430 as a set of opposite sides of the target area 440 , and the highest line segment as another set of opposite sides of the target area 440 .
  • Step 3 The electronic device uses the third algorithm to identify the key points in the target area 440 to obtain the target area 450 including the key points.
  • the number and position of key points are not limited. Exemplarily, in the target area 450 including key points shown in FIG. 3 , the number of key points is 21, and most of the key points are located at the joint points of the user's hand.
  • Step 4 The electronic device uses the fourth algorithm to process the target area 450 including the key points, the second image 420 and the second information input by the user to obtain the first information currently input by the user.
  • the user may input information based on the virtual keyboard 400 presented by the electronic device, and FIG.
  • the electronic device may first obtain the first motion information of the finger according to the coordinate information of the key points of the hand in the target area 450 including the key points.
  • the electronic device may first unify the virtual keyboard 400 and the key points of the user's hand into one coordinate system (for example, a right-handed coordinate system with the center of the user's left eye as the origin, see Figure 5 below for a specific example).
  • the electronic device can obtain the coordinate difference (eg, Euler distance, etc.) between the hand key points corresponding to the third image 430 of the preset number of frames and the keys on the virtual keyboard 400, and obtain the first motion information according to the coordinate difference, such as
  • the fingers of the user may perform the tapping action, the magnitude of the tapping action, the frequency of the tapping action, and the like.
  • the finger on which the user may perform the tapping action generally corresponds to a key on the virtual keyboard 400, that is, the user may perform the tapping action based on the key. Therefore, when the electronic device usually obtains the finger on which the user may tap the action, it can obtain the user Word information that may be entered.
  • the electronic device obtains the first motion information according to the coordinate difference, for example, when a preset number of coordinate differences are greater than the first preset difference, it is determined that the finger to which the corresponding key point belongs is a finger that may perform a tapping action. Or, among the preset number of coordinate difference values, when the number of coordinate difference values greater than the first preset difference value is greater than the fourth threshold, it is determined that the finger to which the corresponding key point belongs is a finger that may perform a tapping action. Alternatively, assuming that a finger that may perform a tapping action has been determined, when the corresponding coordinate difference is greater than the second preset difference value, it is determined that the tapping action of the finger is relatively severe.
  • the second preset difference value is greater than the first preset difference value.
  • the intensity of the tapping action may be not severe, relatively severe, or very severe, respectively.
  • the present application does not limit the specific manner of obtaining the first motion information according to the coordinate difference.
  • FIG. 5 exemplarily shows a schematic diagram of a three-dimensional coordinate system.
  • the coordinate system may be a right-handed coordinate system established with the center of the left eye of the user 200 as the origin in the scenario shown in FIG. 2 .
  • the coordinate system shows the coordinates of a key point A in two adjacent frames of the first images 410.
  • the output moments of the two frames of the first images 410 can be the first moment and the second moment respectively, and the first moment is earlier than the first moment. Two moments.
  • the coordinate of each key on the virtual keyboard on the z-axis is z 0
  • the coordinates on the x-axis and the y-axis are different, wherein the coordinate of the key W (center point) on the x-axis is x 0
  • the coordinate on the y-axis is y 0
  • the coordinates of the key point A at the first moment and the second moment on the x-axis are both x 1 .
  • the coordinate of the key point A at the first moment is y 1 on the y-axis
  • the coordinate on the z-axis is z 1 .
  • the coordinate of the key point A at the second moment is y 2 on the y-axis, and the coordinate z 2 on the z-axis.
  • y 2 -y 1 >y t , z 2 -z 1 >z t where y t and z t are preset differences, the electronic device can determine that the finger to which the key point A belongs is a finger that may perform a tapping action.
  • the coordinates of the key points may also be two-dimensional coordinates.
  • the electronic device may also set the finger of the key point with the smallest coordinate difference between the third image 430 and the keys on the virtual keyboard 400 as the finger that may perform the tapping action.
  • the electronic device can also obtain the fingers that may perform the tapping action according to the difference between the coordinates of the same key point at different times in the multiple frames of the third image 430, for example, when the coordinate difference is greater than the preset difference, the key point belongs to A finger is a finger that may perform a tapping action.
  • the present application does not limit the specific manner of determining the motion information.
  • the image shown in FIG. 3 for acquiring the first information currently input by the user may be a single-frame image or multiple-frame images. That is to say, the first image 410 and the second image 420 input to the first algorithm in step 1, and the third image 420 input to the second algorithm in step 2 may be a single frame image at a certain time, or may be a certain time Multiple frames of images within a segment.
  • the electronic device can screen the above-mentioned first motion information according to the gray value of the position of each finger in the second image 420 to obtain the second motion information of the finger, wherein the position of the finger in the second image 420
  • the greater the gray value of the more intense the movement of the finger (the greater the movement amplitude and/or the movement frequency).
  • the fingers of the user who may perform the tapping action are three fingers, and the electronic device may use the two fingers with higher gray values of the three fingers in the second image 420 as the user in the second motion information A finger that may perform a tapping action.
  • the electronic device may obtain information that may be input by the user according to the second motion information.
  • the electronic device obtains the minimum Euler distance between the index finger, middle finger and ring finger of the user and the key D, key W and key A on the virtual keyboard, that is, in the determined first motion information, the fingers that may perform the tapping action.
  • the electronic device recognizes that the positions of the middle finger and the ring finger in the second image 420 shown in FIG. 3 have higher grayscale values. Therefore, in the second motion information, the fingers of the user performing the tapping action are the middle finger and the ring finger. At this time, the electronic device may obtain information that the user may input as "w" and "a” according to the second motion information.
  • the electronic device can also first determine the second motion information of the finger according to the gray value of the position of each finger in the second image 420, for example, the gray value is greater than the first gray threshold value.
  • the fingers are the fingers that may perform the tapping action.
  • the electronic device filters the second motion information according to the key point information to obtain the first motion information.
  • the electronic device can calculate the key points on the fingers that may perform the tapping action and the key points on the virtual keyboard 400 in the second motion information.
  • the coordinate difference value of the key when the coordinate difference value is greater than the sixth preset difference value, it is determined that the finger is a finger that may perform a tapping action in the first motion information.
  • the electronic device may obtain information that may be input by the user according to the first motion information.
  • the electronic device can obtain the first information currently input by the user according to the obtained information that the user may input and the second information that the user has input. For example, the electronic device may use the obtained information that the user may input and the second information that the user has input as the input of the text judgment neural network to obtain the output first information.
  • the first information may be information with a higher input probability among the information that the user may input.
  • the information that the user may input is “w” and "a”
  • the second information that the user has input is “abnorm”.
  • the text judgment neural network can obtain that the probability of the user's current input of "a” is 0.8, and the probability of the user's current input of "w” is 0.2. Therefore, the first output information is "a”.
  • the electronic device can directly use the information that the user may input as the determined first information, Or perform judgment processing on the information possibly input by the user to obtain the first information, which is not limited in this application.
  • the first algorithm is a proportion adjustment algorithm and an image processing algorithm
  • the second algorithm is a target detection neural network
  • the third algorithm is a key point detection neural network
  • the fourth algorithm is a text judgment neural network. It is not limited to this, and may also be a deep learning algorithm, etc.
  • the specific forms of the first algorithm, the second algorithm, the third algorithm and the fourth algorithm are not limited in this application.
  • the electronic device can identify both hands of the user at the same time to obtain the information input by the user.
  • the image acquired by the electronic device may not be an image of the user's hand, but an image of other parts such as the user's legs and waist.
  • the electronic device can perform human body gesture recognition based on the acquired image according to the above process, so as to improve the detection accuracy of key points and make the recognition result more accurate.
  • FIG. 6 is a schematic flowchart of an identification method provided by an embodiment of the present application.
  • FIG. 6 may be applied to an electronic device, which may include an optical camera and an event camera, and the electronic device may be the electronic device shown in FIG. 1 .
  • the method includes but is not limited to the following steps:
  • S101 The electronic device acquires the first image and the second image.
  • the first image is an image captured by an optical camera
  • the second image is an image captured by an event camera.
  • the moment when the optical camera outputs the first image can be the same as the moment when the event camera outputs the second image, so the object in the first image and the second image can both be used to capture N parts of the target object moving at the same moment, such as a user , and the N parts are multiple fingers of the user.
  • N is a positive integer.
  • the second image is determined according to the brightness change of the pixels of the moving target object, and the second image may include moving parts among the N parts.
  • the target object may be a user
  • the N parts may be body parts of the user, such as a hand, a waist, a leg, and the like. Not limited to this, the target object may also be other creatures or objects.
  • S102 The electronic device acquires key point information of N parts of the target object according to the first image.
  • the electronic device can identify the target area where the N parts of the target object in the first image are located, and then identify key point information from the target area, wherein a set of opposite sides of the target area is the N parts of the target area.
  • the widest line segment, another set of opposite sides of the target area is the highest line segment of N parts.
  • the electronic device can also first identify the target area where the P parts of the N parts are located, and then identify the key point information from the target area, wherein a set of pairs of the target area The side is the widest line segment of the P parts, and the other set of opposite sides of the target area is the highest line segment of the P parts.
  • P is a positive integer, and P is less than N.
  • the electronic device may, but is not limited to, determine the P parts according to the area occupied by the N parts in the image, the sharpness, etc., for example, the area occupied by a certain part in the image is larger than the preset area and sharpness (gray of adjacent pixels). When the degree difference or gradient) is smaller than the preset sharpness value, the part is determined to belong to P parts.
  • the electronic device may determine the position and number of key points according to the shape of the part in the target area.
  • the number of key points on a hand (one) is 21, and most of the key points are located at the joint points of the hand.
  • the part in the target area is a leg (one)
  • the number of key points may be 10, and most of them may be located at the joint points of the leg.
  • S102 is specifically: the electronic device acquires the key point information of the N parts of the target object according to the first image and the second image.
  • the electronic device may first fuse the first image and the second image to obtain the third image, and then obtain the key point information of the N parts of the target object according to the third image.
  • the electronic device fuses the first image and the second image to obtain the third image, which may be specifically: the electronic device performs a weighted sum of the first image and the second image to obtain the third image.
  • the weight of the first image may be the first weight
  • the weight of the second image may be the second weight.
  • the sum of the first weight and the second weight is 1.
  • the electronic device may determine the first weight according to parameters of the first image, such as, but not limited to, the distribution of grayscale histograms of the first image, the mean value of the first image, and the standard deviation of the first image.
  • the electronic device may lower the first weight and increase the second weight, and at this time, the first weight may be smaller than the second weight.
  • the poor imaging effect of the first image may satisfy, but is not limited to, at least one of the following: the distribution of the grayscale histogram of the first image is concentrated in a fixed interval, the mean value of the first image is greater than the first threshold, and the mean value of the first image is less than The second threshold, the standard deviation of the first image is less than the third threshold. Wherein, the first threshold is greater than the second threshold.
  • S103 The electronic device determines the first moving part among the N parts according to the key point information and the grayscale value of the second image.
  • the key point information can be used to obtain information such as the coordinate difference of N parts at different times, so as to obtain the motion information of the N parts (for example, the parts that may perform actions, the magnitude of the actions, the frequency of the actions, etc.) .
  • the motion information of the N parts for example, the parts that may perform actions, the magnitude of the actions, the frequency of the actions, etc.
  • the electronic device can obtain the first information input by the target object through the first part according to the first part.
  • the first site may be at least one site, such as at least one finger.
  • the electronic device may first obtain the first motion information of the target object according to the key point information, which is assumed to be M moving parts among the N parts, where M is a positive integer, and N is greater than M. Then, the electronic device may determine the first part from the M parts represented by the first motion information according to the gray value of the second image. Wherein, the gray value of the pixel of the first part in the second image may be greater than the first gray threshold, or the gray value of the pixel of the first part in the second image is greater than the gray value of the pixels of other parts in the M parts degree value.
  • the electronic device may first obtain the second motion information of the target object according to the grayscale value of the second image, which is assumed to be T moving parts among the N parts, where T is a positive integer, and N is greater than T.
  • the gray value of the pixels of the T parts in the second image may be greater than the second gray threshold, or the gray values of the pixels of the T parts in the second image may be greater than the gray values of the pixels of other parts of the N parts degree value.
  • the electronic device may determine the first part from the T parts represented by the second motion information according to the key point information.
  • the coordinate difference of the key points on the first part is greater than the first difference, or the coordinate difference of the key points on the first part is greater than the coordinate difference of the key points on other parts of the T parts.
  • the key point information includes, for example, the coordinates of the key points relative to the fixed points on the user or the electronic device (may be referred to as absolute coordinates), and the key points relative to the coordinates of the fixed points on N parts (may be referred to as relative coordinates).
  • the method may further include: receiving second information input by the user. Then the electronic device can determine the first information according to the first part together with the second information input by the user.
  • the target object is the user and the N parts are the user's left hand (5 fingers of the left hand) as an example.
  • the electronic device can combine the optical camera and the event camera to realize key point recognition, and introduce the event camera to further determine the motion information, so as to obtain the information input by the user according to the motion information, thereby reducing motion blur, objects and
  • the image quality of the first image affects the key point detection accuracy, and the result of motion recognition (ie, the first information input by the user) is also more accurate and robust.
  • Users can input information to electronic devices without external devices such as keyboards, wearable devices, cameras with a third-party perspective, and interact with electronic devices, which enhances the interaction capabilities of electronic devices and makes it more convenient for users to use.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product described above includes one or more computer instructions.
  • the computer program instructions described above are loaded and executed on a computer, the procedures or functions described above in accordance with the present application are produced in whole or in part.
  • the aforementioned computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the above-mentioned computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the above-mentioned computer instructions may be transmitted from a website site, computer, server, or data center via wired communication. (eg coaxial cable, optical fiber, digital subscriber line) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the above-mentioned computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that includes one or more available media integrated.
  • the above-mentioned usable media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital versatile disc (DVD)), or semiconductor media (eg, solid state disk (SSD)) )Wait.
  • magnetic media eg, floppy disk, hard disk, magnetic tape
  • optical media eg, digital versatile disc (DVD)
  • semiconductor media eg, solid state disk (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请实施例提供一种识别方法,应用于电子设备,该方法包括:获取第一图像和第二图像,第一图像为通过光学相机拍摄的图像,第二图像为通过事件相机拍摄的图像,第二图像是根据运动的目标对象的像素的亮度变化确定的;根据第一图像获取目标对象的N个部位的关键点信息;根据关键点信息和第二图像的灰度值确定N个部位中运动的第一部位,运动频率越大的部位在所述第二图像中的像素的灰度值越大。采用本申请实施例能够提高运动部位的识别精度,同时用户可以在无需键盘、可穿戴设备等外部设备的情况下和电子设备交互,便捷性更高。

Description

识别方法及电子设备
本申请要求于2021年02月26日提交中国专利局、申请号为202110222892.X、申请名称为“识别方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人机交互技术领域,尤其涉及一种识别方法及电子设备。
背景技术
目前,增强现实(augmented reality,AR)眼镜、虚拟现实(virtual reality,VR)眼镜等电子设备可以通过配置的光学相机拍摄用户的图像,并根据该图像识别用户运动的部位(例如执行敲击动作的手指),以此获取用户输入的信息(例如文本信息)。无需键盘、可穿戴设备、第三方视角的摄像头等外部设备,大大方便了用户的使用。但拍摄到的图像可能存在运动模糊的问题,导致识别结果也不准确。例如,用户佩戴VR眼镜时,可以在虚拟世界通过虚拟键盘进行文本输入。VR眼镜可以通过自带的光学相机拍摄用户手部的图像,并根据该图像得到用户手部关键点的坐标,然后再根据该坐标识别敲击的手指等运动信息,最后结合已输入的文本得到用户当前输入的文本。但是,在用户手部和背景颜色相似,手指敲击速度过快的情况下,拍摄到的图像存在运动模糊的问题,导致得到的手部关键点的精度较低,得到的文本也不准确。
发明内容
本申请实施例公开了一种识别方法及电子设备,可以在无需键盘、可穿戴设备等外部设备的情况下和电子设备交互,同时提高了运动部位的识别精度,获取到更为准确的输入信息。
第一方面,本申请实施例提供了一种识别方法,应用于电子设备,该方法包括:上述电子设备获取第一图像和第二图像,上述第一图像为通过光学相机拍摄的图像,上述第二图像为通过事件相机拍摄的图像,上述第二图像是根据运动的目标对象的像素的亮度变化确定的;上述电子设备根据上述第一图像获取上述目标对象的N个部位的关键点信息,N为正整数;上述电子设备根据上述关键点信息和上述第二图像的灰度值确定上述N个部位中运动的第一部位,运动频率越大的部位在上述第二图像中的像素的灰度值越大。
本申请实施例中,电子设备可以结合光学相机和事件相机来确定目标对象运动的部位,即使第一图像存在运动模糊、曝光过度、光线昏暗等问题,也可以结合第二图像获取到更为准确的运动信息。同时,用户可以在无需键盘、可穿戴设备、第三方视角的摄像头等外部设备的情况下和电子设备进行交互,增强了电子设备的交互能力,用户使用起来也更加方便。
在一种可能的实现方式中,上述电子设备根据上述第一图像获取上述目标对象的N个部位的关键点信息,包括:上述电子设备融合上述第一图像和上述第二图像以得到第三图像;上述电子设备根据上述第三图像获取上述N个部位的关键点信息。
本申请实施例中,电子设备可以结合光学相机和事件相机实现关键点识别,减少了运动模糊、物体和背景颜色纹理相近等情况下,第一图像的图像质量对关键点检测精度的影响,根据关键点得到的运动信息也更加准确和鲁棒。
在一种可能的实现方式中,上述电子设备融合上述第一图像和上述第二图像以得到第三图像之前,上述方法还包括:上述电子设备根据上述第一图像的参数确定上述第一图像的第一权重和上述第二图像的第二权重;上述第一图像的参数包括以下至少一项:灰度直方图的分布、均值、标准偏差;上述电子设备融合上述第一图像和上述第二图像以得到第三图像,包括:上述电子设备基于上述第一权重和上述第二权重,融合上述第一图像和上述第二图像以得到上述第三图像。
在一种可能的实现方式中,上述电子设备根据上述第一图像的参数确定上述第一图像的第一权重和上述第二图像的第二权重,包括:当满足预设条件时,上述电子设备设置上述第一权重和上述第二权重分别为第一预设值和第二预设值,其中,上述第一预设值小于上述第二预设值;上述预设条件包括以下至少一项:上述第一图像的灰度直方图的分布集中在固定区间内,上述第一图像的均值大于第一阈值,上述第一图像的均值小于第二阈值,上述第一图像的标准偏差小于第三阈值,其中,上述第一阈值大于上述第二阈值。
本申请实施例中,当第一图像的成像效果较差时(例如满足预设条件时),电子设备可以将获取第三图像时第一图像所占的权重调小,第二图像所占的权重调大。而第三图像用于获取关键点,因此可以减少图像曝光过度、光线昏暗等情况下,第一图像的质量对关键点的检测精度的影响,根据关键点得到的运动信息也更加准确。
在一种可能的实现方式中,上述电子设备根据上述第三图像获取上述N个部位的关键点信息,包括:上述电子设备在上述第三图像中识别出上述N个部位所在的目标区域;上述电子设备在上述目标区域中识别出上述N个部位的关键点信息。
本申请实施例中,电子设备可以先获取目标对象的N个部位所在的目标区域,再基于目标区域进行关键点检测,无需在目标区域外的区域进行关键点检测,避免了不必要的处理流程,减小处理压力,可用性更高。
在一种可能的实现方式中,上述电子设备根据上述关键点信息和上述第二图像的灰度值确定上述N个部位中运动的第一部位,包括:上述电子设备根据上述关键点信息从上述N个部位中确定出M个部位,M小于或等于N,M为正整数;上述关键点信息包括上述N个部位上的至少一个关键点的坐标;上述电子设备根据上述第二图像的灰度值从上述M个部位中确定出上述第一部位,上述第一部位在上述第二图像中的像素的灰度值大于预设灰度阈值,或者上述第一部位在上述第二图像中的像素的灰度值大于上述M个部位中其他部位在上述第二图像中的像素的灰度值。
本申请实施例中,电子设备可以先根据关键点信息得到运动信息(即上述运动的M个部位),然后再根据第二图像的灰度值对该运动信息进行筛选处理,从而得到更为准确的运动部位:第一部位,识别精度更高。
在一种可能的实现方式中,上述M个部位上的关键点在第一时刻的坐标和在第二时刻的坐标的差值大于第一预设差值,上述第一时刻和上述第二时刻不同;或者,上述M个部位上的关键点的坐标和预设坐标的差值大于第二预设差值。
示例性地,或者也可以是上述M个部位上的关键点在第一时刻的坐标和预设坐标的差值为第一差值,上述M个部位在关键点在第二时刻的坐标和预设坐标的差值为第二差值,第一差值和第二差值的差值大于第三预设差值。
例如,预设坐标为电子设备显示的虚拟键盘上和关键点对应的按键的坐标。
本申请实施例中,确定可能运动的M个部位的方式多种多样,即使仅获取了一帧第一图像,也可以通过关键点的坐标和虚拟键盘上按键的坐标的差值确定可能运动的M个部位,减 小了处理时延,应用场景也更为广泛。
在一种可能的实现方式中,上述第一部位用于上述电子设备确定上述目标对象通过上述第一部位输入的第一信息。
本申请实施例中,电子设备可以结合光学相机和事件相机来提高运动部位的检测精度,而运动部位用于获取目标对象输入的信息,因此可以提高输入信息的识别精度。同时,用户可以在无需键盘、可穿戴设备、第三方视角的摄像头等外部设备的情况下向电子设备输入信息,即使用户处于移动场景下也可以正常使用电子设备,便捷性更高。
在一种可能的实现方式中,上述方法还包括:上述电子设备根据上述第一部位确定出Q个信息,Q为正整数,上述Q个信息包括上述第一信息;上述电子设备根据第二信息从上述Q个信息中确定出上述第一信息,上述第二信息为上述目标对象输入上述第一信息之前输入的信息。
本申请实施例中,电子设备可以结合目标对象已输入的第二信息、运动的第一部位来猜测用户通过第一部位输入的第一信息,提高输入信息的识别精度。
在一种可能的实现方式中,上述第一部位为多个部位。
在一种可能的实现方式中,上述目标对象为用户,上述第一部位为上述用户的手指。
本申请实施例中,电子设备可以结合光学相机和事件相机实现用户运动的手指的识别,并以此识别用户基于虚拟键盘通过手指输入的信息,得到的输入信息的精度更高,且无需实体键盘,大大方便了用户的使用。
第二方面,本申请实施例提供了一种电子设备,该电子设备包括一个或多个存储器、一个或多个处理器;上述一个或多个存储器用于存储计算机程序,上述一个或多个处理器用于调用上述计算机程序,上述计算机程序包括指令,当上述指令被上述一个或多个处理器执行时,使得上述电子设备执行第一方面、第一方面的任意一种实现方式提供的识别方法。
在一种可能的实现方式中,上述电子设备包括上述事件相机和上述光学相机。
在一种可能的实现方式中,上述电子设备为虚拟现实设备、增强现实设备或混合现实设备。
第三方面,本申请实施例提供了一种计算机存储介质,包括计算机程序,该计算机程序包括指令,当该指令在处理器上运行时实现第一方面、第一方面的任意一种实现方式提供的识别方法。
第四方面,本申请实施例提供了一种计算机程序产品,当该计算机程序产品在电子设备上运行时,使得该电子设备执行第一方面、第一方面的任意一种实现方式提供的识别方法。
第五方面,本申请实施例提供了一种芯片,该芯片包括至少一个处理器和接口电路,可选地,该芯片还包括存储器;上述存储器、上述接口电路和上述至少一个处理器通过线路互联,上述至少一个存储器中存储有计算机程序;上述计算机程序被上述至少一个处理器执行时实现第一方面、第一方面的任意一种实现方式提供的识别方法。
可以理解地,上述第二方面提供的电子设备、第三方面提供的计算机存储介质、第四方面提供的计算机程序产品以及第五方面提供的芯片均用于执行第一方面、第一方面的任意一种实现方式提供的识别方法。因此,其所能达到的有益效果可参考第一方面所提供的识别方法中的有益效果,此处不再赘述。
附图说明
以下对本申请实施例用到的附图进行介绍。
图1是本申请实施例提供的一种电子设备的结构示意图;
图2是本申请实施例提供的一种应用场景的示意图;
图3是本申请实施例提供的一种处理过程的示意图;
图4-图5是本申请实施例提供的一些关键点在虚拟键盘上的示意图;
图6是本申请实施例提供的一种识别方法的流程示意图。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
本申请实施例提供了一种识别方法,可以应用于电子设备。电子设备可以获取第一图像和第二图像,其中,第一图像是通过光学相机拍摄得到的,第二图像是通过事件相机拍摄得到的。然后,电子设备可以基于第一图像和第二图像识别目标对象的关键点信息(例如用户手部的关键点坐标),并结合关键点信息、第二图像,可选地以及用户已输入的第二信息得到用户当前输入的第一信息。
其中,事件相机的输出是像素级别的亮度变化。也就是说,在像素阵列中,当某一像素的亮度变化大于预设亮度阈值,该像素就会产生一个输出,该输出可以称为“事件”。因此,当事件相机拍摄的物体没有运动时,事件相机的输出为一张黑色的图像。当事件相机拍摄的物体运动,造成多个像素的亮度发生变化时,事件相机的输出可以为运动物体。事件相机具有对光照条件没有要求,低延时,对微小、快速的运动敏感的优点。因此第二图像是根据运动的目标对象的像素的亮度变化确定的,第二图像包括运动的目标对象的部位。
本申请结合光学相机和事件相机输出的图像来识别目标对象的运动部位,减少了运动模糊(motion blur)时图像质量对关键点信息的检测精度的影响,识别的结果(即目标对象运动的部位,和/或第一信息)也更准确。并且,用户无需键盘、可穿戴设备、第三方视角的摄像头等外部设备也可以向电子设备输入信息,进行人机交互,即使在移动场景下用户也可以正常使用电子设备,便捷性更高。
和事件相机的输出不同,光学相机输出的可以是一帧由多个像素组成的完整图像,例如RGB图像。光学相机输出的图像中的物体和背景颜色相似,或者运动速度过快时,该图像存在运动模糊的问题。并且光学相机的成像效果受光照影响较大,光照过亮或过暗时图像质量较差。示例性地,电子设备可以通过光学相机拍摄得到包括用户手部的多帧图像,并根据这多帧图像得到用户手部关键点的位置信息(例如包括手部关键点相对光学相机中心的坐标,以及手部关键点相对手部中心的坐标)。然后电子设备识别预设手势中是否存在和该位置信息匹配的手势,若存在则电子设备确定当前用户输入的手势即为和该位置信息匹配的手势,执行和该手势对应的操作,例如选择单词或删除已输入的单词。若不存在和该位置信息匹配的手势,电子设备可以根据该位置信息识别手指的运动信息(例如手指坐标、振幅等),并通过贝叶斯模型,结合手指的运动信息和单词使用频率的语言模型得到概率最高的候选单词,将该候选单词确认为当前用户输入的文本。或者,电子设备也可以根据单帧图像来识别手指的运动信息,从而得到当前用户输入的文本。但单帧图像仅能体现某一时刻的场景,而手指运动是具有时序性的,是属于一段时间内的场景,因此通过单帧图像来识别运动信息的准确率较低。同时,当光学相机拍摄得到的图像存在运动模糊,受光照影响较大的问题时,手部关 键点的检测精度也较低,识别结果(即当前用户输入的文本)也不准确。并且,不同用户的手势动作差别可能很大,进行手势匹配时很可能出现匹配失败等误识别、无法识别的情况,依赖于预设手势的质量,可用性较低。
本申请实施例对用户输入的信息的形式不作限定,例如但不限于为文本信息、图片信息、音频信息、指令信息等等。
本申请实施例中涉及的电子设备可以是可穿戴电子设备,例如头戴电子设备、眼镜、护目镜等,用户可以佩戴可穿戴电子设备实现增强现实(augmented reality,AR)、虚拟现实(virtual reality,VR)、混合现实(mixed reality,MR)等不同效果。不限于此,电子设备也可以是其他包括光学相机和事件相机的电子设备,例如手机、平板电脑、笔记本电脑、智慧屏、智能电视、耳机等设备。
本申请实施例以电子设备为头戴电子设备为例进行介绍,但是本申请实施例不限于头戴电子设备,电子设备还可以是其他设备。
请参见图1,图1示例性示出了一种电子设备100的结构示意图。
如图1所示,电子设备100可以包括处理器110,存储器120,通信模块130,显示屏140,传感器模块150以及摄像头160等。其中,摄像头160可以包括光学相机161和事件相机162。
可以理解地,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。在一些实施例中,处理器110还可以连接其他处理单元,协同执行本申请提供的识别方法。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从上述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
存储器120可以用于存储计算机可执行程序代码,可执行程序代码包括指令。处理器110通过运行存储在存储器120的指令,从而执行电子设备100的各种功能应用以及数据处理。存储器120可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如图像拍摄功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如图像数据,文本数据等)等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以包含无线通信功能。在一些实施例中,通信模块130可以包含移动通信模块和无线通信模块。无线通信功能可以通过天线、移动通信模块、无线通信模块、调制 解调处理器以及基带处理器等实现。
天线用于发射和接收电磁波信号。电子设备100中可以包含多个天线,每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块可以由天线接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块还可以对经调制解调处理器调制后的信号放大,经天线转为电磁波辐射出去。在一些实施例中,移动通信模块的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
在一些实施例中,电子设备100的天线和移动通信模块耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。上述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器等)输出语音信号,或通过显示屏1100显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块或其他功能模块设置在同一个器件中。
无线通信模块可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块经由天线接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线转为电磁波辐射出去。
显示屏140用于显示图像,视频等。显示屏140包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。
当电子设备100安装在用户头上时,用户眼睛可以看到电子设备100的显示屏140呈现的图像。当显示屏140是透明的情况下,用户眼睛可以透过显示屏140看到实体对象,或者用户眼睛可以透过显示屏140看到另外的显示装置显示的图像。
电子设备100中显示屏140的数量可以是两个,分别对应用户的两个眼球。这两个显示屏上显示的内容可以独立显示。可以在这两个显示屏上显示不同的图像来提高图像的立体感。在一些实施例中,电子设备100中显示屏140的数量也可以是一个,来对应用户的两个眼球。
在一些实施例中,摄像头160可以包括光学相机161和事件相机162。
光学相机161用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。光学相机161例如但不限于包括单目相机、双目相机以及深度相机。其中,深度相机可以通过结构光或飞行时间(time of fly,TOF)等方法测量物体的深度信息。
事件相机162的输出是像素级别的亮度变化。也就是说,在像素阵列中,当某一像素的亮度变化大于预设亮度阈值,该像素就会产生一个输出,该输出可以称为“事件”。当拍摄场景中,物体运动或光照改变造成大量像素变化时,事件相机162可以输出一系列的事件,也可以称为事件流。事件流的数据量远小于光学相机161输出的数据。
在一些实施例中,电子设备100可以包括多个摄像头。具体地,电子设备100可以包括至少一个光学相机161和至少一个事件相机162。示例性地,如图1所示,电子设备100包括四个光学相机161,安装在电子设备100的侧面,两个在上部,两个在下部(未示出在下部的一个)。电子设备还包括两个事件相机162,安装在电子设备100上两个显示屏140之间的位置,一个在上部,一个在下部(未示出)。摄像头用于实时捕捉用户视角内的图像和视频。电子设备100可以根据捕获的实时的图像和视频生成虚拟图像,并将虚拟图像通过显示屏140进行显示。
可以理解的,图1中示出的光学相机161、事件相机162的在电子设备100上的位置和数量仅用于解释本申请实施例,不应构成限定。
本申请中,电子设备100可以通过光学相机161捕获第一图像,通过事件相机162捕获第二图像。处理器110可以通过第一算法融合第一图像和第二图像,以得到第三图像,并识别第三图像中的关键点信息(例如用户手部的21个或22个关键点的坐标)。然后处理器110可以结合关键点信息、第二图像,可选地以及用户已输入的第二信息,得到用户当前输入的第一信息。处理器110可以根据第一信息确定执行相应的操作。例如,第一信息为用户输入的文本信息,则处理器110可以通过显示屏140在第二信息后显示第一信息。或者,第一信息为指令信息,处理器110可以响应于该指令信息执行相应的操作(例如关机操作、暂停操作等)。
在一些实施例中,电子设备100还可以连接其他设备(例如手机、平板电脑、智慧屏等),电子设备100可以从其他设备处获取第一图像和第二图像,其中,第一图像为其他设备通过光学相机拍摄得到的,第二图像为其他设备通过事件相机拍摄得到的。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,串行外设接口(serial peripheral interface,SPI)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处 理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,和/或通用串行总线(universal serial bus,USB)接口等。
在一些实施例中,电子设备100可以通过GPU,显示屏140,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏140和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
在一些实施例中,电子设备100可以通过ISP,摄像头,视频编解码器,GPU,显示屏140以及应用处理器等实现拍摄功能。ISP可以用于处理光学相机反馈的数据。例如,通过光学相机拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将上述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头中。
传感器模块150可以包括多个传感器,例如:触摸传感器、压力传感器、环境光传感器、加速度传感器、陀螺仪传感器、红外传感器等等,不限于此,还可以包括麦克风、耳机等。
在一些实施例中,处理器110可以根据摄像头捕获的静态图像或视频图像,结合传感器模块150获取的数据(例如亮度、声音等数据),来确定显示屏140上显示的虚拟图像,从而实现在现实世界物体上叠加上虚拟图像。
请参见图2,图2示例性示出一种用户输入文本信息的场景示意图。
如图2所示,用户200佩戴电子设备100,电子设备100的结构可参见上图1所示的结构。用户200可以看到电子设备100的显示屏140呈现的虚拟界面300和虚拟键盘400,虚拟界面300可以显示电子设备100上的应用程序的用户界面,或连接电子设备100的其他设备(例如手机、平板电脑、智慧屏等)的应用程序的用户界面。虚拟键盘400可以和实体键盘的结构一致。用户200可以通过虚拟键盘400输入信息,该信息可以在虚拟界面300上呈现出来。
示例性地,假设虚拟界面300已显示有用户200通过虚拟键盘400已输入的第二信息。用户200可以继续通过虚拟键盘400输入信息,此时,电子设备100可以通过摄像头160拍摄用户200手部的图像,具体可以通过光学相机161拍摄第一图像,通过事件相机162拍摄第二图像。然后,电子设备100可以结合第一图像和第二图像识别用户200手部的关键点信息,并结合第二图像、关键点信息以及第二信息得到用户200当前输入的第二信息。电子设备100可以在虚拟界面300上显示第一信息。其中,关键点信息例如为21个或22个手部关键点的坐标信息,该坐标信息可以是基于以用户200头部中心为原点建立的右手坐标系的。
基于上图2所示的应用场景,接下来介绍电子设备100对用户进行手部动作识别,以此获取用户输入的文本信息的具体过程,具体可参见下图3。如图3所示,该过程可以包括但不限于如下步骤:
步骤1:电子设备使用第一算法对第一图像410和第二图像420进行处理,以得到第三图像430。
其中,第一图像410是通过光学相机拍摄得到的。可以看出,图3所示的第一图像410较为模糊,图像质量较差。第二图像420是通过事件相机拍摄得到的,图3所示的第二图像420包括运动部位(即用户手部)的轮廓。
光学相机输出第一图像410的时刻和事件相机输出第二图像420的时刻相同。在一些实 施例中,在执行步骤1之前,电子设备可以先在光学相机输出的多张图像中筛选出和事件相机输出第二图像420的时刻相同的第二图像410。
具体地,电子设备可以先设置第一图像410和第二图像420的权重,然后进行加权求和得到第三图像430,即使用第一算法融合第一图像410和第二图像420以得到第三图像430,其中,第一图像410和第二图像420的权重之和为1。第三图像430用于进行后续的手部关键点检测,从而实现手部动作识别。若第一图像410的成像效果较差(例如清晰度较低、亮度较低、亮度较高等),电子设备可以减小第一图像410的权重,增大第二图像420的权重,从而减少图像曝光过度、光线昏暗、运动模糊等情况下,第一图像410的质量对检测算法(例如步骤2所述的第二算法、步骤3所述的第三算法)精度的影响,手部动作识别的结果也更加准确。
在一些实施例中,电子设备可以根据第一图像410的灰度直方图确定第一图像410和第二图像420的权重。例如,假设默认设置第一图像410和第二图像420的权重均为0.5。当第一图像410的灰度直方图的分布不均匀,例如灰度直方图的分布集中在某一固定区间内时,第一图像410在细节上的清晰程度较低,电子设备可以将第一图像410的权重设置为0.3,第二图像420的权重设置为0.7。
不限于此,在具体实现中,电子设备也可以根据第一图像410的平均值确定第一图像410和第二图像420的权重。例如,当第一图像410的均值大于第一阈值时第一图像410过亮,或者当第一图像410的均值小于第二阈值时第一图像410过暗,电子设备可以将第一图像410的权重设置为0.4,第二图像420的权重设置为0.6。电子设备还可以根据第一图像410的标准偏差确定第一图像410和第二图像420的权重。例如,当第一图像410的标准偏差小于第三阈值时第一图像420的对比程度较低,电子设备可以将第一图像410的权重设置为0.35,第二图像420的权重设置为0.65。本申请对确定第一图像410和第二图像420的权重的具体方式不作限定。
不限于上述列举的情况,在具体实现中,电子设备使用第一算法融合第一图像和第二图像以得到第三图像时,也可以不采用加权求和的方法,本申请对第一算法的具体实现方式不作限定。
步骤2:电子设备使用第二算法识别出第三图像430中的目标区域440。
具体地,目标区域440为第三图像430中用户手部所在的矩形区域。电子设备可以将第三图像430中用户手部的最宽线段作为目标区域440的一组对边,最高线段作为目标区域440的另一组对边。
步骤3:电子设备使用第三算法识别出目标区域440中的关键点,以得到包括关键点的目标区域450。
其中,关键点的数量和位置不作限定。示例性地,图3所示的包括关键点的目标区域450中,关键点的数量为21个,关键点大多位于用户手部的关节点处。
步骤4:电子设备使用第四算法对包括关键点的目标区域450、第二图像420以及用户已输入的第二信息进行处理,以得到用户当前输入的第一信息。
可以理解地,用户可以是基于电子设备呈现的虚拟键盘400输入信息的,用户手部关键点落在虚拟键盘400上的示例可参见图4。
具体地,电子设备可以先根据包括关键点的目标区域450中手部关键点的坐标信息得到手指的第一运动信息。示例性地,电子设备可以先将虚拟键盘400、用户手部关键点统一到一个坐标系下(例如以用户左眼中心为原点的右手坐标系下,具体示例可参见下图5)。然后, 电子设备可以获取预设数量帧第三图像430对应的手部关键点和虚拟键盘400上按键的坐标差值(例如欧拉距离等),并根据坐标差值得到第一运动信息,例如用户可能执行敲击动作的手指、敲击动作的幅度、敲击动作的频率等。其中,用户可能执行敲击动作的手指一般是对应一个虚拟键盘400上的按键,即用户可能基于该按键执行敲击动作,因此电子设备通常得到用户可能敲击动作的手指时,就可以得到用户可能输入的单词信息。
其中,电子设备根据坐标差值得到第一运动信息例如为:预设数量个坐标差值均大于第一预设差值时确定对应的关键点所属手指为可能执行敲击动作的手指。或者,预设数量个坐标差值中,大于第一预设差值的坐标差值的数量大于第四阈值时确定对应的关键点所属手指为可能执行敲击动作的手指。或者,假设已确定了可能执行敲击动作的手指,当对应的坐标差值大于第二预设差值时确定该手指的敲击动作较为剧烈。其中,第二预设差值大于第一预设差值。不限于此,当坐标差值大于第三预设差值、第四预设差值、第五预设差值时,敲击动作的剧烈程度可以分别为不剧烈、较剧烈、很剧烈。本申请对根据坐标差值得到第一运动信息的具体方式不作限定。
电子设备获取手指的第一运动信息的示例可参见下图5。图5示例性示出一种三维坐标系的示意图。该坐标系可以是图2所示的场景下,以用户200的左眼中心为原点建立的右手坐标系。该坐标系示出了一个关键点A在相邻两帧第一图像410中的坐标,这两帧第一图像410的输出时刻可以分别为第一时刻和第二时刻,第一时刻早于第二时刻。
如图5所示,虚拟键盘上每个按键在z轴的坐标均为z 0,在x轴和y轴上的坐标各不相同,其中,按键W(中心点)在x轴上的坐标为x 0,在y轴上的坐标为y 0。第一时刻和第二时刻的关键点A在x轴上的坐标均为x 1。第一时刻的关键点A在y轴上的坐标为y 1,在z轴上的坐标为z 1。第二时刻的关键点A在y轴上的坐标为y 2,在z轴上的坐标z 2。假设y 2-y 1>y t,z 2-z 1>z t,其中,y t、z t为预设差值,电子设备可以确定关键点A所属手指为可能执行敲击动作的手指。
不限于上述列举的情况,在具体实现中,关键点的坐标也可以是二维坐标。
不限于上述列举的情况,在具体实现中,电子设备还可以将一帧第三图像430中,和虚拟键盘400上的按键的坐标差值最小的关键点所属手指为可能执行敲击动作的手指,即使拍摄的图像数量不足时(例如电子设备的存储容量较小)也可以实现手部动作识别,同时也减小了处理时延,用户体验感更好。或者,电子设备也可以根据多帧第三图像430中,同一关键点在不同时刻的坐标的差值得到可能执行敲击动作的手指,例如坐标差值大于预设差值时该关键点所属的手指为可能执行敲击动作的手指。本申请对确定运动信息的具体方式不作限定。
可以理解地,图3所示的用于获取用户当前输入的第一信息的图像可以是单帧图像,也可以是多帧图像。也就是说,步骤1中输入第一算法的第一图像410、第二图像420,步骤2中输入第二算法的第三图像420可以是某一时刻的单帧图像,也可以是某一时间段内的多帧图像。
然后,电子设备可以再根据第二图像420中每个手指所在位置的灰度值,对上述第一运动信息进行筛选,以得到手指的第二运动信息,其中,第二图像420中手指所在位置的灰度值越大表征该手指的运动越剧烈(运动幅度和/或运动频率越大)。例如,第一运动信息中用户可能执行敲击动作的手指为三个手指,电子设备可以将第二图像420中这三个手指的灰度值较高的两个手指作为第二运动信息中用户可能执行敲击动作的手指。电子设备可以根据第二运动信息得到用户可能输入的信息。
示例性地,假设电子设备得到用户的食指、中指和无名指和虚拟键盘上的按键D、按键W和按键A的欧拉距离最小,即确定的第一运动信息中,可能执行敲击动作的手指为食指、中指和无名指。而且,电子设备识别到图3所示的第二图像420中中指和无名指所在位置的灰度值较高,因此第二运动信息中,用户执行敲击动作的手指为中指和无名指。此时,电子设备可以根据第二运动信息得到用户可能输入的信息为“w”和“a”。
不限于上述列举的情况,在具体实现中,电子设备也可以先根据第二图像420中每根手指所在位置的灰度值确定手指的第二运动信息,例如灰度值大于第一灰度阈值的手指为可能执行敲击动作的手指。然后,电子设备再根据关键点信息对第二运动信息进行筛选以得到第一运动信息,例如电子设备可以计算第二运动信息中可能执行敲击动作的手指上的关键点和虚拟键盘400上的按键的坐标差值,当坐标差值大于第六预设差值时确定该手指为第一运动信息中可能执行敲击动作的手指。电子设备可以根据第一运动信息得到用户可能输入的信息。
最后,电子设备可以根据上述得到的用户可能输入的信息、用户已输入的第二信息得到用户当前输入的第一信息。例如,电子设备可以将上述得到的用户可能输入的信息、用户已输入的第二信息作为文本判断神经网络的输入,以得到输出的第一信息。其中,第一信息可以是用户可能输入的信息中输入概率更高的信息。
示例性地,假设用户可能输入的信息为“w”和“a”,用户已输入的第二信息为“abnorm”。文本判断神经网络可以得到用户当前输入“a”的概率为0.8,用户当前输入“w”的概率为0.2,因此,输出的第一信息为“a”。
不限于上述列举的情况,在具体实现中,用户在输入第一信息之前没有输入第二信息,即第二信息为空,则电子设备可以直接将用户可能输入的信息作为确定的第一信息,或者对用户可能输入的信息进行判断处理以得到第一信息,本申请对此不作限定。
示例性地,第一算法为占比调整算法和图像处理算法,第二算法为目标检测神经网络,第三算法为关键点检测神经网络,第四算法为文本判断神经网络。不限于此,也可以是深度学习算法等,本申请对第一算法、第二算法、第三算法和第四算法的具体形式不作限定。
以上实施例仅以用户一只手为例进行说明,在具体实现中,电子设备可以同时对用户两只手进行识别,以获取用户输入的信息。
不限于上述示例的情况,在具体实现中,电子设备获取的图像也可以不是用户手部的图像,而是用户腿部、腰部等其他部位的图像。电子设备可以按照上述过程,基于获取的图像进行人体姿态识别,提高关键点的检测精度,识别结果更加准确。
请参见图6,图6是本申请实施例提供的一种识别方法的流程示意图。图6可以应用于电子设备,该电子设备可以包含光学相机和事件相机,该电子设备可以是图1示出的电子设备。该方法包括但不限于如下步骤:
S101:电子设备获取第一图像和第二图像。
具体地,第一图像是通过光学相机拍摄的图像,第二图像是通过事件相机拍摄的图像。光学相机输出第一图像的时刻可以和事件相机输出第二图像的时刻相同,因此第一图像中的物体和第二图像可以均用于拍摄同一时刻下运动的目标对象的N个部位,例如用户的手部,N个部位为用户的多个手指。N为正整数。基于上述事件相机的说明可以知道,第二图像是根据运动的目标对象的像素的亮度变化确定的,第二图像可以包括N个部位中运动的部位。其中,目标对象可以为用户,N个部位可以是用户的身体部位,例如手部、腰部、腿部等。不限于此,目标对象也可以是其他生物或物体。
S102:电子设备根据第一图像获取目标对象的N个部位的关键点信息。
具体的,电子设备可以识别第一图像中目标对象的N个部位所在的目标区域,然后再从该目标区域中识别出关键点信息,其中,该目标区域的一组对边为N个部位的最宽线段,该目标区域的另一组对边为N个部位的最高线段。
不限于此,在具体实现中,电子设备也可以先识别出N个部位中P个部位所在的目标区域,然后再从这目标区域中识别出关键点信息,其中,该目标区域的一组对边为P个部位的最宽线段,该目标区域的另一组对边为P个部位的最高线段。P为正整数,P小于N。电子设备可以但不限于根据N个部位在图像中的所占面积、清晰度等确定P个部位,例如某个部位在图像中的所占面积大于预设面积、清晰度(相邻像素的灰度差或梯度)小于预设清晰值时,该部位确定属于P个部位。
具体地,电子设备可以根据目标区域中部位的形态确定关键点的位置和数量。例如,上图3所示的包括关键点的目标区域450中,手部(一只)上关键点的数量为21个,并且关键点大多位于手部的关节点处。或者,目标区域中的部位为腿部(一只)时,关键点的数量可以为10个,且大多可以位于腿部的关节点处。
在一些实施例中,S102具体为:电子设备根据第一图像和第二图像获取目标对象的N个部位的关键点信息。
具体地,电子设备可以先融合第一图像和第二图像以得到第三图像,然后再根据第三图像获取目标对象的N个部位的关键点信息。在一些实施例中,电子设备融合第一图像和第二图像以得到第三图像,可以具体为:电子设备对第一图像和第二图像进行加权求和以得到第三图像。其中,第一图像的权重可以为第一权重,第二图像的权重可以为第二权重。第一权重和第二权重之和为1。电子设备可以根据第一图像的参数确定第一权重,第一图像的参数例如但不限于包括第一图像的灰度直方图的分布、第一图像的均值、第一图像的标准偏差。
当第一图像的成像效果较差时,电子设备可以调低第一权重,调高第二权重,此时,第一权重可以小于第二权重。第一图像的成像效果较差可以但不限于满足以下至少一项:第一图像的灰度直方图的分布集中在固定区间内,第一图像的均值大于第一阈值,第一图像的均值小于第二阈值,第一图像的标准偏差小于第三阈值。其中,第一阈值大于第二阈值。
S103:电子设备根据关键点信息和第二图像的灰度值确定N个部位中运动的第一部位。
具体地,关键点信息可以用于获取N个部位在不同时刻的坐标差值等信息,从而得到N个部位的运动信息(例如可能执行动作的部位、执行动作的幅度、执行动作的频率等)。基于上述事件相机的说明可以知道,运动越剧烈(运动幅度和/或运动频率越大)的部位在第二图像中的像素的灰度值越大。电子设备可以根据第一部位得到目标对象通过所述第一部位输入的第一信息。第一部位可以是至少一个部位,例如至少一根手指。
在一些实施例中,电子设备可以先根据关键点信息得到目标对象的第一运动信息,假设为N个部位中运动的M个部位,M为正整数,N大于M。然后,电子设备可以根据第二图像的灰度值从第一运动信息表征的M个部位中确定出第一部位。其中,第一部位在第二图像中的像素的灰度值可以大于第一灰度阈值,或者第一部位在第二图像中的像素的灰度值大于M个部位中其他部位的像素的灰度值。
在一些实施例中,电子设备也可以先根据第二图像的灰度值得到目标对象的第二运动信息,假设为N个部位中运动的T个部位,T为正整数,N大于T。其中,T个部位在第二图像中的像素的灰度值可以大于第二灰度阈值,或者T个部位在第二图像中的像素的灰度值大于N个部位中其他部位的像素的灰度值。然后,电子设备可以根据关键点信息从第二运动信 息表征的T个部位中确定出第一部位。可选地,第一部位上的关键点的坐标差值大于第一差值,或者第一部位上的关键点的坐标差值大于T个部位中其他部位上的关键点的坐标差值。
其中,关键点信息例如包括关键点相对用户或电子设备上固定点的坐标(可称为绝对坐标),关键点相对N个部位上固定点的坐标(可称为相对坐标)。
在一些实施例中,S103之前,该方法还可以包括:接收用户输入的第二信息。则电子设备可以根据第一部位和用户输入的第二信息一起确定第一信息。
图6所示流程的示例可参见上图3所示的过程,其中,图3以目标对象为用户,N个部为用户的左手(左手的5个手指)为例进行说明。
在图6所示的方法中,电子设备可以结合光学相机和事件相机实现关键点识别,并且引入事件相机进一步确定运动信息,以此根据运动信息获取用户输入的信息,减少了运动模糊、物体和背景颜色纹理相近等情况下,第一图像的图像质量对关键点检测精度的影响,运动识别的结果(即上述用户输入的第一信息)也更加准确和鲁棒。用户无需键盘、可穿戴设备、第三方视角的摄像头等外部设备也可以向电子设备输入信息,和电子设备进行交互,增强了电子设备的交互能力,用户使用起来也更加方便。
可以理解地,本申请实施例无需进行手势匹配等过程,可以直接根据关键点信息和事件相机输出的图像得到目标对象的运动信息,并根据运动信息以及用户已输入的信息获取用户当前输入的信息,避免了匹配失败等误识别、识别失败等情况,输入效率和可用性更高。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。上述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行上述计算机程序指令时,全部或部分地产生按照本申请上述的流程或功能。上述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。上述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,上述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。上述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。上述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
总之,以上上述仅为本发明技术方案的实施例而已,并非用于限定本发明的保护范围。凡根据本发明的揭露,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (14)

  1. 一种识别方法,其特征在于,应用于电子设备,所述方法包括:
    所述电子设备获取第一图像和第二图像,所述第一图像为通过光学相机拍摄的图像,所述第二图像为通过事件相机拍摄的图像,所述第二图像是根据运动的目标对象的像素的亮度变化确定的;
    所述电子设备根据所述第一图像获取所述目标对象的N个部位的关键点信息,N为正整数;
    所述电子设备根据所述关键点信息和所述第二图像的灰度值确定所述N个部位中运动的第一部位,运动频率越大的部位在所述第二图像中的像素的灰度值越大。
  2. 如权利要求1所述的方法,其特征在于,所述电子设备根据所述第一图像获取所述目标对象的N个部位的关键点信息,包括:
    所述电子设备融合所述第一图像和所述第二图像以得到第三图像;
    所述电子设备根据所述第三图像获取所述N个部位的关键点信息。
  3. 如权利要求2所述的方法,其特征在于,所述电子设备融合所述第一图像和所述第二图像以得到第三图像之前,所述方法还包括:
    所述电子设备根据所述第一图像的参数确定所述第一图像的第一权重和所述第二图像的第二权重;所述第一图像的参数包括以下至少一项:灰度直方图的分布、均值、标准偏差;
    所述电子设备融合所述第一图像和所述第二图像以得到第三图像,包括:
    所述电子设备基于所述第一权重和所述第二权重,融合所述第一图像和所述第二图像以得到所述第三图像。
  4. 如权利要求3所述的方法,其特征在于,所述电子设备根据所述第一图像的参数确定所述第一图像的第一权重和所述第二图像的第二权重,包括:
    当满足预设条件时,所述电子设备设置所述第一权重和所述第二权重分别为第一预设值和第二预设值,其中,所述第一预设值小于所述第二预设值;所述预设条件包括以下至少一项:所述第一图像的灰度直方图的分布集中在固定区间内,所述第一图像的均值大于第一阈值,所述第一图像的均值小于第二阈值,所述第一图像的标准偏差小于第三阈值,其中,所述第一阈值大于所述第二阈值。
  5. 如权利要求2-4任一项所述的方法,其特征在于,所述电子设备根据所述第三图像获取所述N个部位的关键点信息,包括:
    所述电子设备在所述第三图像中识别出所述N个部位所在的目标区域;
    所述电子设备在所述目标区域中识别出所述N个部位的关键点信息。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述电子设备根据所述关键点信息和所述第二图像的灰度值确定所述N个部位中运动的第一部位,包括:
    所述电子设备根据所述关键点信息从所述N个部位中确定出M个部位,M小于或等于N,M为正整数;所述关键点信息包括所述N个部位上的至少一个关键点的坐标;
    所述电子设备根据所述第二图像的灰度值从所述M个部位中确定出所述第一部位,所述 第一部位在所述第二图像中的像素的灰度值大于预设灰度阈值,或者所述第一部位在所述第二图像中的像素的灰度值大于所述M个部位中其他部位在所述第二图像中的像素的灰度值。
  7. 如权利要求6所述的方法,其特征在于,所述M个部位上的关键点在第一时刻的坐标和在第二时刻的坐标的差值大于第一预设差值,所述第一时刻和所述第二时刻不同;或者,所述M个部位上的关键点的坐标和预设坐标的差值小于第二预设差值。
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述第一部位用于所述电子设备确定所述目标对象通过所述第一部位输入的第一信息。
  9. 如权利要求8所述的方法,其特征在于,所述方法还包括:
    所述电子设备根据所述第一部位确定出Q个信息,Q为正整数,所述Q个信息包括所述第一信息;
    所述电子设备根据第二信息从所述Q个信息中确定出所述第一信息,所述第二信息为所述目标对象输入所述第一信息之前输入的信息。
  10. 如权利要求1-9任一项所述的方法,其特征在于,所述第一部位为多个部位。
  11. 如权利要求1-10任一项所述的方法,其特征在于,所述目标对象为用户,所述N个部位为所述用户的手指。
  12. 一种电子设备,其特征在于,所述电子设备包括一个或多个存储器、一个或多个处理器;所述一个或多个存储器用于存储计算机程序,所述一个或多个处理器用于调用所述计算机程序,所述计算机程序包括指令,当所述指令被所述一个或多个处理器执行时,使得所述电子设备执行权利要求1至11任一项所述的方法。
  13. 如权利要求12所述的电子设备,其特征在于,所述电子设备包括所述事件相机和所述光学相机。
  14. 一种计算机存储介质,其特征在于,包括计算机程序,所述计算机程序包括指令,当所述指令在处理器上运行时,实现如权利要求1至11任一项所述的方法。
PCT/CN2022/076403 2021-02-26 2022-02-16 识别方法及电子设备 WO2022179412A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110222892.XA CN114967907A (zh) 2021-02-26 2021-02-26 识别方法及电子设备
CN202110222892.X 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022179412A1 true WO2022179412A1 (zh) 2022-09-01

Family

ID=82973247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/076403 WO2022179412A1 (zh) 2021-02-26 2022-02-16 识别方法及电子设备

Country Status (2)

Country Link
CN (1) CN114967907A (zh)
WO (1) WO2022179412A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108027656A (zh) * 2015-09-28 2018-05-11 日本电气株式会社 输入设备、输入方法和程序
CN109241835A (zh) * 2018-07-27 2019-01-18 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
US20200005469A1 (en) * 2017-02-14 2020-01-02 The Trustees Of The University Of Pennsylvania Event-based feature tracking
WO2020034083A1 (en) * 2018-08-14 2020-02-20 Huawei Technologies Co., Ltd. Image processing apparatus and method for feature extraction
CN111951313A (zh) * 2020-08-06 2020-11-17 北京灵汐科技有限公司 图像配准方法、装置、设备及介质
CN112396562A (zh) * 2020-11-17 2021-02-23 中山大学 一种高动态范围场景下基于rgb与dvs图像融合的视差图增强方法
CN112884805A (zh) * 2021-01-07 2021-06-01 清华大学 一种跨尺度自适应映射的光场成像方法
CN113033526A (zh) * 2021-05-27 2021-06-25 北京欧应信息技术有限公司 基于计算机实现的方法、电子设备和计算机程序产品

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108027656A (zh) * 2015-09-28 2018-05-11 日本电气株式会社 输入设备、输入方法和程序
US20200005469A1 (en) * 2017-02-14 2020-01-02 The Trustees Of The University Of Pennsylvania Event-based feature tracking
CN109241835A (zh) * 2018-07-27 2019-01-18 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
WO2020034083A1 (en) * 2018-08-14 2020-02-20 Huawei Technologies Co., Ltd. Image processing apparatus and method for feature extraction
CN111951313A (zh) * 2020-08-06 2020-11-17 北京灵汐科技有限公司 图像配准方法、装置、设备及介质
CN112396562A (zh) * 2020-11-17 2021-02-23 中山大学 一种高动态范围场景下基于rgb与dvs图像融合的视差图增强方法
CN112884805A (zh) * 2021-01-07 2021-06-01 清华大学 一种跨尺度自适应映射的光场成像方法
CN113033526A (zh) * 2021-05-27 2021-06-25 北京欧应信息技术有限公司 基于计算机实现的方法、电子设备和计算机程序产品

Also Published As

Publication number Publication date
CN114967907A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2020238741A1 (zh) 图像处理方法、相关设备及计算机存储介质
WO2019183819A1 (zh) 拍照方法、拍照装置和移动终端
US11782554B2 (en) Anti-mistouch method of curved screen and electronic device
US20220197033A1 (en) Image Processing Method and Head Mounted Display Device
US20210203836A1 (en) Camera switching method for terminal, and terminal
WO2022179376A1 (zh) 手势控制方法与装置、电子设备及存储介质
US20170345165A1 (en) Correcting Short Term Three-Dimensional Tracking Results
US20220086360A1 (en) Big aperture blurring method based on dual cameras and tof
CN112614057A (zh) 一种图像虚化处理方法及电子设备
EP3641294B1 (en) Electronic device and method for obtaining images
US20200322530A1 (en) Electronic device and method for controlling camera using external electronic device
WO2022100685A1 (zh) 一种绘制命令处理方法及其相关设备
CN113741681B (zh) 一种图像校正方法与电子设备
CN111145192A (zh) 图像处理方法及电子设备
WO2021179829A1 (zh) 一种人机交互方法及设备
CN116703995B (zh) 视频虚化处理方法和装置
WO2023011302A1 (zh) 拍摄方法及相关装置
CN115150542B (zh) 一种视频防抖方法及相关设备
CN113496477A (zh) 屏幕检测方法及电子设备
CN117132515A (zh) 一种图像处理方法及电子设备
WO2022179412A1 (zh) 识别方法及电子设备
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质
CN109729264B (zh) 一种图像获取方法及移动终端
CN115150543B (zh) 拍摄方法、装置、电子设备及可读存储介质
EP4376433A1 (en) Camera switching method and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758777

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22758777

Country of ref document: EP

Kind code of ref document: A1