WO2022206494A1 - 目标跟踪方法及其装置 - Google Patents

目标跟踪方法及其装置 Download PDF

Info

Publication number
WO2022206494A1
WO2022206494A1 PCT/CN2022/082300 CN2022082300W WO2022206494A1 WO 2022206494 A1 WO2022206494 A1 WO 2022206494A1 CN 2022082300 W CN2022082300 W CN 2022082300W WO 2022206494 A1 WO2022206494 A1 WO 2022206494A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
tracking
displacement
depth information
value
Prior art date
Application number
PCT/CN2022/082300
Other languages
English (en)
French (fr)
Inventor
徐健
张超
张雅琪
刘宏马
贾志平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022206494A1 publication Critical patent/WO2022206494A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Definitions

  • the present application relates to the field of information processing, and in particular, to a target tracking method and device thereof.
  • Target tracking is to take the image sequence as input according to the selected tracking target, and output the size and position of the selected tracking target in each frame of the image sequence.
  • the accuracy of target tracking depends on the selected tracking target, so the selection of the tracking target is a key step in triggering target tracking.
  • the target detection model (such as the Yolo model) can be used to identify multiple objects in the image, and output the detection frame to mark the position of each object, and then select the target frame according to the user clicks on the selected detection frame. object as the tracking target.
  • the Yolo model cannot detect objects that are too small in the image, which may lead to failure to locate and track the target.
  • the tracking target can also be selected by manually drawing an image frame on the image to mark the object, for a moving object, it is possible that the object is at the first position on a certain frame (eg, the first frame) of the image sequence when the image frame is started, As the drawing progresses and the object moves, the object has moved out of the original position on another frame (for example, the tenth frame) of the image sequence, which also leads to the failure of positioning and tracking the target.
  • embodiments of the present application provide a target tracking method and device thereof, which can conveniently select a tracking target.
  • an embodiment of the present application provides a target tracking method, the method includes: acquiring depth information of image frames in a video stream; determining a change area between first adjacent image frames in the video stream as a waiting area
  • the detection object, the first adjacent image frame includes a first image frame and a second image frame, the first image frame is an image frame located in front of the second image frame, and the changed area is a difference area of depth information; determine the second image
  • the position of the object to be detected in the frame is compared with the displacement value and displacement direction of the position of the object to be detected in the first image frame, the position is the position of the depth information, and the displacement value and displacement direction are the displacement value and displacement direction of the depth information;
  • a tracking target is determined, where the tracking target is an object to be detected whose displacement value is greater than the first preset value and whose displacement direction is a direction close to the focal plane.
  • the present application determines that the object to be detected is a difference area with depth information between adjacent image frames, and if the position of the depth information of the object to be detected moves forward significantly, the object to be detected is selected as the tracking target, which can be easily selected Track the target.
  • the method further includes: during tracking, determining the position of the tracking target in a second adjacent image frame, where the second adjacent image frame includes a third image frame and a fourth image frame, and the third The image frame is the image frame located in front of the fourth image frame, and the position is the position of the depth information; the displacement value and displacement direction between the position of the tracking target in the fourth image frame and the position of the tracking target in the third image frame are determined.
  • the displacement value and the displacement direction are the displacement value and the displacement direction of the depth information; if the displacement value is greater than the second preset value and the displacement direction is a direction away from the focal plane, the tracking is stopped.
  • the tracking can be exited, so that the tracking can be exited conveniently.
  • the method further includes: detecting human body key points in image frames in the video stream; wherein, the object to be detected is between the first adjacent image frames in the video stream and the human body key points.
  • the change area where the first parameter is connected; the tracking target is that the displacement value is greater than the first preset value, the displacement direction is the direction close to the focal plane, and the depth information difference between the image frame and the second parameter of the human body key point is Objects to be detected larger than the third preset value.
  • the object to be detected is selected as the tracking target, and the tracking target can be conveniently selected.
  • the method further includes: during tracking, determining the position of the tracking target in a second adjacent image frame, where the second adjacent image frame includes a third image frame and a fourth image frame, and the third The image frame is the image frame located in front of the fourth image frame, and the position is the position of the depth information; the displacement value and displacement direction between the position of the tracking target in the fourth image frame and the position of the tracking target in the third image frame are determined.
  • the displacement value and the displacement direction are the displacement value and displacement direction of the depth information; if the displacement value is greater than the second preset value and the displacement direction is the direction away from the focal plane, or the tracking target is not connected to the first parameter of the human body key point , to exit tracking.
  • the tracking can be exited, so that the tracking can be exited conveniently.
  • the displacement value of the depth information is the absolute value of the average depth change value of the pixel point.
  • an embodiment of the present application provides a target tracking device, the device includes: an acquisition unit for acquiring depth information of an image frame in a video stream; a determination unit for determining a first phase in the video stream The change area between adjacent image frames is the object to be detected, the first adjacent image frame includes a first image frame and a second image frame, the first image frame is the image frame located in front of the second image frame, and the changed area is The difference area of the depth information; the determining unit is also used to determine the displacement value and the displacement direction between the position of the object to be detected in the second image frame compared with the position of the object to be detected in the first image frame, and the position is the depth of the depth information.
  • the position, the displacement value and the displacement direction are the displacement value and the displacement direction of the depth information; the determining unit is also used to select a tracking target, and the tracking target is a displacement value greater than the first preset value and the displacement direction is close to the focal plane the direction of the object to be detected.
  • the determining unit is further configured to determine the position of the tracking target in a second adjacent image frame during tracking, where the second adjacent image frame includes a third image frame and a fourth image frame,
  • the third image frame is an image frame located in front of the fourth image frame, and the position is the position of the depth information;
  • the determining unit is further configured to determine that the position of the tracking target in the fourth image frame is compared with the position of the tracking target in the third image frame.
  • the displacement value and the displacement direction between the positions, the displacement value and the displacement direction are the displacement value and the displacement direction of the depth information; the determining unit is also used for if the displacement value is greater than the second preset value and the displacement direction is far away Orientation of the focal plane, exit tracking.
  • the determining unit is further configured to detect human key points in the image frames in the video stream; the determining unit is further configured to determine the difference between the first adjacent image frames in the video stream The change area connected with the first parameter of the human body key point is the object to be detected; the determining unit is also used to select a tracking target, and the tracking target has a displacement value greater than the first preset value, and the displacement direction is close to the focal plane The direction of the object to be detected, and the depth information difference between the image frame and the second parameter of the human body key point is greater than the third preset value.
  • the determining unit is further configured to determine the position of the tracking target in a second adjacent image frame during tracking, where the second adjacent image frame includes a third image frame and a fourth image frame,
  • the third image frame is an image frame located in front of the fourth image frame, and the position is the position of the depth information; the determining unit is further configured to determine that the position of the tracking target in the fourth image frame is compared with the position of the tracking target in the third image frame.
  • the displacement value and the displacement direction between the positions, the displacement value and the displacement direction are the displacement value and the displacement direction of the depth information; the determining unit is also used for if the displacement value is greater than the second preset value and the displacement direction is far away.
  • the orientation of the focal plane, or the tracking target is not connected to the first parameter of the human body key point, and the tracking is exited.
  • the displacement value of the depth information is an absolute value of an average depth change value of a pixel point.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory is used to store program instructions, and when the processor calls the program instructions, any one of the above is implemented The target tracking method described in item.
  • an embodiment of the present application provides a server, where the server includes a processor and a memory, the memory is used to store program instructions, and when the processor invokes the program instructions, the implementation of any of the above The target tracking method described.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program, and the program enables a computer device to implement the target tracking method described in any one of the above.
  • an embodiment of the present application provides a computer program product, the computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; at least one processor of the device can The computer-executable instructions are read from the computer-readable storage medium, and the at least one processor executes the computer-executable instructions to cause the device to perform the target tracking method described in any one of the above.
  • FIG. 1 is a schematic diagram of a tracking system according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
  • FIG. 3 is a block diagram of a software structure of an electronic device according to an embodiment of the present application.
  • FIG. 4 is a flowchart of a target tracking method according to an embodiment of the present application.
  • 5A-5D are diagrams of human-computer interaction interfaces provided by embodiments of the present application.
  • 6A-6B are other human-computer interaction interface diagrams provided by the embodiments of the present application.
  • 8A-8B are other human-computer interaction interface diagrams provided by the embodiments of the present application.
  • 9A-9B are some schematic diagrams provided in the embodiments of the present application.
  • 10A-10E are user interfaces provided by embodiments of the present application.
  • 11A-11B are other user interfaces provided by the embodiments of the present application.
  • FIG. 12 is a schematic diagram of a human body key point according to an embodiment of the present application.
  • 13A-13B are other user interfaces provided by the embodiments of the present application.
  • FIG. 14 is a schematic diagram of a hardware structure of a server according to an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a target tracking device of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, features defined as “first”, “second” may expressly or implicitly include one or more of said features.
  • words such as “for example” are used to represent examples, illustrations or illustrations. Any embodiment or design described in the embodiments of the present application as “for example,” should not be construed as preferred or advantageous over other embodiments or designs. Rather, use of words such as “such as” is intended to present the related concepts in a specific manner.
  • the tracking system 10 may include an electronic device 11 and a server 12 .
  • the electronic device 11 may be an electronic device such as a smart phone with an image capturing function, a tablet computer, a PDA (Personal Digital Assistant), a smart camera device, or a wearable device.
  • a network connection can be established between the electronic device 11 and the server 12 .
  • the network connection may be a wired or wireless connection.
  • the electronic device 11 may include a camera module 111 .
  • the camera module 111 may be a camera module such as a binocular camera, a structured light camera, a TOF (Time of flight) camera, or a common monocular camera.
  • the camera module 111 is used for capturing images of the scene.
  • the image can be used to obtain depth information of the subject. If the camera module 111 is a binocular camera, a structured light camera, or a TOF (Time of flight) camera, the image includes the depth information of the subject, and the subject in the image can be directly acquired subsequently. depth information of the subject. If the camera module 111 is a common monocular camera, a monocular depth estimation algorithm can be used subsequently to obtain the depth information of the object in the image.
  • the camera module 111 captures the image at a fixed frequency, for example, 30 frames per second.
  • the camera module 111 can be fixed to capture images in the same scene, or can be driven to move to track objects.
  • the electronic device 11 includes a client 112 .
  • the client 112 may be an application with a camera function running on the electronic device 11, such as a camera application APP, an APP that provides live streaming, an APP that provides video calls, or an APP that provides monitoring applications.
  • the client 112 can call the camera application APP through an application programming interface (API) to request permission to call the camera module 111 , and after obtaining the permission, can control to call the camera module 111 .
  • the electronic device 11 can acquire the video stream collected by the camera module 111 , and send the video stream to the server 12 through the client 112 .
  • the server 12 may store the video stream in a storage location associated with the live channel identifier, so that the player can play the video stream or send the video stream to other electronic devices for a video call.
  • the electronic device 11 can obtain the depth information of the image frames in the video stream; determine the change area between adjacent image frames in the video stream as the object to be detected, and the adjacent image frames include the first image frame and the second image frame, the first image frame is the image frame located in front of the second image frame, and the changed area is the depth information difference area; determining the position of the object to be detected in the second image frame is compared with that in the first image frame.
  • the displacement value and the displacement direction between the positions of the detection objects, the displacement value and the displacement direction are the displacement value and the displacement direction of the depth information;
  • the tracking target is selected, and the tracking target is the displacement value greater than the first preset value and The displacement direction is the object to be detected in the direction close to the focal plane.
  • the electronic device 11 also tracks the tracking target, and then sends the processed video stream to the server 12 .
  • the processing of the video stream can also be handled by the server 12 .
  • the server receives the video stream sent by the electronic device through the client, it can also obtain the depth information of the image frames in the video stream; and determine that the change area between adjacent image frames in the video stream is to be detected object, the adjacent image frames include a first image frame and a second image frame, the first image frame is the image frame located in front of the second image frame, and the change area is the depth information change area; determine the second image frame to be detected
  • the position of the object is compared with the displacement value and displacement direction between the positions of the object to be detected in the first image frame, and the displacement value and the displacement direction are the displacement value and displacement direction of the depth information;
  • the target is an object to be detected whose displacement value is greater than the first preset value and whose displacement direction is a direction close to the focal plane.
  • the server 12 can also track the tracking target through the electronic device, and then send the processed video stream to other electronic devices through the client. That is, in the embodiment of the present application, selecting a tracking target and tracking the tracking target may be implemented in the electronic device 11 or in the server 12 , which is not limited here.
  • the electronic device 100 may include a cell phone with image capture function, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook , at least one of a cellular phone, a personal digital assistant (PDA), a wearable device, an in-vehicle device, or a smart home device.
  • the specific type of the electronic device 100 is not particularly limited in this embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) connector 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera module 193, display screen 194 , and a subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the processor can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in the processor 110 may be a cache memory.
  • the memory may store instructions or data that is used by the processor 110 or is used more frequently. If the processor 110 needs to use the instructions or data, it can be called directly from this memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • the processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, a camera, and the like through at least one of the above interfaces.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the USB connector 130 is an interface conforming to the USB standard specification, which can be used to connect the electronic device 100 and peripheral devices, and specifically can be a Mini USB connector, a Micro USB connector, a USB Type C connector, and the like.
  • the USB connector 130 can be used to connect to a charger, so that the charger can charge the electronic device 100, and can also be used to connect to other electronic devices, so as to transmit data between the electronic device 100 and other electronic devices. It can also be used to connect headphones to output audio stored in electronic devices through the headphones.
  • This connector can also be used to connect other electronic devices, such as VR devices, etc.
  • the standard specifications of the Universal Serial Bus may be USB1.x, USB2.0, USB3.x, and USB4.
  • the charging management module 140 is used for receiving charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera module 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), bluetooth low power power consumption (bluetooth low energy, BLE), ultra wide band (UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other electronic devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi- zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • electronic device 100 may include one or more display screens 194.
  • the electronic device 100 may implement a camera function through a camera module 193, an ISP, a video codec, a GPU, a display screen 194, an application processor AP, a neural network processor NPU, and the like.
  • the camera module 193 can be used to collect color image data and depth data of the photographed object.
  • the ISP can be used to process the color image data collected by the camera module 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera module 193 .
  • the camera module 193 may be composed of a color camera module and a 3D sensing module.
  • the photosensitive element of the camera of the color camera module may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the 3D sensing module may be a time of flight (TOF) 3D sensing module or a structured light (structured light) 3D sensing module.
  • the structured light 3D sensing is an active depth sensing technology, and the basic components of the structured light 3D sensing module may include an infrared (Infrared) emitter, an IR camera module, and the like.
  • the working principle of the structured light 3D sensing module is to first emit a light spot of a specific pattern on the object to be photographed, and then receive the light coding of the light spot pattern on the surface of the object, and then compare the similarities and differences with the original projected light spot. And use the principle of trigonometry to calculate the three-dimensional coordinates of the object.
  • the three-dimensional coordinates include the distance between the electronic device 100 and the object to be photographed.
  • the TOF 3D sensing can be an active depth sensing technology, and the basic components of the TOF 3D sensing module can include an infrared (Infrared) transmitter, an IR camera module, and the like.
  • the working principle of the TOF 3D sensing module is to calculate the distance (ie depth) between the TOF 3D sensing module and the object to be photographed through the time of infrared reentry to obtain a 3D depth map.
  • Structured light 3D sensing modules can also be used in face recognition, somatosensory game consoles, industrial machine vision detection and other fields.
  • TOF 3D sensing modules can also be applied to game consoles, augmented reality (AR)/virtual reality (VR) and other fields.
  • AR augmented reality
  • VR virtual reality
  • the camera module 193 may also be composed of two or more cameras.
  • the two or more cameras may include color cameras, and the color cameras may be used to collect color image data of the photographed object.
  • the two or more cameras may use stereo vision technology to collect depth data of the photographed object.
  • Stereoscopic vision technology is based on the principle of human eye parallax. Under natural light sources, two or more cameras are used to capture images of the same object from different angles, and then operations such as triangulation are performed to obtain the electronic device 100 and the object. The distance information between the objects, that is, the depth information.
  • the camera module 193 may also be composed of a camera. This camera captures an RGB image from one or only viewing angle.
  • the GPU in the processor 110 can estimate the distance of each pixel in the image relative to the camera module 193 according to the monocular depth estimation algorithm, that is, depth information.
  • the camera module 193 can be fixed to capture images of the same scene and the same viewing angle, and can also be driven to capture images of different scenes.
  • the camera module 193 can be fixed before the tracking target is selected; after the tracking target is selected, it can be driven to track the target.
  • the electronic device 100 may include one or more camera modules 193 .
  • the electronic device 100 may include a front camera module 193 and a rear camera module 193.
  • the front camera module 193 can usually be used to collect the color image data and depth data of the photographer facing the display screen 194, and the rear camera module can be used to collect the shooting objects (such as people, landscapes, etc.) that the photographer faces. etc.) color image data and depth data.
  • the CPU, GPU or NPU in the processor 110 may process the color image data and depth data collected by the camera module 193 .
  • the NPU can recognize the color image data collected by the camera module 193 (specifically, the color camera module) through a neural network algorithm based on the skeleton point recognition technology, such as a convolutional neural network algorithm (CNN). , to determine the skeleton point of the person being photographed.
  • CNN convolutional neural network algorithm
  • the CPU or GPU can also run the neural network algorithm to realize the determination of the skeletal points of the photographed person according to the color image data.
  • the CPU, GPU or NPU can also be used to confirm the figure (such as the body of the person being photographed) according to the depth data collected by the camera module 193 (which may be a 3D sensing module) and the identified skeletal points. ratio, the fatness and thinness of the body parts between the skeletal points), and can further determine the body beautification parameters for the photographed person, and finally process the photographed image of the photographed person according to the body beautification parameters, so that the photographed image
  • the body shape of the person to be photographed is beautified. Subsequent embodiments will introduce in detail how to perform body beautification processing on the image of the person being photographed based on the color image data and depth data collected by the camera module 193 , which will not be described here.
  • Digital signal processors are used to process digital signals and can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card. Or transfer music, video and other files from electronic devices to external memory cards.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the processor 110 executes various functional methods or data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 and/or the instructions stored in the memory provided in the processor.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or some functional modules of the audio module 170 may be provided in the processor 110.
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 may listen to music through the speaker 170A, or output an audio signal for a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyroscope sensor 180B detects the shaking angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and controls the reverse movement of the lens to offset the shaking of the electronic device 100 to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the magnetic sensor 180D can be used to detect the folding or unfolding of the electronic device, or the folding angle.
  • the electronic device 100 when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in horizontal and vertical screen switching, pedometers and other applications.
  • the electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When the intensity of the detected reflected light is greater than the threshold, it may be determined that there is an object near the electronic device 100 . When the intensity of the detected reflected light is less than the threshold, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L may be used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is blocked, eg, the electronic device is in a pocket. When it is detected that the electronic device is blocked or in a pocket, some functions (such as touch functions) can be disabled to prevent misuse.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature detected by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs a reduction in the performance of the processor in order to reduce the power consumption of the electronic device to implement thermal protection.
  • the electronic device 100 heats the battery 142 when the temperature detected by the temperature sensor 180J is below another threshold. In other embodiments, the electronic device 100 may boost the output voltage of the battery 142 when the temperature is below yet another threshold.
  • Touch sensor 180K also called “touch device”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 may include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
  • the electronic device 100 may support one or more SIM card interfaces.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. Multiple cards can be of the same type or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of the electronic device 100 .
  • FIG. 3 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into five layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime, ART) and the native C/C++ library, and the hardware abstraction layer (Hardware abstraction layer). Abstract Layer, HAL) and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.
  • the window manager provides window management services (Window Manager Service, WMS), WMS can be used for window management, window animation management, surface management and as a transfer station for the input system.
  • WMS Window Manager Service
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • This data can include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications from applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Activity Manager can provide activity management services (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) startup, switching, scheduling and application process management and scheduling work .
  • AMS Activity Manager Service
  • system components such as activities, services, content providers, broadcast receivers
  • the input manager can provide an input management service (Input Manager Service, IMS), and the IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input and so on.
  • IMS Input Manager Service
  • IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.
  • the Android runtime includes the core library and the Android runtime.
  • the Android runtime is responsible for converting source code to machine code.
  • the Android runtime mainly includes the use of ahead or time (AOT) compilation technology and just in time (JIT) compilation technology.
  • the core library is mainly used to provide the functions of basic Java class libraries, such as basic data structures, mathematics, IO, tools, databases, networks and other libraries.
  • the core library provides an API for users to develop Android applications.
  • a native C/C++ library can include multiple functional modules. For example: surface manager, Media Framework, libc, OpenGL ES, SQLite, Webkit, etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media framework supports playback and recording of many common audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of the electronic device 100 .
  • the hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a calling interface to the upper layer.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer, and then starts the camera driver by calling the kernel layer. The camera captures still images or video.
  • FIG. 4 is a flowchart of a target tracking method according to an embodiment of the present application.
  • the target tracking method is applied to an electronic device, and a tracking target is selected and tracked according to the change of the depth information.
  • the target tracking method includes:
  • the electronic device receives an operation of starting a client on the electronic device.
  • the present application is described below by taking a mobile phone as an example.
  • the user can click on the app icon of the "Video Call” client or the "Live Streaming Delivery” client on the mobile phone to open the "Video Calling" client or the "Live Streaming Delivery” client, and the camera module starts to collect video streams. Get an image of the scene in front of the camera module.
  • the tracking mode is enabled in the "Video Call” client or the "Live Stream Delivery” client.
  • the tracking mode in the "Video Call” client can be enabled through the controls in the "Video Call” client.
  • the enabling of the tracking mode in the "Live Streaming Delivery” client can be realized through the controls in the "Live Streaming Delivery” client.
  • the user can click the "video call” client on the main interface of the mobile phone.
  • the client is not limited to a "video call” client, but can also be other clients including a video call function or other clients with a similar video call function, which is not limited in this application.
  • the user interface of the video call can be displayed, as shown in FIG. 5B . In FIG.
  • the user interface of the video call may include: a video display area 51 , a hangup control 52 , a camera switch control 53 , a more option control 54 , a status bar 55 , a setting control 56 and a tracking mode switch control 57 .
  • FIG. 5B is an example of a user interface of a video call, and the user interface of the video call may further include a window reduction control and the like. The present application does not limit the content and form of the user interface of the video call.
  • the video display area 51 is used to display the video stream collected by the camera module of the mobile phone of the video contact.
  • the hang-up control 52 is used to interrupt the video call.
  • the cell phone can detect a touch operation (eg, a tap operation on the hang-up control 52) acting on the hang-up control 52, and terminate the video call in response to the operation.
  • the camera switching control 53 is used for switching cameras.
  • the mobile phone can detect a touch operation (such as a click operation on the camera switching control 53) acting on the camera switching control 53, and in response to the operation, switch the camera module of the mobile phone from the front camera to the rear camera, or switch the camera module to the rear camera.
  • the camera module of the mobile phone is switched from the rear camera to the front camera.
  • More option controls 54 may include window switching controls and the like.
  • the mobile phone can detect a touch operation on the more options control 54 (eg, a click operation on the more options control 54 ), and display the window switching control.
  • the window switching control is used to display the video stream collected by the camera module of the mobile phone, and switch the video window.
  • the mobile phone can detect a touch operation (such as a click operation on the window switching control) acting on the window switching control, and switch the window switching control and the content displayed in the video display area 51 in response to the operation.
  • Status bar 55 may include network, signal strength, battery status, and time, among others.
  • the setting control 56 is used for receiving setting instructions input by the user. As shown in FIG. 5B , the user may click on the settings control 56 . After detecting the setting control 56 selected by the user, the mobile phone displays a setting interface, as shown in FIG. 5C .
  • the setting interface may include the smart tracking control and the status bar enabled by default.
  • FIG. 5C is an example of a setting interface, and the setting interface of the video call may further include more or less controls than those shown in FIG. 5C .
  • the present application does not limit the content and form of the user interface of the video call.
  • the enabling of the smart tracking control by default includes enabling the smart tracking font and a corresponding selection control by default.
  • a selection control can have two states, on and off.
  • the state of the tracking mode can be toggled. For example, when the tracking mode is off, the user chooses to enable the smart tracking font or select the control by default (as shown in Figure 5C), then the mobile phone switches to the tracking mode on state (as shown in Figure 5D). In Figure 5C, select Control is off. In FIG. 5D , the selection control is on; when the tracking mode is on, and the user chooses to enable the smart tracking font by default or selects the control, the mobile phone switches to the tracking mode off.
  • the tracking mode switch control 57 is used for receiving a user-input command for turning on or off the tracking mode. As shown in FIG. 6A , the tracking mode is off, and the user can click the tracking mode switch control 57 . After detecting that the user selects the tracking mode switch control 57, the mobile phone turns on the tracking mode, as shown in FIG. 6B.
  • the tracking mode switch control includes a flag 61, indicating that the tracking mode is in an off state; in FIG. 6B, the tracking mode switch control does not include a flag, indicating that the tracking mode is in an on state. If the tracking mode is turned on, the user can also click the tracking mode switch control, and the mobile phone turns off the tracking mode after detecting that the user selects the tracking mode switch control.
  • the user can click the "live streaming with goods” client on the main interface of the mobile phone.
  • the client is not limited to the "live streaming with goods” client, but can also be Kuaishou, Taobao and other clients that include a live streaming and delivery function, or other clients with a similar live streaming and delivery function, which is not limited in this application.
  • the mobile phone After the mobile phone detects that the user clicks on the "Live Live Delivery” client, it can display the user interface of the Live Delivery Delivery, as shown in Figure 7B. In FIG.
  • the user interface for live streaming may include: a video display area 71 , a play/pause control 72 , a status bar 73 , a setting control 74 and a tracking mode switch control 75 .
  • FIG. 7B is an example of the user interface of the live broadcast with goods, the user interface of the live broadcast with goods may include more or less controls, and the present application does not carry out the content and form of the live broadcast with goods user interface. limited.
  • the video display area 71 is used to display the video stream collected by the camera module of the mobile phone.
  • Play/Pause control 72 is used to pause/play the live broadcast.
  • the mobile phone can detect a touch operation (eg, a click operation on the play/pause control 72) acting on the play/pause control 72, and pause/play the live broadcast in response to the operation. For example, if the play/pause control 72 is in the play state, the user can select the play/pause control 72, and the mobile phone pauses the live broadcast in response to the selection operation; if the play/pause control 72 is in the pause state, the user can select the play/pause control 72.
  • the mobile phone plays the live broadcast in response to the selection operation.
  • Status bar 73 may include network, signal strength, battery status, and time, among others.
  • the setting control 74 is used to receive setting instructions input by the user. As shown in FIG. 7B , the user may click on the settings control 74 . After detecting the setting control 74 selected by the user, the mobile phone displays a setting interface, as shown in FIG. 7C .
  • the setting interface may include the smart tracking control and the status bar enabled by default.
  • FIG. 7C is an example of a setting interface, and the setting interface of the video call may further include more or less controls than those shown in FIG. 7C .
  • the present application does not limit the content and form of the user interface of the video call.
  • the enabling of the smart tracking control by default includes enabling the smart tracking font and a corresponding selection control by default.
  • a selection control can have two states, on and off.
  • the state of the tracking mode can be toggled. For example, when the tracking mode is off, the user chooses to enable the smart tracking font or select the control by default (as shown in Figure 7C), then the mobile phone switches to the tracking mode on state (as shown in Figure 7D). In Figure 7C, select Control is off. In FIG. 7D , the selection control is on; when the tracking mode is on, and the user chooses to enable the smart tracking font by default or selects the control, the mobile phone switches to the tracking mode off.
  • the tracking mode switch control 75 is used to receive a user-input command for turning on or off the tracking mode. As shown in FIG. 8A , the tracking mode is turned off, and the user can click the tracking mode switch control 75 . After detecting that the user selects the tracking mode switch control 75, the mobile phone turns on the tracking mode, as shown in FIG. 8B.
  • the tracking mode switch control includes a flag 81, indicating that the tracking mode is in an off state; in FIG. 8B, the tracking mode switch control does not include a flag, indicating that the tracking mode is in an on state. If the tracking mode is turned on, the user can also click the tracking mode switch control, and the mobile phone turns off the tracking mode after detecting that the user selects the tracking mode switch control.
  • the electronic device acquires a video stream collected by the camera module, where the video stream includes image frames.
  • the electronic device After the client is turned on, in the tracking mode, the electronic device first selects the tracking target, and then tracks the tracking target. Before selecting the tracking target, the camera module is fixed to collect image frames of the same scene and the same viewing angle. The camera captures the video stream at a fixed frequency, eg, 30 frames per second. The video stream includes multiple frames of images ordered in chronological order.
  • the electronic device obtains the video stream collected by the camera module in real time, for example, at time t1, the electronic device obtains image frame 1 (as shown in Figure 9A ) in the video stream collected by the camera module; at time t2, the electronic device obtains the camera module Image frame 2 in the captured video stream (as shown in Figure 9B). It can be understood that, although the electronic device displays image frame 1 and image frame 2 in FIG. 9A and FIG. 9B , this does not prevent the image frame collected by the camera module from being considered as the image in FIG. 9A and FIG. 9B . frame.
  • S403 Acquire depth information of image frames in the video stream.
  • the image frame includes the depth information of the subject, and the depth of the subject in the image frame can be directly obtained. information. If the camera module is an ordinary monocular camera, and the image frame does not include the depth information of the subject, a monocular depth estimation algorithm can be used to obtain the depth information of the subject in the image frame.
  • the electronic device acquires the depth information of the image frames in the video stream in real time.
  • FIGS. 9A and 9B include objects: people, tables, cups, razors, etc.
  • Figure 9A it acquires the image Depth information of people, tables, cups, razors, etc. in frame 1;
  • Figure 9B also includes objects: people, tables, cups, razors, etc., when the electronic device acquires the image frame 2 shown in Figure 9B , to obtain the depth information of people, tables, cups and razors in image frame 2.
  • the electronic device can obtain the depth information of all the pixels of the person in the image frame 1 shown in FIG. 9A , the depth information of all the pixels of the table, the depth information of all the pixels of the cup, and the depth information of all the pixels of the razor The depth information of the pixel points, etc.; the electronic device can obtain the depth information of all the pixel points of the person in the image frame 2 shown in FIG. 9B, the depth information of all the pixel points of the table, the depth information of all the pixel points of the cup, and the Depth information of all pixels of the shaver, etc.
  • the first adjacent image frame includes a first image frame and a second image frame, and the first image frame is located in front of the second image frame
  • the image frame of , the change area is the difference area of depth information.
  • determining that the change area between the first adjacent image frames in the video stream is the object to be detected includes:
  • the change area between adjacent image frames, the change area is the difference area of the depth information; the change area is determined as the object to be detected.
  • preset rules can be set, for example, people are excluded from the change area, or the object is determined as the change area only when the overall similarity of the objects in the first adjacent image frames in the video stream is less than a threshold value , for example, the artificial change area is determined only when the similarity of the whole person in the first adjacent image frame in the video stream is less than the threshold.
  • the first adjacent image frame may be one first adjacent image frame or a plurality of first adjacent image frames.
  • one first adjacent image frame may include the image frame 1 shown in FIG. 9A and the image frame 2 shown in FIG. 9B .
  • the image frame 1 shown in FIG. 9A is the first image frame
  • the image frame 2 shown in FIG. 9B is the second image frame
  • the image frame 1 is the image frame located in front of the image frame 2 .
  • Determining the change area between the first adjacent image frames in the video stream as the object to be detected may include: determining the person, table, cup and shaver in the image frame 1 shown in FIG. 9A in the video stream and the image shown in FIG.
  • the plurality of first adjacent image frames may be, for example, image frame 1 shown in FIG. 9A , image frame 2 , image frame 3 , and image frame 4 shown in FIG. 9B .
  • the four image frames are continuous video streams captured by the camera module. Then the image frame 1 shown in FIG. 9A and the image frame 2 shown in FIG. 9B , the image frame 2 and the image frame 3 shown in FIG. 9B , and the image frame 3 and the image frame 4 are three adjacent image frames.
  • the process of determining the object to be detected by a plurality of first adjacent image frames is similar to the process of determining the object to be detected by one first adjacent image frame, and details are not described herein again.
  • the first adjacent image frame can be ignored.
  • the similarity between the first adjacent image frames is the similarity between the pixels of the first adjacent image frames.
  • the device can compare the similarity between the pixels of the first adjacent image frames; determine that the pixel points whose depth information change value between adjacent image frames exceeds a threshold is the first pixel point;
  • the depth change value of the first pixel in the image frame; the depth change image is determined according to the depth change value of the first pixel;
  • the minimum depth between each pixel in the depth change image and its spatially adjacent pixels is calculated change the value to form a distance difference image; perform threshold binarization on the distance difference image to obtain a binary image; in the binary image, mark the connected domain to determine the connected domain; determine the object to be detected according to the connected domain .
  • the connected domain is the above-mentioned change region.
  • FIG. 9A and FIG. 9B as examples to illustrate how to determine the object to be detected according to the pixel points in the case of a first adjacent image frame, the image frame 1 shown in FIG. 9A and the image frame 2 shown in FIG. 9B .
  • the change values of the depth information of pixel points a, b, c, d, f, g, and h between image frame 1 and image frame 2 are 0.21 m, 0.25 m, 0.29 m, 0.3 m, 0.21 m, 0.25m and 0.27m exceed the threshold of 0.2m, then determine the first pixel point as pixel points a, b, c, d, f, g, h, and determine the first pixel points a, b, and h in the first adjacent image frame.
  • the depth variation values of c, d, f, g, and h are 0.21 m, 0.25 m, 0.29 m, 0.3 m, 0.21 m, 0.25 m, and 0.27 m, respectively.
  • Connected domain refers to the area composed of adjacent pixels with the same pixel value in the image frame.
  • Each pixel in the connected domain has a certain similarity with its adjacent adjacent pixels in space, then the difference between each pixel in the connected domain and the adjacent pixel is The depth difference does not undergo sudden change, that is, the absolute value of the depth difference between each pixel point in the connected domain and the adjacent pixel point is less than a certain depth difference.
  • the image data of the depth change image is shown in FIG. 10A .
  • the depth change image is a single-channel image.
  • the value represented by each pixel in the depth change image is the depth change value of the pixel. For example, in FIG.
  • the resolution of the depth change image is 100 ⁇ 100
  • the pixels with a value of 0 in the depth change image indicate that the change value of the depth information of the pixels in the first adjacent image frame is less than the threshold value
  • a pixel whose value is not 0 indicates that the change value of the depth information of the pixel in the first adjacent image frame exceeds the threshold.
  • the pixel point and its spatially adjacent pixel point may be as shown in FIG. 10B or FIG. 10C .
  • FIG. 10B and FIG. 10C take the pixel point f in the above-mentioned first pixel point as an example for description. In FIG.
  • the pixel point f has four adjacent pixel points in space, which are pixel point g, pixel point h, pixel point i and pixel point j.
  • the pixel point g, the pixel point h, the pixel point i and the pixel point j are respectively located directly above, directly below, directly left and right of the pixel point f.
  • the pixel point f has 8 pixel points adjacent to it in space, which are pixel point g, pixel point h, pixel point i, pixel point j, pixel point k, pixel point l, pixel point point m, and pixel point n.
  • the pixel point g, the pixel point h, the pixel point i, the pixel point j, the pixel point k, the pixel point l, the pixel point m, and the pixel point n are respectively located at Right above, right below, right left, right right, upper left corner, upper right corner, lower left corner and lower right corner of the pixel point f.
  • the depth change value of the pixel point f is 0.21 meters
  • the depth change value of the pixel point g is 0.25 meters
  • the depth change value of the pixel point h is 0.27 meters
  • the pixel point i The depth change value is 0.2 meters
  • the depth change value of the pixel j is 0.3 meters
  • the depth change between the pixel point f and its spatially adjacent pixel point g, pixel point h, pixel point i, and pixel point j The values are: 0.04m, 0.06m, 0.01m, 0.09m, then the minimum depth change value between the pixel f and its spatially adjacent pixels is 0.01m.
  • the image data of the distance difference image is shown in FIG. 10D .
  • the distance difference image is a single-channel image.
  • the value represented by each pixel in the distance difference image is the minimum depth change value between the pixel and its spatially adjacent pixel.
  • the threshold binarization process for the distance difference image may be, if the value represented by a pixel point is less than a preset value (for example, 0.03, etc.), the value represented by the pixel point is 1 after threshold value binarization process. ; If the value represented by the pixel point is greater than the preset value (for example, 0.03, etc.), the value represented by the pixel point is 0 after threshold binarization processing.
  • a preset value for example, 0.03, etc.
  • the value 0.01 represented by the pixel point f is less than the preset value 0.03, then the value represented by the pixel point f is subjected to threshold binarization processing
  • the latter is 1, that is, the pixel value of the pixel point f in the binary image is 1.
  • other first pixel points a, b, c, d, g, h can also determine the pixel value in the binary image, then
  • the image data of the binary image obtained according to the above-mentioned FIGS. 9A and 9B is shown in FIG. 10E , and the binary image is a single-channel image.
  • the pixel value represented by each pixel point in the binary image is the value obtained by performing threshold binarization processing on the minimum depth change value.
  • the connected domain is marked to determine the connected domain.
  • the number of the connected domains may be one or more.
  • the connected domain H, the connected domain I and the connected domain J are determined.
  • the connected domain H is the cup
  • the connected domain I is the human arm
  • the connected domain J is the human eye.
  • the connected domain H includes pixel points a, b, c, and d
  • the connected domain I includes pixel points f and g
  • the connected domain J includes h.
  • the above-mentioned preset rule can be used to exclude people from the change area, or only when the overall similarity between the objects in the first adjacent image frames in the video stream is less than a threshold value, the object is determined to be a change region, the connected domain I and the connected domain J can be excluded to complete the determination of the connected domain. Then, the connected domain with the largest area and/or the largest depth change value in the first adjacent image frame can be determined as the object to be detected. For example, taking another example to illustrate, the number of excluded connected domains is two, namely the connected domain K and the connected domain L. If the area of the connected domain K is larger than the area of the connected domain L, it is determined to be detected.
  • the object is a connected domain K, or if the depth change value of the connected domain K in the first adjacent image frame is greater than the depth change value of the connected domain L, the object to be detected is determined to be a connected domain K, so that when there are multiple connected domain changes , some connected domains can be excluded, that is, some noises in the image frame can be excluded.
  • the connected domain H is obtained according to FIG. 9A and FIG. 9B , it can be determined that the connected domain H is the object to be detected, that is, the cup is the object to be detected.
  • the present application can also be used when determining the first pixel point, or when determining the depth change value of the first pixel point, or when determining the depth change image, or when forming the distance difference image, or when obtaining the binary image. Human arms and human eyes are excluded from the changes.
  • the process of determining the object to be detected according to the pixel points in the case of multiple first adjacent image frames is similar to the process of determining the object to be detected according to the pixel points in the case of one first adjacent image frame, and will not be repeated here.
  • the first adjacent image frame is determined according to the average value of the change values of the depth information of the pixel points pixel and determine the depth change value of the first pixel. For example, when the image frame 1 shown in FIG. 9A is compared with the image frame 2 shown in FIG.
  • the change values of the depth information of the pixel points a, b, c, d, f, g, and h in the image frame are 0.21 m, 0.25m, 0.29m, 0.3m, 0.21m, 0.25m, 0.27m exceeds the threshold of 0.2m; for the convenience of description, the following will only take pixel a as an example to illustrate whether pixel a is the first pixel, and if The pixel point a is the first pixel point, and the depth change value of the first pixel point a is determined.
  • the change value of the depth information of pixel point a is 0.27 meters, which exceeds the threshold value of 0.2 meters; when image frame 3 is compared with image frame 4, the change value of the depth information of pixel point a is 0.25 meters, which exceeds the threshold value of 0.2 meters.
  • the pixel point a is determined to be the first pixel point, and the depth change value of the first pixel point a is determined to be the average value of 0.21 m, 0.27 m, and 0.25 m, that is, 0.73 m/3.
  • S405 Determine the displacement value and displacement direction between the position of the object to be detected in the second image frame and the position of the object to be detected in the first image frame, where the position is the position of the depth information, and the displacement value and the displacement direction are the displacement of the depth information value and displacement direction.
  • the object to be detected includes a plurality of interconnected pixel points in the image frame.
  • the displacement value of the depth information is the absolute value of the average depth change value of the pixel point.
  • the displacement directions include directions close to the focal plane and directions away from the focal plane. If the average depth change value of the pixel point is greater than zero, the displacement direction is away from the focal plane; if the average depth change value of the pixel point is less than zero, the displacement direction is close to the focal plane.
  • S406 Select a tracking target, where the tracking target is an object to be detected whose displacement value is greater than the first preset value and whose displacement direction is a direction close to the focal plane.
  • FIGS. 9A and 9B continue to use the above-mentioned FIGS. 9A and 9B as examples to illustrate the present application.
  • the position of the cup to be detected in the image frame 2 shown in FIG. 9B is compared with the position of the cup to be detected in the image frame 1 shown in FIG. 9A. If the displacement value
  • the camera module can be driven to collect image frames of different scenes.
  • the second adjacent image frame includes a third image frame and a fourth image frame
  • the third image frame is located in front of the fourth image frame
  • the image frame, the position is the position of the depth information.
  • the electronic device acquires the video stream collected by the camera module in real time, and also acquires the depth information of the image frames in the video stream in real time.
  • the second adjacent image frame may be one second adjacent image frame or a plurality of second adjacent image frames.
  • the present application will be described below by taking a second adjacent image frame as an example.
  • the electronic device acquires the image frame 7 in the video stream collected by the camera module (as shown in FIG. 11A ); at time t4, the electronic device Obtain the image frame 8 in the video stream collected by the camera module (as shown in FIG. 11B ).
  • the image frame 7 is the third image frame
  • the image frame 8 is the fourth image frame
  • the image frame 7 is the image frame located in front of the image frame 8 .
  • the present application determines the position of the depth information of the tracking target in the image frame 7 and the image frame 8 . It can be understood that, although the electronic device displays the image frame 7 and the image frame 8 in FIG. 11A and FIG. 11B , this does not prevent the image frame collected by the camera module from being considered as the image in FIG. 11A and FIG. 11B . frame.
  • S409 Determine the displacement value and displacement direction between the position of the tracking target in the fourth image frame and the position of the tracking target in the third image frame, where the displacement value and the displacement direction are the displacement value and displacement direction of the depth information.
  • the tracking target includes multiple interconnected pixels in the image frame.
  • the displacement value of the depth information is the absolute value of the average depth change value of the pixel point.
  • the displacement directions include directions close to the focal plane and directions away from the focal plane. If the average depth change value of the pixel point is greater than zero, the displacement direction is away from the focal plane; if the average depth change value of the pixel point is less than zero, the displacement direction is close to the focal plane.
  • the tracking target cup includes pixel points a, b, c, and d;
  • the change values of the depth information are 0.33 meters, 0.28 meters, 0.36 meters, and 0.27 meters respectively, then determine the average depth change value of the pixel points between the position of the tracking target cup in image frame 8 and the position of the tracking target cup in image frame 7 is 0.31 m, and the displacement value between the position of the tracking target cup in image frame 8 and the position of the tracking target in image frame 7 is determined to be
  • the focused display may include tracking the tracking target, frame and display the tracking target, or cropping and centering the tracking target.
  • the framed display tracking target may be marked by a box, a circle, or an outline shape of an object to mark the tracking target in the image frame.
  • Clipping and displaying the tracking target in the center may be to cut off other parts of the image except the tracking target, and enlarge and display the remaining part in the center, as shown in FIG. 11A .
  • FIG. 11A the cup is shown cropped and centered.
  • the tracking target is also reselected when the tracking is exited.
  • 11A and 11B described above are used as examples to illustrate the present application, the position of the tracking target cup in the image frame 8 shown in FIG. 11B is compared with the position of the tracking target in the image frame 7 shown in FIG. 11A The displacement value is 0.31 If the meter is greater than the second preset value, and the displacement direction is a direction away from the focal plane, the tracking of the cup is quit.
  • the target tracking method shown in Fig. 4 can not only be used in the scene where the tracking target is selected and tracked according to the change of the depth information, but also can be used in the scene where the tracking target is selected and tracked according to the key points of the human body in the image frame and the change of the depth information. under the scene.
  • the difference from the above-mentioned scene where the tracking target is selected and tracked according to the depth information is:
  • the human body key points in the image frames in the video stream are also detected, and the connection between the first adjacent image frames in the video stream and the first parameter of the human body key points is determined.
  • the change area of is the object to be detected.
  • the displacement value of the tracking target is greater than the first preset value
  • the displacement direction is the direction close to the focal plane
  • the depth information difference between the image frame and the second parameter of the human body key point is greater than The object to be detected with the third preset value.
  • Two preset values and the displacement direction is the direction away from the focal plane or the displacement direction is the direction close to the focal plane; when exiting the tracking, if the displacement value is greater than the second preset value and the displacement direction is the direction away from the focal plane, or the tracking target It is not connected to the first parameter of the human body key point and exits the tracking.
  • the positions of the human body key points in the image frames are detected according to the depth information of the image frames in the video stream.
  • the key points of the human body are shown in FIG. 12 .
  • the human body key points include a first parameter, a third parameter, a fourth parameter, a fifth parameter and a sixth parameter.
  • the first parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter are the left and right wrists, the left and right shoulders, the neck, the head, and the left and right hips, respectively.
  • FIG. 12 only shows that the human body key point includes the first parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter, it is obvious that the human body key point may also include the seventh parameter, the sixth parameter Eight parameters and ninth parameters and other parts.
  • the seventh parameter, the eighth parameter and the ninth parameter are the left and right elbows, the left and right knees, and the left and right ankles, respectively.
  • the electronic device When determining the object to be detected, as shown in the process of determining the connected domain in step S404 in FIG. 4 , the electronic device first determines the connected domain, and determines the connection with the human body key point according to the position of the human body key point and the connected domain in the image frame.
  • the connected domain connected to the first parameter of is the object to be detected.
  • the electronic device determines the connected domain shaver according to FIGS. 13A and 13B .
  • the connected domain razor includes pixels o, p, q, r.
  • the image frame 9 shown in FIG. 13A is the first image frame
  • the image frame 10 shown in FIG. 13B is the second image frame.
  • Image frame 9 is the image frame preceding image frame 10 .
  • the electronic device displays the image frame 9 and the image frame 10 in FIG. 13A and FIG. 13B , this does not prevent it from being considered that the image frame collected by the camera module is the image in FIG. 13A and FIG. 13B . frame.
  • the electronic device also determines, according to the position of the human body key point and the connected domain in the image frame 10 shown in FIG. 13B , the connected domain shaver adjacent to the first parameter of the human body key point as the object to be detected.
  • the connected domain connected with the first parameter of the human body key point is the object to be detected, specifically:
  • the Euclidean distance between the first parameter of the human body key point and the center of the connected domain, and the connected domain whose Euclidean distance is less than the preset value is determined as the object to be detected. Wherein, the Euclidean distance being smaller than the preset value indicates that the connected domain is connected with the first parameter of the human body key point.
  • the position of the human body key point in the image frame includes the coordinates of the human body key point in the image frame coordinate system
  • the position of the center of each connected domain in the image frame includes the connected domain
  • the Euclidean distance between the first parameter of the human body key point in the image frame and the center of the connected domain is determined according to the position of the human body key point in the image frame and the position of the center of each connected domain in the image frame
  • p ji is the Euclidean distance between the j-th wrist in the first parameter of the human key point in the image frame and the center of the i-th connected domain
  • x 1i is the distance between the i-th connected domain in the image frame
  • x j is the abscissa of the jth wrist in the first parameter of the human body key point in the image frame
  • y 1i is the ordinate of the center of the ith connected domain in the image frame
  • y j is The ordinate of the jth wrist in the first parameter of the human body key point in the image frame.
  • the electronic device determines that the position of the center of the connected domain shaver in the image frame 10 is (x 11 , y 11 ),
  • the positions of the human body key points include that the position of the left wrist in the first parameter of the human body key point is (x 1 , y 1 ), and the position of the right wrist in the first parameter of the human body key point is (x 2 , y 2 ).
  • the Euclidean distance between the left wrist in the first parameter of the human body key points in the image frame 10 and the center of the connected domain razor is greater than the preset value
  • the Euclidean distance between the right wrist in the first parameter of the human body key point in the image frame 10 and the center of the connected domain is If it is less than the preset value, it is determined that the connected domain shaver is the object to be detected.
  • the electronic device determines that the Euclidean distance is smaller than the preset value, and the connected domain with the largest area and/or the largest depth change value in the first adjacent image frame is the object to be detected.
  • the tracking target is selected, as shown in step S406 in FIG. 4 , in the process of determining that the displacement value is greater than the first preset value and the displacement direction is the direction close to the focal plane, the electronic device first determines that the displacement value in the object to be detected is greater than the first preset value value and the displacement direction is the direction close to the focal plane, and then determine the to-be-detected object whose depth information difference between the image frame and the second parameter of the human body key point is greater than the third preset value as the tracking target .
  • the object to be detected in which the depth information difference between the image frame and the second parameter of the human body key point is greater than the third preset value is determined, specifically:
  • the depth information of the second parameter of the human body key point is determined according to the depth information of the image frame and the position of the human body key point in the image frame, and the depth information of the second parameter of the human body key point is determined according to the depth information of the object to be detected and the depth information of the second parameter of the human body key point.
  • the object to be detected whose depth information difference between the frame and the second parameter of the human body key point is greater than the third preset value.
  • the second parameter of the human body key point is the body of the human body.
  • the human body in the image frame is determined according to the depth information of the image frame and the position of the third parameter, the position of the fourth parameter, the position of the fifth parameter and the position of the sixth parameter of the human body key points in the image frame.
  • the depth information of the parameter, the depth information of the fifth parameter, and the depth information of the sixth parameter determine the depth information of the second parameter of the human body key point.
  • the depth information of the left and right shoulders of the third parameter of the human key point is 1.5 meters and 1.54 meters, respectively, and the fourth parameter of the neck
  • the depth information of the fifth parameter is 1.52 meters
  • the depth information of the fifth parameter head is 1.52 meters
  • the depth information of the sixth parameter left and right hips is 1.51 meters and 1.53 meters respectively
  • the depth information is (1.5m+1.54m+1.52m+1.52m+1.51m+1.53m)/6, which is 1.52m.
  • the object to be detected includes a plurality of interconnected pixel points in the image frame.
  • the depth information of the object to be detected is the average value of the depth information of the pixel points of the object to be detected, and the depth information difference between the object to be detected in the image frame and the second parameter of the human body key point is the object to be detected in the image frame.
  • the position of the shaver to be detected in the image frame 10 shown in FIG. 13B is compared with the shaver to be detected in the image frame 9 shown in FIG. 13A
  • the displacement value between the positions is greater than the first preset value
  • the displacement direction is the direction close to the focal plane
  • the pixel points o, p, q, r included in the shaver to be detected the depth information in the image frame 10 If the average relative depth difference with the second parameter body of the human body key point is greater than the third preset value, the shaver to be detected is selected as the tracking target.
  • the electronic device determines If the displacement value is less than the second preset value and the displacement direction is a direction away from the focal plane or the displacement direction is a direction close to the focal plane, then determine that the tracking target is connected to the first parameter of the human body key point in the image frame. Determining that the tracking target is connected to the first parameter of the human body key point in the image frame is similar to judging that the object to be detected is connected to the first parameter of the human body key point, and will not be repeated here.
  • the electronic device determines that the displacement value is greater than the second preset value and the displacement direction is the direction away from the focal plane in step S411 in FIG. direction, or the electronic device determines that the tracking target is not connected to the first parameter of the key point of the human body, and exits the tracking.
  • the electronic device determines that the tracking target is not connected to the first parameter of the key point of the human body, specifically:
  • Determining the position of the center of the tracking target in the image frame is similar to the above-mentioned determining the position of the center of each connected domain in the image frame, and details are not repeated here.
  • the above-mentioned Euclidean relationship between the first parameter of the human body key point in the image frame and the center of the connected domain is determined according to the position of the human body key point in the image frame and the position of the center of each connected domain in the image frame. The distances are similar and will not be repeated here.
  • FIG. 14 it is a schematic diagram of a hardware structure of a server according to an embodiment of the present application.
  • the server 14 includes a memory 143 , a processor 144 and a communication interface 145 .
  • the structure shown in FIG. 14 does not constitute a limitation on the server 14, and the server 14 may include more or less components than the one shown, or combine some components, or disassemble some components. Some parts, or different parts arrangement.
  • the memory 143 may be used to store software programs and/or modules/units.
  • the processor 144 implements various functions of the server 14 by running or executing software programs and/or modules/units stored in the memory 143 and calling data stored in the memory 143 .
  • the memory 143 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server 14 ( Such as audio data, etc.) and so on.
  • the memory 143 may include non-volatile computer readable memory, such as hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash memory card (Flash). Card), at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • the processor 144 may be a central processing unit (Central Processing Unit, CPU), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), or other general-purpose processors, digital processors.
  • Signal processor Digital Signal Processor, DSP
  • Application Specific Integrated Circuit ASIC
  • Field-Programmable Gate Array Field-Programmable Gate Array, FPGA
  • the processor 144 can be a microprocessor or the processor 144 can also be any conventional processor, etc.
  • the processor 144 is the control center of the server 14, and uses various interfaces and lines to connect the various parts of the entire server 14. part.
  • the communication interface 145 may include a standard wired interface, a wireless interface, and the like.
  • the communication interface 145 is used for the server 14 to communicate with the electronic device.
  • the target tracking method shown in FIG. 4 can be used not only on electronic equipment, but also on a system composed of electronic equipment and a server.
  • the difference between the target tracking method applied to the system composed of the electronic device and the server and the application to the electronic device is:
  • the electronic device After the electronic device acquires the video stream collected by the camera module in step S402, the electronic device also transmits the collected video stream to the server through the client.
  • the server executes steps S403 to S406, and after selecting the tracking target, transmits the selected tracking target and a driving signal to the electronic device, controls the movement of the camera module of the electronic device and performs tracking of the selected tracking target.
  • the server also executes steps S408 to S410, and if the displacement value is greater than the second preset value and the displacement direction is a direction away from the focal plane, it transmits a tracking exit signal to the electronic device to control the electronic device to exit the tracking.
  • the above-mentioned target tracking method can not only be used to select and track the target according to the change of the depth information, but also can be applied to the selected tracking target and tracked according to the key points of the human body in the image frame and the change of the depth information.
  • the system composed of the electronic device and the server can also be applied to the scene where the tracking target is selected and tracked according to the changes of the key points and depth information of the human body in the image frame.
  • the difference is that the above-mentioned scene in which the tracking target is selected and tracked according to the key points of the human body in the image frame and changes in depth information is different from the scene in which the tracking target is selected and tracked according to the depth information.
  • the server executes, only if the displacement value is greater than the second preset value and the displacement direction is a direction away from the focal plane, or the tracking target is not connected to the first parameter of the human body key point, transmits an exit tracking signal to the electronic device, and controls the electronic device to exit track.
  • FIG. 15 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention.
  • the target tracking apparatus 15 may include an acquisition unit 151 and a determination unit 152 .
  • the acquiring unit 151 is configured to acquire depth information of image frames in the video stream.
  • the determining unit 152 is configured to determine the change area between the first adjacent image frames in the video stream as the object to be detected, the first adjacent image frame includes the first image frame and the second image frame, and the first image frame is located in the second image frame.
  • the image frame before the image frame, and the changed area is the difference area of the depth information.
  • the determining unit 152 is further configured to determine the displacement value and displacement direction between the position of the object to be detected in the second image frame compared to the position of the object to be detected in the first image frame, where the position is the position of the depth information, and the displacement value and the displacement direction are: The displacement value and displacement direction of the depth information.
  • the determining unit 152 is further configured to select a tracking target, where the tracking target is an object to be detected whose displacement value is greater than the first preset value and whose displacement direction is a direction close to the focal plane.
  • the determining unit 152 determines the position of the tracking target in the second adjacent image frame, the second adjacent image frame includes a third image frame and a fourth image frame, and the third image frame is located in the second adjacent image frame.
  • the image frame before the four image frames, the position is the position of the depth information.
  • the determining unit 152 is further configured to determine the displacement value and the displacement direction between the position of the tracking target in the fourth image frame and the position of the tracking target in the third image frame, and the displacement value and the displacement direction are the displacement values of the depth information. and displacement direction.
  • the determining unit 152 is further configured to exit the tracking if the displacement value is greater than the second preset value and the displacement direction is a direction away from the focal plane.
  • the determining unit 152 is configured to detect human key points in image frames in the video stream.
  • the determining unit 152 is further configured to determine a change region between the first adjacent image frames in the video stream that is connected to the first parameter of the human body key point as the object to be detected.
  • the determining unit 152 is further configured to select a tracking target, where the tracking target has a displacement value greater than the first preset value, the displacement direction is a direction close to the focal plane, and is between the image frame and the second parameter of the human body key point. Objects to be detected whose depth information difference is greater than the third preset value.
  • the determining unit 152 is further configured to determine the position of the tracking target in the second adjacent image frame during tracking, the second adjacent image frame includes a third image frame and a fourth image frame, and the third image frame is located in the second adjacent image frame.
  • the image frame before the fourth image frame, the position is the position of the depth information.
  • the determining unit 152 is further configured to determine the displacement value and the displacement direction between the position of the tracking target in the fourth image frame and the position of the tracking target in the third image frame, and the displacement value and the displacement direction are the displacement values of the depth information. and displacement direction.
  • the determining unit 152 is further configured to exit the tracking if the displacement value is greater than the second preset value and the displacement direction is a direction away from the focal plane, or the tracking target is not connected to the first parameter of the human body key point.
  • the displacement value of the depth information is the absolute value of the average depth change value of the pixels.
  • the target tracking apparatus described in the embodiments of the present application can be used to implement the operations performed by the electronic device or server in the above target tracking method.
  • embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a processor, the target tracking method is implemented.
  • a computer program product comprising computer-executable instructions stored in a computer-readable storage medium; from which the computer-readable storage medium can be read by at least one processor of a device Computer-executable instructions, executed by the at least one processor, cause the apparatus to implement the target tracking method.
  • the present application can determine the object that moves significantly forward as the selected tracking target before tracking, and can determine the position and size of the selected tracking target through a simple interactive method, without the need to manually draw a bounding box and detect objects that are too small ;
  • it is determined that the object that has moved significantly backward is the object that has been withdrawn from tracking, and the object can be withdrawn from tracking through a simple interactive method, so that the tracking can be withdrawn.
  • the present application can determine the object that is picked up by hand and move significantly forward as the selected tracking target before tracking, and can determine the position and size of the selected tracking target through a simple interactive method without manually drawing a bounding box and can Detecting objects that are too small; during tracking, it is determined that an object that is put down by hand or that moves significantly backward is an object to exit the tracking, and the object can be withdrawn from the tracking through a simple interaction method, so that the tracking can be exited.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the storage medium includes several instructions to cause a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标跟踪方法及其装置,涉及大数据领域,用于便捷地选定跟踪目标。所述目标跟踪方法包括:获取视频流中的图像帧的深度信息;确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域;确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向;选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。本申请实施例应用于数据处理。

Description

目标跟踪方法及其装置
本申请要求于2021年03月29日提交中国国家知识产权局、申请号为202110336639.7、发明名称为“目标跟踪方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理领域,尤其涉及一种目标跟踪方法及其装置。
背景技术
目标跟踪是计算机视觉方向的一个重要的研究课题,目前广泛地应用于视频直播、安防监控、机器人和人机交互等相关领域。目标跟踪是根据选定的跟踪目标以图像序列为输入,输出图像序列中每帧图像中选定的跟踪目标的大小、位置等。目标跟踪的准确性依赖于选定的跟踪目标,因此跟踪目标的选定是触发目标跟踪的关键步骤。为了选定跟踪目标,目前可通过目标检测模型(例如Yolo模型)识别图像中的多个物体,并输出检测框来标示各物体的位置,然后根据用户点击选择的检测框选定目标框内的物体作为跟踪目标。但是,Yolo模型检测不到图像中的过小的物体,这样可能将会导致定位跟踪目标失败。虽然还可通过在图像上手动绘制图像框标示物体来选定跟踪目标,但是对于运动物体,可能开始绘制图像框时物体为在图像序列的某帧(例如第一帧)上的第一位置,随着绘制的进行和物体的运动,物体在图像序列的另一帧(例如第十帧)上已经运动出了原位置,这也会导致定位跟踪目标失败。
发明内容
鉴于以上内容,本申请实施例提供一种目标跟踪方法及其装置,可便捷地选定跟踪目标。
第一方面,本申请的一实施例提供一种目标跟踪方法,所述方法包括:获取视频流中的图像帧的深度信息;确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域;确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向;选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
本申请确定待检测对象为相邻图像帧之间存在深度信息的差异区域,并在若待检测对象的深度信息的位置显著前移时,选定待检测对象为跟踪目标,可便捷地选定跟踪目标。
根据本申请的一些实施例,所述方法还包括:跟踪时,确定跟踪目标在第二相邻图像帧 中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
本申请在若跟踪目标的深度信息的位置显著后移时,退出跟踪,从而可便捷地退出跟踪。
根据本申请的一些实施例,所述方法还包括:检测视频流中的图像帧中的人体关键点;其中,待检测对象为视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域;所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。
本申请若待检测对象的深度信息的位置显著前移且与人体关键点的第一参数相连时,选定待检测对象为跟踪目标,可便捷地选定跟踪目标。
根据本申请的一些实施例,所述方法还包括:跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。
本申请在若跟踪目标的深度信息的位置显著后移或与人体关键点的第一参数不再相连时,退出跟踪,从而可便捷地退出跟踪。
根据本申请的一些实施例,深度信息的位移值为像素点的平均深度变化值的绝对值。
第二方面,本申请的一实施例提供一种目标跟踪装置,所述装置包括:获取单元,用于获取视频流中的图像帧的深度信息;确定单元,用于确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域;所述确定单元,还用于确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向;所述确定单元,还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
根据本申请的一些实施例,所述确定单元,还用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;所述确定单元,还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;所述确定单元,还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
根据本申请的一些实施例,所述确定单元,还用于检测视频流中的图像帧中的人体关键点;所述确定单元,还用于确定视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域为待检测对象;所述确定单元,还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之 间的深度信息差大于第三预设值的待检测对象。
根据本申请的一些实施例,所述确定单元,还用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;所述确定单元,还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;所述确定单元,还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。
根据本申请的一些实施例,所述深度信息的位移值为像素点的平均深度变化值的绝对值。
第三方面,本申请的一实施例提供一种电子设备,所述电子设备包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令时,实现如上任一项所述的目标跟踪方法。
第四方面,本申请的一实施例提供一种服务器,所述服务器包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令时,实现如上任一项所述的目标跟踪方法。
第五方面,本申请的一实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有程序,所述程序使得计算机设备实现如上任一项所述的目标跟踪方法。
第六方面,本申请的一实施例提供一种计算机程序产品,所述计算机程序产品包括计算机执行指令,所述计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从所述计算机可读存储介质中读取所述计算机执行指令,所述至少一个处理器执行所述计算机执行指令使得所述设备执行如上任一项所述的目标跟踪方法。
第二方面到第六方面及其各种实现方式的有益效果,可以参考第一方面及其各种实现方式中的有益效果分析,此处不再赘述。
附图说明
图1为本申请实施例的跟踪系统的示意图。
图2为本申请实施例的电子设备的硬件结构示意图。
图3为本申请实施例的电子设备的软件结构框图。
图4为本申请实施例的目标跟踪方法的流程图。
图5A-图5D为本申请实施例提供的人机交互界面图。
图6A-6B为本申请实施例提供的另一些人机交互界面图。
图7A-图7D为本申请实施例提供的另一些人机交互界面图。
图8A-8B为本申请实施例提供的另一些人机交互界面图。
图9A-9B为本申请实施例提供的一些示意图。
图10A-10E为本申请实施例提供的用户界面。
图11A-11B为本申请实施例提供的另一些用户界面。
图12为本申请实施例的人体关键点的示意图。
图13A-13B为本申请实施例提供的另一些用户界面。
图14为本申请实施例的服务器的硬件结构示意图。
图15为本申请的目标跟踪装置的结构示意图。
具体实施方式
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请实施例的描述中,“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“例如”等词旨在以具体方式呈现相关概念。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请中的技术领域的技术人员通常理解的含义相同。本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。应理解,本申请中除非另有说明,“多个”是指两个或多于两个。
参考图1所示,为本申请实施例的跟踪系统的示意图。所述跟踪系统10可包括电子设备11及服务器12。在本实施例中,所述电子设备11可为具有图像拍摄功能的智能手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、智能摄像设备、或可穿戴设备等电子设备。所述电子设备11可与所述服务器12之间建立网络连接。所述网络连接可为有线或无线连接。所述电子设备11可包括摄像模组111。所述摄像模组111可为双目摄像头、结构光摄像头、TOF(Time of flight,飞行时间)摄像头、普通单目摄像头等摄像模组。所述摄像模组111用于摄取场景的图像。所述图像可供获取被摄物体的深度信息。若所述摄像模组111为双目摄像头、结构光摄像头、或TOF(Time of flight,飞行时间)摄像头,所述图像中包括被摄物体的深度信息,后续可直接获取所述图像中的被摄物体的深度信息。若所述摄像模组111为普通单目摄像头,后续可采用单目深度估计算法来获得所述图像中的被摄物体的深度信息。所述摄像模组111以固定的频率采集所述图像,例如30帧每秒。所述摄像模组111可固定不动来摄取同一场景内的图像,也可被驱动而移动来对物体进行跟踪。所述电子设备11包括客户端112。所述客户端112可以是运行于所述电子设备11上的具有摄像功能的应用程序,例如相机应用APP,提供直播带货的APP,提供视频通话的APP,或者监控应用的APP等。所述客户端112可通过应用程序编码接口(API)来调用相机应用APP,以请求获得调用摄像模组111的权限,并在获得权限后,可控制调用摄像模组111。所述电子设备11可获取所述摄像模组111采集的视频流,并通过所述客户端112将视频流发送至服务器12。所述服务器12可将所述视频流存储到与直播频道标识相关联的存储位置以供播放端对所述视频流进行播放或者将所述视频流发送至其他的电子设备进行视频通话。
在本实施例中,所述电子设备11可获取视频流中的图像帧的深度信息;确定视频流中相邻图像帧之间的变化区域为待检测对象,相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息差异区域;确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。所述电子设备11还对所述跟踪目标进行跟踪,并随后将处理后的视频流发送至所述服务器12。
为了减小电子设备的计算量,对视频流的处理也可以由所述服务器12处理。具体地,在所述服务器接收所述电子设备通过客户端发送的视频流后,也可获取视频流中的图像帧的深度信息;确定视频流中相邻图像帧之间的变化区域为待检测对象,相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息变化区域;确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。所述服务器12还可通过电子设备对所述跟踪目标进行跟踪,并随后将处理后的视频流通过客户端发送给其他电子设备。即,在本申请实施例中,选定跟踪目标,对所述跟踪目标进行跟踪可在所述电子设备11中实现,也可在所述服务器12中实现,此处并不作为限定。
参考图2所示,为本申请实施例的电子设备的硬件结构示意图。电子设备100可以包括具有图像拍摄功能的手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、可穿戴式设备、车载设备、或智能家居设备中的至少一种。本申请实施例对该电子设备100的具体类型不作特殊限制。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接头130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像模组193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器可以为高速缓冲存储器。该存储器可以保存处理器110用过或使用频率较高的指 令或数据。如果处理器110需要使用该指令或数据,可从该存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像头等模块。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
USB接头130是一种符合USB标准规范的接口,可以用于连接电子设备100和外围设备,具体可以是Mini USB接头,Micro USB接头,USB Type C接头等。USB接头130可以用于连接充电器,实现充电器为该电子设备100充电,也可以用于连接其他电子设备,实现电子设备100与其他电子设备之间传输数据。也可以用于连接耳机,通过耳机输出电子设备中存储的音频。该接头还可以用于连接其他电子设备,例如VR设备等。在一些实施例中,通用串行总线的标准规范可以为USB1.x、USB2.0、USB3.x和USB4。
充电管理模块140用于接收充电器的充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像模组193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对 经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),蓝牙低功耗(bluetooth low energy,BLE),超宽带(ultra wide band,UWB),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络和其他电子设备通信。该无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。该GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100可以通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例 中,电子设备100可以包括1个或多个显示屏194。
电子设备100可以通过摄像模组193,ISP,视频编解码器,GPU,显示屏194以及应用处理器AP、神经网络处理器NPU等实现摄像功能。
摄像模组193可用于采集拍摄对象的彩色图像数据以及深度数据。ISP可用于处理摄像模组193采集的彩色图像数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将该电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像模组193中。
在一些实施例中,摄像模组193可以由彩色摄像模组和3D感测模组组成。
在一些实施例中,彩色摄像模组的摄像头的感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。
在一些实施例中,3D感测模组可以是(time of flight,TOF)3D感测模块或结构光(structured light)3D感测模块。其中,结构光3D感测是一种主动式深度感测技术,结构光3D感测模组的基本零组件可包括红外线(Infrared)发射器、IR相机模等。结构光3D感测模组的工作原理是先对被拍摄物体发射特定图案的光斑(pattern),再接收该物体表面上的光斑图案编码(light coding),进而比对与原始投射光斑的异同,并利用三角原理计算出物体的三维坐标。该三维坐标中就包括电子设备100距离被拍摄物体的距离。其中,TOF3D感测可以是主动式深度感测技术,TOF 3D感测模组的基本组件可包括红外线(Infrared)发射器、IR相机模等。TOF 3D感测模组的工作原理是通过红外线折返的时间去计算TOF 3D感测模组跟被拍摄物体之间的距离(即深度),以得到3D景深图。
结构光3D感测模组还可应用于人脸识别、体感游戏机、工业用机器视觉检测等领域。TOF 3D感测模组还可应用于游戏机、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)等领域。
在另一些实施例中,摄像模组193还可以由两个或更多个摄像头构成。这两个或更多个摄像头可包括彩色摄像头,彩色摄像头可用于采集被拍摄物体的彩色图像数据。这两个或更多个摄像头可采用立体视觉(stereo vision)技术来采集被拍摄物体的深度数据。立体视觉技术是基于人眼视差的原理,在自然光源下,透过两个或两个以上的摄像头从不同的角度对同一物体拍摄影像,再进行三角测量法等运算来得到电子设备100与被拍摄物之间的距离信息,即深度信息。
在另一些实施例中,摄像模组193还可以由一个摄像头构成。这个摄像头拍摄一张或者唯一视角下的RGB图像。处理器110中GPU可根据单目深度估计算法估计图像中每个像素相对摄像模组193的距离,即深度信息。
在一些实施例中,摄像模组193可固定不动去采集同一场景、同一视角的图像,还可被驱动而采集不同场景的图像。摄像模组193可在选定跟踪目标之前,固定不动;在选定跟踪目标之后,可被驱动进行目标跟踪。
在一些实施例中,电子设备100可以包括1个或多个摄像模组193。具体的,电子设备 100可以包括1个前置摄像模组193以及1个后置摄像模组193。其中,前置摄像模组193通常可用于采集面对显示屏194的拍摄者自己的彩色图像数据以及深度数据,后置摄像模组可用于采集拍摄者所面对的拍摄对象(如人物、风景等)的彩色图像数据以及深度数据。
在一些实施例中,处理器110中的CPU或GPU或NPU可以对摄像模组193所采集的彩色图像数据和深度数据进行处理。在一些实施例中,NPU可以通过骨骼点识别技术所基于的神经网络算法,例如卷积神经网络算法(CNN),来识别摄像模组193(具体是彩色摄像模组)所采集的彩色图像数据,以确定被拍摄人物的骨骼点。CPU或GPU也可来运行神经网络算法以实现根据彩色图像数据确定被拍摄人物的骨骼点。在一些实施例中,CPU或GPU或NPU还可用于根据摄像模组193(可以是3D感测模组)所采集的深度数据和已识别出的骨骼点来确认被拍摄人物的身材(如身体比例、骨骼点之间的身体部位的胖瘦情况),并可以进一步确定针对该被拍摄人物的身体美化参数,最终根据该身体美化参数对被拍摄人物的拍摄图像进行处理,以使得该拍摄图像中该被拍摄人物的体型被美化。后续实施例中会详细介绍如何基于摄像模组193所采集的彩色图像数据和深度数据对被拍摄人物的图像进行美体处理,这里先不赘述。
数字信号处理器用于处理数字信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。或将音乐,视频等文件从电子设备传输至外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,该可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能方法或数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音 频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或输出免提通话的音频信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测该触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,控制镜头反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100根据气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。当电子设备为可折叠电子设备,磁传感器180D可以用于检测电子设备的折叠或展开,或折叠角度。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横 竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到的反射光的强度大于阈值时,可以确定电子设备100附近有物体。当检测到的反射光的强度小于阈值时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L可以用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否被遮挡,例如电子设备在口袋里。当检测到电子设备被遮挡或在口袋里,可以使部分功能(例如触控功能)处于禁用状态,以防误操作。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当通过温度传感器180J检测的温度超过阈值,电子设备100执行降低处理器的性能,以便降低电子设备的功耗以实施热保护。在另一些实施例中,当通过温度传感器180J检测的温度低于另一阈值时,电子设备100对电池142加热。在其他一些实施例中,当温度低于又一阈值时,电子设备100可以对电池142的输出电压升压。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于该骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于该骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190可以包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈 效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或多个SIM卡接口。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图3是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为五层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime,ART)和原生C/C++库,硬件抽象层(Hardware Abstract Layer,HAL)以及内核层。
应用程序层可以包括一系列应用程序包。
如图3所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图3所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,资源管理器,通知管理器,活动管理器,输入管理器等。
窗口管理器提供窗口管理服务(Window Manager Service,WMS),WMS可以用于窗口管理、窗口动画管理、surface管理以及作为输入系统的中转站。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。该数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后 台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
活动管理器可以提供活动管理服务(Activity Manager Service,AMS),AMS可以用于系统组件(例如活动、服务、内容提供者、广播接收器)的启动、切换、调度以及应用进程的管理和调度工作。
输入管理器可以提供输入管理服务(Input Manager Service,IMS),IMS可以用于管理系统的输入,例如触摸屏输入、按键输入、传感器输入等。IMS从输入设备节点取出事件,通过和WMS的交互,将事件分配至合适的窗口。
安卓运行时包括核心库和安卓运行时。安卓运行时负责将源代码转换为机器码。安卓运行时主要包括采用提前(ahead or time,AOT)编译技术和及时(just in time,JIT)编译技术。
核心库主要用于提供基本的Java类库的功能,例如基础数据结构、数学、IO、工具、数据库、网络等库。核心库为用户进行安卓应用开发提供了API。
原生C/C++库可以包括多个功能模块。例如:表面管理器(surface manager),媒体框架(Media Framework),libc,OpenGL ES、SQLite、Webkit等。
其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体框架支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。OpenGL ES提供应用程序中2D图形和3D图形的绘制和操作。SQLite为电子设备100的应用程序提供轻量级关系型数据库。
硬件抽象层运行于用户空间(user space),对内核层驱动进行封装,向上层提供调用接口。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头捕获静态图像或视频。
请参考图4,为本申请一实施例的目标跟踪方法的流程图。所述目标跟踪方法应用于电子设备上,为根据深度信息的变化选定跟踪目标并跟踪。所述目标跟踪方法包括:
S401:电子设备接收开启电子设备上的客户端的操作。
为了便于描述,以下以手机为例,对本申请进行说明。例如,用户可点击手机上的“视频通话”客户端或者“直播带货”客户端的应用图标来开启“视频通话”客户端或者“直播带货”客户端,则摄像模组开启采集视频流,获取摄像模组前的场景的图像。在开启“视频通话”客户端或者“直播带货”客户端时,在“视频通话”客户端或者“直播带货”客户端中的追踪模式已开启。“视频通话”客户端中的追踪模式的开启可以通过“视频通话”客户 端中的控件实现。“直播带货”客户端中的追踪模式的开启可以通过“直播带货”客户端中的控件实现。
为了开启“视频通话”客户端中的追踪模式,在一种具体的实现方式中,如图5A所示,用户可以在手机的主界面上点击“视频通话”客户端。所述客户端不仅局限于“视频通话”客户端,还可为其他包括视频通话功能的客户端或者其他类似视频通话功能的客户端,本申请并不对此做出限制。手机检测到用户点击“视频通话”客户端后,可以显示视频通话的用户界面,如图5B所示。在图5B中,视频通话的用户界面可以包括:视频显示区51、挂断控件52、摄像头切换控件53、更多选项控件54、状态栏55、设置控件56及跟踪模式开关控件57。可以理解的是,图5B是视频通话的用户界面的一种示例,所述视频通话的用户界面还可包括窗口缩小控件等。本申请并不对视频通话的用户界面的内容和形式进行限定。
视频显示区51用于显示视频联系人的手机的摄像模组采集的视频流。挂断控件52用于中断视频通话。手机可以检测作用在挂断控件52的触控操作(如在挂断控件52上的点击操作),并响应于所述操作中断视频通话。摄像头切换控件53用于切换摄像头。手机可以检测作用在摄像头切换控件53的触控操作(如在摄像头切换控件53上的点击操作),并响应于所述操作将手机的摄像模组从前置摄像头切换为后置摄像头,或者将手机的摄像模组从后置摄像头切换为前置摄像头。更多选项控件54可包括窗口切换控件等。手机可以检测作用在更多选项控件54的触控操作(如在更多选项控件54上的点击操作),显示窗口切换控件。所述窗口切换控件用于显示手机的摄像模组采集的视频流,并切换视频窗口。手机可以检测作用在窗口切换控件的触控操作(如在窗口切换控件上的点击操作),并响应于所述操作切换窗口切换控件及视频显示区51显示的内容。状态栏55可以包括网络、信号强度、电池状态、及时间等。
设置控件56用于接收用户输入的设置指令。如图5B中所示,用户可点击设置控件56。手机在检测到用户选择的设置控件56后,显示设置界面,如图5C所示。所述设置界面可以包括默认开启智能跟踪控件及状态栏。图5C是设置界面的一种实例,所述视频通话的设置界面还可包括比图5C所示更多或更少的控件。本申请并不对视频通话的用户界面的内容和形式进行限定。所述默认开启智能跟踪控件包括默认开启智能跟踪字体及对应的选择控件。选择控件可以有两种状态,开启及关闭。当用户对默认开启智能跟踪字体或选择控件操作时,可以切换跟踪模式的状态。例如,当跟踪模式处于关闭状态时,用户选择默认开启智能跟踪字体或选择控件(如图5C所示),则手机切换为跟踪模式开启状态(如图5D所示),在图5C中,选择控件处于关闭状态。在图5D中,选择控件处于开启状态;当跟踪模式处于开启状态时,用户选择默认开启智能跟踪字体或选择控件,则手机切换为跟踪模式关闭状态。
跟踪模式开关控件57用于接收用户输入的跟踪模式开启或关闭指令。如图6A所示,跟踪模式处于关闭状态,用户可点击跟踪模式开关控件57。手机在检测到用户选择跟踪模式开关控件57后,开启跟踪模式,如图6B所示。在图6A中,所述跟踪模式开关控件包括标志61,表示所述跟踪模式处于关闭状态;在图6B中,所述跟踪模式开关控件不包括标志,表示所述跟踪模式处于开启状态。若跟踪模式处于开启状态,用户也可点击跟踪模式开关控件,手机在检测到用户选择跟踪模式开关控件后,关闭跟踪模式。
为了开启“直播带货”客户端中的追踪模式,在一种具体的实现方式中,如图7A所示,用户可以在手机的主界面上点击“直播带货”客户端。所述客户端不仅局限于“直播带货”客户端,还可为快手、淘宝等包括直播带货功能的客户端或者其他类似直播带货功能的客户端,本申请并不对此做出限制。手机检测到用户点击“直播带货”客户端后,可以显示直播带货的用户界面,如图7B所示。在图7B中,直播带货的用户界面可以包括:视频显示区71、播放/暂停控件72、状态栏73、设置控件74及跟踪模式开关控件75。可以理解的是,图7B是直播带货的用户界面的一种示例,直播带货的用户界面可包括更多或更少的控件,本申请并不对直播带货的用户界面的内容和形式进行限定。
视频显示区71用于显示手机的摄像模组采集的视频流。播放/暂停控件72用于暂停/播放直播。手机可以检测作用在播放/暂停控件72的触控操作(如播放/暂停控件72上的点击操作),并响应于所述操作暂停/播放直播。例如,若播放/暂停控件72处于播放状态时,用户可选择播放/暂停控件72,手机响应于所述选择操作暂停直播;若播放/暂停控件72处于暂停状态时,用户可选择播放/暂停控件72,手机响应于所述选择操作播放直播。状态栏73可以包括网络、信号强度、电池状态、及时间等。
设置控件74用于接收用户输入的设置指令。如图7B中所示,用户可点击设置控件74。手机在检测到用户选择的设置控件74后,显示设置界面,如图7C所示。所述设置界面可以包括默认开启智能跟踪控件及状态栏。图7C是设置界面的一种实例,所述视频通话的设置界面还可包括比图7C所示更多或更少的控件。本申请并不对视频通话的用户界面的内容和形式进行限定。所述默认开启智能跟踪控件包括默认开启智能跟踪字体及对应的选择控件。选择控件可以有两种状态,开启及关闭。当用户对默认开启智能跟踪字体或选择控件操作时,可以切换跟踪模式的状态。例如,当跟踪模式处于关闭状态时,用户选择默认开启智能跟踪字体或选择控件(如图7C所示),则手机切换为跟踪模式开启状态(如图7D所示),在图7C中,选择控件处于关闭状态。在图7D中,选择控件处于开启状态;当跟踪模式处于开启状态时,用户选择默认开启智能跟踪字体或选择控件,则手机切换为跟踪模式关闭状态。
跟踪模式开关控件75用于接收用户输入的跟踪模式开启或关闭指令。如图8A所示,跟踪模式处于关闭状态,用户可点击跟踪模式开关控件75。手机在检测到用户选择跟踪模式开关控件75后,开启跟踪模式,如图8B所示。在图8A中,所述跟踪模式开关控件包括标志81,表示所述跟踪模式处于关闭状态;在图8B中,所述跟踪模式开关控件不包括标志,表示所述跟踪模式处于开启状态。若跟踪模式处于开启状态,用户也可点击跟踪模式开关控件,手机在检测到用户选择跟踪模式开关控件后,关闭跟踪模式。
为了方便后续的描述,以下以客户端为“视频通话”客户端为例对本申请进行说明。
S402,电子设备获取摄像模组采集的视频流,视频流包括图像帧。
在客户端开启之后,在跟踪模式下,电子设备首先选定跟踪目标,然后才对跟踪目标进行跟踪。在选定跟踪目标之前,摄像模组固定不动去采集同一场景、同一视角的图像帧。所述摄像头以固定的频率采集视频流,例如30帧每秒。视频流包括按照时间顺序排序的多帧图像帧。电子设备实时获取摄像模组采集的视频流,例如在时间t1,电子设备获取摄像模组采集的视频流中的图像帧1(如图9A所示);在时间t2,电子设备获取摄像模组采集的视频流中的图像帧2(如图9B所示)。可理解的是,虽然在图9A及图9B中示出了电子设备显示 图像帧1及图像帧2,但这并不妨碍认为摄像模组采集的图像帧即为图9A及图9B中的图像帧。
S403,获取视频流中的图像帧的深度信息。
若所述摄像模组为双目摄像头、结构光摄像头、或TOF(Time of flight,飞行时间)摄像头,图像帧中包括被摄物体的深度信息,可直接获取图像帧中的被摄物体的深度信息。若所述摄像模组为普通单目摄像头,图像帧中不包括被摄物体的深度信息,可采用单目深度估计算法来获得图像帧中的被摄物体的深度信息。电子设备实时获取视频流中的图像帧的深度信息。继续以上述的图9A及图9B为例对本申请进行说明,图9A中包括对象:人、桌子、杯子及剃须刀等,电子设备在获取到图9A所示的图像帧1时,获取图像帧1中的人、桌子、杯子及剃须刀等的深度信息;图9B中也包括对象:人、桌子、杯子及剃须刀等,电子设备在获取到图9B所示的图像帧2时,获取图像帧2中的人、桌子、杯子及剃须刀等的深度信息。具体地,电子设备可获取图9A所示的图像帧1中的人的所有像素点的深度信息、桌子的所有像素点的深度信息、杯子的所有像素点的深度信息、及剃须刀的所有像素点的深度信息等;电子设备可获取图9B所示的图像帧2中的人的所有像素点的深度信息、桌子的所有像素点的深度信息、杯子的所有像素点的深度信息、及剃须刀的所有像素点的深度信息等。
S404,确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域。
摄像模组所采集的视频流在时间上是连续的,被摄物体的位置也不会发生突变。如果场景中没有运动的物体,则第一相邻图像帧之间的变化很小。如果场景中存在运动的物体,则相邻图像帧之间的变化会超过阈值。在本实施例中,确定视频流中第一相邻图像帧之间的变化区域为待检测对象包括:
确定视频流中第一相邻图像帧之间的相似度,相似度为深度信息的相似度;若视频流中第一相邻图像帧之间的相似度小于阈值,确定视频流中第一相邻图像帧之间的变化区域,所述变化区域为深度信息的差异区域;确定变化区域为待检测对象。其中,在确定变化区域时,可设置预设规则,例如将人从变化区域中排除,或者视频流中第一相邻图像帧中的对象的整体的相似度小于阈值时才确定对象为变化区域,例如视频流中第一相邻图像帧中的整个人的相似度小于阈值时才确定人为变化区域。
所述第一相邻图像帧可为一个第一相邻图像帧或多个第一相邻图像帧。继续以上述的图9A及图9B为例对本申请进行说明,一个第一相邻图像帧可包括图9A所示的图像帧1及图9B所示的图像帧2。其中,图9A所示的图像帧1为第一图像帧,图9B所示的图像帧2为第二图像帧,图像帧1为位于图像帧2前面的图像帧。确定视频流中第一相邻图像帧之间的变化区域为待检测对象可包括:确定视频流中图9A所示的图像帧1中的人、桌子、杯子及剃须刀及图9B所示的图像帧2中的人、桌子、杯子及剃须刀之间的深度信息的相似度;图9A所示的图像帧1中的杯子、人的手臂及眼睛与图9B所示的图像帧2中的杯子、人的手臂及眼睛之间的相似度小于阈值。根据上述的预设规则,确定视频流中第一相邻图像帧之间的变化区域为杯子,则确定杯子为待检测对象。
多个第一相邻图像帧可为,例如图9A所示的图像帧1、图9B所示的图像帧2、图像帧3及图像帧4。这四个图像帧为摄像模组摄取的连续的视频流。则图9A所示的图像帧1与 图9B所示的图像帧2、图9B所示的图像帧2与图像帧3、及图像帧3与图像帧4为三个相邻的图像帧。多个第一相邻图像帧确定待检测对象的过程与一个第一相邻图像帧确定待检测对象的过程相似,在此不再赘述。其中,在多个第一相邻图像帧确定待检测对象的过程中,若多个第一相邻图像帧中存在一个第一相邻图像帧中的对象(例如杯子)之间的相似度大于阈值,则可忽略此个第一相邻图像帧。
第一相邻图像帧之间的相似度为第一相邻图像帧像素点之间的相似度。具体地,所述设备可通过比较第一相邻图像帧像素点之间的相似度;确定相邻图像帧之间的深度信息的变化值超过阈值的像素点为第一像素点;确定相邻图像帧中第一像素点的深度变化值;根据所述第一像素点的深度变化值确定深度变化图像;计算深度变化图像中每个像素点与其空间上相邻的像素点之间的最小深度变化值来形成距离差图像;对所述距离差图像进行阈值二值化处理来获得二值图像;在所述二值图像中,进行连通域标记,确定连通域;根据连通域确定待检测对象。所述连通域即为上述的变化区域。
继续以上述的图9A及图9B为例来说明如何在一个第一相邻图像帧的情况下根据像素点确定待检测对象,图9A所示的图像帧1与图9B所示的图像帧2进行比较时,图像帧1与图像帧2之间的像素点a、b、c、d、f、g、h的深度信息的变化值0.21米、0.25米、0.29米、0.3米、0.21米、0.25米、0.27米超过阈值0.2米,则确定第一像素点为像素点a、b、c、d、f、g、h,并确定第一相邻图像帧中第一像素点a、b、c、d、f、g、h的深度变化值分别为0.21米、0.25米、0.29米、0.3米、0.21米、0.25米、0.27米。
连通域是指在图像帧中具有相同像素值并且位置相邻的像素组成的区域。所述连通域中的每个像素点和在空间上与其相邻的相邻像素点存在一定的相似性,则所述连通域中的每个像素点和所述相邻的像素点之间的深度差不会发生突变,即所述连通域中的每个像素点和所述相邻的像素点之间的深度差的绝对值小于一定的深度差。所述深度变化图像的图像数据如图10A所示。所述深度变化图像为单通道图像。所述深度变化图像中的每个像素点所表示的值为所述像素点的深度变化值。例如,在图10A中,所述深度变化图像的分辨率为100x100,所述深度变化图像中数值为0的像素点表示第一相邻图像帧中的像素点的深度信息的变化值小于阈值,数值不为0的像素点表示第一相邻图像帧中的像素点的深度信息的变化值超过阈值。所述像素点与其空间上相邻的像素点可如图10B或如图10C所示。图10B及图10C中以上述的第一像素点中的像素点f为例进行说明。在图10B中,所述像素点f具有4个与其在空间上相邻的像素点,分别为像素点g、像素点h、像素点i及像素点j。所述像素点g、所述像素点h、所述像素点i及所述像素点j分别位于所述像素点f的正上方、正下方、正左方及正右方。在图10C中,所述像素点f具有8个与其在空间上相邻的像素点,分别为像素点g、像素点h、像素点i、像素点j、像素点k、像素点l、像素点m、及像素点n。所述像素点g、所述像素点h、所述像素点i、所述像素点j、所述像素点k、所述像素点l、所述像素点m、及所述像素点n分别位于所述像素点f的正上方、正下方、正左方、正右方、左上角、右上角、左下角及右下角。
为了方便描述,继续以上述的第一像素点中的像素点f及以像素点f具有4个与其在空间上相邻的像素点为例进行说明如何计算深度变化图像中每个像素点与其空间上相邻的像素点之间的最小深度变化值。如图10B所示,所述像素点f的深度变化值为0.21米、所述像素点g的深度变化值为0.25米、所述像素点h的深度变化值为0.27米、所述像素点i的深 度变化值为0.2米、所述像素点j的深度变化值为0.3米,像素点f与其空间上相邻的像素点g、像素点h、像素点i、像素点j之间的深度变化值为:0.04米、0.06米、0.01米、0.09米,则像素点f与其空间上相邻的像素点之间的最小深度变化值为0.01米。所述距离差图像的图像数据如图10D所示。所述距离差图像为单通道图像。所述距离差图像中的每个像素点所表示的值为所述像素点与其空间上相邻的像素点之间的最小深度变化值。
对所述距离差图像进行阈值二值化处理可为,若像素点所表示的值小于预设值(例如0.03等),则所述像素点所表示的值进行阈值二值化处理后为1;若像素点所表示的值大于预设值(例如0.03等),则所述像素点所表示的值进行阈值二值化处理后为0。继续以上述的第一像素点中的像素点f为例对本申请进行说明,像素点f所表示的值0.01小于预设值0.03,则所述像素点f所表示的值进行阈值二值化处理后为1,即像素点f在二值图像中的像素值为1。可理解,按照上述的确定像素点f在二值图像中的像素值的过程,其他第一像素点a、b、c、d、g、h也可确定在二值图像中的像素值,则根据上述的图9A及图9B所得的二值图像的图像数据如图10E所示,所述二值图像为单通道图像。所述二值图像中的每个像素点所表示的像素值为最小深度变化值进行阈值二值化处理后的值。
在所述二值图像中,进行连通域标记,确定连通域。所述连通域的数量可为一个或多个。上述的图10E所示的二值图像进行连通域标记后,确定连通域H、连通域I和连通域J。连通域H为杯子、连通域I为人的手臂、及连通域J为人的眼睛。连通域H包括像素点a、b、c、d,连通域I包括像素点f、g,连通域J包括h。在确定连通域后,可根据上述的预设规则:将人从变化区域中排除,或者视频流中第一相邻图像帧中的对象的整体之间的相似度小于阈值时才确定对象为变化区域,可排除连通域I及连通域J,进而完成连通域的确定。然后,可确定具有最大的区域面积及/和第一相邻图像帧中的深度变化值最大的连通域为待检测对象。例如,以另外的例子来进行说明,排除后的连通域的数量为两个,分别为连通域K及连通域L,若连通域K的区域面积大于连通域L的区域面积,则确定待检测对象为连通域K,或者若第一相邻图像帧中连通域K的深度变化值大于连通域L的深度变化值,则确定待检测对象为连通域K,从而在有多个连通域变化时,可排除一些连通域,即可排除图像帧中的一些杂讯。其中,在根据图9A及图9B得到连通域H后,则可确定连通域H为待检测对象,即杯子为待检测对象。可理解,本申请还可为在确定第一像素点时,或者确定第一像素点的深度变化值时,或者确定深度变化图像时,或者形成距离差图像时,或者获得二值图像时,将人的手臂及人的眼睛从变化中排除。
在多个第一相邻图像帧的情况下根据像素点确定待检测对象的过程与在一个第一相邻图像帧的情况下根据像素点确定待检测对象的过程相似,在此不再赘述。其中,在多个第一相邻图像帧的情况下根据像素点确定待检测对象的过程中,根据所有第一相邻图像帧之间的像素点的深度信息的变化值的平均值确定第一像素点及确定第一像素点的深度变化值。例如,图9A所示的图像帧1与图9B所示的图像帧2进行比较,图像帧中的像素点a、b、c、d、f、g、h的深度信息的变化值0.21米、0.25米、0.29米、0.3米、0.21米、0.25米、0.27米超过阈值0.2米;为了描述的方便,下面将仅以像素点a为例说明确定像素点a是否为第一像素点,及若像素点a为第一像素点,确定第一像素点a的深度变化值。图像帧2与图像帧3进行比较,像素点a的深度信息的变化值0.27米超过阈值0.2米;图像帧3与图像帧4进行比较,像素点a的深度信息的变化值0.25米超过阈值0.2米,则确定像素点a为第一像素 点,并确定第一像素点a的深度变化值分别为0.21米、0.27米及0.25米的平均值,即0.73米/3。按照确定像素点a是否为第一像素点,及若像素点a为第一像素点,确定第一像素点a的深度变化值的过程,确定像素点b、c、d、f、g、h是否为第一像素点及确定像素点b、c、d、f、g、h中的各第一像素点的深度变化值。
S405,确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向。
待检测对象在图像帧中包括多个相互连通的像素点。深度信息的位移值为像素点的平均深度变化值的绝对值。位移方向包括靠近焦平面的方向和远离焦平面的方向。若像素点的平均深度变化值大于零,则位移方向为远离焦平面;若像素点的平均深度变化值小于零,则位移方向为靠近焦平面。继续以上述的图9A及图9B为例来说明本申请,待检测对象杯子包括像素点a、b、c、d;像素点a、b、c、d在图9B所示的图像帧2和图9A所示的图像帧1之间的深度信息的变化值为-0.21米、-0.25米、-0.29米、-0.3米,则确定图像帧2中待检测对象杯子的位置相比图像帧1中待检测对象杯子的位置之间像素点的平均深度变化值-1.05米/4,确定图像帧2中待检测对象杯子的位置相比图像帧1中待检测对象杯子的位置之间位移值为|-1.05米/4|。其中,像素点的平均深度变化值-1.05米/4小于零,则确定位移方向为靠近焦平面的方向。
S406,选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
继续以上述的图9A及图9B为例来说明本申请,图9B所示的图像帧2中待检测对象杯子的位置相比图9A所示的图像帧1中待检测对象杯子的位置之间位移值|-1.05米/4|大于第一预设值,且位移方向为靠近焦平面的方向,则选定杯子为跟踪目标。
S407,跟踪选定的跟踪目标。
在对跟踪目标进行跟踪时,摄像模组可被驱动去采集不同场景的图像帧。
S408,跟踪时,确定跟踪目标在视频流中第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置。
在跟踪时,电子设备实时获取摄像模组采集的视频流,也实时获取视频流中的图像帧的深度信息。第二相邻图像帧可为一个第二相邻图像帧或多个第二相邻图像帧。下面以一个第二相邻图像帧为例对本申请进行说明,例如在时间t3,电子设备获取摄像模组采集的视频流中的图像帧7(如图11A所示);在时间t4,电子设备获取摄像模组采集的视频流中的图像帧8(如图11B所示)。图像帧7为第三图像帧、图像帧8为第四图像帧,图像帧7为位于图像帧8前面的图像帧。本申请确定跟踪目标在图像帧7及图像帧8中的深度信息的位置。可理解的是,虽然在图11A及图11B中示出了电子设备显示图像帧7及图像帧8,但这并不妨碍认为摄像模组采集的图像帧即为图11A及图11B中的图像帧。
S409,确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向。
跟踪目标在图像帧中包括多个相互连通的像素点。深度信息的位移值为像素点的平均深度变化值的绝对值。位移方向包括靠近焦平面的方向和远离焦平面的方向。若像素点的平均 深度变化值大于零,则位移方向为远离焦平面;若像素点的平均深度变化值小于零,则位移方向为靠近焦平面。继续以上述的图11A及图11B为例来说明本申请,跟踪目标杯子包括像素点a、b、c、d;像素点a、b、c、d在图像帧8和图像帧7之间的深度信息的变化值分别为0.33米、0.28米、0.36米、0.27米,则确定图像帧8中跟踪目标杯子的位置相比图像帧7中跟踪目标杯子的位置之间像素点的平均深度变化值为0.31米,确定图像帧8中跟踪目标杯子的位置相比图像帧7中跟踪目标的位置之间位移值为|0.31|米,即0.31米。图像帧8中跟踪目标杯子的位置相比图像帧7中跟踪目标杯子的位置之间像素点的平均深度变化值0.31米大于零,则确定位移方向为远离焦平面的方向。
S410,若位移值小于第二预设值且位移方向为远离焦平面的方向或者位移方向为靠近焦平面的方向,聚焦显示跟踪目标。
所述聚焦显示可包括追焦跟踪目标,框取显示跟踪目标,或者剪裁居中跟踪目标。框取显示跟踪目标可为通过方框、圆形、或者对象的轮廓形状等标示图像帧中的跟踪目标。剪裁居中显示跟踪目标可为剪去图像中除跟踪目标外的其他的部分,并将保留的部分放大居中显示,如图11A所示。在图11A中,所述杯子被剪裁居中显示。
S411,若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
在本实施例中,在退出跟踪时还重新选定跟踪目标。继续以上述的图11A及图11B为例来说明本申请,图11B所示的图像帧8中跟踪目标杯子的位置相比图11A所示的图像帧7中跟踪目标的位置之间位移值0.31米大于第二预设值,且位移方向为远离焦平面的方向,则退出对杯子的跟踪。
图4所示的目标跟踪方法不仅可用于根据深度信息的变化选定跟踪目标并进行跟踪的场景,还可应用于根据图像帧中人体的关键点及深度信息的变化选定跟踪目标并跟踪的场景下。在根据图像帧中人体的关键点及深度信息的变化选定跟踪目标并跟踪的场景下,与上述图4的根据深度信息的选定跟踪目标并进行跟踪的场景的不同之处在于:
在获取视频流中的图像帧的深度信息之后,还检测视频流中的图像帧中的人体关键点,并确定视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域为待检测对象。在选定跟踪目标时,所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。在聚焦显示跟踪目标时,若满足第一条件及第二条件,聚焦显示跟踪目标,第一条件包括跟踪目标在图像帧中与人体关键点的第一参数相连,第二条件包括位移值小于第二预设值且位移方向为远离焦平面的方向或者位移方向为靠近焦平面的方向;在退出跟踪时,若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。在检测视频流中的图像帧中的人体关键点时,根据视频流中的图像帧的深度信息检测图像帧中的人体关键点的位置。所述人体关键点如图12所示。所述人体关键点包括第一参数、第三参数、第四参数、第五参数及第六参数。第一参数、第三参数、第四参数、第五参数及第六参数分别为左右腕、左右肩、颈、头、左右臀。在图12中虽然仅示出了所述人体关键点包括第一参数、第三参数、第四参数、第五参数及第六参数,但是显然所述人体关键点还可包括第七参数、第八参数及第九参数等部位。第七参数、第八参数及第九参数分别为左右肘、左右膝、左右脚踝。
在确定待检测对象时,如图4中步骤S404中确定连通域的过程,电子设备首先确定连 通域,并根据图像帧中的所述人体关键点的位置及所述连通域确定与人体关键点的第一参数相连的连通域为待检测对象。例如电子设备根据图13A及图13B确定连通域剃须刀。连通域剃须刀包括像素点o、p、q、r。图13A所示的图像帧9为第一图像帧,图13B所示的图像帧10为第二图像帧。图像帧9为位于图像帧10前面的图像帧。可理解的是,虽然在图13A及图13B中示出了电子设备显示图像帧9及图像帧10,但这并不妨碍认为摄像模组采集的图像帧即为图13A及图13B中的图像帧。电子设备并根据图13B所示的图像帧10中的所述人体关键点的位置及所述连通域确定与人体关键点的第一参数相邻的连通域剃须刀为待检测对象。
根据图像帧中的所述人体关键点的位置及所述连通域确定与人体关键点的第一参数相连的连通域为待检测对象,具体地:
确定图像帧中的每个连通域的中心的位置;根据所述图像帧中的所述人体关键点的第一参数的位置及图像帧中的每个连通域的中心的位置确定图像帧中的人体关键点的第一参数与连通域的中心之间的欧氏距离,并确定欧氏距离小于预设值的连通域为待检测对象。其中,欧氏距离小于预设值表示连通域与人体关键点的第一参数相连。
在图像帧坐标系下,图像帧中的所述人体关键点的位置包括所述人体关键点在图像帧坐标系下的坐标,图像帧中的每个连通域的中心的位置包括所述连通域的中心在图像帧坐标系下的坐标。根据所述图像帧中的所述人体关键点的位置及图像帧中的每个连通域的中心的位置确定图像帧中的人体关键点的第一参数与连通域的中心之间的欧氏距离包括:通过公式
Figure PCTCN2022082300-appb-000001
根据所述图像帧中的所述人体关键点的坐标及图像帧中的每个连通域的中心的位置确定图像帧中的人体关键点的第一参数与连通域的中心之间的欧氏距离。其中,p ji为图像帧中的人体关键点的第一参数中的第j个腕与第i个连通域的中心之间的欧氏距离,x 1i为图像帧中的第i个连通域的中心的横坐标,x j为图像帧中的人体关键点的第一参数中的第j个腕的横坐标,y 1i为图像帧中的第i个连通域的中心的纵坐标,y j为图像帧中的人体关键点的第一参数中的第j个腕的纵坐标。其中,i=1,2,…,n,j=1,2。继续以上述的图13B所示的图像帧10为例来对本申请进行说明,电子设备确定图像帧10中的连通域剃须刀的中心的位置为(x 11,y 11),图像帧10中的所述人体关键点的位置包括人体关键点的第一参数中的左腕的位置为(x 1,y 1),及人体关键点的第一参数中的右腕的位置为(x 2,y 2)。图像帧10中的人体关键点的第一参数中的左腕与连通域剃须刀的中心之间的欧氏距离为
Figure PCTCN2022082300-appb-000002
大于预设值,图像帧10中的人体关键点的第一参数中的右腕与连通域的中心之间的欧氏距离为
Figure PCTCN2022082300-appb-000003
小于预设值,则确定连通域剃须刀为待检测对象。
在本实施例中,电子设备确定欧氏距离小于预设值,具有最大的区域面积及/和第一相邻图像帧中的深度变化值最大的连通域为待检测对象。在选定跟踪目标时,如图4中步骤S406中确定位移值大于第一预设值且位移方向为靠近焦平面的方向的过程,电子设备首先确定待检测对象中位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象,再在其中确定在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象作为跟踪目标。在其中确定在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象,具体地:
根据图像帧的深度信息及图像帧中的人体关键点的位置确定人体关键点的第二参数的深 度信息,并根据待检测对象的深度信息与人体关键点的第二参数的深度信息确定在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。
人体关键点的第二参数为人体的躯体。在本实施例中,根据图像帧的深度信息及图像帧中的人体关键点的第三参数的位置、第四参数的位置、第五参数的位置及第六参数的位置确定图像帧中的人体关键点的第三参数的深度信息、第四参数的深度信息、第五参数的深度信息及第六参数的深度信息,并根据图像帧中的人体关键点的第三参数的深度信息、第四参数的深度信息、第五参数的深度信息及第六参数的深度信息确定人体关键点的第二参数的深度信息。继续以上述的图13B所示的图像帧10为例来对本申请进行说明,在图像帧10中,人体关键点的第三参数左右肩的深度信息分别为1.5米及1.54米、第四参数颈的深度信息为1.52米、第五参数头的深度信息为1.52米,及第六参数左右臀的深度信息分别为1.51米及1.53米,则图像帧10中的人体关键点的第二参数躯体的深度信息为(1.5米+1.54米+1.52米+1.52米+1.51米+1.53米)/6,即1.52米。待检测对象在图像帧中包括多个相互连通的像素点。待检测对象的深度信息为待检测对象的像素点的深度信息的平均值,及待检测对象在图像帧中与人体关键点的第二参数之间的深度信息差为待检测对象在图像帧中的像素点与人体关键点的第二参数之间的平均深度信息差。
继续以上述的图13A及图13B为例来说明本申请,图13B所示的图像帧10中待检测对象剃须刀的位置相比图13A所示的图像帧9中待检测对象剃须刀的位置之间位移值大于第一预设值,位移方向为靠近焦平面的方向,且待检测对象剃须刀所包括的像素点o、p、q、r,在图像帧10中的深度信息与人体关键点的第二参数躯体之间的平均相对深度差大于第三预设值,则选定所述待检测对象剃须刀为跟踪目标。
在聚焦显示跟踪目标时,如图4中步骤S409至步骤S410中确定位移值小于第二预设值且位移方向为远离焦平面的方向或者位移方向为靠近焦平面的方向的过程,电子设备确定位移值小于第二预设值且位移方向为远离焦平面的方向或者位移方向为靠近焦平面的方向,再确定跟踪目标在图像帧中与人体关键点的第一参数相连。确定跟踪目标在图像帧中与人体关键点的第一参数相连与判断待检测对象与人体关键点的第一参数相连相似,在此不再进行赘述。
在退出跟踪时,如图4中步骤S411中确定位移值大于第二预设值且位移方向为远离焦平面的方向,电子设备确定位移值大于第二预设值且位移方向为远离焦平面的方向,或者电子设备确定跟踪目标与人体关键点的第一参数不相连,退出跟踪。电子设备确定跟踪目标与人体关键点的第一参数不相连,具体地:
确定图像帧中的跟踪目标的中心的位置;根据所述图像帧中的所述人体关键点的位置及图像帧中的跟踪目标的中心的位置确定图像帧中的人体关键点的第一参数与跟踪目标的中心之间的欧氏距离,并判断欧氏距离是否小于预设值,若欧式距离大于预设值,电子设备确定跟踪目标与人体关键点的第一参数不相连。
确定图像帧中的跟踪目标的中心的位置与上述的确定图像帧中的每个连通域的中心的位置相似,在此不再进行赘述。根据所述图像帧中的所述人体关键点的位置及图像帧中的跟踪目标的中心的位置确定图像帧中的人体关键点的第一参数与跟踪目标的中心之间的欧氏距离,与上述的根据所述图像帧中的所述人体关键点的位置及图像帧中的每个连通域的中心的位置确定图像帧中的人体关键点的第一参数与连通域的中心之间的欧氏距离相似,在此不再 进行赘述。
参考图14,为本申请实施例的服务器的硬件结构示意图。所述服务器14包括存储器143、处理器144及通信接口145。本领域技术人员可以理解,图14中示出的结构并不构成对所述服务器14的限定,所述服务器14可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。
所述存储器143可用于存储软件程序和/或模块/单元。所述处理器144通过运行或执行存储在所述存储器143内的软件程序和/或模块/单元,以及调用存储在存储器143内的数据,实现所述服务器14的各种功能。所述存储器143可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据服务器14的使用所创建的数据(比如音频数据等)等。此外,存储器143可以包括非易失性计算机可读存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。
所述处理器144可以是中央处理单元(Central Processing Unit,CPU),图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器144可以是微处理器或者该处理器144也可以是任何常规的处理器等,所述处理器144是所述服务器14的控制中心,利用各种接口和线路连接整个服务器14的各个部分。
所述通信接口145可包括标准的有线接口、无线接口等。所述通信接口145用于供所述服务器14与所述电子设备进行通信。
图4所示的目标跟踪方法不仅可用于电子设备上,还可用于电子设备和服务器所组成的系统上。目标跟踪方法应用于电子设备和服务器所组成的系统上与应用于电子设备上的不同之处在于:
在步骤S402电子设备获取摄像模组采集的视频流之后,电子设备还通过客户端将采集的视频流传送至服务器。所述服务器执行步骤S403至步骤S406,并在选定跟踪目标后传送选定的跟踪目标及驱动信号至电子设备,控制电子设备的摄像模组运动及执行跟踪选定的跟踪目标。服务器还执行步骤S408至步骤S410,并在若位移值大于第二预设值且位移方向为远离焦平面的方向时,传送退出跟踪信号至电子设备,控制电子设备退出跟踪。
显然,如上述的目标跟踪方法不仅可用于根据深度信息的变化选定跟踪目标并进行跟踪的场景,还可应用于根据图像帧中人体的关键点及深度信息的变化选定跟踪目标并跟踪的场景下。电子设备和服务器所组成的系统也可应用于根据图像帧中人体的关键点及深度信息的变化选定跟踪目标并跟踪的场景下。不同之处,上述的在根据图像帧中人体的关键点及深度信息的变化选定跟踪目标并跟踪的场景下,与根据深度信息的选定跟踪目标并进行跟踪的场景的不同之处主要由服务器执行,仅若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连时,传送退出跟踪信号至电子设备,控制电子设备退出跟踪。
请参考图15,图15为本发明实施例提供的一种目标跟踪装置的结构示意图,所述目标跟踪装置15可以包括获取单元151和确定单元152。
获取单元151用于获取视频流中的图像帧的深度信息。
确定单元152用于确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域。
确定单元152还用于确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向。
确定单元152还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
可选地,确定单元152用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置。确定单元152还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向。确定单元152还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
可选地,确定单元152用于检测视频流中的图像帧中的人体关键点。确定单元152还用于确定视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域为待检测对象。确定单元152还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。
可选地,确定单元152还用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置。确定单元152还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向。确定单元152还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。
可选地,深度信息的位移值为像素点的平均深度变化值的绝对值。
本申请实施例中描述的目标跟踪装置可以用来实施上述目标跟踪方法中所述电子设备或服务器执行的操作。
除以上方法和装置外,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在处理器上运行时,实现目标跟踪方法。
一种计算机程序产品,所述计算机程序产品包括计算机执行指令,所述计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从所述计算机可读存储介质中读取所述计算机执行指令,所述至少一个处理器执行所述计算机执行指令使得所述设备实现目标跟踪方法。
本申请可在跟踪之前,确定显著前移的物体为选定的跟踪目标,可通过简单的交互方法,确定选定的跟踪目标的位置和大小,无需手动绘制边界框且可检测过小的物体;在跟踪 时,确定显著后移的物体为退出跟踪的物体,可通过简单的交互方法使得所述物体退出跟踪,从而可退出跟踪。
本申请可在跟踪之前,确定用手拿起并显著前移的物体为选定的跟踪目标,可通过简单的交互方法,确定选定的跟踪目标的位置和大小,无需手动绘制边界框且可检测过小的物体;在跟踪时,确定用手放下或者显著后移的物体为退出跟踪的物体,可通过简单的交互方法使得所述物体退出跟踪,从而可退出跟踪。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者所述技术方案的全部或部分可以以软件产品的形式体现出来,所述软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (14)

  1. 一种目标跟踪方法,其特征在于,所述方法包括:
    获取视频流中的图像帧的深度信息;
    确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域;
    确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的位移值及位移方向;
    选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
  2. 如权利要求1所述的目标跟踪方法,其特征在于,所述方法还包括:
    跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;
    确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;
    若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
  3. 如权利要求1所述的目标跟踪方法,其特征在于,所述方法包括:
    检测视频流中的图像帧中的人体关键点;
    其中,待检测对象为视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域;所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。
  4. 如权利要求3所述的目标跟踪方法,其特征在于,所述方法还包括:
    跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;
    确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;
    若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。
  5. 如权利要求1至4中任一项所述的目标跟踪方法,其特征在于:所述深度信息的位移值为像素点的平均深度变化值的绝对值。
  6. 一种目标跟踪装置,其特征在于,所述装置包括:
    获取单元,用于获取视频流中的图像帧的深度信息;
    确定单元,用于确定视频流中第一相邻图像帧之间的变化区域为待检测对象,第一相邻图像帧包括第一图像帧和第二图像帧,第一图像帧为位于第二图像帧前面的图像帧,所述变化区域为深度信息的差异区域;
    所述确定单元,还用于确定第二图像帧中待检测对象的位置相比第一图像帧中待检测对象的位置之间位移值及位移方向,位置为深度信息的位置,位移值及位移方向为深度信息的 位移值及位移方向;
    所述确定单元,还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值且位移方向为靠近焦平面的方向的待检测对象。
  7. 如权利要求6所述的目标跟踪装置,其特征在于:
    所述确定单元,还用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;
    所述确定单元,还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;
    所述确定单元,还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,退出跟踪。
  8. 如权利要求6所述的目标跟踪装置,其特征在于:
    所述确定单元,还用于检测视频流中的图像帧中的人体关键点;
    所述确定单元,还用于确定视频流中第一相邻图像帧之间的与人体关键点的第一参数相连的变化区域为待检测对象;
    所述确定单元,还用于选定跟踪目标,所述跟踪目标为位移值大于第一预设值,位移方向为靠近焦平面的方向,且在图像帧中与人体关键点的第二参数之间的深度信息差大于第三预设值的待检测对象。
  9. 如权利要求8所述的目标跟踪装置,其特征在于:
    所述确定单元,还用于跟踪时,确定跟踪目标在第二相邻图像帧中的位置,第二相邻图像帧包括第三图像帧和第四图像帧,第三图像帧为位于第四图像帧前面的图像帧,位置为深度信息的位置;
    所述确定单元,还用于确定第四图像帧中跟踪目标的位置相比第三图像帧中跟踪目标的位置之间位移值及位移方向,所述位移值及所述位移方向为深度信息的位移值及位移方向;
    所述确定单元,还用于若位移值大于第二预设值且位移方向为远离焦平面的方向,或者跟踪目标与人体关键点的第一参数不相连,退出跟踪。
  10. 如权利要求6至9中任一项所述的目标跟踪装置,其特征在于:
    所述深度信息的位移值为像素点的平均深度变化值的绝对值。
  11. 一种电子设备,其特征在于,所述设备包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令时,实现如权利要求1至5中任一项所述的目标跟踪方法。
  12. 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令时,实现如权利要求1至5中任一项所述的目标跟踪方法。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有程序,所述程序使得计算机设备实现如权利要求1至5中任一项所述的目标跟踪方法。
  14. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机执行指令,所述计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从所述计算机 可读存储介质中读取所述计算机执行指令,所述至少一个处理器执行所述计算机执行指令使得所述设备执行如权利要求1至5中任一项所述的目标跟踪方法。
PCT/CN2022/082300 2021-03-29 2022-03-22 目标跟踪方法及其装置 WO2022206494A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110336639.7 2021-03-29
CN202110336639.7A CN115147451A (zh) 2021-03-29 2021-03-29 目标跟踪方法及其装置

Publications (1)

Publication Number Publication Date
WO2022206494A1 true WO2022206494A1 (zh) 2022-10-06

Family

ID=83403774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082300 WO2022206494A1 (zh) 2021-03-29 2022-03-22 目标跟踪方法及其装置

Country Status (2)

Country Link
CN (1) CN115147451A (zh)
WO (1) WO2022206494A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115980116A (zh) * 2022-11-22 2023-04-18 宁波博信电器有限公司 一种仪表盘耐高温检测方法、系统、存储介质及智能终端
CN116320727A (zh) * 2023-02-25 2023-06-23 荣耀终端有限公司 一种算法调度方法及电子设备
CN117874289A (zh) * 2024-01-15 2024-04-12 深圳市智云看家科技有限公司 一种摄像头回放查找的方法、缺陷检测装置和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509074A (zh) * 2011-10-18 2012-06-20 Tcl集团股份有限公司 一种目标识别方法和设备
CN103729860A (zh) * 2013-12-31 2014-04-16 华为软件技术有限公司 一种图像目标跟踪的方法和装置
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN105628951A (zh) * 2015-12-31 2016-06-01 北京小孔科技有限公司 用于测量对象的速度的方法和装置
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509074A (zh) * 2011-10-18 2012-06-20 Tcl集团股份有限公司 一种目标识别方法和设备
CN103729860A (zh) * 2013-12-31 2014-04-16 华为软件技术有限公司 一种图像目标跟踪的方法和装置
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN105628951A (zh) * 2015-12-31 2016-06-01 北京小孔科技有限公司 用于测量对象的速度的方法和装置
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115980116A (zh) * 2022-11-22 2023-04-18 宁波博信电器有限公司 一种仪表盘耐高温检测方法、系统、存储介质及智能终端
CN115980116B (zh) * 2022-11-22 2023-07-14 宁波博信电器有限公司 一种仪表盘耐高温检测方法、系统、存储介质及智能终端
CN116320727A (zh) * 2023-02-25 2023-06-23 荣耀终端有限公司 一种算法调度方法及电子设备
CN116320727B (zh) * 2023-02-25 2024-03-08 荣耀终端有限公司 一种算法调度方法及电子设备
CN117874289A (zh) * 2024-01-15 2024-04-12 深圳市智云看家科技有限公司 一种摄像头回放查找的方法、缺陷检测装置和存储介质

Also Published As

Publication number Publication date
CN115147451A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
JP7391102B2 (ja) ジェスチャ処理方法およびデバイス
EP3800876B1 (en) Method for terminal to switch cameras, and terminal
CN113645351B (zh) 应用界面交互方法、电子设备和计算机可读存储介质
CN110495819B (zh) 机器人的控制方法、机器人、终端、服务器及控制系统
EP3961358B1 (en) False touch prevention method for curved screen, and eletronic device
WO2022206494A1 (zh) 目标跟踪方法及其装置
WO2019072178A1 (zh) 一种通知处理方法及电子设备
WO2022127787A1 (zh) 一种图像显示的方法及电子设备
WO2020029306A1 (zh) 一种图像拍摄方法及电子设备
WO2021169394A1 (zh) 基于深度的人体图像美化方法及电子设备
US20220262035A1 (en) Method, apparatus, and system for determining pose
CN112637758B (zh) 一种设备定位方法及其相关设备
CN112087649B (zh) 一种设备搜寻方法以及电子设备
WO2022042275A1 (zh) 测量距离的方法、装置、电子设备及可读存储介质
WO2022161386A1 (zh) 一种位姿确定方法以及相关设备
WO2022105702A1 (zh) 保存图像的方法及电子设备
WO2022166435A1 (zh) 分享图片的方法和电子设备
WO2022152174A9 (zh) 一种投屏的方法和电子设备
CN116152814A (zh) 一种图像识别方法以及相关设备
WO2022078116A1 (zh) 笔刷效果图生成方法、图像编辑方法、设备和存储介质
WO2022017270A1 (zh) 外表分析的方法和电子设备
WO2022062902A1 (zh) 一种文件传输方法和电子设备
CN114812381B (zh) 电子设备的定位方法及电子设备
WO2021036562A1 (zh) 用于健身训练的提示方法和电子设备
CN113970965A (zh) 消息显示方法和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778677

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778677

Country of ref document: EP

Kind code of ref document: A1