WO2022143230A1 - 一种确定跟踪目标的方法及电子设备 - Google Patents

一种确定跟踪目标的方法及电子设备 Download PDF

Info

Publication number
WO2022143230A1
WO2022143230A1 PCT/CN2021/139276 CN2021139276W WO2022143230A1 WO 2022143230 A1 WO2022143230 A1 WO 2022143230A1 CN 2021139276 W CN2021139276 W CN 2021139276W WO 2022143230 A1 WO2022143230 A1 WO 2022143230A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
target
electronic device
target object
frame
Prior art date
Application number
PCT/CN2021/139276
Other languages
English (en)
French (fr)
Inventor
张超
徐健
张雅琪
刘宏马
贾志平
吕帅林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US18/259,718 priority Critical patent/US20240062392A1/en
Priority to EP21913971.4A priority patent/EP4258649A4/en
Publication of WO2022143230A1 publication Critical patent/WO2022143230A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/633Control of cameras or camera modules by using electronic viewfinders for displaying additional information relating to control or operation of the camera
    • H04N23/635Region indicators; Field of view indicators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • H04N23/675Focus control based on electronic image sensor signals comprising setting of focusing regions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20101Interactive definition of point of interest, landmark or seed

Definitions

  • the present application relates to the field of video processing, and in particular, to a method for determining a tracking target and an electronic device.
  • a specific target to be tracked is selected by means of target detection and a single click of a user to determine a tracking target.
  • the electronic device starts target detection, detects the position and size of an object of a specified category in the video image, and displays it in the target frame.
  • the user's click coordinates are located in a certain target frame, it is determined that the object in the target frame is a specific target to be tracked.
  • the disadvantage of the prior art is that the object categories that can be detected by the target detection are limited, and the tracking of any object cannot be realized. And limited by the target detection algorithm, if the size of the object does not meet the requirements of the algorithm, the electronic device cannot detect the target.
  • running the target detection algorithm requires processor resources and increases power consumption.
  • the method and electronic device for determining a tracking target provided by the present application can allow users to quickly and conveniently determine the tracking target in the target tracking mode in a video recording scene, improve the range of the user to select the tracking target, and achieve the purpose of tracking more types of targets .
  • the present application provides a method for shooting video, the electronic device acquires N frames of images, and then acquires the user's first operation on the target object on the screen; the electronic device displays a target frame with a cyclically changing area;
  • the feature vector of the target object is determined.
  • the method further includes: determining whether the user has a third operation on the target object;
  • the user if the user does not perform the third operation on the target object, it is determined whether the user has the second operation; if the user does not have the second operation, the area will continue to be displayed The target frame that changes cyclically; if the user has a second operation, the information of the target frame in the Nth frame image is acquired.
  • the fourth operation is acquired; it is determined whether the user cancels the fourth operation; if the user does not cancel the fourth operation , then continue to acquire the user's fourth operation on the target object; if the user cancels the fourth operation, determine whether the user has a second operation on the target object.
  • the second operation performed by the user on the target object is acquired.
  • the second operation is acquired; it is determined whether the user cancels the second operation; if the user does not have the second operation on the target object
  • the second operation is to obtain the information of the target frame in the Nth frame image.
  • the first operation is a click operation
  • the second operation is a second click operation
  • the third operation is a long-press operation
  • the fourth operation is a drag operation
  • the information of the target frame includes the area and position of the target frame.
  • the present application provides an electronic device including a screen, a memory and a processor, and the screen receives a user's operation on a target object on the screen;
  • the processor is configured to invoke the computer program, so that the electronic device executes the method described in any one of the above-mentioned first aspects.
  • the present application provides a computer storage medium, comprising: computer instructions; when the computer instructions are executed on an electronic device, the electronic device causes the electronic device to execute the method described in any one of the above-mentioned first aspects.
  • FIG. 1A is a schematic diagram of an image frame obtained by an electronic device provided by an embodiment of the present application.
  • FIG. 1B is a schematic diagram of a hardware structure of an electronic device 100 provided by an embodiment of the present application.
  • FIG. 1C is a schematic diagram of a software structure of an electronic device 100 provided by an embodiment of the present application
  • FIGS. 2A to 2J are schematic diagrams of user interfaces in which some electronic devices 100 according to embodiments of the present application receive user operations to determine a tracking target;
  • FIG. 3 is a schematic diagram of a method for implementing target tracking provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a feature vector of a tracking target provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for determining a target frame of a tracking target provided by an embodiment of the present application
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • plural means two or more.
  • An electronic device with the function of shooting video can track and shoot a specific object in the video screen during video shooting, that is, target tracking.
  • target tracking There are two ways of target tracking: the first is that the image signal processor (ISP) of the electronic device first obtains a frame of wide-angle image, as shown in Figure 1A, the wide-angle image obtained by the ISP is image 101. If it is determined that the tracking target is the person 103, take the person 103 as the center, crop a part of the image (such as the image 102), resize the image 102, and fix the resolution of the cropped image (for example, fix the resolution to 1080P) , and then display the above-processed image on the screen and show it to the user.
  • ISP image signal processor
  • the second is to use auxiliary equipment (such as a gimbal, etc.), which can rotate and switch lenses with the user's movement to achieve adjustment of different fields of view, so that specific targets can be continuously photographed.
  • auxiliary equipment such as a gimbal, etc.
  • the camera of the electronic device can acquire continuous image frames, and the user can select the tracking target in the Nth frame of the continuous image frames displayed by the electronic device.
  • the target tracking in this embodiment of the present application may also mean that the electronic device may determine the tracking target selected by the user in the image frames after the Nth frame of images (eg, the N+1th frame).
  • the electronic device can mark out the tracking target in the image frame.
  • an electronic device is a device that can display continuous image frames, such as mobile phones, tablet computers, desktop computers, televisions, and the like.
  • the embodiments of the present application do not limit the electronic device.
  • a frame of image displayed by the electronic device may be referred to as an image frame or an Nth frame of image.
  • a specific object for example, a person, a plant or animal, a car, etc.
  • a tracking target for example, a person, a plant or animal, a car, etc.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as inputs, and the output of the operation unit can refer to the following formula (1):
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is an activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • a convolutional neural network is a neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional and subsampling layers.
  • the feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle.
  • Neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as the way to extract image information is independent of location. The underlying principle is that the statistics of one part of the image are the same as the other parts. This means that image information learned in one part can also be used in another part. So for all positions on the image, the same learned image information can be used.
  • multiple convolution kernels can be used to extract different image information. Generally, the more convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the feature matching algorithm used in this application is to use the feature of the target small image of the tracking target (for example, the feature is 6*6*128) as the convolution kernel, and the feature map of the image frame (for example, the feature is 22*22*128) Perform convolution to obtain a score map (Score map).
  • the above Score map is a score matrix (for example, a 17*17 matrix). A point with a larger score on the Score map (corresponding to an area on the image frame) indicates that the area on the corresponding image frame is more similar to the features of the target, that is, the more likely the area is to track the location of the target. high.
  • a cosine window is added, that is, each point in the score matrix is multiplied by a coefficient.
  • the most central point corresponds to the position of the target in the previous frame, and the coefficient is 1.
  • FIG. 1B shows a schematic structural diagram of the electronic device 100 .
  • the electronic device 100 As an example, it should be understood that the electronic device 100 shown in FIG. 1B is only an example, and the electronic device 100 may have more or fewer components than those shown in FIG. 1 , two or more components may be combined, or Different component configurations are possible.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, And a subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • emitting diode, AMOLED organic light-emitting diode
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the display screen 194 can accept the user's click or slide operation to determine the tracking target in the video shooting.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, and an application processor. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • FIG. 1C is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (API) and a programming framework for applications in the application layer, and the application framework layer includes some predefined functions.
  • API application programming interface
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • the window manager is used to manage the window program.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, and take screenshots.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, and the like.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, prompt text information in the status bar, sound a prompt, electronic equipment vibrates, indicator lights flash, etc.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library media library
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of many common audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG and PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, compositing and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the electronic device 100 may include a camera 21 .
  • the camera 21 may be a front camera.
  • the camera 21 may also include a rear camera.
  • the electronic device 100 may display the user interface 200 as shown in FIG. 2A .
  • the user interface 200 may include an application icon display area 23, a tray 24 with frequently used application icons. in:
  • the application icon display area 23 may include a gallery icon 231 .
  • the electronic device 100 may open the gallery application, thereby displaying information such as pictures and videos stored in the electronic device 100 .
  • the pictures and videos stored in the electronic device 100 include photos and videos shot by the electronic device 100 through a camera application.
  • the application icon display area 23 may further include more application program icons, such as icons of smart life, settings, calendar, smooth call, clock, application store, memo, etc., which are not limited in this embodiment of the present application.
  • a tray 24 with frequently used application icons may display the camera icon 22 .
  • the electronic device 100 can open the camera application, so as to perform functions such as photographing and video recording.
  • the camera 21 the front camera and/or the rear camera
  • the tray 23 with icons of frequently used application programs may also display more application program icons, such as icons of phone, information, and address book, which are not limited in this embodiment of the present application.
  • User interface 200 may also contain more or less content, such as controls to display the current time and date, controls to display the weather, and the like. It can be understood that, FIG. 2A only exemplarily shows the user interface on the electronic device 100 , and should not constitute a limitation to the embodiments of the present application.
  • the electronic device 100 may display a user interface 210 as shown in FIG. 2B.
  • User interface 210 may include preview area 221, edit controls 220, scale adjustment controls 219, flash controls 218, smart recognition controls 217, filter controls 216, settings controls 215, camera mode selection wheel 211, gallery shortcut controls 214, shutter Control 212, camera flip control 213, etc. in:
  • the preview area 221 may be used to display images captured by the camera 21 in real time.
  • the electronic device can refresh the displayed content in real time, so that the user can preview the image currently captured by the camera 21 .
  • the editing control 200 can be used to add doodles or expressions to images currently captured in real time.
  • the scale adjustment control 219 can be used to adjust the display scale of the currently displayed preview area 221, such as 16:9, 4:3, and so on.
  • Flash control 218 may be used to turn the flash on or off.
  • the intelligent object recognition control 217 can be used to identify the category of the currently collected image by using an artificial intelligence algorithm, and perform corresponding processing.
  • the filter control 216 can be used to simulate the filter of a camera and adjust the color of the light source.
  • the setting control 215 can be used to adjust parameters for taking pictures and enable or disable some methods for taking pictures (such as timed pictures, smile snapshots, voice-activated pictures, etc.).
  • the setting control 215 may be used to set more other shooting functions, such as setting the screen ratio, setting the video resolution, setting the follow-up mode, setting the watermark, etc., which are not limited in the embodiment of the present application.
  • the one or more shooting mode options may be displayed in the camera mode selection wheel 211 .
  • the one or more shooting mode options may include: night scene mode, portrait mode, photo mode, video mode, professional mode, and the like.
  • the one or more shooting mode options may be represented as text information on the interface, such as "night scene”, “portrait”, “photography”, “video recording”, “professional”, “more”.
  • the one or more camera options can be represented as icons or other forms of interactive elements (IE) on the interface.
  • the electronic device 100 can start the shooting mode selected by the user.
  • the camera mode selection wheel 211 may include more or less shooting mode options. The user can browse other shooting mode options by swiping left/right in the camera mode selection wheel 211 .
  • Gallery shortcut control 214 may be used to launch the Gallery application.
  • the electronic device 100 may launch the gallery application.
  • the gallery application is a picture management application on an electronic device such as a smart phone and a tablet computer, and may also be called an "album", and the name of the application is not limited in this embodiment.
  • the gallery application can support the user to perform various operations on the pictures stored on the electronic device 100, such as browsing, editing, deleting, selecting and other operations.
  • the shutter control 212 can be used to monitor the user operation that triggers the photographing.
  • the electronic device 100 may detect a user operation acting on the shutter control 212, and in response to the operation, the electronic device 100 may save the image in the preview area 221 as a picture in the gallery application.
  • the electronic device 100 may also display thumbnails of the saved images in the gallery shortcut control 213 . That is, the user can click on the shutter control 212 to trigger taking a photo.
  • the shutter control 212 may be a button or other forms of control.
  • the camera flip control 213 can be used to monitor a user operation that triggers flipping the camera.
  • the electronic device 100 may detect a user operation, such as a click operation, acting on the camera flip control 213, and in response to the operation, the electronic device 100 may flip the camera for shooting, for example, switch the rear camera to the front camera, or switch the front camera to the front camera. switch from the front camera to the rear camera.
  • the user interface 210 may further include more or less content, which is not limited in this embodiment of the present application.
  • FIG. 2C to FIG. 2J exemplarily show a user interface using target tracking when the electronic device 100 performs video shooting.
  • the setting interface is entered.
  • the setting interface may be the user interface 230 shown in FIG. 2C .
  • the user interface 230 may include, but is not limited to, setting options for photography, setting options for video, setting options for general, and the like.
  • a picture ratio option may be included, for example, the picture ratio may be 4:3; and a voice-activated photographing option may also be included, for example, the voice-activated photographing may be turned on or off.
  • a video resolution option such as 1080P
  • a tracking mode can also be included, for example, the tracking mode can be turned on or off, and the electronic device can use the target tracking function in the tracking mode.
  • the automatic watermarking option can be included, for example, the automatic watermarking option can be turned on or off; the time-lapse shooting option can also be included, for example, the time-lapse shooting option can be turned on or off.
  • the tracking mode can be turned on or off in response to the user clicking on the tracking mode option.
  • the tracking mode is turned on, as shown in FIG. 2B , in response to the user's click operation on the shutter control 212 , the electronic device 100 starts video recording, that is, shooting video.
  • a user interface 240 is displayed on the electronic device.
  • the tracking indication switch 241 at the upper position of the preview area.
  • the tracking indication switch 241 can indicate whether the target tracking mode is currently turned on, and in response to the user's operation on the tracking indication switch 241 , the tracking mode can be turned on or off. When the target tracking mode is turned on, the user can perform the operation of determining the tracking target.
  • the user interface 240 there is a character 244 (which can also be any object).
  • a time control 242 for displaying the duration of the video recording.
  • the user can click on the character 244 in the user interface 240, and a target frame 243 with the touched position as the center (or center) will appear at the position clicked by the user.
  • the target frame 243 may be circular or other shapes, and the shape of the target frame is not limited in this application.
  • the target box 243 when the tracking mode is turned on, after the user clicks on the character 244, the target box 243 will change the size of the shape over time.
  • the user needs to long-press on the character 244, and the target frame 243 will change the size of the shape over time.
  • the target frame 243 will change the size of the shape over time.
  • the time control 242 changes from 00:02 in FIG. 2D to 00:03 in FIG. 2E , and the target frame 243 area gradually increased.
  • the time control 242 increases from 00:03 in FIG. 2E to 00:05 in FIG. 2F , and the area of the target frame 243 is compared with the target frame in FIG. 2E The area of 243 was further increased to cover a larger target range.
  • the time control 242 increases from 00:05 in China to 00:06 in FIG. 2F , and the area of the target frame 243 decreases compared to the area of the target frame 243 in FIG. 2F .
  • the area of the target frame 243 will cyclically increase to decrease, as shown in FIG. 2D to FIG. 2G .
  • the maximum area of the target frame 243 is the entire display interface of the video recording interface, and the minimum area of the target frame 243 is the touch point on the screen when the user clicks the target.
  • the tracking mode when the user selects the tracking target, if the user fails to click on the center of the target, the user can drag the finger to reach the center of the target frame 243 as the finger drags The purpose of moving and changing. As the center position of the target frame 243 moves, the area of the target frame 243 also increases and decreases cyclically. As shown in FIG. 2H , the user initially clicks on the upper position of the character 244 to generate an initial target frame 243 . As the user drags the center of the target frame 243 downward, the area of the target frame 243 gradually increases, and is displayed as a target frame 243a. It can be understood that in the above embodiment, as long as the user's finger does not leave the screen, the target frame 243 centered on the user's click position will increase and decrease cyclically; Synchronization.
  • objects eg, characters
  • objects in the video recording interface may move or move.
  • the position where the user clicks on the screen can move with the movement of the object, or with the movement of the character, that is, the position where the user clicks on the screen is always on the object and moves with the object.
  • the area of the target frame 243 also increases and decreases cyclically as the user's pressing time increases.
  • the character 244 moves from the far left in the video recording interface to the far right, and the user clicks the character 244 when the character 244 appears on the far left of the video recording interface to generate an initial target frame 243; 244 moves to the middle position of the video recording interface, the user's finger moves to the middle position of the screen following the movement of the character 244, and the area of the initial target frame 243 gradually increases and is displayed as the target frame 243b; as the character 244 moves to the video recording On the far right side of the interface, the user's finger moves to the right position of the screen following the movement of the character 244, and the area of the target frame 243b gradually decreases and is displayed as a target frame 243c.
  • the user can more conveniently select the moving object as the tracking target.
  • an object eg, a person
  • a text or icon prompt prompting the user to rotate the electronic device will appear in the video recording interface.
  • the character 244 has moved to the edge of the video recording interface 250. If the character 244 continues to move, the target frame 243 cannot obtain the characteristics of the character 244, so that the tracking target cannot be determined.
  • a prompt 245 will appear on the video recording interface 250.
  • the prompt 245 may be in text or in other forms such as an icon, which is not covered in this application. limited. Prompt 245 may be "Please turn the phone to keep the target in the frame".
  • the electronic device can intelligently prompt the user, so that the user can avoid the situation that the tracking target cannot be determined due to not paying attention or not knowing how to operate.
  • the method for determining the tracking target of the electronic device includes step S310 .
  • step S310 the electronic device confirms the initial tracking target.
  • the specific method for confirming the tracking target is as described in the foregoing embodiments of the present application, and details are not described herein again.
  • step S320 is entered.
  • step S320 the electronic device extracts the feature of the tracking target and the feature of the current video frame respectively.
  • the feature of the tracking target can be represented by the feature vector F1(N)
  • the feature of the video frame can be represented by F(N).
  • the electronic device extracts F1(N) in step S322, and extracts F(N) in step S321.
  • step S330 after obtaining the feature vector F1(N) and the feature vector F(N), the electronic device executes the aforementioned feature matching algorithm and center weighting algorithm to perform feature matching on F1(N) and F(N).
  • the electronic device can perform feature extraction on the tracking target image through a feature extraction algorithm, and obtain and save the features of the tracking target image (such as texture features, contour features, color features, etc.).
  • the electronic device can The feature vector obtained by the feature extraction of the tracking target image represents the feature of the tracking target corresponding to the tracking target image.
  • the feature of the specified target (person 244) in the Nth frame image can be represented by a feature vector F1(N).
  • the feature vector can represent the color feature, texture feature, contour feature and other features of the tracked target.
  • feature vector F1(N) may represent one or more of texture features, contour features, color features, etc. of the specified object (person 244).
  • the specific form and size of the feature vector F1(N) of the target (person 244) will be specified here.
  • F1(N) may be a feature vector [0.5, 0.6, 0.8, ..., 0.9, 0.7, 0.3] containing n values.
  • n is an integer, which can be 128, 256, 512, etc.
  • the size of n is not limited.
  • a tracking template can be used to represent one or more characteristics of a tracking target. After the electronic device extracts the features of the tracking target, the electronic device saves the features of the tracking target into the tracking template. In subsequent consecutive video frames, when the electronic device matches the features of the specified target in the image frame with the features of the tracking target in the tracking template.
  • step S340 if the matching is successful, the electronic device determines in the image frame that the designated target is a tracking target. After the user specifies the tracking target, the electronic device will track the tracking target in the image frame after the Nth frame of image.
  • the electronic device can track the tracking target in any way.
  • the electronic device tracks the tracking target by using the center of the target frame 243 in the previous frame of image as the search center and M times the size of the target frame as the search area.
  • the electronic device obtains the response value of each pixel in the search area according to the characteristics of the tracking target and each pixel in the search area. If the maximum response value corresponding to the pixel in the search area is greater than the preset response value, there are Track the target. Then the electronic device marks the position of the tracking target in the frame image. After that, the electronic device automatically focuses on the tracking target, so that the tracking target can be photographed more clearly.
  • step S350 after the electronic device determines the center point of the tracking target in the N+1 th frame image, the electronic device uses the center point of the tracking target as the center to determine the size of the cropping frame for the acquired N+ th frame. 1 frame image is cropped to get a new image.
  • the preset size of the cropping frame is larger than the size of the target frame 243 .
  • the preset size of the cropping frame may be half or three-quarters of the size of the original image obtained by the electronic device, and the present application does not limit the size of the preset size of the cropping frame.
  • the class corresponding to the specified target may be displayed on or around the position of the tracking target.
  • the new cropped image is displayed.
  • step S501 the electronic device first acquires a video stream, and the video stream includes N frames of image frames.
  • step S502 the electronic device acquires the user's first operation (eg, a click operation) on the target object on the screen, and proceeds to step S503.
  • the target object may be a stationary or moving object, a person, an animal, etc., which is not limited in this application.
  • step S503 after the user's click operation, the video recording interface displays a target frame whose area is constantly changing cyclically (eg, different in size) and centered on the click point. It can be understood that as long as the user's finger does not leave the screen of the electronic device, the video recording interface will continuously display target boxes with different sizes in a loop. In addition, the maximum area and the minimum area of the target frame are as described in the foregoing embodiments, and details are not described herein again. Proceed to step S504.
  • step S504 the electronic device determines whether the user has long pressed the screen, or whether the user has long pressed the target frame. It can be understood that the long-press operation here is the continuous pressing of the user's finger on the screen of the electronic device. If the user has not long-pressed the target frame, proceed to S506; if it is determined that the user has long-pressed the target frame, proceed to step S505.
  • step S506 because the user has not long-pressed the target frame, the electronic device determines whether the user clicks the target frame again. If the user does not click the target box again, go back to step S503; if the user clicks the target box again, go to step S512.
  • step S505 since it is determined that the user has long pressed the target frame, the next step is to determine whether the user drags the target frame. If the user drags the target frame, go to step S507; if the user does not drag the target frame, go to step S510.
  • step S507 since it is determined that the user has dragged the target frame, the electronic device acquires the user's drag operation on the target frame, and then proceeds to step S508.
  • step S508 the electronic device determines whether the user cancels the drag operation. If the user does not cancel the drag operation, go back to step S507; if the user cancels the drag operation, go to step S509.
  • step S509 the electronic device determines again whether the user presses the target frame for a long time after dragging the target frame. If the target frame is not continued to be long-pressed, proceed to step S512; if the user still presses the target frame for a long time after dragging the target frame, proceed to step S510.
  • step S510 the electronic device acquires the user's long-press operation on the target frame, and then proceeds to step S511.
  • step S511 the electronic device determines whether the user will cancel the long-pressing operation of the target frame. If it is determined that the long-press operation has not been canceled, return to step S510 to continue acquiring the user's long-press operation on the target frame; if it is determined that the user has canceled the long-press operation, proceed to step S512.
  • step S512 the electronic device acquires information (including but not limited to, area size information and position information) of the target frame in the Nth frame image. Proceed to step S513.
  • information including but not limited to, area size information and position information
  • step S513 the electronic device determines the feature vector of the tracking target according to the area size and position information of the target frame obtained in step S512.
  • the method for determining the feature vector of the tracking target is the embodiment shown in FIG. 3 , and details are not described herein again.
  • the above-mentioned electronic devices and the like include corresponding hardware structures and/or software modules for executing each function.
  • Those skilled in the art should be easily aware that, in conjunction with the units and algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the embodiments of the present invention.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiment of the present invention is schematic, and is only a logical function division, and there may be other division manners in actual implementation. The following is an example of dividing each function module corresponding to each function to illustrate:
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, network device, electronic device, or other programmable apparatus.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available media that can be accessed by a computer, or a data storage device such as a server, data center, etc. that includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, digital video discs (DVDs)), or semiconductor media (eg, SSDs), and the like.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明涉及视频处理领域,公开了一种确定跟踪目标的方法及电子设备,本申请提供一种视频拍摄的方法,电子设备获取N帧图像,然后获取用户的对于屏幕中目标物体的第一操作;电子设备显示面积循环变化的目标框;根据用户对于目标物体的第二操作,获取第N帧图像中目标框的信息;根据目标框的信息,确定目标物体的特征向量。

Description

一种确定跟踪目标的方法及电子设备
本申请要求在2020年12月29日提交中国国家知识产权局、申请号为202011607731.4的中国专利申请的优先权,发明名称为“一种确定跟踪目标的方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理领域,尤其涉及一种确定跟踪目标的方法及电子设备。
背景技术
现在的电子设备大部分都具有视频拍摄功能。在用户进行视频拍摄时,可以让视频画面跟随视频中的某个特定目标,也即无需用户手动调节,视频画面会随着某个特定目标的移动而发生变化,使特定目标始终保持在视频画面的中心位置。上述功能可以称为目标跟踪。
现有技术中,通过目标检测与用户单次点击确定跟踪目标的方式来选定需要跟踪的特定目标。在用户点击特定目标前,电子设备启动目标检测,检测出视频画面中指定类别的物体的位置和大小,并将其显示在目标框内。当用户的点击坐标位于某个目标框内时,即确定该目标框内的物体为需要跟踪的特定目标。现有技术的缺点是目标检测所能检测的物体类别有限,不能实现任意物体的跟踪。并且受限于目标检测算法,如果物体的尺寸不符合算法的要求,则电子设备无法检测到目标。此外,运行目标检测算法需要占用处理器资源,增加功耗。
发明内容
本申请提供的一确定跟踪目标的方法及电子设备,可以让用户快速便捷地确定视频录制场景中目标跟踪模式下的跟踪目标,提升用户选择跟踪目标的范围,以实现跟踪更多种类目标的目的。
第一方面,本申请提供一种视频拍摄的方法,电子设备获取N帧图像,然后获取用户的对于屏幕中目标物体的第一操作;电子设备显示面积循环变化的目标框;
根据用户对于目标物体的第二操作,获取第N帧图像中目标框的信息;
根据目标框的信息,确定目标物体的特征向量。
结合第一方面,可以理解的是,在一些实施方式中,在显示面积循环变化的目标框之后,还包括:确定用户是否有对目标物体的第三操作;
若有第三操作,则确定用户是否有对目标物体的第四操作。
结合第一方面,可以理解的是,在另外一些实施方式中,若用户没有对目标物体的第三操作,则确定用户是否有第二操作;若用户没有第二操作,则继续显示所述面积循环变化的目标框;若用户有第二操作,则获取第N帧图像中所述目标框的信息。
结合第一方面,可以理解的是,在另外一些实施方式中,若确定用户有对目标物体的第四操作,则获取第四操作;确定用户是否取消第四操作;若用户未取消第四操作,则继续获取用户对于目标物体的第四操作;若用户取消第四操作,则确定用户是否对目标物体有第二操作。
结合第一方面,可以理解的是,在另外一些实施方式中,若确定用户没有对目标物体的第四操作,则获取用户对目标物体的第二操作。
结合第一方面,可以理解的是,在另外一些实施方式中,若用户有对目标物体的第二操 作,则获取第二操作;确定用户是否取消第二操作;若用户没有对目标物体的第二操作,则获取第N帧图像中目标框的信息。
结合第一方面,可以理解的是,在另外一些实施方式中,若用户取消第二操作,则获取第N帧图像中目标框的信息;若用户未取消第二操作,则继续获取第二操作。
结合第一方面,可以理解的是,在另外一些实施方式中,第一操作为点击操作,第二操作为二次点击操作,第三操作为长按操作,第四操作为拖动操作。
结合第一方面,可以理解的是,在另外一些实施方式中,目标框的信息包括目标框的面积与位置。
第二方面,本申请提供一种电子设备,包括屏幕、存储器和处理器,屏幕接收用户对于屏幕中的目标物体的操作;
存储器用于存储计算机程序;
处理器用于调用计算机程序,使得电子设备执行上述第一方面中的任一项所述的方法。
第三方面,本申请提供一种计算机存储介质,包括:计算机指令;当计算机指令在电子设备上运行时,使得电子设备执行上述第一方面中的中任一项所述的方法。
附图说明
图1A是本申请实施例提供的一种电子设备获取的图像帧的示意图;
图1B是本申请实施例提供的一种电子设备100的硬件结构示意图;
图1C是本申请实施例提供的一种电子设备100的软件结构示意图
图2A~图2J是本申请实施例提供的一些电子设备100接收用户操作来确定跟踪目标的用户界面示意图;
图3是本申请实施例提供的一种实现目标跟踪的方法示意图;
图4是本申请实施例提供的一种跟踪目标的特征向量示意图;
图5是本申请实施例提供的一种确定跟踪目标的目标框的方法流程图;
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
下面介绍本申请实施例涉及的应用场景以及本申请实施例需要用到的相关术语。
(1)目标跟踪
有拍摄视频功能的电子设备,可以在视频拍摄时对视频画面中的某个特定物体进行跟踪拍摄,也即目标跟踪。目标跟踪的方式有两种:第一种是首先电子设备的图像信号处理器(image signal processor,ISP)获取到一帧广角图像,如图1A所示,ISP获取到的广角图像是 图像101。如果确定跟踪目标是人物103,则以人物103为中心,裁切一部分图像(如图像102),并将图像102重新确定大小,并固定裁切图像的分辨率(如将分辨率固定为1080P),然后将经过上述处理的图像显示到屏幕上,展示给用户。第二种是使用辅助设备(如云台等),这些辅助设备可以随着用户的运动进行旋转和切换镜头来实现不同视野的调整,从而能够持续拍摄特定目标。本申请中的目标跟踪使用上述两种方式中的第一种。
电子设备的摄像头可以获取到连续的图像帧,用户可以在电子设备显示的连续的图像帧中第N帧选定跟踪目标。本申请实施例中的目标跟踪也可以是指,电子设备可以在连续的第N帧图像后(如:第N+1帧)的图像帧中确定出用户选定的跟踪目标。电子设备可以标注出图像帧中的跟踪目标。
在本申请中,电子设备是可以显示连续的图像帧的设备,如手机、平板电脑、台式电脑、电视机等。本申请实施例对电子设备不做限定。在本申请的实施例中,电子设备显示的一帧图像可以称为图像帧或者第N帧图像。
(2)跟踪目标
在本申请的实施例中,将第N帧图像中用户确定的某个特定物体(例如,人、植物或动物、汽车等)称为跟踪目标。
(3)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以参考下述公式(1):
Figure PCTCN2021139276-appb-000001
其中,s=1、2、……、n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(4)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(5)特征匹配算法
本申请用到的特征匹配算法是,将跟踪目标的目标小图的特征(例如特征为6*6*128)作为卷积核,在图像帧特征图(例如特征为22*22*128)上进行卷积,得到得分地图(Score map),上述Score map为一个分值矩阵(例如17*17的矩阵)。在Score map上分值越大的点(对应图像帧上的一片区域),表示对应图像帧上的这一片区域与目标的特征越相像,也即该片区域是跟踪目标所在位置的可能性越高。
(6)中心加权算法
在特征匹配算法中得到的Score map上,增加一个余弦窗,也即给分值矩阵中的每个点乘以一个系数。最中心的点对应上一帧目标所在的位置,系数为1。从中心点向外扩散,每个点乘以的系数越来越小,表示远离中心的点惩罚增加。
下面,介绍本申请以下实施例中提供的示例性的电子设备100。
图1B示出了电子设备100的结构示意图。
下面以电子设备100为例对实施例进行具体说明。应该理解的是,图1B所示电子设备100仅是一个范例,并且电子设备100可以具有比图1中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。显示屏194可以接受用户的点击或者滑动操作,来确定视频拍摄中的跟踪目标。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机 接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
图1C是本申请实施例的电子设备100的软件结构框图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。应用程序层可以包括一系列应用程序包。
如图1C所示,应用程序包可以包括相机、图库、日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架,应用程序框架层包括一些预先定义的函数。
如图1C所示,应用程序框架层可以包括窗口管理器、内容提供器、视图系统、电话管理器、资源管理器、通知管理器等。
窗口管理器用于管理窗口程序,窗口管理器可以获取显示屏大小,判断是否有状态栏、锁定屏幕、截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频、图像、音频、拨打和接听的电话、浏览历史和书签、电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串、图标、图片、布局文件、视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息、发出提示音、电子设备振动、指示灯闪烁等。
系统库可以包括多个功能模块。例如:表面管理器(surface manager)、媒体库(media libraries)、三维图形处理库(例如:OpenGL ES)、2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频、视频格式回放和录制以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4、H.264、MP3、AAC、AMR、JPG和PNG等。
三维图形处理库用于实现三维图形绘图、图像渲染、合成和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动、摄像头驱动、音频驱动、传感器驱动。
下面介绍本申请涉及的一些典型的拍摄场景。
如图2A所示,电子设备100可以包括摄像头21。其中,摄像头21可以为前置摄像头。摄像头21还可以包含后置摄像头。电子设备100可以显示如图2A所示的用户界面200。用户界面200可以包括应用图标显示区域23、具有常用应用程序图标的托盘24。其中:
应用图标显示区域23可以包含图库图标231,响应于作用在图库图标231的用户,例如触摸操作,电子设备100可以开启图库应用程序,从而显示电子设备100中存储的图片和视频等信息。电子设备100中存储的图片和视频中包括电子设备100通过相机应用程序拍摄的照片和视频。应用图标显示区域23还可以包括更多的应用程序图标,例如智慧生活、设置、日历、畅联通话、时钟、应用商城、备忘录等图标,本申请实施例对此不做限定。
具有常用应用程序图标的托盘24可展示相机图标22。响应于作用在相机图标22上的用户操作,例如触摸操作,电子设备100可以开启相机应用程序,从而进行拍照以及录像等功能。其中,电子设备100开启相机应用程序时,可以开启摄像头21(前置摄像头和/或后置摄像头),来实现拍照以及录像等功能。具有常用应用程序图标的托盘23还可以展示更多的应用程序图标,例如电话、信息、通讯录等图标,本申请实施例对此不作限定。
用户界面200还可以包含更多或更少的内容,例如显示当前时间和日期的控件、显示天气的控件等等。可以理解的是,图2A仅仅示例性示出了电子设备100上的用户界面,不应构成对本申请实施例的限定。
响应于作用在相机图标22上的用户操作,电子设备100可以显示如图2B所示的用户界面210。用户界面210可以包括预览区域221、编辑控件220、比例调整控件219、闪光灯控件218、智能识物控件217、滤镜控件216、设置控件215、相机模式选择轮盘211、图库快捷控件214、快门控件212、摄像头翻转控件213等。其中:
预览区域221可以用于显示摄像头21实时采集的图像。电子设备可以实时刷新其中的显示内容,以便于用户预览摄像头21当前采集的图像。
编辑控件200可以用于给当前实时采集的图像添加涂鸦或者表情。
比例调整控件219可以用于调整当前显示的预览区域221的显示比例,如16:9、4:3等。
闪光灯控件218可以用于开启或者关闭闪光灯。
智能识物控件217可以用于使用人工智能算法识别当前采集的图像的类别,并做相应的处理。
滤镜控件216可以用于模拟相机的滤光镜,调整光源的颜色。
设置控件215可以用于调整拍摄照片的参数以及开启或关闭一些用于拍照的方式(如定时拍照、微笑抓拍、声控拍照等)。设置控件215可以用于设置更多其他拍摄的功能,如设置画面比例、设置视频分辨率、设置跟拍模式、设置水印等,本申请实施例不作限定。
相机模式选择轮盘211中可以显示有一个或多个拍摄模式选项。这一个或多个拍摄模式选项可以包括:夜景模式、人像模式、拍照模式、录像模式、专业模式等。这一个或多个拍摄模式选项在界面上可以表现为文字信息,例如“夜景”、“人像”、“拍照”、“录像”、“专业”、“更多”。不限于此,这一个或多个摄像选项在界面上海可以表现为图标或者其他形式的交互元素(interactive element,IE)。当检测到作用与录像模式选项上的用户操作,电子设备100可以开启用户选择的拍摄模式。不限于图2B所示,相机模式选择轮盘211中该可以包含更多或更少的拍摄模式选项。用户可以通过在相机模式选择轮盘211中向左/右滑动来浏览其他拍摄模式选项。
图库快捷控件214可以用于开启图库应用程序。响应于作用在图库快捷控件214上的用户操作,例如点击操作,电子设备100可以开启图库应用程序。这样,用户可以便捷地查看拍摄的照片和视频,而无需先退出相机应用程序,再开启图库应用程序。图库应用程序是智能手机、平板电脑等电子设备上的一款图片管理的应用程序,又可以称为“相册”,本实施例对该应用程序的名称不做限制。图库应用程序可以支持用户对存储于电子设备100上的图片进行各种操作,例如浏览、编辑、删除、选择等操作。
快门控件212可以用于监听触发拍照的用户操作。电子设备100可以检测到作用于快门控件212的用户操作,响应于该操作,电子设备100可以将预览区域221中的图像保存为图库应用程序中的图片。另外,电子设备100还可以在图库快捷控件213中显示所保存的图像的缩略图。也即是说,用户可以点击快门控件212来触发拍照。其中,快门控件212可以是按钮或者其他形式的控件。
摄像头翻转控件213可以用于监听触发翻转摄像头的用户操作。电子设备100可以检测到作用于摄像头翻转控件213的用户操作,例如点击操作,响应于该操作,电子设备100可以翻转用于拍摄的摄像头,例如将后置摄像头切换为前置摄像头,或者将前置摄像头切换为后置摄像头。
用户界面210还可以包含更多或者更少的内容,本申请实施例对此不作限定。
图2C至图2J示例性的示出了电子设备100进行视频拍摄时采用目标跟踪的用户界面。
如图2B所示,响应于用户对于设置控件215的点击操作,进入设置界面。示例性的,设置界面可以为如图2C所示的用户界面230。在用户界面230中,可以包括但不限于拍照类的设置选项、视频类的设置选项、通用类的设置选项等。在拍照类的设置选项中,可以包括画面比例选项,例如画面比例可以为4:3;还可以包括声控拍照选项,例如可以将声控拍照打开或者关闭。在视频类的设置选项中,可以包括视频分辨率选项,例如1080P;还可以包括跟踪模式,例如可以将跟踪模式打开或者关闭,在跟踪模式下电子设备可以使用目标跟踪的功能。在通用类的设置选中中,可以包括自动添加水印选项,例如可以将自动添加水印选项打开或者关闭;还可以包含定时拍摄选项,例如可以将定时拍摄选项打开或者关闭。
响应于用户对于跟踪模式选项的点击,可以开启或者关闭跟踪模式。在开启跟踪模式的情况下,如图2B所示,响应于用户对于快门控件212的点击操作,电子设备100开始录像,也即拍摄视频。在开始拍摄视频后,如图2D所示,在电子设备上显示用户界面240。在用户界面240中,预览区域靠上的位置有跟踪指示开关241。
跟踪指示开关241能够表示目前是否打开目标跟踪模式,响应于用户对于跟踪指示开关241的操作,可以打开或者关闭跟踪模式。在目标跟踪模式打开的情况下,用户可以进行确定跟踪目标的操作。
在用户界面240中,有一个人物244(也可以是任意物体)。在视频开始录制后,在用户 界面240的下方,有时间控件242用来显示视频开始录制的时长。电子设备100开始录制视频后,用户可以在用户界面240中点击人物244,在用户点击的位置处会出现以触摸处为圆心(或中心)的目标框243。
目标框243可以是圆形也可以是其他形状,本申请对目标框的形状不做限定。在一些实施例中,在跟踪模式开启的情况下,用户点击人物244后,目标框243会随着时间的变化而改变形状的大小。
在另外一些实施例中,用户需要长按在人物244上,目标框243才会随着时间的变化而改变形状的大小。如图2E所示,在人物244静止的情况下,用户持续按压触摸处,随着时间的增长,例如时间控件242由图2D中的00:02到了图2E中的00:03,目标框243的面积逐渐变大。
随着用户按压人物244的时间增加,如图2F所示,时间控件242由图2E中的00:03增加到了图2F中的00:05,目标框243的面积相比图2E中的目标框243的面积又进一步增大,以覆盖更大的目标范围。
如果用户继续持续按压人物244,则如图2G所示,时间控件242由图2F中国的00:05增加到了00:06,目标框243的面积相比图2F中的目标框243的面积减小了。
在上述实施例中,随着用户持续按压目标,目标框243的面积会循环增大到减小,如图2D至图2G所示。可以理解的是,目标框243面积增大的最大范围为视频录制界面的全部显示界面,目标框243的最小范围为用户点击目标时屏幕的触摸点。上述实施例的优点在于,可以让用户更加自由地选择跟踪目标,并且能够使用户更加方便地选择大小不同的跟踪目标。
在一些实施例中,在跟踪模式打开的情况下,用户可以在选取跟踪目标的时候,如果未能点击到目标的中心位置,用户可以拖动手指,来达到目标框243的中心随着手指拖动而变动的目的。随着目标框243的中心位置移动,目标框243的面积也会循环式的增大并且缩小。如图2H所示,用户初始点击人物244的上方位置,产生初始目标框243。随着用户将目标框243的中心向下拖动,目标框243的面积逐步增大,显示为目标框243a。可以理解的是,在上述实施例中只要用户的手指不离开屏幕,以用户点击位置为中心的目标框243会循环增大和减小;并且,用户的拖动手势和目标框243面积的变化可以同步进行。
在另外一些实施例中,在跟踪模式打开的情况下,视频录制界面中的物体(例如人物)可能会发生移动或者运动。在这种场景下,用户点击屏幕的位置可以随着物体的移动而移动,或者随着人物的运动而运动,也即用户点击屏幕的位置始终在物体上并跟随物体移动。可以理解的是,在上述场景下,目标框243的面积也会随着用户按压时间的增长而循环增大和减小。如图2I所示,人物244从视频录制界面中的最左侧运动到最右侧,用户在人物244出现在视频录制界面的最左侧时点击人物244,产生初始目标框243;随着人物244运动到视频录制界面的中间位置,用户的手指跟随人物244的运动而移动到屏幕的中间位置,初始目标框243的面积逐步增大并显示为目标框243b;随着人物244运动到视频录制界面的最右侧,用户的手指跟随人物244的运动而移动到屏幕的右侧位置,目标框243b的面积逐步减小并显示为目标框243c。在上述实施例中,用户可以更加便捷地将运动中的物体选取为跟踪目标。
在一些实施例中,在跟踪模式打开的情况下,视频录制界面中的物体(例如人物)可能会移动至视频录制界面的外部,可能会导致因为物体的特征缺失而无法确定跟踪的目标。在上述的实施例中,在物体移动出视频录制界面外的情况下,视频录制界面中会出现提示用户转动电子设备的文字或者图标提示。如图2J所示,在视频录制界面250中,人物244移动到了视频录制界面250的边缘,此时如果人物244继续移动,目标框243无法获取人物244的 特征,从而无法确定跟踪目标。在电子设备检测到人物244移动到视频录制界面250边缘时,在视频录制界面250上出现提示245,可以理解的是,提示245可以是文字也可以是图标等其他形式,本申请对此不做限定。提示245可以是“请转动手机,以保持目标在画面中”。在上述实施例中,电子设备可以智能提示用户,让用户能够避免因未注意或者不知道如何操作而导致无法确定跟踪目标的情况。
下面介绍本申请中确定跟踪目标的一些方法。
如图3所示,电子设备在视频录制的跟踪模式下,确定跟踪目标的方法包括步骤S310。在步骤S310中,电子设备确认初始跟踪目标。具体跟踪目标的确认方法如本申请前述实施例所述,在此不再赘述。在用户确认跟踪目标后,进入步骤S320。
在步骤S320中,电子设备分别提取跟踪目标的特征和当前视频帧的特征。例如,跟踪目标的特征可以用特征向量F1(N)表示,视频帧的特征可以用F(N)表示。电子设备在步骤S322中提取F1(N),在步骤S321中提取F(N)。
在步骤S330中,获取特征向量F1(N)与特征向量F(N)后,电子设备执行前述特征匹配算法与中心加权算法,对F1(N)与F(N)进行特征匹配。如图4所示,电子设备可以通过特征提取算法对跟踪目标图像进行特征提取,得到并保存跟踪目标图像的特征(例如纹理特征、轮廓特征、颜色特征等等),具体的,电子设备可以对踪目标图像进行特征提取得到的特征向量表示跟踪目标图像对应的跟踪目标的特征。可以将第N帧图像中指定目标(人物244)的特征用特征向量F1(N)表示。特征向量可以表示跟踪目标的颜色特征、纹理特征、轮廓特征等其他特征。例如,特征向量F1(N)可以表示指定目标(人物244)的纹理特征,轮廓特征、颜色特征等等中的一项或多项。此处将指定目标(人物244)的特征向量F1(N)具体形式,以及大小不作限定。例如,F1(N)可以是包含n个数值的特征向量[0.5,0.6,0.8,…,0.9,0.7,0.3]。其中,n为整数,可以是128,256,512等等,n的大小不作限定。电子设备将指定目标(人物244)的特征提取之后,将指定目标(人物244)的特征保存至跟踪模板中,跟踪模板中存储特征向量F(N)。
跟踪模板可用于表示跟踪目标的一个或多个特征。电子设备提取跟踪目标的特征之后,电子设备将跟踪目标的特征保存至跟踪模板中。在后续连续的视频帧中,当电子设备将图像帧中的指定目标的特征与跟踪模板中跟踪目标的特征进行匹配。
在步骤S340中,若匹配成功,则电子设备在该图像帧中确定该指定目标为跟踪目标。当用户指定跟踪目标后,电子设备将在第N帧图像之后的图像帧中对跟踪目标进行跟踪。
接下来介绍电子设备如何对跟踪目标进行跟踪的。
电子设备可以采取任意一种方式对跟踪目标进行跟踪。
在第N帧图像之后的连续的图像帧中,电子设备以上一帧图像中目标框243的中心为搜索中心,以其目标框的尺寸的M倍为搜索区域来对跟踪目标进行跟踪。电子设备根据跟踪目标的特征和搜索区域内的每个像素点得到搜索区域内每个像素点的响应值,若搜索区域内像素点对应的最大响应值大于预设响应值,则搜索区域内有跟踪目标。则电子设备标记出该帧图像中该跟踪目标所处的位置。之后,电子设备自动对焦至该跟踪目标,以使跟踪目标拍摄地更清楚。
在步骤S350中,当电子设备在第N+1帧图像中确定跟踪目标的中心点之后,电子设备将以跟踪目标的中心点为中心,以确定尺寸大小的裁剪框对获取到的第N+1帧图像进行裁剪,得到新的图像。一般的,裁剪框的预设尺寸要比目标框243的尺寸大。示例性的,裁剪框的预设尺寸可以是电子设备获取到的原始图像的尺寸一半或四分之三,本申请对于裁剪框的预 设尺寸大小不做限定。
在一些实施例中,当电子设备在视频录制界面中标记出跟踪目标在第N+1帧图像帧中的位置后,可以在跟踪目标的位置上或周围显示出指定目标对应的类。
在电子设备的预览流中,显示经过裁剪的新的图像。
下面介绍本申请中,确定目标框的一些方法。
在步骤S501中,电子设备先获取视频流,视频流中包含N帧图像帧。
在步骤S502中,电子设备获取用户对于屏幕中目标物体的第一操作(例如,点击操作),并进入步骤S503。可以理解的是,该目标物体可以是静止或者运动的物体或者人物、动物等,本申请对此不做限定。
在步骤S503中,经过用户的点击操作后,视频录制界面中显示面积不断循环变化的(例如,大小不同的),并且以点击点为中心的目标框。可以理解的是,只要用户的手指没有离开电子设备的屏幕,则视频录制界面会持续循环显示面积大小不同的目标框。并且目标框的最大面积和最小面积如上述实施例所述,在此不再赘述。进入步骤S504。
在步骤S504中,电子设备判断用户是否长按屏幕,或者判断用户是否长按目标框。可以理解的是,此处的长按操作就是用户的手指对于电子设备屏幕的持续按压。如果用户没有长按目标框,则进入S506;如果确定用户长按了目标框,则进入步骤S505。
在步骤S506中,因为用户没有长按目标框,电子设备判断用户是否再次点击目标框。如果用户没有再次点击目标框,则返回步骤S503;如果用户再次点击目标框,则进入步骤S512。
在步骤S505中,因为确定用户长按目标框,则下一步判断用户是否拖动目标框。如果用户拖动了目标框,则进入步骤S507;如果用户没有拖动目标框,则进入步骤S510。
在步骤S507中,因为确定用户拖动了目标框,则电子设备获取用户对于目标框的拖动操作,然后进入步骤S508。
在步骤S508中,电子设备判断用户是否取消了拖动操作。如果用户没有取消拖动操作,则返回步骤S507;如果用户取消了拖动操作,则进入步骤S509。
在步骤S509中,电子设备再次判断用户是否在拖动目标框后长按目标框。如果没有继续长按目标框,则进入步骤S512;如果在用户拖动目标框后,仍然长按目标框,则进入步骤S510。
在步骤S510中,电子设备会获取用户对于目标框的长按操作,进入步骤S511。
在步骤S511中,电子设备判断用户是否会取消会目标框的长按操作。如果确定没有取消长按操作,则返回步骤S510继续获取用户对于目标框的长按操作;如果确定用户取消了长按操作,则进入步骤S512。
在步骤S512中,电子设备获取第N帧图像中目标框的信息(包括但不限于,面积大小信息和位置信息)。进入步骤S513。
在步骤S513中,电子设备根据步骤S512获取到的目标框的面积大小和位置信息来确定跟踪目标的特征向量。确定跟踪目标的特征向量的方法如图3所述的实施例,在此不再赘述。
可以理解的是,上述步骤之间的顺序不构成对本实施例的限制,也即本实施例中可以没有上述步骤中的部分,上述步骤也可以根据需要进行不同程度的组合。
其他内容参考上文相关内容的描述,不再赘述。
可以理解的是,上述电子设备等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的 各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的范围。
本申请实施例可以根据上述方法示例对上述电子设备等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以采用对应各个功能划分各个功能模块为例进行说明:
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例描述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、电子设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,SSD)等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上,仅为本申请的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应 涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以权利要求的保护范围为准。

Claims (11)

  1. 一种视频拍摄的方法,其特征在于,获取N帧图像;
    获取用户的对于屏幕中目标物体的第一操作;
    显示面积循环变化的目标框;
    根据用户对于所述目标物体的第二操作,获取第N帧图像中所述目标框的信息;
    根据所述目标框的信息,确定所述目标物体的特征向量。
  2. 根据权利要求1所述的方法,其特征在于,在所述显示面积循环变化的目标框之后,还包括:确定用户是否有对所述目标物体的第三操作;
    若有所述第三操作,则确定用户是否有对所述目标物体的第四操作。
  3. 根据权利要求2所述的方法,其特征在于,若用户没有对所述目标物体的所述第三操作,则确定用户是否有第二操作;若用户没有第二操作,则继续显示所述面积循环变化的目标框;若用户有所述第二操作,则获取所述第N帧图像中所述目标框的信息。
  4. 根据权利要求2所述的方法,其特征在于,若确定用户有对所述目标物体的所述第四操作,则获取所述第四操作;确定所述用户是否取消所述第四操作;若用户未取消所述第四操作,则继续获取所述用户对于所述目标物体的第四操作;若用户取消所述第四操作,则确定用户是否对所述目标物体有所述第二操作。
  5. 根据权利要求2所述的方法,其特征在于,若确定用户没有对所述目标物体的所述第四操作,则获取所述用户对所述目标物体的所述第二操作。
  6. 根据权利要求4所述的方法,其特征在于,若用户有对所述目标物体的所述第二操作,则获取所述第二操作;确定用户是否取消所述第二操作;若用户没有对所述目标物体的所述第二操作,则获取所述第N帧图像中所述目标框的信息。
  7. 根据权利要求6所述的方法,其特征在于,若用户取消所述第二操作,则获取所述第N帧图像中所述目标框的信息;若用户未取消所述第二操作,则继续获取所述第二操作。
  8. 根据权利要求1至7所述的方法,其特征在于,所述第一操作为点击操作,所述第二操作为二次点击操作,所述第三操作为长按操作,所述第四操作为拖动操作。
  9. 根据权利要求1所述的方法,其特征在于,所述目标框的信息包括目标框的面积与位置。
  10. 一种电子设备,包括屏幕、存储器和处理器,其特征在于,所述屏幕接收用户对于所述屏幕中的目标物体的操作;
    所述存储器用于存储计算机程序;
    所述处理器用于调用所述计算机程序,使得所述电子设备执行权利要求1至9中任一项 所述的方法。
  11. 一种计算机存储介质,其特征在于,包括:计算机指令;当所述计算机指令在电子设备上运行时,使得所述电子设备执行权利要求1至9中任一项所述的方法。
PCT/CN2021/139276 2020-12-29 2021-12-17 一种确定跟踪目标的方法及电子设备 WO2022143230A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/259,718 US20240062392A1 (en) 2020-12-29 2021-12-17 Method for determining tracking target and electronic device
EP21913971.4A EP4258649A4 (en) 2020-12-29 2021-12-17 METHOD FOR DETERMINING A TRACKING TARGET, AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011607731.4 2020-12-29
CN202011607731.4A CN114697525B (zh) 2020-12-29 2020-12-29 一种确定跟踪目标的方法及电子设备

Publications (1)

Publication Number Publication Date
WO2022143230A1 true WO2022143230A1 (zh) 2022-07-07

Family

ID=82132282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139276 WO2022143230A1 (zh) 2020-12-29 2021-12-17 一种确定跟踪目标的方法及电子设备

Country Status (4)

Country Link
US (1) US20240062392A1 (zh)
EP (1) EP4258649A4 (zh)
CN (1) CN114697525B (zh)
WO (1) WO2022143230A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678809A (zh) * 2016-01-12 2016-06-15 湖南优象科技有限公司 手持式自动跟拍装置及其目标跟踪方法
CN106375682A (zh) * 2016-08-31 2017-02-01 深圳市大疆创新科技有限公司 影像处理方法、装置与可移动设备、无人机遥控器及系统
CN108259703A (zh) * 2017-12-31 2018-07-06 深圳市秦墨科技有限公司 一种云台的跟拍控制方法、装置及云台
US20190279681A1 (en) * 2018-03-09 2019-09-12 Apple Inc. Real-time face and object manipulation
CN111316630A (zh) * 2018-11-28 2020-06-19 深圳市大疆创新科技有限公司 手持云台及其拍摄控制方法
CN111382705A (zh) * 2020-03-10 2020-07-07 创新奇智(广州)科技有限公司 逆行行为检测方法、装置、电子设备及可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4912117B2 (ja) * 2006-10-27 2012-04-11 三洋電機株式会社 追跡機能付き撮像装置
CN105827952B (zh) * 2016-02-01 2019-05-17 维沃移动通信有限公司 一种去除指定对象的拍照方法及移动终端
JP2018005555A (ja) * 2016-07-01 2018-01-11 ソニー株式会社 画像処理装置、情報処理装置、および方法、並びにプログラム
CN107491742B (zh) * 2017-07-28 2020-10-23 西安因诺航空科技有限公司 一种长时稳定的无人机目标跟踪方法
WO2019023915A1 (zh) * 2017-07-31 2019-02-07 深圳市大疆创新科技有限公司 一种视频处理方法、设备、飞行器及系统
CN109831622B (zh) * 2019-01-03 2021-06-22 华为技术有限公司 一种拍摄方法及电子设备
CN112119427A (zh) * 2019-06-28 2020-12-22 深圳市大疆创新科技有限公司 目标跟随的方法、系统、可读存储介质和可移动平台
CN110503042B (zh) * 2019-08-23 2022-04-19 Oppo广东移动通信有限公司 图像处理方法、装置以及电子设备
CN111340736B (zh) * 2020-03-06 2024-03-15 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质及电子设备
CN111815669B (zh) * 2020-06-23 2023-02-28 浙江大华技术股份有限公司 目标跟踪方法、目标跟踪装置及存储装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678809A (zh) * 2016-01-12 2016-06-15 湖南优象科技有限公司 手持式自动跟拍装置及其目标跟踪方法
CN106375682A (zh) * 2016-08-31 2017-02-01 深圳市大疆创新科技有限公司 影像处理方法、装置与可移动设备、无人机遥控器及系统
CN108259703A (zh) * 2017-12-31 2018-07-06 深圳市秦墨科技有限公司 一种云台的跟拍控制方法、装置及云台
US20190279681A1 (en) * 2018-03-09 2019-09-12 Apple Inc. Real-time face and object manipulation
CN111316630A (zh) * 2018-11-28 2020-06-19 深圳市大疆创新科技有限公司 手持云台及其拍摄控制方法
CN111382705A (zh) * 2020-03-10 2020-07-07 创新奇智(广州)科技有限公司 逆行行为检测方法、装置、电子设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4258649A4

Also Published As

Publication number Publication date
EP4258649A1 (en) 2023-10-11
US20240062392A1 (en) 2024-02-22
CN114697525A (zh) 2022-07-01
CN114697525B (zh) 2023-06-06
EP4258649A4 (en) 2024-05-22

Similar Documents

Publication Publication Date Title
WO2021238325A1 (zh) 一种图像处理方法及装置
WO2021027725A1 (zh) 显示页面元素的方法和电子设备
WO2021190078A1 (zh) 短视频的生成方法、装置、相关设备及介质
WO2021244295A1 (zh) 拍摄视频的方法和装置
US20220343648A1 (en) Image selection method and electronic device
CN113099146B (zh) 一种视频生成方法、装置及相关设备
CN110830645B (zh) 一种操作方法和电子设备及计算机存储介质
CN114008575A (zh) 一种生成用户头像的方法及电子设备
WO2021180046A1 (zh) 图像留色方法及设备
CN113536866A (zh) 一种人物追踪显示方法和电子设备
WO2023093169A1 (zh) 拍摄的方法和电子设备
CN110290426B (zh) 展示资源的方法、装置、设备及存储介质
WO2021103919A1 (zh) 构图推荐方法和电子设备
WO2022156473A1 (zh) 一种播放视频的方法及电子设备
WO2022228042A1 (zh) 显示方法、电子设备、存储介质和程序产品
CN113538227A (zh) 一种基于语义分割的图像处理方法及相关设备
WO2022057384A1 (zh) 拍摄方法和装置
WO2024179101A1 (zh) 一种拍摄方法
WO2022143230A1 (zh) 一种确定跟踪目标的方法及电子设备
WO2022247614A1 (zh) 一种多界面显示的方法和电子设备
WO2022127609A1 (zh) 图像处理方法及电子设备
WO2024152676A1 (zh) 一种窗口管理方法以及电子设备
WO2023231696A1 (zh) 一种拍摄方法及相关设备
WO2022253053A1 (zh) 一种播放视频的方法及装置
WO2024067129A1 (zh) 一种系统、歌单生成方法以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913971

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18259718

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2021913971

Country of ref document: EP

Effective date: 20230704

NENP Non-entry into the national phase

Ref country code: DE