WO2021082883A1 - 主体检测方法和装置、电子设备、计算机可读存储介质 - Google Patents

主体检测方法和装置、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2021082883A1
WO2021082883A1 PCT/CN2020/120116 CN2020120116W WO2021082883A1 WO 2021082883 A1 WO2021082883 A1 WO 2021082883A1 CN 2020120116 W CN2020120116 W CN 2020120116W WO 2021082883 A1 WO2021082883 A1 WO 2021082883A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
subject
current frame
frame
frame image
Prior art date
Application number
PCT/CN2020/120116
Other languages
English (en)
French (fr)
Inventor
贾玉虎
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to EP20881883.1A priority Critical patent/EP4044579A4/en
Publication of WO2021082883A1 publication Critical patent/WO2021082883A1/zh
Priority to US17/711,455 priority patent/US20220222830A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • H04N23/675Focus control based on electronic image sensor signals comprising setting of focusing regions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Definitions

  • This application relates to the field of computer technology, in particular to a subject detection method and device, electronic equipment, and computer-readable storage media.
  • the embodiments of the present application provide a subject detection method, device, electronic equipment, and computer-readable storage medium, which can ensure the accuracy of subject detection.
  • a subject detection method including:
  • the tracking frame of the previous frame image of the current frame image is acquired, the first center weight map is determined according to the tracking frame of the previous frame image, and the first center weight map is determined according to the first frame image.
  • the center weight map traverses the current frame image to obtain the target subject of the current frame image; wherein, each pixel in the first center weight map has a corresponding weight value, and the first center weight map
  • the weight value of is gradually reduced from the center to the edge, and the tracking frame is the area where the target subject is located in the previous frame of image.
  • a main body detection device includes:
  • An acquisition module configured to acquire a current frame image, and detect whether there is a focus frame generated by a user trigger in the current frame image
  • the first determining module is configured to determine the target subject of the current frame image according to the image in the focus frame when the focus frame exists in the current frame image;
  • the second determining module is configured to obtain the tracking frame of the previous frame image of the current frame image when there is no focus frame in the current frame image, and determine the first center weight according to the tracking frame of the previous frame image Figure, traverse the current frame image according to the first central weight map to obtain the target subject of the current frame image; wherein, each pixel in the first central weight map has a corresponding weight value, so The weight value in the first center weight map gradually decreases from the center to the edge, and the tracking frame is the area where the target subject is located in the previous frame of image.
  • An electronic device includes a memory and a processor, and a computer program is stored in the memory.
  • the processor is caused to perform the following operations:
  • the tracking frame of the previous frame image of the current frame image is acquired, the first center weight map is determined according to the tracking frame of the previous frame image, and the first center weight map is determined according to the first frame image.
  • the center weight map traverses the current frame image to obtain the target subject of the current frame image; wherein, each pixel in the first center weight map has a corresponding weight value, and the first center weight map
  • the weight value of is gradually reduced from the center to the edge, and the tracking frame is the area where the target subject is located in the previous frame of image.
  • the tracking frame of the previous frame image of the current frame image is acquired, the first center weight map is determined according to the tracking frame of the previous frame image, and the first center weight map is determined according to the first frame image.
  • the center weight map traverses the current frame image to obtain the target subject of the current frame image; wherein, each pixel in the first center weight map has a corresponding weight value, and the first center weight map
  • the weight value of is gradually reduced from the center to the edge, and the tracking frame is the area where the target subject is located in the previous frame of image.
  • the subject detection method and device, electronic equipment, and computer-readable storage medium described above acquire the current frame image, and detect whether there is a focus frame generated by the user trigger in the current frame image.
  • the image to determine the target subject of the current frame image when there is no focus frame in the current frame image, the tracking frame of the previous frame image of the current frame image is obtained, and the first center weight map is determined according to the tracking frame of the previous frame image.
  • the first center weight map traverses the current frame image to obtain the target subject of the current frame image; wherein, each pixel in the first center weight map has a corresponding weight value, and the weight values in the first center weight map range from center to The edge gradually decreases, and the tracking frame is the area where the target subject in the previous frame image is located, which can more accurately identify the target subject in the current frame image.
  • Fig. 1 is a schematic diagram of an image processing circuit in an embodiment
  • Figure 2 is a flowchart of a subject detection method in an embodiment
  • FIG. 3 is a flowchart of the operation of obtaining the target subject of the current frame image according to the first center weight map in an embodiment
  • FIG. 5 is a flowchart of obtaining the target subject in the current frame image in an embodiment
  • FIG. 6 is a flowchart of an operation of determining the target subject of the current frame image according to the image in the focus frame in an embodiment
  • FIG. 7 is a schematic diagram of detecting a target subject through a subject segmentation network in an embodiment
  • FIG. 8 is a flowchart of the operation of generating a target filter in an embodiment
  • FIG. 9 is a schematic diagram of generating a target filter in another embodiment.
  • FIG. 10 is a schematic diagram of performing dot multiplication processing on the image in the focus frame and the second center weight map in an embodiment
  • Figure 11 is a schematic diagram of subject detection in another embodiment
  • Figure 12 is a structural block diagram of a main body detection device in an embodiment
  • Fig. 13 is a schematic diagram of the internal structure of an electronic device in an embodiment.
  • first, second, etc. used in this application can be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish the first element from another element.
  • first body region may be referred to as the second body region, and similarly, the second body region may be referred to as the first body region. Both the first body region and the second body region are body regions, but they are not the same body region.
  • the embodiment of the present application provides an electronic device.
  • the above-mentioned electronic equipment includes an image processing circuit, which can be implemented by hardware and/or software components, and can include various processing units that define an ISP (Image Signal Processing, image signal processing) pipeline.
  • Fig. 1 is a schematic diagram of an image processing circuit in an embodiment. As shown in FIG. 1, for ease of description, only various aspects of the image processing technology related to the embodiments of the present application are shown.
  • the image processing circuit includes an ISP processor 140 and a control logic 150.
  • the image data captured by the imaging device 110 is first processed by the ISP processor 140, and the ISP processor 140 analyzes the image data to capture image statistics that can be used to determine and/or one or more control parameters of the imaging device 110.
  • the imaging device 110 may include a camera having one or more lenses 112 and an image sensor 114.
  • the image sensor 114 may include a color filter array (such as a Bayer filter). The image sensor 114 may obtain the light intensity and wavelength information captured by each imaging pixel of the image sensor 114, and provide a set of raw materials that can be processed by the ISP processor 140. Image data.
  • the attitude sensor 120 (such as a three-axis gyroscope, a Hall sensor, and an accelerometer) can provide the collected image processing parameters (such as anti-shake parameters) to the ISP processor 140 based on the interface type of the attitude sensor 120.
  • the interface of the attitude sensor 120 may use an SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the foregoing interfaces.
  • the image sensor 114 may also send the original image data to the posture sensor 120.
  • the sensor 120 can provide the original image data to the ISP processor 140 based on the interface type of the posture sensor 120, or the posture sensor 120 can store the original image data in the image memory 130. in.
  • the ISP processor 140 processes the original image data pixel by pixel in a variety of formats.
  • each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 140 may perform one or more image processing operations on the original image data, and collect statistical information about the image data. Among them, the image processing operations can be performed with the same or different bit depth accuracy.
  • the ISP processor 140 may also receive image data from the image memory 130.
  • the posture sensor 120 interface sends the original image data to the image memory 130, and the original image data in the image memory 130 is then provided to the ISP processor 140 for processing.
  • the image memory 130 may be a part of a memory device, a storage device, or an independent dedicated memory in an electronic device, and may include DMA (Direct Memory Access) features.
  • the ISP processor 140 may perform one or more image processing operations, such as temporal filtering.
  • the processed image data can be sent to the image memory 130 for additional processing before being displayed.
  • the ISP processor 140 receives the processed data from the image memory 130, and performs image data processing in the original domain and in the RGB and YCbCr color spaces on the processed data.
  • the image data processed by the ISP processor 140 may be output to the display 160 for viewing by the user and/or further processed by a graphics engine or a GPU (Graphics Processing Unit, graphics processor).
  • the output of the ISP processor 140 can also be sent to the image memory 130, and the display 160 can read image data from the image memory 130.
  • the image memory 130 may be configured to implement one or more frame buffers.
  • the statistical data determined by the ISP processor 140 may be sent to the control logic 150 unit.
  • the statistical data may include image sensor 114 statistical information such as the vibration frequency of the gyroscope, automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, and lens 112 shadow correction.
  • the control logic 150 may include a processor and/or a microcontroller that executes one or more routines (such as firmware). The one or more routines can determine the control parameters and ISP processing of the imaging device 110 based on the received statistical data. The control parameters of the device 140.
  • control parameters of the imaging device 110 may include attitude sensor 120 control parameters (such as gain, integration time of exposure control, anti-shake parameters, etc.), camera flash control parameters, camera anti-shake displacement parameters, lens 112 control parameters (such as focus or Zoom focal length), or a combination of these parameters.
  • the ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (for example, during RGB processing), and lens 112 shading correction parameters.
  • the current frame image is acquired through the lens 112 and the image sensor 114 in the imaging device (camera) 110, and the current frame image is sent to the ISP processor 140.
  • the ISP processor 140 detects whether there is a focus frame triggered by the user in the current frame image.
  • the ISP processor 140 detects that there is a focus frame in the current frame image, it determines the target subject of the current frame image according to the image in the focus frame.
  • the ISP processor 140 detects that there is no focus frame in the current frame of image, it acquires the tracking frame of the previous frame of the current frame of image, and determines the first center weight map according to the tracking frame of the previous frame of image.
  • the weight map traverses the current frame image to obtain the target subject of the current frame image; among them, each pixel in the first center weight map has a corresponding weight value, and the weight value in the first center weight map gradually decreases from the center to the edge Small, so as to obtain a more accurate target subject, and improve the accuracy of subject detection.
  • the ISP processor After obtaining the target subject of the current frame image, the ISP processor sends the target subject to the control logic 150. After the control logic 150 obtains the target subject, it can control the lens 112 in the imaging device (camera) 110 to move and focus on the position corresponding to the target subject, so that the next frame of the target subject can be obtained more clearly, and the next frame can be downloaded. One frame of image is sent to the ISP processor 140. After the ISP processor 140 receives the next frame image, it can use the current frame image as the previous frame image, and the next frame image as the current frame image, and perform detection of whether there is a focus frame generated by the user trigger in the current frame image.
  • the frame determines the first center weight map, and traverses the current frame image according to the first center weight map to obtain the target subject of the current frame image, so that a clearer target video of the target subject can be generated.
  • Fig. 2 is a flowchart of a subject detection method in an embodiment. As shown in Figure 2, the subject detection method includes:
  • Operation 202 Obtain a current frame image, and detect whether there is a focus frame generated by a user trigger in the current frame image.
  • the current frame image refers to the image acquired at the current moment.
  • the current frame image can be any of RGB (Red, Green, Blue) images, grayscale images, depth images, and images corresponding to the Y component in the YUV image.
  • the "Y” in the YUV image represents the brightness (Luminance or Luma), which is the grayscale value
  • the "U” and “V” represent the chrominance (Chrominance or Chroma), which is used to describe the color and saturation of the image Degree, used to specify the color of the pixel.
  • the ISP processor or central processing unit of the electronic device can obtain the current frame image, and can perform filtering processing on the current frame image to remove noise. Then, the ISP processor or the central processing unit can detect whether there is a focus frame on the current frame of the image, and the focus frame is generated in response to a user's trigger instruction.
  • the focus frame is the area where the target subject selected by the user is located.
  • the ISP processor or the central processing unit can detect whether a user's trigger instruction is received on the current frame image. When the user's trigger instruction is received, a corresponding focus frame is generated at the user's trigger position, then the ISP The processor or the central processing unit detects that there is a focus frame in the current frame of the image. When the user's trigger instruction is not received, it means that there is no focus frame generated by the user's trigger in the current frame of image.
  • the current frame image is acquired.
  • the current frame image can be scaled to a smaller size (such as 224*224, or other sizes).
  • the focus frame when the ISP processor or the central processing unit detects that there is a focus frame generated by the user trigger in the current frame of the image, the focus frame can be expanded, and the expanded focus frame can be cropped to obtain the expanded focus frame Image. Input the image in the focus frame into the subject segmentation network to obtain the target subject in the current frame of image.
  • Operation 206 when there is no focus frame in the current frame image, acquire the tracking frame of the previous frame of the current frame image, determine the first center weight map according to the tracking frame of the previous frame image, and traverse the current frame according to the first center weight map.
  • the last frame of image refers to the image adjacent to the current frame of image and acquired at the previous moment.
  • the last image can be any of RGB (Red, Green, Blue) images, grayscale images, depth images, and images corresponding to the Y component in the YUV image.
  • the first central weight map refers to a map used to record the weight value of each pixel in the image.
  • the weight value recorded in the first center weight map gradually decreases from the center to the four sides, that is, the center weight is the largest, and the weight gradually decreases toward the four sides.
  • the first central weight map is used to characterize the weight value of the image central pixel to the edge pixel of the image gradually decreasing.
  • the tracking frame refers to the area where the target subject is located in the image.
  • the ISP processor or the central processor can obtain the previous frame of the current frame of image. Obtain the tracking frame in the previous frame of image, and generate the corresponding first center weight map according to the size of the tracking frame in the previous frame of image. Each pixel in the first central weight map has a corresponding weight value, and the weight value in the first central weight map gradually decreases from the center to the four sides.
  • the first center weight map can be generated using a Gaussian function, a first-order equation, or a second-order equation.
  • the Gaussian function may be a two-dimensional Gaussian function.
  • the ISP processor or the central processing unit traverses the current frame of the image according to the first central weight map to obtain multiple subject areas, from which the area where the target subject is located can be determined. Obtain the area where the target subject is located, and input the area into the subject segmentation network to obtain the target subject in the current frame image.
  • both the current frame image and the previous frame image can be captured by an electronic device.
  • the electronic device can be equipped with a camera, and the number of cameras can be one or more. For example, setting 1, 2, 3, 5, etc., is not limited here.
  • the form of the camera installed in the electronic device is not limited. For example, it can be a camera built into the electronic device or a camera external to the electronic device; it can be a front camera or a rear camera.
  • the current frame image and the previous frame image may be captured by the same camera in the electronic device, or may be captured by different cameras, and it is not limited to this.
  • the camera on the electronic device can be any type of camera.
  • the camera may be a color camera, a grayscale image, a black-and-white camera, a depth camera, a telephoto camera, a wide-angle camera, etc., but is not limited thereto.
  • acquiring color images that is, RGB images, through a color camera, acquiring a grayscale image through a black and white camera, acquiring a depth image through a depth camera, acquiring a telephoto image through a telephoto camera, and acquiring a wide-angle image through a wide-angle camera
  • the cameras in the electronic device may be the same type of cameras or different types of cameras. For example, all may be color cameras, or all may be black-and-white cameras; one of the cameras may be a telephoto camera, and the other cameras may be a wide-angle camera, but it is not limited to this.
  • the electronic device may store each captured image in a first-in first-out queue according to the sequence of the camera shooting time, and obtain the current frame image and the previous frame image from the first-in first-out queue.
  • the first-in-first-out queue means that the images stored first are taken out first.
  • the electronic device first obtains the previous frame image from the first-in first-out queue, and then obtains the current frame image from the first-in first-out queue.
  • the current shooting moment and the last shooting moment are obtained; the current frame image is obtained according to the current shooting moment; the last frame image is obtained according to the last shooting moment.
  • the electronic device obtains the current shooting time and can obtain the shooting frequency; and obtains the previous shooting time according to the current shooting time and the shooting frequency.
  • the current shooting time is 15:45:56.200
  • the shooting frequency is 10 frames/s, that is, one frame of image is taken every 100 ms
  • the previous shooting time is 15:45:56.100.
  • the current frame image is obtained according to the current shooting moment
  • the previous frame image is obtained according to the last shooting moment.
  • the current frame image and the previous frame image can be down-sampled to obtain the current frame image and the previous frame image with a smaller size, thereby saving the amount of calculations for computer processing.
  • both the current frame image and the previous frame image can be filtered, and the current frame image and the previous frame image can be filtered out of the high frequency noise carried by the complex background with a lot of texture details, or the image can be filtered out
  • the high-frequency noise caused by down-sampling can obtain more accurate current frame image and previous frame image, thereby preventing false detection of subject detection.
  • the filtering processing may be at least one of Gaussian filtering processing, smoothing filtering processing, bilateral filtering processing, and the like.
  • Down-sampling refers to sampling once at intervals of several pixels in the image to obtain a new image.
  • the subject detection method in this embodiment acquires the current frame image and detects whether there is a focus frame generated by the user trigger in the current frame image.
  • the current frame image is determined according to the image in the focus frame.
  • the target subject can quickly and easily determine the target subject in the current frame of image.
  • the tracking frame of the previous frame of the current frame image is acquired, the first central weight map is determined according to the tracking frame of the previous frame image, and the current frame image is traversed according to the first central weight map, Obtain the target subject of the current frame image; among them, each pixel in the first center weight map has a corresponding weight value, and the weight value in the first center weight map gradually decreases from the center to the edge, which can be more accurately identified Out the target subject in the current frame of image.
  • traversing the current frame image according to the first central weight map to obtain the target subject of the current frame image includes:
  • the candidate frame is an image area where the target subject may exist in the current frame image.
  • the ISP processor or the central processing unit can use the first central weight map to slide on the current frame image, and each slide can obtain a candidate frame.
  • the size of the candidate frame is the same as the size of the central weight map.
  • Operation 304 Obtain the first image feature in each candidate frame, and perform convolution processing on the first image feature in each candidate frame and the target filter to obtain a response value corresponding to each candidate frame.
  • the target filter refers to a trained filter, and further, the target filter refers to a filter obtained after updating according to the main body area of the previous frame of image.
  • the response value can be a regression response value.
  • the first image feature refers to the image feature obtained after dot product processing.
  • the ISP processor or the central processor obtains the target filter, and obtains the first image feature corresponding to each candidate frame.
  • the first image feature in each candidate frame is convolved with the target filter to obtain the response value corresponding to each candidate frame.
  • the first center weight map slides on the current frame image, and when the first candidate frame is obtained, the first image feature in the first candidate frame is obtained, and the first image feature and the target filter are rolled.
  • Product processing to get the response value corresponding to the first candidate box By analogy, each time a candidate frame is obtained, the first image feature in the candidate frame is convolved with the target filter to obtain the corresponding response value. After the first central weight map traverses the current frame image, the response value corresponding to the last candidate frame can be obtained, thereby speeding up the efficiency of the convolution processing.
  • the target subject of the current frame of image is determined according to the candidate frame corresponding to the maximum value in the response value.
  • the ISP processor or the central processing unit obtains the response value corresponding to each candidate frame, and determines the maximum response value.
  • Obtain the candidate frame corresponding to the maximum response value expand the candidate frame, and crop the expanded candidate frame to obtain the main body area of the candidate frame.
  • the subject area in the candidate frame is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the current frame image is traversed through the first central weight map to obtain each candidate frame, the first image feature in each candidate frame is obtained, and the first image feature in each candidate frame is convolved with the target filter.
  • Get the response value corresponding to each candidate frame and determine the target subject of the current frame image according to the candidate frame corresponding to the maximum value in the response value.
  • the candidate frame with the largest response value is most likely to have the target subject, so that the image can be accurately identified The target subject.
  • obtaining the first image feature in each candidate frame includes: obtaining the second image feature in each candidate frame, where the second image feature is a candidate obtained when the first central weight map slides on the current frame image The pixel value of each pixel in the frame; for each candidate frame, the pixel value of each pixel in the candidate frame is dot-multiplied with the weight value of the corresponding pixel in the first central weight map to obtain each candidate frame The first image feature within.
  • the ISP processor or the central processing unit slides on the current frame of the image through the first central weight map, and each slide can obtain a candidate frame, and obtain the image feature of the candidate frame, that is, the second image feature.
  • the size of each second image feature is the same as the size of the first central weight map.
  • the second image feature is the pixel value of each pixel in the candidate frame obtained when the first central weight map slides on the current frame of image.
  • the pixel value of each pixel that constitutes the second image feature and the weight value of the corresponding pixel in the first central weight map are subjected to dot multiplication processing to obtain the first in a candidate frame
  • An image feature is the image feature obtained after the second image feature is processed by dot multiplication, and the size of the candidate frame is the same as the size of the first central weight map.
  • the second image feature is the pixel value of each pixel in the candidate frame obtained when the first central weight map slides on the current frame image; for each candidate Box, the pixel value of each pixel in the candidate box is dot-multiplied with the weight value of the corresponding pixel in the first central weight map to obtain the first image feature in each candidate box, which can highlight the central area of the image , It is easier to identify the target subject.
  • the pixel value of each pixel in the candidate frame and the weight value of the corresponding pixel in the first central weight map are subjected to dot multiplication processing, including: multiplying the pixel value of each pixel in the candidate frame Perform logarithmic operation processing respectively; perform dot multiplication processing on the pixel value of each pixel in the candidate frame after the logarithmic operation processing and the weight value of the corresponding pixel in the first central weight map.
  • the logarithmic operation refers to taking the logarithm of the pixel value of the pixel.
  • the ISP processor or the central processor slides on the current frame image through the first central weight map, and each slide can obtain the second image feature in a candidate frame.
  • the first image feature It can be understood that each pixel in the candidate frame corresponds to each pixel in the first center weight map in a one-to-one correspondence.
  • the pixel value of each pixel can be normalized to reduce the amount of calculation. Then, the pixel value of each pixel point after the normalization processing is respectively subjected to logarithmic operation processing.
  • the pixel value after logarithmic operation processing can be normalized.
  • the pixel value of each pixel in the candidate frame is respectively subjected to logarithmic operation processing, which can reduce the interference of the contrast area on target tracking. Dot multiplying the pixel value of each pixel after logarithmic operation with the weight value of the corresponding pixel in the first central weight map can highlight the central area of the image and make it easier to identify the target subject.
  • determining the target subject of the current frame image according to the candidate frame corresponding to the maximum value in the response value includes: taking the candidate frame corresponding to the maximum value in the response value as the first subject area of the current frame image; The first subject area of the frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the ISP processor or the central processing unit may determine the maximum value of the response values, and determine the candidate frame corresponding to the maximum value.
  • the image in the candidate frame corresponding to the maximum value is used as the first main body region of the current frame image.
  • the first subject area is cropped from the current frame of image, and the cropped first subject area is enlarged to a preset size.
  • the first subject area enlarged to a preset size is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the ISP processor or the central processing unit can obtain the subject segmented image of the previous frame of image.
  • the subject segmented image may be a binarized image of subject segmentation.
  • the subject position information of the previous frame image is added, and the accuracy of subject detection can be improved.
  • the candidate frame corresponding to the maximum value in the response value is used as the first subject area of the current frame image; the first subject area of the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the area where the target subject is located is determined by the response value, and the area containing the target subject is input into the subject segmentation network to quickly detect the target subject in the current frame of image.
  • the candidate frame corresponding to the maximum value in the response value is used as the first main body area of the current frame image, including:
  • the height and width of the candidate frame corresponding to the maximum value in the response value are determined.
  • the ISP processor or the central processing unit may determine the maximum value of the response values, determine the candidate frame corresponding to the maximum value, and then determine the height value and the width value of the candidate frame.
  • the height of the candidate frame is increased by a preset height
  • the width of the candidate frame is increased by a preset width
  • the preset height refers to any one of a preset height value and a height ratio.
  • the preset width refers to any one of a preset width value and a width ratio.
  • the ISP processor or the central processing unit obtains the preset height value, adds the preset height value to the height value of the candidate frame; obtains the preset width value, adds the preset width value to the width value of the candidate frame, Get the candidate frame after increasing the preset height value and preset width value.
  • the ISP processor of the electronic device may obtain a preset height ratio, and add the preset height ratio to the height value of the candidate frame of the current frame image; obtain the preset width ratio, and set the preset height ratio in the candidate frame of the current frame image.
  • the width value of the frame is increased by a preset width ratio to obtain a candidate frame after the preset height ratio and the preset width ratio are increased.
  • the height and width of the candidate frame corresponding to the maximum value are h and w, respectively, the height of the candidate frame is increased by h/4, and the width of the candidate frame is increased by w/4.
  • the position of the candidate frame after the expansion is recorded as the position of the first subject area of the current frame image.
  • the candidate frame obtained by increasing the preset height and the preset width is used as the first main body region of the current frame image.
  • the ISP processor or the central processing unit of the electronic device uses the candidate frame obtained by adding the preset height value and the preset width value as the first main body area of the current frame image.
  • the ISP processor or the central processing unit of the electronic device may use the candidate frame obtained by increasing the preset height ratio and the preset width ratio as the first main body area of the current frame image.
  • the subject detection method in this implementation determines the height and width of the candidate frame corresponding to the maximum value in the response value, increases the height of the candidate frame by a preset height, increases the width of the candidate frame by the preset width, and increases the preset height and width.
  • the candidate frame obtained after the preset width is used as the first subject area of the current frame image, which can accurately determine the area where the complete target subject is located, and avoid the missing part of the detected target subject due to the too small candidate frame.
  • the main body area in the current frame image can also be determined through the KCF (Kernel Correlation Filter) algorithm, the MedianFlow bidirectional optical flow tracking algorithm, and the like.
  • KCF Kernel Correlation Filter
  • inputting the first subject area of the current frame image into the subject segmentation network to obtain the target subject in the current frame image includes:
  • the moving subject refers to the subject in motion.
  • the ISP processor or the central processor performs background subtraction processing on the current frame image to obtain a binary image corresponding to the current frame image.
  • the connected domain processing is performed on the binary image to obtain the area of each candidate subject in the binary image.
  • the area of the candidate subject is greater than or equal to the area threshold, it is determined that there is a moving subject in the current frame of image; when the area of each candidate subject is less than the area threshold, it is determined that there is no moving subject in the current frame of image.
  • the connected domain generally refers to an image area composed of foreground pixels that have the same pixel value and are located adjacent to each other, and the foreground pixels refer to the subject pixels.
  • Connected region processing refers to finding and marking each connected region in the image.
  • the ISP processor or central processing unit of the electronic device can detect and mark each connected domain in the binary image. Each connected domain can be used as a candidate subject. Next, determine the area of each candidate subject in the binary image.
  • the area of the candidate subject when the area of the candidate subject is larger, it means that the candidate subject is closer to the camera, and the object closer to the camera is the subject that the user wants to photograph. Therefore, when the area of each candidate subject is smaller than the area threshold, it means that the area of each candidate subject in the current frame image is small, and it can be considered that each candidate subject is not a moving subject, nor is it a subject that the user wants to photograph.
  • the candidate subject When there is a candidate subject whose area is greater than or equal to the area threshold, it can be considered that the candidate subject is greater than or equal to the area threshold as a moving subject and a subject that the user wants to photograph.
  • the area of each candidate subject is less than the area threshold, it is determined that there is no moving subject in the current frame of image.
  • the candidate subject can be regarded as a stationary object, that is, there is no moving subject; when the sharpness of the contour edge is low When it is equal to or equal to the sharpness threshold, the candidate subject can be considered as a moving subject.
  • This application can determine whether there is a moving subject in each candidate subject, but it is not limited to the above methods.
  • the second main body area may be a rectangular area containing the moving main body, may also be a circular area containing the moving main body, or may be an irregularly shaped area containing the moving main body, and is not limited to this.
  • the ISP processor or the central processing unit of the electronic device detects that there is a moving subject in the current frame image according to the binary image, the second subject area containing the moving subject in the current frame image can be acquired.
  • the first subject region and the second subject region are fused, and the subject region obtained after the fusion processing is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the subject segmentation image may be the same as the candidate subject before the input, or it may be different from the candidate subject before the input.
  • Fusion processing can be AND processing, which refers to a logical operation. For example, 0 and 1 are ANDed to get 0, 1 and 0 are ANDed to get 0, 1 and 1 are ANDed to get 1.
  • the ISP processor or the central processing unit performs AND processing between the first body region and the second body region, which may be the value of the pixel in the first body region and the value of the corresponding pixel in the second body region. And processing, get the main area of fusion. Then, the subject region obtained after the fusion processing is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the second subject area containing the moving subject is acquired, and the first subject area and the second subject area are merged, and the The subject area obtained after the fusion processing is input into the subject segmentation network to obtain the target subject in the current frame image, and a more accurate target subject can be obtained.
  • the method further includes: updating the target filter according to the first subject area of the current frame image.
  • the ISP processor or the central processing unit obtains the first body region of the current frame image, inputs the first body region into the target filter, and trains the target filter through the first body region of the current frame image, thereby achieving target filtering Update.
  • the target filter is updated through the first subject area of the current frame image, which can reduce the interference to subject detection in terms of illumination, posture, and scale.
  • the first body region and the target filter may be transformed from the time domain to the frequency domain, and the time domain convolution operation may be transformed into the frequency domain product operation, so as to reduce the calculation amount of the convolution.
  • determining the target subject of the current frame image according to the image in the focus frame includes:
  • the ISP processor or the central processing unit detects that there is a focus frame generated by a user trigger in the current frame of the image, the height value and the width value of the focus frame can be acquired.
  • the height of the focus frame is increased by a preset height
  • the width of the focus frame is increased by a preset width
  • the ISP processor or the central processing unit obtains a preset height value, adds a preset height value to the height value of the focus frame; obtains a preset width value, adds a preset width value to the width value of the focus frame, Get the focus frame after increasing the preset height value and preset width value.
  • the ISP processor of the electronic device can obtain a preset height ratio, and add the preset height ratio to the height value of the focus frame of the current frame image; obtain the preset width ratio, and focus on the current frame image.
  • the preset width ratio is added to the width value of the frame to obtain the focus frame after the preset height ratio and the preset width ratio are increased.
  • the height and width of the focus frame corresponding to the maximum value are h and w, respectively.
  • the height of the focus frame is increased by h/4, and the width of the focus frame is increased by w/4.
  • the position of the expanded focus frame is recorded as the position of the first subject area of the current frame of image.
  • the focus frame obtained by increasing the preset height and the preset width is used as the first subject area of the current frame of image.
  • the ISP processor or the central processing unit of the electronic device uses the focus frame obtained by adding the preset height value and the preset width value as the first subject area of the current frame image.
  • the ISP processor or the central processing unit of the electronic device may use the focus frame obtained by increasing the preset height ratio and the preset width ratio as the first main body area of the current frame image.
  • the increased preset height and preset width of the focus frame may be the same as or different from the increased preset height and preset width of the candidate frame.
  • the first subject region of the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the first subject area is cropped from the current frame of image, and the cropped first subject area is enlarged to a preset size. Then, the first subject area enlarged to a preset size is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the ISP processor or the central processing unit can obtain the subject segmented image of the previous frame of image.
  • the subject segmentation image may be a binary image containing the target subject, which is output after the subject region of the previous frame of image passes through the subject segmentation network. Input the subject segmentation image of the previous frame image and the first subject area of the preset size into the subject segmentation network to obtain the target subject in the current frame image.
  • the subject detection method in this implementation when there is a focus frame in the current frame image, determine the height and width of the focus frame, increase the height of the focus frame by a preset height, and increase the width of the focus frame by the preset width, which will increase the preset
  • the focus frame obtained after the height and the preset width is used as the first subject area of the current frame image, which can accurately determine the area where the complete target subject is located, and avoid the missing part of the detected target subject due to the movement of the target subject while the size of the focus frame remains unchanged Case.
  • inputting the first subject area of the current frame image into the subject segmentation network to obtain the target subject in the current frame image includes: acquiring the subject segmentation image of the previous frame image, and dividing the subject segmentation image of the previous frame image And the first subject area of the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the main body segmentation network deeplabv3+, U-Net and other network architectures.
  • the ISP processor or the central processing unit can obtain the subject segmentation image of the previous frame image, and input the subject segmentation image of the previous frame image and the first subject region of the current frame image into the subject segmentation network, and pass through multiple layers. After convolution, the binary image of the first subject area can be output, so as to obtain the target subject in the current frame image.
  • the traditional network input is an RGB three-channel image for network segmentation prediction.
  • the subject segmentation network in this embodiment adds a channel—that is, the subject segmentation binary image of the previous frame of image.
  • the subject segmentation binary image brings the subject position information of the previous frame into the network, which can improve the segmentation effect of the network in the video scene.
  • inputting the first subject area of the current frame image into the subject segmentation network to obtain the target subject in the current frame image includes: acquiring the subject segmentation image of the previous frame image, and dividing the subject segmentation image of the previous frame image Input the subject segmentation network with the first subject area of the current frame image to obtain the subject segmentation image of the current frame image; determine the proportion of the subject segmentation image of the current frame image to the current frame image; when the subject segmentation image of the current frame image occupies the current frame image When the ratio of is less than the ratio threshold, the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the ISP processor or the central processing unit inputs the body segmentation image of the previous frame image and the first body region of the current frame image into the body segmentation network. After multi-layer convolution, the binary image of the first subject area can be output, and the binary image is the subject segmentation image of the current frame image. Next, calculate the proportion of the subject segmented image in the current frame of image. The ratio threshold is acquired, and it is determined whether the calculated ratio of the subject segmented image to the current frame image is greater than the ratio threshold.
  • the proportion of the subject segmentation image of the current frame image to the current frame image is less than the proportion threshold, it means that the target subject has left the current screen, that is, there is no target subject in the subject segmentation image, or there is only the target subject in the subject segmentation image. In one part, the target subject is incomplete, and the target subject in the current frame of image needs to be re-detected. Then the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the subject segmentation image of the previous frame image can be obtained, and the subject segmentation image of the previous frame image and the current frame image are input to the subject segmentation network to obtain the target subject in the current frame image.
  • the subject segmentation image of the previous frame image is obtained
  • the subject segmentation image of the previous frame image and the first subject region of the current frame image are input into the subject segmentation network to obtain the subject segmentation image of the current frame image
  • the current frame is determined
  • the proportion of the main segmented image of the image in the current frame of image By whether the ratio is greater than the ratio threshold, it can be determined whether there is a target subject in the current detection result, and whether the detected target subject is complete.
  • the proportion of the subject segmentation image of the current frame image to the current frame image is less than the proportion threshold, it means that the detection is not accurate and the detected target subject is complete.
  • the current frame image is input into the subject segmentation network, so that the current frame image can be obtained.
  • Target subject is input into the subject segmentation network
  • the method for generating the target filter includes:
  • Operation 802 Obtain a second center weight map corresponding to the focus frame, where each pixel in the second center weight map has a corresponding weight value, and the weight value in the second center weight map gradually decreases from the center to the edge. small.
  • the second central weight map refers to a map used to record the weight value of each pixel in the image.
  • the weight value recorded in the second center weight map gradually decreases from the center to the four sides, that is, the center weight is the largest, and the weight gradually decreases toward the four sides.
  • the second central weight map is used to characterize the weight value of the image central pixel to the edge pixel of the image gradually decreasing.
  • the second central weight map may be the same as or different from the first central weight map, that is, the size of the second central weight map and the size of the first central weight map can be the same or different, and are set according to specific conditions.
  • the ISP processor or the central processing unit may generate the corresponding second center weight map according to the size of the image in the focus frame.
  • the second center weight map can be generated using a Gaussian function, or a first-order equation, or a second-order equation.
  • the Gaussian function may be a two-dimensional Gaussian function.
  • the first center weight map can be directly obtained as the center weight map corresponding to the focus frame.
  • the pixel value of each pixel of the image in the focus frame and the weight value of the corresponding pixel in the second central weight map are subjected to dot product processing.
  • the ISP processor or the central processing unit can obtain each pixel of the image in the focus frame and the pixel value of each pixel, and compare each pixel in the focus frame with each pixel in the second center weight map. match. Then, the pixel value of the successfully matched pixel is dot-multiplied by the weight value of the corresponding pixel in the second central weight map.
  • Operation 806 Perform affine transformation on the image in the focus frame after the dot multiplication process to obtain a preset number of images in the transformation frame.
  • affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, maintaining the "straightness" of the two-dimensional graphics (that is, the straight line or the straight line will not bend after the transformation, but the arc is still Arc) and "parallelism” (that is, to ensure that the relative position relationship between the two-dimensional graphics remains unchanged, the parallel lines are still parallel lines, and the angle of intersection of the intersecting straight lines remains unchanged).
  • affine transformation is to allow the graph to be arbitrarily tilted, and to allow the graph to be arbitrarily expanded and transformed in two directions.
  • the ISP processor or the central processing unit performs dot multiplication processing on the image in the focus frame and the successfully matched pixels in the second center weight map, and then affines the image in the focus frame after the dot multiplication process. Transform to obtain a preset number of images in the transform frame.
  • the affine transformation can be realized by the combination of a series of atomic transformations, including but not limited to Translation, Scale, Flip, Rotation and Shear. And so on.
  • the initial filter is trained according to the preset number of images in the transform frame, and when the preset condition is met, the target filter is obtained.
  • the initial filter refers to the filter to be trained.
  • the ISP processor or the central processing unit performs convolution processing on the images in the preset number of transformation frames and the initial filter to obtain the response value corresponding to each transformation frame.
  • the marked target subject in the focus frame can be obtained.
  • the marked target subject in the focus frame is compared with the target subject obtained from the image in the transform frame corresponding to the maximum response value, and the parameters of the initial filter are adjusted according to the difference between the two.
  • After adjusting the parameters perform training again until the difference between the two is less than the difference threshold, stop training. Determine the filter parameters at this time to obtain the target filter. Therefore, the preset condition may be that the difference between the target subject obtained according to the image in the transformation frame corresponding to the maximum response value and the marked target subject in the focus frame is less than the difference threshold.
  • a second center weight map corresponding to the focus frame is acquired, where each pixel in the second center weight map has a corresponding weight value, and the weight values in the second center weight map range from center to edge Gradually reduce, the image in the focus frame and the second center weight map are subjected to dot product processing, and the image in the focus frame after the dot product processing is subjected to affine transformation to obtain a preset number of images in the transformation frame, according to The images in the preset number of transform frames train the initial filter, and when the preset conditions are met, the target filter is obtained, so that the trained target filter can be obtained.
  • FIG. 9 it is a schematic diagram of generating a target filter in an embodiment. Acquire the image in the focus frame, perform affine transformation on the focus frame and the image in the focus frame to obtain multiple transformation frames, train the initial filter according to the images in the multiple transformation frames, and obtain the target filter when the preset conditions are met .
  • performing dot multiplication processing on the image in the focus frame and the second center weight map includes: performing logarithmic operation processing on the pixel value of each pixel of the image in the focus frame; and performing logarithmic operation processing The pixel value of each subsequent pixel is dot-multiplied with the weight value of the corresponding pixel in the second central weight map.
  • the ISP processor or the central processing unit can obtain the pixel value of each pixel of the image in the focus frame, and perform logarithmic calculation processing on the pixel value of each pixel, that is, take the pixel value of each pixel. number. Then, the logarithm of the pixel value of each pixel in the focus frame and the weight value of the corresponding pixel in the second central weight map are dot-multiplied to obtain the image feature in the focus frame after the dot-multiplication process. It can be understood that each pixel point in the focus frame corresponds to each pixel point in the second center weight map on a one-to-one basis.
  • the pixel value of each pixel may be normalized to reduce the amount of calculation. Then, the pixel value of each pixel point after the normalization processing is respectively subjected to logarithmic operation processing.
  • the pixel value of each pixel of the image in the focus frame is obtained by logarithmic calculation processing, that is, after the logarithm of the pixel value of each pixel is taken respectively, the logarithm of each pixel is performed Normalized processing.
  • Each pixel in the normalized focus frame and the corresponding pixel in the second central weight map are subjected to dot multiplication processing. After dot multiplying, the pixel value of the edge of the image in the focus frame will approach 0, which can enhance the focus on the middle target area.
  • the pixel value of each pixel of the image in the focus frame is processed by logarithmic calculation, and the pixel value of each pixel of the image in the focus frame after the logarithmic calculation process is compared with the second center weight in the graph
  • the weight value of the corresponding pixel is processed by dot multiplication, which can reduce interference, highlight the center area of the image in the focus frame, and make it easier to identify the target subject.
  • FIG. 10 it is a schematic diagram of performing dot multiplication processing on the image in the focus frame and the second center weight map in an embodiment.
  • the pixel value of each pixel of the image in the focus frame is acquired, the logarithm calculation of each pixel value is performed, and then the logarithm of each pixel is normalized.
  • Each pixel in the focus frame after the normalization processing is subjected to dot multiplication processing with the corresponding pixel in the second center weight map.
  • the current frame image 1102 is acquired, and operation 1106 is performed by the focusing module 1104, that is, the focusing module detects whether there is a focus frame generated by the user trigger in the current frame image. If yes, perform operation 1108 to enlarge the focus frame and crop the target subject area.
  • the target subject area is input into the subject segmentation network 1110 to obtain the target subject and the subject segmentation image in the current frame image.
  • the user triggers the generated focus frame.
  • the target subject of the current frame image is determined according to the image in the focus frame, and when there is no focus frame in the current frame image, the tracking of the previous frame image is obtained
  • Frame Determine the first center weight map according to the tracking frame of the previous frame image, and traverse the current frame image according to the first center weight map to obtain the target subject of the current frame image.
  • perform operation 1116 to obtain the tracking frame of the previous frame image, determine the first center weight map according to the tracking frame of the previous frame image, and traverse the current frame image according to the first center weight map to obtain The first subject area 1118 of the current frame image.
  • perform operation 1120 to preprocess the current frame image, such as Gaussian filter processing, which can eliminate high-frequency noise caused by a complex background with a lot of texture details in the image and high-frequency noise caused by image downsampling.
  • the target subject area is input into the subject segmentation network 1110 to obtain the target subject and the subject segmentation image in the current frame image.
  • the user triggers the generated focus frame.
  • the target subject of the current frame image is determined according to the image in the focus frame, and when there is no focus frame in the current frame image, the tracking of the previous frame image is obtained Frame, the first central weight map is determined according to the tracking frame of the previous frame of image, and the current frame image is traversed according to the first central weight map to obtain the target subject of the current frame image, thereby obtaining a clearer target video of the target subject.
  • Fig. 12 is a structural block diagram of a main body detection device of an embodiment. As shown in FIG. 12, the main body detection device includes: an acquiring module 1202, a first determining module 1204, and a second determining module 1206. among them,
  • the acquiring module 1202 is used to acquire a current frame image, and detect whether there is a focus frame generated by a user trigger in the current frame image.
  • the first determining module 1204 is configured to determine the target subject of the current frame image according to the image in the focus frame when there is a focus frame in the current frame image.
  • the second determining module 1206 is used to obtain the tracking frame of the previous frame of the current frame image when there is no focus frame in the current frame of image, and determine the first center weight map according to the tracking frame of the previous frame of image, and according to the first
  • the center weight map traverses the current frame image to obtain the target subject of the current frame image; among them, each pixel in the first center weight map has a corresponding weight value, and the weight value in the first center weight map gradually goes from center to edge Decrease, the tracking frame is the area where the target subject is located in the previous frame of image.
  • the subject detection device in this embodiment acquires the current frame image and detects whether there is a focus frame generated by the user trigger in the current frame image.
  • the current frame image is determined according to the image in the focus frame.
  • the target subject can quickly and easily determine the target subject in the current frame of image.
  • the tracking frame of the previous frame of the current frame image is acquired, the first central weight map is determined according to the tracking frame of the previous frame image, and the current frame image is traversed according to the first central weight map, Obtain the target subject of the current frame image; among them, each pixel in the first center weight map has a corresponding weight value, and the weight value in the first center weight map gradually decreases from the center to the edge, which can be more accurately identified Out the target subject in the current frame of image.
  • the second determining module 1206 is further used to: slide on the current frame image through the first center weight map to obtain each candidate frame; obtain the first image feature in each candidate frame, and compare the The first image feature is subjected to convolution processing with the target filter to obtain the response value corresponding to each candidate frame; the target subject of the current frame image is determined according to the candidate frame corresponding to the maximum value in the response value.
  • the current frame image is traversed through the first central weight map to obtain each candidate frame, the first image feature in each candidate frame is obtained, and the first image feature in each candidate frame is convolved with the target filter.
  • Get the response value corresponding to each candidate frame and determine the target subject of the current frame image according to the candidate frame corresponding to the maximum value in the response value.
  • the candidate frame with the largest response value is most likely to have the target subject, so that the image can be accurately identified The target subject.
  • the second determining module 1206 is further configured to: obtain a second image feature in each candidate frame, where the second image feature is that the first center weight graph slides on the current frame image The pixel value of each pixel in the candidate frame obtained at time;
  • the pixel value of each pixel in the candidate frame is dot-multiplied with the weight value of the corresponding pixel in the first central weight map to obtain the first image in each candidate frame feature.
  • the second image feature is the pixel value of each pixel in the candidate frame obtained when the first central weight map slides on the current frame image; for each candidate Box, the pixel value of each pixel in the candidate box is dot-multiplied with the weight value of the corresponding pixel in the first central weight map to obtain the first image feature in each candidate box, which can highlight the central area of the image , It is easier to identify the target subject.
  • the second determining module 1206 is further used to: perform logarithmic operation on the pixel value of each pixel in the candidate frame; and perform logarithmic operation on the pixel value of each pixel in the candidate frame after the logarithmic operation.
  • the values are respectively subjected to dot multiplication processing with the weight values of the corresponding pixels in the first central weight map.
  • the pixel value of each pixel in the candidate frame is respectively subjected to logarithmic operation processing, which can reduce the interference of the contrast area on target tracking. Dot multiplying the pixel value of each pixel after logarithmic operation with the weight value of the corresponding pixel in the first central weight map can highlight the central area of the image and make it easier to identify the target subject.
  • the second determining module 1206 is further configured to: use the candidate frame corresponding to the maximum value in the response value as the first subject area of the current frame image; input the first subject area of the current frame image into the subject segmentation network, Get the target subject in the current frame image.
  • the candidate frame corresponding to the maximum value in the response value is used as the first subject area of the current frame image; the first subject area of the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the area where the target subject is located is determined by the response value, and the area containing the target subject is input into the subject segmentation network to quickly detect the target subject in the current frame of image.
  • the second determining module 1206 is further configured to: determine the height and width of the candidate frame corresponding to the maximum value in the response value; increase the height of the candidate frame by a preset height, and increase the width of the candidate frame by a preset width ;
  • the candidate frame obtained by increasing the preset height and the preset width is used as the first subject area of the current frame image.
  • the height and width of the candidate frame corresponding to the maximum value in the response value are determined, the height of the candidate frame is increased by the preset height, and the width of the candidate frame is increased by the preset width. After the preset height and width are increased, The obtained candidate frame is used as the first subject area of the current frame image, which can accurately determine the area where the complete target subject is located, and avoid the situation that the detected target subject part is missing due to the too small candidate frame.
  • the second determining module 1206 is further used to: detect whether there is a moving subject in the current frame of image; when there is a moving subject in the current frame of image, obtain the second subject area containing the moving subject; change the first subject area Perform fusion processing with the second subject area, and input the subject area obtained after the fusion processing into the subject segmentation network to obtain the target subject in the current frame image.
  • the second subject area containing the moving subject is acquired, and the first subject area and the second subject area are merged, and the The subject area obtained after the fusion processing is input into the subject segmentation network to obtain the target subject in the current frame image, and a more accurate target subject can be obtained.
  • the method further includes: updating the module.
  • the update module is used to update the target filter according to the first subject area of the current frame image.
  • the target filter is updated through the first subject area of the current frame image, which can reduce the interference to subject detection in terms of illumination, posture, and scale.
  • the first determining module 1204 is further configured to: when there is a focus frame in the current frame of image, determine the height and width of the focus frame; increase the height of the focus frame by a preset height, and increase the width of the focus frame by a preset height. Set the width; the focus frame obtained by increasing the preset height and the preset width is used as the first subject area of the current frame image; the first subject area of the current frame image is input into the subject segmentation network to obtain the target subject in the current frame image.
  • the subject detection device in this implementation determines the height and width of the focus frame when there is a focus frame in the current frame of image, increases the height of the focus frame by a preset height, and increases the width of the focus frame by a preset width, which will increase the preset
  • the focus frame obtained after the height and the preset width is used as the first subject area of the current frame image, which can accurately determine the area where the complete target subject is located, and avoid the missing part of the detected target subject due to the movement of the target subject while the size of the focus frame remains unchanged Case.
  • the first determining module 1204 is further configured to: obtain the subject segmentation image of the previous frame image, and input the subject segmentation image of the previous frame image and the first subject region of the current frame image into the subject segmentation network to obtain the current The target subject in the frame image.
  • the subject segmentation image of the last frame of image brings the subject position information of the previous frame to the network, which can improve the segmentation effect of the network in the video scene.
  • the first determining module 1204 is further configured to: obtain the subject segmentation image of the previous frame image, and input the subject segmentation image of the previous frame image and the first subject region of the current frame image into the subject segmentation network to obtain the current The subject segmentation image of the frame image; determine the proportion of the subject segmentation image of the current frame image to the current frame image; when the proportion of the subject segmentation image of the current frame image to the current frame image is less than the proportion threshold, the current frame image is input into the subject segmentation network, Get the target subject in the current frame image.
  • the subject segmentation image of the previous frame image is obtained
  • the subject segmentation image of the previous frame image and the first subject region of the current frame image are input into the subject segmentation network to obtain the subject segmentation image of the current frame image
  • the current frame is determined
  • the proportion of the main segmented image of the image in the current frame of image By whether the ratio is greater than the ratio threshold, it can be determined whether there is a target subject in the current detection result, and whether the detected target subject is complete.
  • the proportion of the subject segmentation image of the current frame image to the current frame image is less than the proportion threshold, it means that the detection is not accurate and the detected target subject is complete.
  • the current frame image is input into the subject segmentation network, so that the current frame image can be obtained.
  • Target subject is input into the subject segmentation network
  • the device further includes: a generating device.
  • the generating device is used to obtain a second center weight map corresponding to the focus frame, wherein each pixel in the second center weight map has a corresponding weight value, and the weight values in the second center weight map range from center to The edge gradually decreases; the pixel value of each pixel of the image in the focus frame and the weight value of the corresponding pixel in the second center weight map are dot-multiplied; the image in the focus frame after the dot-multiplication is processed
  • Affine transformation is used to obtain images in a preset number of transform frames; the initial filter is trained according to the images in the preset number of transform frames, and the target filter is obtained when the preset conditions are met.
  • a second center weight map corresponding to the focus frame is acquired, where each pixel in the second center weight map has a corresponding weight value, and the weight values in the second center weight map range from center to edge Gradually reduce, the image in the focus frame and the second center weight map are subjected to dot product processing, and the image in the focus frame after the dot product processing is subjected to affine transformation to obtain a preset number of images in the transformation frame, according to The images in the preset number of transform frames train the initial filter, and when the preset conditions are met, the target filter is obtained, so that the trained target filter can be obtained.
  • the first determining module 1204 is further configured to: perform logarithmic operation processing on the pixel value of each pixel of the image in the focus frame; and compare the pixel value of each pixel after the logarithmic operation processing with the first pixel value.
  • the weight value of the corresponding pixel in the two center weight map is processed by dot multiplication.
  • the pixel value of each pixel of the image in the focus frame is processed by logarithmic calculation, and the pixel value of each pixel after the logarithmic calculation process is compared with the pixel value of the corresponding pixel in the second center weight map.
  • the weight value is processed by dot multiplication, which can reduce interference, highlight the center area of the image in the focus frame, and make it easier to identify the target subject.
  • each module in the subject detection device described above is only for illustration. In other embodiments, the subject detection device can be divided into different modules as needed to complete all or part of the functions of the subject detection device.
  • Fig. 13 is a schematic diagram of the internal structure of an electronic device in an embodiment.
  • the electronic device includes a processor and a memory connected via a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the memory may include a non-volatile storage medium and internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement a subject detection method provided in the following embodiments.
  • the internal memory provides a cached operating environment for the operating system computer program in the non-volatile storage medium.
  • the electronic device can be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device.
  • each module in the subject detection device provided in the embodiment of the present application may be in the form of a computer program.
  • the computer program can be run on a terminal or a server.
  • the program module composed of the computer program can be stored in the memory of the terminal or the server.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • a computer program product containing instructions that, when run on a computer, causes the computer to execute the subject detection method.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • RDRAM synchronous chain Channel
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

一种主体检测方法,包括:获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对焦框;当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体;当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,跟踪框为上一帧图像中目标主体所在的区域。

Description

主体检测方法和装置、电子设备、计算机可读存储介质
相关申请的交叉引用
本申请要求于2019年10月29日提交中国专利局、申请号为2019110386684、发明名称为“主体检测方法和装置、电子设备、计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种主体检测方法和装置、电子设备、计算机可读存储介质。
背景技术
随着影像技术的发展,人们越来越习惯通过电子设备上的摄像头等图像采集设备拍摄图像或视频,记录各种信息。电子设备获取到图像后,往往需要对图像进行主体检测,检测出主体,从而可以获取该主体更清晰的图像。然而,传统的主体检测技术在对图像进行检测时,存在不准确的问题。
发明内容
本申请实施例提供一种主体检测方法、装置、电子设备、计算机可读存储介质,可以主体检测的准确性。
一种主体检测方法,包括:
获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
一种主体检测装置,包括:
获取模块,用于获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
第一确定模块,用于当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
第二确定模块,用于当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下操作:
获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下操作:
获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
上述主体检测方法和装置、电子设备、计算机可读存储介质,获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体;当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,跟踪框为上一帧图像中目标主体所在的区域,可以更加准确的识别出当前帧图像中的目标主体。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他实施例的附图。
图1为一个实施例中图像处理电路的示意图;
图2为一个实施例中主体检测方法的流程图;
图3为一个实施例中根据第一中心权重图得到当前帧图像的目标主体的操作的流程图;
图4为一个实施例中将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域的操作的流程图;
图5为一个实施例中得到当前帧图像中的目标主体的流程图;
图6为一个实施例中根据对焦框中的图像确定当前帧图像的目标主体的操作的流程图;
图7为一个实施例中通过主体分割网络检测目标主体的示意图;
图8为一个实施例中生成目标滤波器的操作的流程图;
图9为另一个实施例中生成目标滤波器的示意图;
图10为一个实施例中将对焦框中的图像与第二中心权重图进行点乘处理的示意图;
图11为另一个实施例中主体检测的示意图;
图12为一个实施例中主体检测装置的结构框图;
图13为一个实施例中电子设备的内部结构示意图。
具体实施方式
为了便于理解本发明,下面将参照相关附图对本发明进行更全面的描述。附图中给出了本发明的较佳实施例。但是,本发明可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本发明的公开内容的理解更加透彻全面。
可以理解,本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种元件,但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一主体区域称为第二主体区域,且类似地,可将第二主体区域称为第一主体区域。第一主体区域和第二主体区域两者都是主体区域,但其不是同一主体区域。
本申请实施例提供一种电子设备。上述电子设备中包括图像处理电路,图像处理电路可以利用硬 件和/或软件组件实现,可包括定义ISP(Image Signal Processing,图像信号处理)管线的各种处理单元。图1为一个实施例中图像处理电路的示意图。如图1所示,为便于说明,仅示出与本申请实施例相关的图像处理技术的各个方面。
如图1所示,图像处理电路包括ISP处理器140和控制逻辑器150。成像设备110捕捉的图像数据首先由ISP处理器140处理,ISP处理器140对图像数据进行分析以捕捉可用于确定和/或成像设备110的一个或多个控制参数的图像统计信息。成像设备110可包括具有一个或多个透镜112和图像传感器114的照相机。图像传感器114可包括色彩滤镜阵列(如Bayer滤镜),图像传感器114可获取用图像传感器114的每个成像像素捕捉的光强度和波长信息,并提供可由ISP处理器140处理的一组原始图像数据。姿态传感器120(如三轴陀螺仪、霍尔传感器、加速度计)可基于姿态传感器120接口类型把采集的图像处理的参数(如防抖参数)提供给ISP处理器140。姿态传感器120接口可以利用SMIA(Standard Mobile Imaging Architecture,标准移动成像架构)接口、其它串行或并行照相机接口或上述接口的组合。
此外,图像传感器114也可将原始图像数据发送给姿态传感器120,传感器120可基于姿态传感器120接口类型把原始图像数据提供给ISP处理器140,或者姿态传感器120将原始图像数据存储到图像存储器130中。
ISP处理器140按多种格式逐个像素地处理原始图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,ISP处理器140可对原始图像数据进行一个或多个图像处理操作、收集关于图像数据的统计信息。其中,图像处理操作可按相同或不同的位深度精度进行。
ISP处理器140还可从图像存储器130接收图像数据。例如,姿态传感器120接口将原始图像数据发送给图像存储器130,图像存储器130中的原始图像数据再提供给ISP处理器140以供处理。图像存储器130可为存储器装置的一部分、存储设备、或电子设备内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。
当接收到来自图像传感器114接口或来自姿态传感器120接口或来自图像存储器130的原始图像数据时,ISP处理器140可进行一个或多个图像处理操作,如时域滤波。处理后的图像数据可发送给图像存储器130,以便在被显示之前进行另外的处理。ISP处理器140从图像存储器130接收处理数据,并对所述处理数据进行原始域中以及RGB和YCbCr颜色空间中的图像数据处理。ISP处理器140处理后的图像数据可输出给显示器160,以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,ISP处理器140的输出还可发送给图像存储器130,且显示器160可从图像存储器130读取图像数据。在一个实施例中,图像存储器130可被配置为实现一个或多个帧缓冲器。
ISP处理器140确定的统计数据可发送给控制逻辑器150单元。例如,统计数据可包括陀螺仪的振动频率、自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、透镜112阴影校正等图像传感器114统计信息。控制逻辑器150可包括执行一个或多个例程(如固件)的处理器和/或微控制器,一个或多个例程可根据接收的统计数据,确定成像设备110的控制参数及ISP处理器140的控制参数。例如,成像设备110的控制参数可包括姿态传感器120控制参数(例如增益、曝光控制的积分时间、防抖参数等)、照相机闪光控制参数、照相机防抖位移参数、透镜112控制参数(例如聚焦或变焦用焦距)、或这些参数的组合。ISP控制参数可包括用于自动白平衡和颜色调整(例如,在RGB处理期间)的增益水平和色彩校正矩阵,以及透镜112阴影校正参数。
在一个实施例中,通过成像设备(照相机)110中的透镜112和图像传感器114获取当前帧图像,并将当前帧图像发送至ISP处理器140。ISP处理器140接收到当前帧图像后,检测当前帧图像中是否存在用户触发所产生的对焦框。当ISP处理器140检测到当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体。当ISP处理器140检测到当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,从而得到更准确的目标主体,提高主体检测的准确性。
ISP处理器获取到当前帧图像的目标主体之后,将该目标主体发送至控制逻辑器150。控制逻辑器150获取到目标主体之后,可以控制成像设备(照相机)110中的透镜112进行移动,对焦至目标主体对应的位置上,从而可以获取目标主体更加清晰的下一帧图像,并将下一帧图像发送至ISP处理器140。ISP处理器140接收到下一帧图像之后,可以将当前帧图像作为上一帧图像,将下一帧图像作为当前帧图像,执行检测当前帧图像中是否存在用户触发所产生的对焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,当当前帧图像中不存在对焦框时,获取上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体,从而可以生成目标主体更加清晰的目标视频。
图2为一个实施例中主体检测方法的流程图。如图2所示,该主体检测方法包括:
操作202,获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对焦框。
其中,当前帧图像指的是当前时刻获取的图像。当前帧图像可以是RGB(Red,Green,Blue)图像、灰度图像、深度图像、YUV图像中的Y分量所对应的图像等其中的任意一种。其中,YUV图像中的“Y”表示明亮度(Luminance或Luma),也就是灰阶值,“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述影像色彩及饱和度,用于指定像素的颜色。
具体地,电子设备的ISP处理器或中央处理器可获取当前帧图像,可对当前帧图像进行滤波处理,以去除噪声。接着,ISP处理器或中央处理器可检测在当前帧图像上是否存在对焦框,该对焦框为响应用户的触发指令所产生的。对焦框为用户选中的目标主体所在的区域。
在本实施例中,ISP处理器或中央处理器可检测在当前帧图像上是否接收到用户的触发指令,当接收到用户的触发指令时,在用户的触发位置产生相应的对焦框,则ISP处理器或中央处理器检测出当前帧图像中存在对焦框。当未接收到用户的触发指令时,表示当前帧图像中不存在用户触发所产生的对焦框。
在电子设备拍摄的预览模式下,获取当前帧图像。为减小后续处理的计算量,将可将当前帧图像缩放到较小的尺寸(如224*224,也可以为其他尺寸)。
操作204,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体。
具体地,当ISP处理器或中央处理器检测出当前帧图像中存在用户触发所产生的对焦框时,可扩大该对焦框,并将扩大后的对焦框进行裁剪,得到扩大后的对焦框中的图像。将该对焦框中的图像输入主体分割网络,可得到当前帧图像中的目标主体。
操作206,当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,跟踪框为上一帧图像中目标主体所在的区域。
其中,上一帧图像指的是与当前帧图像相邻的且在上一时刻获取的图像。上一帧图像均可以是RGB(Red,Green,Blue)图像、灰度图像、深度图像、YUV图像中的Y分量所对应的图像等其中的任意一种。其中,第一中心权重图是指用于记录图像中各个像素点的权重值的图。第一中心权重图中记录的权重值从中心向四边逐渐减小,即中心权重最大,向四边权重逐渐减小。通过第一中心权重图表征图像中的图像中心像素点到图像边缘像素点的权重值逐渐减小。跟踪框是指图像中的目标主体所在的区域。
具体地,当ISP处理器或中央处理器检测出当前帧图像中不存在用户触发所产生的对焦框时,ISP处理器或中央处理器可获取当前帧图像的上一帧图像。获取上一帧图像中的跟踪框,并根据上一帧图像中的跟踪框的大小生成对应的第一中心权重图。该第一中心权重图中的每个像素点均存在对应的权重值,并且该第一中心权重图中的权重值从中心向四边逐渐减小。该第一中心权重图可采用高斯函数、或采用一阶方程、或二阶方程生成。该高斯函数可为二维高斯函数。
接着,ISP处理器或中央处理器根据第一中心权重图遍历当前帧图像,得到多个主体区域,从该多个主体区域中可确定目标主体所在的区域。获取该目标主体所在的区域,将该区域输入主体分割网络,可得到当前帧图像中的目标主体。
在本实施例中,当前帧图像和上一帧图像均可以通过电子设备拍摄得到。电子设备可以设置摄像头,设置的摄像头的数量可以是一个或者多个。例如,设置1个、2个、3个、5个等,在此不做限定。摄像头设置于电子设备的形式不限,例如,可以是内置于电子设备的摄像头,也可以外置于电子设备的摄像头;可以是前置摄像头,也可以是后置摄像头。
当前帧图像与上一帧图像可以由电子设备中的同一个摄像头拍摄得到,也可以由不同的摄像头拍摄得到,不限于此。电子设备上的摄像头可以为任意类型的摄像头。例如,摄像头可以是彩色摄像头、灰度图像、黑白摄像头、深度摄像头、长焦摄像头、广角摄像头等,不限于此。
相对应地,通过彩色摄像头获取彩色图像,即RGB图像,通过黑白摄像头获取灰度图像,通过深度摄像头获取深度图像,通过长焦摄像头获取长焦图像,通过广角摄像头获取广角图像,不限于此。电子设备中的摄像头可以是相同类型的摄像头,也可以是不同类型的摄像头。例如,可以均是彩色摄像头,也可以均是黑白摄像头;可以其中的一个摄像头为长焦摄像头,其他的摄像头为广角摄像头,不限于此。
具体地,电子设备可以按照摄像头拍摄时间的先后顺序,将拍摄得到的各个图像存储于先入先出队列中,并从先入先出队列中获取当前帧图像和上一帧图像。
先入先出队列指的是先存储的图像先取出。电子设备先从先入先出队列中获取上一帧图像,再从先入先出队列中获取当前帧图像。
在另外一个实施例中,获取当前拍摄时刻和上一拍摄时刻;根据当前拍摄时刻获取当前帧图像;根据上一拍摄时刻获取上一帧图像。
电子设备获取当前拍摄时刻,可以获取拍摄频率;根据当前拍摄时刻和拍摄频率获取上一拍摄时刻。例如,当前拍摄时刻为15:45:56.200,拍摄频率10帧/s,即每100ms拍摄一帧图像,则上一拍摄时刻为15:45:56.100。根据当前拍摄时刻获取当前帧图像,根据上一拍摄时刻获取上一帧图像。
在一个实施例中,可以将当前帧图像和上一帧图像进行下采样处理,得到尺寸更小的当前帧图像和上一帧图像,从而节约了计算机处理的运算量。
在一个实施例中,可以将当前帧图像和上一帧图像均进行滤波处理,可以滤除当前帧图像和上一帧图像含有大量纹理细节的复杂背景所携带的高频噪声,或者滤除图像下采样带来的高频噪声,获取更准确的当前帧图像和上一帧图像,从而防止主体检测的误检。其中,滤波处理可以是高斯滤波处理、平滑滤波处理、双边滤波处理等其中的至少一种。下采样指的是在图像中间隔若干个像素点取样一次,从而得到新的图像。
本实施例中的主体检测方法,获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,可快速简单地确定当前帧图像中的目标主体。当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,可以更加准确的识别出当前帧图像中的目标主体。
在一个实施例中,如图3所示,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体,包括:
操作302,通过第一中心权重图在当前帧图像上滑动,得到各候选框。
其中,候选框为当前帧图像中可能存在目标主体的图像区域。
具体地,ISP处理器或中央处理器可使用第一中心权重图在当前帧图像上滑动,每滑动一次,可得到一个候选框。该候选框的大小跟中心权重图的大小相同,当使用第一中心权重图遍历完当前帧图像时,可得到多个候选框。
操作304,获取各候选框内的第一图像特征,将各候选框内的第一图像特征与目标滤波器进行卷积处理,得到各候选框对应的响应值。
其中,目标滤波器是指训练好的滤波器,进一步的,目标滤波器是指根据上一帧图像的主体区域进行更新之后得到的滤波器。响应值可以是回归响应值。第一图像特征是指经过点乘处理后得到的图 像特征。
具体地,ISP处理器或中央处理器获取目标滤波器,并获取各候选框分别对应的第一图像特征。将各候选框内的第一图像特征分别与目标滤波器进行卷积处理,得到每个候选框对应的响应值。
在本实施例中,第一中心权重图在当前帧图像上滑动,得到第一个候选框时,获取第一个候选框中的第一图像特征,将第一图像特征与目标滤波器进行卷积处理,得到第一个候选框对应的响应值。以此类推,每得到一个候选框,则将候选框中的第一图像特征与目标滤波器进行卷积处理,得到对应的响应值。使得在第一中心权重图遍历完当前帧图像时,即可得到最后一个候选框对应的响应值,从而加快了卷积处理的效率。
操作306,根据响应值中的最大值对应的候选框确定当前帧图像的目标主体。
具体地,ISP处理器或中央处理器获取各候选框对应的响应值,并确定最大的响应值。获取最大响应值对应的候选框,扩大该候选框,裁剪扩大后的候选框,得到该候选框中的主体区域。将该候选框中的主体区域输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,通过第一中心权重图遍历当前帧图像,得到各候选框,获取各候选框内的第一图像特征,将各候选框内的第一图像特征与目标滤波器进行卷积处理,得到各候选框对应的响应值,根据响应值中的最大值对应的候选框确定当前帧图像的目标主体,响应值最大的候选框中最有可能存在目标主体,从而可准确识别出图像中的目标主体。
在一个实施例中,获取各候选框内的第一图像特征,包括:获取各候选框内的第二图像特征,第二图像特征为第一中心权重图在当前帧图像上滑动时得到的候选框内各像素点的像素值;针对每个候选框,将候选框内的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,得到各候选框内的第一图像特征。
具体地,ISP处理器或中央处理器通过第一中心权重图在当前帧图像上滑动,每滑动一次可得到一个候选框,并得到该候选框中的图像特征,即第二图像特征。每个第二图像特征的大小与该第一中心权重图的大小相同。该第二图像特征为第一中心权重图在当前帧图像上滑动时得到的候选框内各像素点的像素值。每得到一个第二图像特征,则将组成该第二图像特征的各像素点的像素值与该第一中心权重图中对应的像素点的权重值进行点乘处理,得到一个候选框内的第一图像特征。该候选框中的第一图像特征即为第二图像特征经过点乘处理后得到的图像特征,该候选框的大小与该第一中心权重图的大小相同。
本实施例中,通过获取各候选框内的第二图像特征,第二图像特征为第一中心权重图在当前帧图像上滑动时得到的候选框内各像素点的像素值;针对每个候选框,将候选框内的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,得到各候选框内的第一图像特征,可以突出图像的中心区域,更容易识别出目标主体。
在一个实施例中,将候选框内的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,包括:将候选框内的各像素点的像素值分别进行对数运算处理;将经过对数运算处理后的候选框内的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理。
其中,对数运算是指对像素点的像素值取对数。
具体地,ISP处理器或中央处理器通过第一中心权重图在当前帧图像上滑动,每滑动一次可得到一个候选框中的第二图像特征。确定组成该第二图像特征的各像素点的像素值,即候选框中的各像素点的像素值,对各像素点的像素值分别进行对数运算处理,即分别取各像素点的像素值的对数。接着,将该候选框的各像素点的像素值的对数与第一中心权重图中对应的像素点的权重值进行点乘处理,得到该候选框中的中经过点乘处理后的图像特征,即第一图像特征。可以理解的是,候选框中的各像素点与第一中心权重图中的各像素点一一对应。
类似地,每滑动一次可得到一个候选框内的像素点,以及各像素点对应的像素值,都进行上述相同的处理,从而得到各个候选框中的第一图像特征。
在本实施中,在获取候选框中的各像素点的像素值之后,可将各像素点的像素值进行归一化处理,以减少计算量。接着,对经过归一化处理后的各像素点的像素值分别进行对数运算处理。
在本实施中,可将经过对数运算处理后的像素值进行归一化。
本实施例中,将候选框内的各像素点的像素值分别进行对数运算处理,能够减少对比度区域对目标跟踪的干扰。将经过对数运算处理后的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,可以突出图像的中心区域,更容易识别出目标主体。
在一个实施例中,根据响应值中的最大值对应的候选框确定当前帧图像的目标主体,包括:将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域;将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。
具体地,ISP处理器或中央处理器可确定响应值中的最大值,确定该最大值所对应的候选框。将该最大值所对应的候选框中的图像作为当前帧图像的第一主体区域。接着,将该第一主体区域从当前帧图像上裁剪出来,并将裁剪的第一主体区域放大到预设尺寸。接着,将放大到预设尺寸的第一主体区域输入到主体分割网络,得到当前帧图像中的目标主体。
在本实施例中,ISP处理器或中央处理器可获取上一帧图像的主体分割图像。该主体分割图像可为主体分割的二值化图像。将上一帧图像的主体分割图像和预设尺寸的第一主体区域输入到主体分割网络,得到当前帧图像中的目标主体。通过将上一帧图像的主体分割图像增加到主体分割网络中,增加了上一帧图像的主体位置信息,可提高主体检测的准确性。
本实施例中,将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域;将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体,可通过响应值确定目标主体所在的区域,将该包含目标主体的区域输入主体分割网络,可快速检测出当前帧图像中的目标主体。
在一个实施例中,如图4所示,将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域,包括:
操作402,确定响应值中的最大值对应的候选框的高度和宽度。
具体地,ISP处理器或中央处理器可确定响应值中的最大值,确定该最大值所对应的候选框,接着确定该候选框的高度值和宽度值。
操作404,将候选框的高度增加预设高度,将候选框的宽度增加预设宽度。
其中,预设高度是指预先设置的高度值、高度比例中的任一种。预设宽度是指预先设置的宽度值、宽度比例中的任一种。
具体地,ISP处理器或中央处理器获取预设高度值,在该候选框的高度值上增加预设高度值;获取预设宽度值,在该候选框的宽度值上增加预设宽度值,得到增加预设高度值和预设宽度值之后的候选框。
在本实施例中,电子设备的ISP处理器可获取预设高度比例,在该当前帧图像的候选框的高度值上增加预设高度比例;获取预设宽度比例,在该当前帧图像的候选框的宽度值上增加预设宽度比例,得到增加预设高度比例和预设宽度比例之后的候选框。
例如,该最大值对应的候选框的高和宽分别为h和w,将候选框的高均增加h/4,将候选框的宽均增加w/4。将外扩后的候选框位置记为当前帧图像的第一主体区域位置。
操作406,将增加预设高度和预设宽度后得到的候选框作为当前帧图像的第一主体区域。
具体地,电子设备的ISP处理器或中央处理器将该增加预设高度值和预设宽度值后得到的候选框作为当前帧图像的第一主体区域。
在本实施例中,电子设备的ISP处理器或中央处理器可将该增加预设高度比例和预设宽度比例后得到的候选框作为当前帧图像的第一主体区域。
可以理解的是,在该当前帧图像上增加候选框的高度和宽度时,候选框中的图像特征也相应增加。
本实施中的主体检测方法,确定响应值中的最大值对应的候选框的高度和宽度,将候选框的高度增加预设高度,将候选框的宽度增加预设宽度,将增加预设高度和预设宽度后得到的候选框作为当前帧图像的第一主体区域,可准确确定完整的目标主体所在的区域,避免因候选框过小导致检测出的目标主体部分缺失的情况。
在一个实施例中,还可通过KCF(Kernel Correlation Filter,核相关滤波算法)算法、MedianFlow 双向光流跟踪算法等确定当前帧图像中的主体区域。
在一个实施例中,如图5所示,将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体,包括:
操作502,检测当前帧图像中是否存在运动主体。
其中,运动主体是指处于运动状态的主体。
具体地,ISP处理器或中央处理器将当前帧图像通过背景减除处理,得到当前帧图像对应的二值图像。接着,对该二值图像进行连通域处理,得到二值图像中的各个候选主体的面积。当存在候选主体的面积大于或等于面积阈值时,判定当前帧图像中存在运动主体;当各个候选主体的面积均小于面积阈值时,判定当前帧图像中不存在运动主体。
其中,连通域一般是指图像中具有相同像素值且位置相邻的前景像素点组成的图像区域,前景像素点指的是主体像素点。连通区域处理是指将图像中的各个连通区域找出并标记。
具体地,电子设备的ISP处理器或中央处理器可检测出二值图像中的各个连通域并进行标记。每个连通域可作为一个候选主体。接着,确定该二值图像中各个候选主体的面积。
可以理解的是,当候选主体的面积越大时,表示该候选主体越靠近摄像头,则越靠近摄像头的对象为用户想拍摄的主体。因此,当各个候选主体的面积均小于面积阈值时,表示当前帧图像中的各个候选主体的面积都较小,可以认为各个候选主体均不是运动主体,也不是用户想拍摄的主体。
当存在候选主体的面积大于或等于面积阈值时,可以认为大于或等于面积阈值的候选主体为运动主体,也是用户想拍摄的主体。当各个候选主体的面积均小于面积阈值时,判定当前帧图像中不存在运动主体。
在另一个实施例中,也可以根据各个候选主体的轮廓边缘的清晰度,判断各个候选主体中是否存在运动主体。可以理解的是,当图像中存在运动主体时,则拍摄得到的图像的运动主体的轮廓边缘存在一定程度的模糊。因此,可以获取各个候选主体的轮廓边缘的清晰度,当轮廓边缘的清晰度高于清晰度阈值时,可以认为该候选主体为静止的物体,即不存在运动主体;当轮廓边缘的清晰度低于或等于清晰度阈值时,可以认为该候选主体为运动主体。
在其他实施例中,还可以提取各个候选主体中的特征点,生成各个特征点的特征描述子,再基于各个特征描述子确定各个候选主体中是否存在运动主体。
本申请确定各个候选主体中是否存在运动主体的方式可以但不限于以上几种。
操作504,当当前帧图像中存在运动主体时,获取包含运动主体的第二主体区域。
其中,第二主体区域可以是包含运动主体的矩形区域,也可以是包含运动主体的圆形区域,还可以是包含运动主体的不规则形状的区域,不限于此。
具体地,当电子设备的ISP处理器或中央处理器根据二值图像检测出当前帧图像中存在运动主体时,可获取当前帧图像中包含运动主体的第二主体区域。
操作506,将第一主体区域和第二主体区域进行融合处理,将融合处理后得到的主体区域输入主体分割网络,得到当前帧图像中的目标主体。
其中,主体分割图像可以与输入之前的候选主体相同,也可以与输入之前的候选主体不同。融合处理可以是与处理,与处理指的是一种逻辑运算操作。例如,0和1进行与处理得到0,1和0进行与处理得到0,1和1进行与处理才得到1。
具体地,ISP处理器或中央处理器将第一主体区域与第二主体区域进行与处理,可以是将第一主体区域中的像素点的值与第二主体区域中对应的像素点的值进行与处理,得到融合的主体区域。接着,将该融合处理后得到的主体区域输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,检测当前帧图像中是否存在运动主体,当当前帧图像中存在运动主体时,获取包含运动主体的第二主体区域,将第一主体区域和第二主体区域进行融合处理,将融合处理后得到的主体区域输入主体分割网络,得到当前帧图像中的目标主体,可以得到更准确的目标主体。
在一个实施例中,在得到当前帧图像的第一主体区域之后,方法还包括:根据当前帧图像的第一主体区域更新目标滤波器。
具体地,ISP处理器或中央处理器获取当前帧图像的第一主体区域,将所述第一主体区域输入目标滤波器,通过当前帧图像的第一主体区域训练目标滤波器,从而实现目标滤波器的更新。通过当前帧图像的第一主体区域更新目标滤波器,可减少光照、姿态、尺度等方面对主体检测的干扰。
在一个实施例中,可以将第一主体区域和目标滤波器从时域变换到频域,将时域卷积运算变换为频域乘积运算,以减少卷积的计算量。
在一个实施例中,如图6所示,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,包括:
操作602,当当前帧图像中存在对焦框时,确定对焦框的高度和宽度。
具体地,当ISP处理器或中央处理器检测到当前帧图像中存在用户触发所产生的对焦框时,可获取该对焦框的高度值和宽度值。
操作604,将对焦框的高度增加预设高度,将对焦框的宽度增加预设宽度。
具体地,ISP处理器或中央处理器获取预设高度值,在该对焦框的高度值上增加预设高度值;获取预设宽度值,在该对焦框的宽度值上增加预设宽度值,得到增加预设高度值和预设宽度值之后的对焦框。
在本实施例中,电子设备的ISP处理器可获取预设高度比例,在该当前帧图像的对焦框的高度值上增加预设高度比例;获取预设宽度比例,在该当前帧图像的对焦框的宽度值上增加预设宽度比例,得到增加预设高度比例和预设宽度比例之后的对焦框。
例如,该最大值对应的对焦框的高和宽分别为h和w,将对焦框的高均增加h/4,将对焦框的宽均增加w/4。将外扩后的对焦框位置记为当前帧图像的第一主体区域位置。
操作606,将增加预设高度和预设宽度后得到的对焦框作为当前帧图像的第一主体区域。
具体地,电子设备的ISP处理器或中央处理器将该增加预设高度值和预设宽度值后得到的对焦框作为当前帧图像的第一主体区域。
在本实施例中,电子设备的ISP处理器或中央处理器可将该增加预设高度比例和预设宽度比例后得到的对焦框作为当前帧图像的第一主体区域。
可以理解的是,在该当前帧图像上增加对焦框的高度和宽度时,对焦框中的图像特征也相应增加。
可以理解的是,对焦框所增加的预设高度和预设宽度与候选框所增加的预设高度和预设宽度可相同,也可以不相同。
操作608,将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。
接着,将该第一主体区域从当前帧图像上裁剪出来,并将裁剪的第一主体区域放大到预设尺寸。接着,将放大到预设尺寸的第一主体区域输入到主体分割网络,得到当前帧图像中的目标主体。
在本实施例中,ISP处理器或中央处理器可获取上一帧图像的主体分割图像。该主体分割图像可为上一帧图像的主体区域经过主体分割网络后输出的包含目标主体的二值图像。将上一帧图像的主体分割图像和预设尺寸的第一主体区域输入到主体分割网络,得到当前帧图像中的目标主体。通过将上一帧图像的主体分割图像增加到主体分割网络中,增加了上一帧图像的主体位置信息,可提高主体检测的准确性。
本实施中的主体检测方法,当当前帧图像中存在对焦框时,确定对焦框的高度和宽度,将对焦框的高度增加预设高度,将对焦框的宽度增加预设宽度,将增加预设高度和预设宽度后得到的对焦框作为当前帧图像的第一主体区域,可准确确定完整的目标主体所在的区域,避免因目标主体移动而对焦框尺寸不变导致检测出的目标主体部分缺失的情况。
在一个实施例中,将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体,包括:获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。
具体地,主体分割网络deeplabv3+、U-Net等网络架构。如图7所示,ISP处理器或中央处理器可获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,经过多层卷积后可输出该第一主体区域的二值图像,从而得到当前帧图像中的目标主体。 传统的网络输入为RGB三通道的图像,进行网络分割预测。本实施例中的主体分割网络,增加了一个通道——即上一帧图像的主体分割二值图像。该主体分割二值图像将上一帧的主体位置信息带入到网络中,可以提高网络在视频场景中的分割效果。
在一个实施例中,将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体,包括:获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像的主体分割图像;确定当前帧图像的主体分割图像占当前帧图像的比例;当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,将当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
具体地,ISP处理器或中央处理器将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络。经过多层卷积后可输出该第一主体区域的二值图像,该二值图像即为当前帧图像的主体分割图像。接着,计算该主体分割图像占当前帧图像的比例。获取比例阈值,确定计算得到的该主体分割图像占当前帧图像比例是否大于比例阈值。当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,表示目标主体已离开当前画面,即该主体分割图像中并未存在目标主体,或者该主体分割图像中只存在目标主体的一部分,目标主体不完整,需要重新检测当前帧图像中的目标主体。则将当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
在本实施中,可获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像的主体分割图像,确定当前帧图像的主体分割图像占当前帧图像的比例。通过比例是否大于比例阈值,可确定当前的检测结果中是否存在目标主体,以及检测出的目标主体是否完整。当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,表示检测不准确,检测出的目标主体是否完整,则将当前帧图像输入主体分割网络,从而可得到当前帧图像中的目标主体。
在一个实施例中,如图8所示,该目标滤波器的生成方式,包括:
操作802,获取与对焦框对应的第二中心权重图,其中,第二中心权重图中的每个像素点均存在对应的权重值,第二中心权重图中的权重值从中心到边缘逐渐减小。
其中,第二中心权重图是指用于记录图像中各个像素点的权重值的图。第二中心权重图中记录的权重值从中心向四边逐渐减小,即中心权重最大,向四边权重逐渐减小。通过第二中心权重图表征图像中的图像中心像素点到图像边缘像素点的权重值逐渐减小。第二中心权重图可以与第一中心权重图相同,也可以不相同,即第二中心权重图的大小与第一中心权重图的大小可相同,也可不同,根据具体情况设置。
具体地,ISP处理器或中央处理器可根据对焦框内的图像的大小生成对应的第二中心权重图。该第二中心权重图可采用高斯函数、或采用一阶方程、或二阶方程生成。该高斯函数可为二维高斯函数。
在本实施例中,当对焦框与上一帧图像的跟踪框的大小相同时,可直接获取第一中心权重图作为对焦框对应的中心权重图。
操作804,将对焦框中的图像的各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理。
具体地,ISP处理器或中央处理器可获取对焦框中的图像的各像素点,以及各像素点的像素值,将对焦框中的各像素点与第二中心权重图中的各像素点进行匹配。接着,将匹配成功的像素点的像素值与第二中心权重图中的对应的像素点的权重值进行点乘。
操作806,将经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像。
其中,仿射变换(Affine Transformation)是一种二维坐标到二维坐标之间的线性变换,保持二维图形的“平直性”(即变换后直线还是直线不会打弯,圆弧还是圆弧)和“平行性”(即保证二维图形间的相对位置关系不变,平行线还是平行线,相交直线的交角不变)。简单来说,仿射变换就是允许图形任意倾斜,并且允许图形在两个方向上任意伸缩变换。并且,保持图形的线共点、点公线的关 系不变,原来相互平行的线任然平行,原来的中点仍然是中点,保持直线线段之间的比例关系不变。但线段的长度可能发生改变、夹角的角度可能发生改变。
具体地,ISP处理器或中央处理器将对焦框中的图像与第二中心权重图中匹配成功的像素点均完成点乘处理后,将经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像。
在本实施例中,仿射变换可以通过一系列的原子变换的复合来实现,包括但不限于平移(Translation)、缩放(Scale)、翻转(Flip)、旋转(Rotation)和剪切(Shear)等方式。
操作808,根据预设数量的变换框中的图像对初始滤波器进行训练,当满足预设条件时,得到目标滤波器。
其中,初始滤波器是指待训练的滤波器。
具体地,ISP处理器或中央处理器将预设数量的变换框中的图像与初始滤波器进行卷积处理,得到各变换框对应的响应值。根据最大响应值对应的变换框中的图像得到当前帧图像中的目标主体。接着可获取对焦框中的已标注的目标主体。将对焦框中的已标注的目标主体和根据最大响应值对应的变换框中的图像所得到的目标主体进行对比,根据两者的差异调整初始滤波器的参数。调整参数后再次进行训练,直到两者的差异小于差异阈值时,停止训练。确定此时的滤波器参数,从而得到目标滤波器。因此,该预设条件可以是根据最大响应值对应的变换框中的图像所得到的目标主体与对焦框中的已标注的目标主体的差异小于差异阈值。
本实施例中,获取与对焦框对应的第二中心权重图,其中,第二中心权重图中的每个像素点均存在对应的权重值,第二中心权重图中的权重值从中心到边缘逐渐减小,将对焦框中的图像与第二中心权重图进行点乘处理,将经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像,根据预设数量的变换框中的图像对初始滤波器进行训练,当满足预设条件时,得到目标滤波器,从而可得到训练好的目标滤波器。
如图9所示,为一个实施例中生成目标滤波器的示意图。获取对焦框中的图像,将对焦框及对焦框中的图像进行仿射变换,得到多个变换框,根据多个变换框中的图像训练初始滤波器,满足预设条件时,得到目标滤波器。
在一个实施例中,将对焦框中的图像与第二中心权重图进行点乘处理,包括:将对焦框中的图像的各像素点的像素值进行对数运算处理;将经过对数运算处理后的各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理。
具体地,ISP处理器或中央处理器可获取对焦框中的图像的各像素点的像素值,对各像素点的像素值分别进行对数运算处理,即分别取各像素点的像素值的对数。接着,将该对焦框内的各像素点的像素值的对数与第二中心权重图中对应的像素点的权重值进行点乘处理,得到经过点乘处理后的对焦框中图像特征。可以理解的是,对焦框内的各像素点与第二中心权重图中的各像素点一一对应。
在本实施中,在获取对焦框中的图像的各像素点的像素值之后,可将各像素点的像素值进行归一化处理,以减少计算量。接着,对经过归一化处理后的各像素点的像素值分别进行对数运算处理。
在本实施例中,对获取对焦框中的图像的各像素点的像素值,分别进行对数运算处理,即分别取各像素点的像素值的对数之后,对各像素点的对数进行归一化处理。将经过归一化处理的对焦框中的各像素点与第二中心权重图中的对应的像素点进行点乘处理。通过点乘之后,对焦框内图像边缘的像素值就趋近于0,可以增强对中间目标区域的关注。
本实施例中,将对焦框中的图像的各像素点的像素值进行对数运算处理,将经过对数运算处理后的对焦框中的图像各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理,可减少干扰,突出对焦框中的图像的中心区域,更容易识别出目标主体。
如图10所示,为一个实施例中将对焦框中的图像与第二中心权重图进行点乘处理的示意图。获取对焦框中的图像的各像素点的像素值,将各像素值进行对数计算,接着,对各像素点的对数进行归一化处理。将经过归一化处理后的对焦框中的各像素点与第二中心权重图中的对应的像素点进行点乘处理。
在一个实施例中,如图11所示,获取当前帧图像1102,通过对焦模块1104执行操作1106,即通 过对焦模块检测当前帧图像中是否存在用户触发所产生的对焦框。是则执行操作1108,放大对焦框并裁剪得到目标主体区域。将目标主体区域输入主体分割网络1110,得到当前帧图像中的目标主体和主体分割图像。
执行操作1112,将主体分割图像映射回当前帧图像。确定当前帧图像的主体分割图像占当前帧图像的比例;当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,将当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
执行操作1114,对焦至目标主体,可以获取该目标主体更清晰的下一帧图像,将当前帧图像作为上一帧图像,将下一帧图像作为当前帧图像,执行检测当前帧图像中是否存在用户触发所产生的对焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,当当前帧图像中不存在对焦框时,获取上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体。
当前帧图像中不存在对焦框时,执行操作1116,获取上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的第一主体区域1118。接着,执行操作1120,将当前帧图像进行预处理,例如高斯滤波处理,可以消除图像中含有大量纹理细节的复杂背景下带来的高频噪声,以及图像下采样带来的高频噪声。
执行操作1122,检测滤波处理后的当前帧图像中是否存在运动主体,存在则获取包含运动主体的第二主体区域1124。
执行操作1126,将第一主体区域1118和第二主体区域1124进行融合处理,执行操作1108,裁剪放大得到目标主体区域。将目标主体区域输入主体分割网络1110,从而得到当前帧图像中的目标主体和主体分割图像。
执行操作1112,将主体分割图像映射回当前帧图像。确定当前帧图像的主体分割图像占当前帧图像的比例;当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,将当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
执行操作1114,对焦至目标主体,可以获取该目标主体更清晰的下一帧图像,将当前帧图像作为上一帧图像,将下一帧图像作为当前帧图像,执行检测当前帧图像中是否存在用户触发所产生的对焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,当当前帧图像中不存在对焦框时,获取上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体,从而得到目标主体更清晰的目标视频。
应该理解的是,虽然图2-图11的流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图2-图11中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。
图12为一个实施例的主体检测装置的结构框图。如图12所示,该主体检测装置,包括:获取模块1202、第一确定模块1204和第二确定模块1206。其中,
获取模块1202,用于获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对焦框。
第一确定模块1204,用于当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体。
第二确定模块1206,用于当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,跟踪框为上一帧图像中目标主体所在的区域。
本实施例中的主体检测装置,获取当前帧图像,检测当前帧图像中是否存在用户触发所产生的对 焦框,当当前帧图像中存在对焦框时,根据对焦框中的图像确定当前帧图像的目标主体,可快速简单地确定当前帧图像中的目标主体。当当前帧图像中不存在对焦框时,获取当前帧图像的上一帧图像的跟踪框,根据上一帧图像的跟踪框确定第一中心权重图,根据第一中心权重图遍历当前帧图像,得到当前帧图像的目标主体;其中,第一中心权重图中的每个像素点均存在对应的权重值,第一中心权重图中的权重值从中心到边缘逐渐减小,可以更加准确的识别出当前帧图像中的目标主体。
在一个实施例中,第二确定模块1206还用于:通过第一中心权重图在当前帧图像上滑动,得到各候选框;获取各候选框内的第一图像特征,将各候选框内的第一图像特征与目标滤波器进行卷积处理,得到各候选框对应的响应值;根据响应值中的最大值对应的候选框确定当前帧图像的目标主体。
本实施例中,通过第一中心权重图遍历当前帧图像,得到各候选框,获取各候选框内的第一图像特征,将各候选框内的第一图像特征与目标滤波器进行卷积处理,得到各候选框对应的响应值,根据响应值中的最大值对应的候选框确定当前帧图像的目标主体,响应值最大的候选框中最有可能存在目标主体,从而可准确识别出图像中的目标主体。
在一个实施例中,第二确定模块1206还用于:获取所述各候选框内的第二图像特征,所述第二图像特征为所述第一中心权重图在所述当前帧图像上滑动时得到的候选框内各像素点的像素值;
针对每个候选框,将候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理,得到所述各候选框内的第一图像特征。
本实施例中,通过获取各候选框内的第二图像特征,第二图像特征为第一中心权重图在当前帧图像上滑动时得到的候选框内各像素点的像素值;针对每个候选框,将候选框内的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,得到各候选框内的第一图像特征,可以突出图像的中心区域,更容易识别出目标主体。
在一个实施例中,第二确定模块1206还用于:将候选框内的各像素点的像素值分别进行对数运算处理;将经过对数运算处理后的候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理。
本实施例中,将候选框内的各像素点的像素值分别进行对数运算处理,能够减少对比度区域对目标跟踪的干扰。将经过对数运算处理后的各像素点的像素值分别与第一中心权重图中对应的像素点的权重值进行点乘处理,可以突出图像的中心区域,更容易识别出目标主体。
在一个实施例中,第二确定模块1206还用于:将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域;将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,将响应值中的最大值对应的候选框作为当前帧图像的第一主体区域;将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体,可通过响应值确定目标主体所在的区域,将该包含目标主体的区域输入主体分割网络,可快速检测出当前帧图像中的目标主体。
在一个实施例中,第二确定模块1206还用于:确定响应值中的最大值对应的候选框的高度和宽度;将候选框的高度增加预设高度,将候选框的宽度增加预设宽度;将增加预设高度和预设宽度后得到的候选框作为当前帧图像的第一主体区域。
本实施中,确定响应值中的最大值对应的候选框的高度和宽度,将候选框的高度增加预设高度,将候选框的宽度增加预设宽度,将增加预设高度和预设宽度后得到的候选框作为当前帧图像的第一主体区域,可准确确定完整的目标主体所在的区域,避免因候选框过小导致检测出的目标主体部分缺失的情况。
在一个实施例中,第二确定模块1206还用于:检测当前帧图像中是否存在运动主体;当当前帧图像中存在运动主体时,获取包含运动主体的第二主体区域;将第一主体区域和第二主体区域进行融合处理,将融合处理后得到的主体区域输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,检测当前帧图像中是否存在运动主体,当当前帧图像中存在运动主体时,获取包含运动主体的第二主体区域,将第一主体区域和第二主体区域进行融合处理,将融合处理后得到的主体区域输入主体分割网络,得到当前帧图像中的目标主体,可以得到更准确的目标主体。
在一个实施例中,在得到当前帧图像的第一主体区域之后,方法还包括:更新模块。该更新模块 用于:根据当前帧图像的第一主体区域更新目标滤波器。
通过当前帧图像的第一主体区域更新目标滤波器,可减少光照、姿态、尺度等方面对主体检测的干扰。
在一个实施例中,第一确定模块1204还用于:当当前帧图像中存在对焦框时,确定对焦框的高度和宽度;将对焦框的高度增加预设高度,将对焦框的宽度增加预设宽度;将增加预设高度和预设宽度后得到的对焦框作为当前帧图像的第一主体区域;将当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。
本实施中的主体检测装置,当当前帧图像中存在对焦框时,确定对焦框的高度和宽度,将对焦框的高度增加预设高度,将对焦框的宽度增加预设宽度,将增加预设高度和预设宽度后得到的对焦框作为当前帧图像的第一主体区域,可准确确定完整的目标主体所在的区域,避免因目标主体移动而对焦框尺寸不变导致检测出的目标主体部分缺失的情况。
在一个实施例中,第一确定模块1204还用于:获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像中的目标主体。该上一帧图像的主体分割图像将上一帧的主体位置信息带入到网络中,可以提高网络在视频场景中的分割效果。
在一个实施例中,第一确定模块1204还用于:获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像的主体分割图像;确定当前帧图像的主体分割图像占当前帧图像的比例;当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,将当前帧图像输入主体分割网络,得到当前帧图像中的目标主体。
本实施例中,获取上一帧图像的主体分割图像,将上一帧图像的主体分割图像和当前帧图像的第一主体区域输入主体分割网络,得到当前帧图像的主体分割图像,确定当前帧图像的主体分割图像占当前帧图像的比例。通过比例是否大于比例阈值,可确定当前的检测结果中是否存在目标主体,以及检测出的目标主体是否完整。当当前帧图像的主体分割图像占当前帧图像的比例小于比例阈值时,表示检测不准确,检测出的目标主体是否完整,则将当前帧图像输入主体分割网络,从而可得到当前帧图像中的目标主体。
在一个实施例中,该装置还包括:生成装置。该生成装置用于:获取与对焦框对应的第二中心权重图,其中,第二中心权重图中的每个像素点均存在对应的权重值,第二中心权重图中的权重值从中心到边缘逐渐减小;将对焦框中的图像的各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理;将经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像;根据预设数量的变换框中的图像对初始滤波器进行训练,当满足预设条件时,得到目标滤波器。
本实施例中,获取与对焦框对应的第二中心权重图,其中,第二中心权重图中的每个像素点均存在对应的权重值,第二中心权重图中的权重值从中心到边缘逐渐减小,将对焦框中的图像与第二中心权重图进行点乘处理,将经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像,根据预设数量的变换框中的图像对初始滤波器进行训练,当满足预设条件时,得到目标滤波器,从而可得到训练好的目标滤波器。
在一个实施例中,第一确定模块1204还用于:将对焦框中的图像的各像素点的像素值进行对数运算处理;将经过对数运算处理后的各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理。
本实施例中,将对焦框中的图像的各像素点的像素值进行对数运算处理,将经过对数运算处理后的各像素点的像素值与第二中心权重图中对应的像素点的权重值进行点乘处理,可减少干扰,突出对焦框中的图像的中心区域,更容易识别出目标主体。
上述主体检测装置中各个模块的划分仅用于举例说明,在其他实施例中,可将主体检测装置按照需要划分为不同的模块,以完成上述主体检测装置的全部或部分功能。
图13为一个实施例中电子设备的内部结构示意图。如图13所示,该电子设备包括通过系统总线 连接的处理器和存储器。其中,该处理器用于提供计算和控制能力,支撑整个电子设备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种主体检测方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。该电子设备可以是手机、平板电脑或者个人数字助理或穿戴式设备等。
本申请实施例中提供的主体检测装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在终端或服务器上运行。该计算机程序构成的程序模块可存储在终端或服务器的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。
本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行主体检测方法的操作。
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行主体检测方法。
本申请实施例所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种主体检测方法,其特征在于,包括:
    获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
    当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
    当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一中心权重图遍历所述当前帧图像,得到所述当前帧图像的目标主体,包括:
    通过所述第一中心权重图在所述当前帧图像上滑动,得到各候选框;
    获取所述各候选框内的第一图像特征,将所述各候选框内的第一图像特征与目标滤波器进行卷积处理,得到所述各候选框对应的响应值;
    根据所述响应值中的最大值对应的候选框确定所述当前帧图像的目标主体。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述各候选框内的第一图像特征,包括:
    获取所述各候选框内的第二图像特征,所述第二图像特征为所述第一中心权重图在所述当前帧图像上滑动时得到的候选框内各像素点的像素值;
    针对每个候选框,将候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理,得到所述各候选框内的第一图像特征。
  4. 根据权利要求3所述的方法,其特征在于,所述将候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理,包括:
    将候选框内的各像素点的像素值分别进行对数运算处理;
    将经过对数运算处理后的候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理。
  5. 根据权利要求2所述的方法,其特征在于,所述根据所述响应值中的最大值对应的候选框确定所述当前帧图像的目标主体,包括:
    将所述响应值中的最大值对应的候选框作为所述当前帧图像的第一主体区域;
    将所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述响应值中的最大值对应的候选框作为所述当前帧图像的第一主体区域,包括:
    确定所述响应值中的最大值对应的候选框的高度和宽度;
    将所述候选框的高度增加预设高度,将所述候选框的宽度增加预设宽度;
    将增加预设高度和预设宽度后得到的候选框作为所述当前帧图像的第一主体区域。
  7. 根据权利要求5或6所述的方法,其特征在于,所述将所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体,包括:
    检测所述当前帧图像中是否存在运动主体;
    当所述当前帧图像中存在运动主体时,获取包含所述运动主体的第二主体区域;
    将所述第一主体区域和所述第二主体区域进行融合处理,将融合处理后得到的主体区域输入主体分割网络,得到所述当前帧图像中的目标主体。
  8. 根据权利要求7所述的方法,其特征在于,所述检测所述当前帧图像中是否存在运动主体,包括:
    将所述当前帧图像进行背景减除处理,得到所述当前帧图像对应的二值图像;
    对所述二值图像进行连通域处理,得到所述二值图像中的各个候选主体的面积;
    当存在候选主体的面积大于或等于面积阈值时,判定所述当前帧图像中存在运动主体;
    当各个候选主体的面积均小于所述面积阈值时,判定所述当前帧图像中不存在运动主体。
  9. 根据权利要求5所述的方法,其特征在于,在得到所述当前帧图像的第一主体区域之后,所述方法还包括:
    根据所述当前帧图像的第一主体区域更新所述目标滤波器。
  10. 根据权利要求1所述的方法,其特征在于,所述当所述当前帧图像中存在对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体,包括:
    当所述当前帧图像中存在对焦框时,确定所述对焦框的高度和宽度;
    将所述对焦框的高度增加预设高度,将所述对焦框的宽度增加预设宽度;
    将增加预设高度和预设宽度后得到的对焦框作为所述当前帧图像的第一主体区域;
    将所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体。
  11. 根据权利要求5、6或10所述的方法,其特征在于,所述将所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体,包括:
    获取上一帧图像的主体分割图像,将所述上一帧图像的主体分割图像和所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体。
  12. 根据权利要求5、6或10所述的方法,其特征在于,所述将所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像中的目标主体,包括:
    获取上一帧图像的主体分割图像,将所述上一帧图像的主体分割图像和所述当前帧图像的第一主体区域输入主体分割网络,得到所述当前帧图像的主体分割图像;
    确定所述当前帧图像的主体分割图像占所述当前帧图像的比例;
    当所述当前帧图像的主体分割图像占所述当前帧图像的比例小于比例阈值时,将所述当前帧图像输入主体分割网络,得到所述当前帧图像中的目标主体。
  13. 根据权利要求2所述的方法,其特征在于,所述目标滤波器的生成方式,包括:
    获取与所述对焦框对应的第二中心权重图,其中,所述第二中心权重图中的每个像素点均存在对应的权重值,所述第二中心权重图中的权重值从中心到边缘逐渐减小;
    将所述对焦框中的图像的各像素点的像素值与所述第二中心权重图中对应的像素点的权重值进行点乘处理;
    将所述经过点乘处理后的对焦框中的图像进行仿射变换,得到预设数量的变换框中的图像;
    根据所述预设数量的变换框中的图像对初始滤波器进行训练,当满足预设条件时,得到目标滤波器。
  14. 根据权利要求13所述的方法,其特征在于,所述将所述对焦框中的图像的各像素点的像素值与所述第二中心权重图中对应的像素点的权重值进行点乘处理,包括:
    将所述对焦框中的图像的各像素点的像素值进行对数运算处理;
    将经过对数运算处理后的各像素点的像素值与所述第二中心权重图中对应的像素点的权重值进行点乘处理。
  15. 根据权利要求14所述的方法,其特征在于,所述将经过对数运算处理后的各像素点的像素值与所述第二中心权重图中对应的像素点的权重值进行点乘处理,包括:
    将经过对数运算处理后的各像素点的像素值进行归一化处理;
    将经过归一化处理后的各像素点的像素值与所述第二中心权重图中对应的像素点的权重值进行点乘处理。
  16. 一种主体检测装置,其特征在于,包括:
    获取模块,用于获取当前帧图像,检测所述当前帧图像中是否存在用户触发所产生的对焦框;
    第一确定模块,用于当所述当前帧图像中存在所述对焦框时,根据所述对焦框中的图像确定所述当前帧图像的目标主体;
    第二确定模块,用于当所述当前帧图像中不存在对焦框时,获取所述当前帧图像的上一帧图像的跟踪框,根据所述上一帧图像的跟踪框确定第一中心权重图,根据所述第一中心权重图遍历所述当前 帧图像,得到所述当前帧图像的目标主体;其中,所述第一中心权重图中的每个像素点均存在对应的权重值,所述第一中心权重图中的权重值从中心到边缘逐渐减小,所述跟踪框为所述上一帧图像中目标主体所在的区域。
  17. 根据权利要求16所述的装置,其特征在于,所述第二确定模块还用于:
    通过所述第一中心权重图在所述当前帧图像上滑动,得到各候选框;
    获取所述各候选框内的第一图像特征,将所述各候选框内的第一图像特征与目标滤波器进行卷积处理,得到所述各候选框对应的响应值;
    根据所述响应值中的最大值对应的候选框确定所述当前帧图像的目标主体。
  18. 根据权利要求17所述的装置,其特征在于,所述第二确定模块还用于:
    获取所述各候选框内的第二图像特征,所述第二图像特征为所述第一中心权重图在所述当前帧图像上滑动时得到的候选框内各像素点的像素值;
    针对每个候选框,将候选框内的各像素点的像素值分别与所述第一中心权重图中对应的像素点的权重值进行点乘处理,得到所述各候选框内的第一图像特征。
  19. 一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至15中任一项所述的主体检测方法的操作。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至15中任一项所述的主体检测方法的操作。
PCT/CN2020/120116 2019-10-29 2020-10-10 主体检测方法和装置、电子设备、计算机可读存储介质 WO2021082883A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20881883.1A EP4044579A4 (en) 2019-10-29 2020-10-10 METHOD AND DEVICE FOR MAIN BODY DETECTION, ELECTRONIC DEVICE AND COMPUTER READABLE STORAGE MEDIUM
US17/711,455 US20220222830A1 (en) 2019-10-29 2022-04-01 Subject detecting method and device, electronic device, and non-transitory computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911038668.4A CN110661977B (zh) 2019-10-29 2019-10-29 主体检测方法和装置、电子设备、计算机可读存储介质
CN201911038668.4 2019-10-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/711,455 Continuation US20220222830A1 (en) 2019-10-29 2022-04-01 Subject detecting method and device, electronic device, and non-transitory computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021082883A1 true WO2021082883A1 (zh) 2021-05-06

Family

ID=69042122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/120116 WO2021082883A1 (zh) 2019-10-29 2020-10-10 主体检测方法和装置、电子设备、计算机可读存储介质

Country Status (4)

Country Link
US (1) US20220222830A1 (zh)
EP (1) EP4044579A4 (zh)
CN (1) CN110661977B (zh)
WO (1) WO2021082883A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796041B (zh) * 2019-10-16 2023-08-18 Oppo广东移动通信有限公司 主体识别方法和装置、电子设备、计算机可读存储介质
CN110661977B (zh) * 2019-10-29 2021-08-03 Oppo广东移动通信有限公司 主体检测方法和装置、电子设备、计算机可读存储介质
CN111767752B (zh) * 2020-06-11 2022-09-23 网易宝有限公司 一种二维码识别方法及装置
CN113489897B (zh) * 2021-06-28 2023-05-26 杭州逗酷软件科技有限公司 图像处理方法及相关装置
CN113743249B (zh) * 2021-08-16 2024-03-26 北京佳服信息科技有限公司 一种违章识别方法、装置、设备及可读存储介质
CN113392820B (zh) * 2021-08-17 2021-11-30 南昌虚拟现实研究院股份有限公司 动态手势识别方法、装置、电子设备及可读存储介质
CN115623318B (zh) * 2022-12-20 2024-04-19 荣耀终端有限公司 对焦方法及相关装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1533996A1 (en) * 2003-11-24 2005-05-25 Mitutoyo Corporation Systems and methods for rapidly automatically focusing a machine vision inspection system
CN104243825A (zh) * 2014-09-22 2014-12-24 广东欧珀移动通信有限公司 一种移动终端自动对焦方法及系统
CN104966304A (zh) * 2015-06-08 2015-10-07 深圳市赛为智能股份有限公司 基于卡尔曼滤波与非参数背景模型的多目标检测跟踪方法
CN107066990A (zh) * 2017-05-04 2017-08-18 厦门美图之家科技有限公司 一种目标跟踪方法及移动设备
CN108986140A (zh) * 2018-06-26 2018-12-11 南京信息工程大学 基于相关滤波和颜色检测的目标尺度自适应跟踪方法
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
CN110248096A (zh) * 2019-06-28 2019-09-17 Oppo广东移动通信有限公司 对焦方法和装置、电子设备、计算机可读存储介质
CN110661977A (zh) * 2019-10-29 2020-01-07 Oppo广东移动通信有限公司 主体检测方法和装置、电子设备、计算机可读存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120065B (zh) * 2019-05-17 2022-08-26 南京邮电大学 一种基于分层卷积特征和尺度自适应核相关滤波的目标跟踪方法及系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1533996A1 (en) * 2003-11-24 2005-05-25 Mitutoyo Corporation Systems and methods for rapidly automatically focusing a machine vision inspection system
CN104243825A (zh) * 2014-09-22 2014-12-24 广东欧珀移动通信有限公司 一种移动终端自动对焦方法及系统
CN104966304A (zh) * 2015-06-08 2015-10-07 深圳市赛为智能股份有限公司 基于卡尔曼滤波与非参数背景模型的多目标检测跟踪方法
CN107066990A (zh) * 2017-05-04 2017-08-18 厦门美图之家科技有限公司 一种目标跟踪方法及移动设备
CN108986140A (zh) * 2018-06-26 2018-12-11 南京信息工程大学 基于相关滤波和颜色检测的目标尺度自适应跟踪方法
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
CN110248096A (zh) * 2019-06-28 2019-09-17 Oppo广东移动通信有限公司 对焦方法和装置、电子设备、计算机可读存储介质
CN110661977A (zh) * 2019-10-29 2020-01-07 Oppo广东移动通信有限公司 主体检测方法和装置、电子设备、计算机可读存储介质

Also Published As

Publication number Publication date
EP4044579A4 (en) 2022-11-30
EP4044579A1 (en) 2022-08-17
CN110661977B (zh) 2021-08-03
US20220222830A1 (en) 2022-07-14
CN110661977A (zh) 2020-01-07

Similar Documents

Publication Publication Date Title
WO2021082883A1 (zh) 主体检测方法和装置、电子设备、计算机可读存储介质
US11430103B2 (en) Method for image processing, non-transitory computer readable storage medium, and electronic device
WO2020259179A1 (zh) 对焦方法、电子设备和计算机可读存储介质
CN110428366B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN108898567B (zh) 图像降噪方法、装置及系统
JP4539729B2 (ja) 画像処理装置、カメラ装置、画像処理方法、およびプログラム
CN107409166B (zh) 摇摄镜头的自动生成
JP6961797B2 (ja) プレビュー写真をぼかすための方法および装置ならびにストレージ媒体
US11836903B2 (en) Subject recognition method, electronic device, and computer readable storage medium
US11538175B2 (en) Method and apparatus for detecting subject, electronic device, and computer readable storage medium
US10764496B2 (en) Fast scan-type panoramic image synthesis method and device
CN110651297B (zh) 使用引导图像对合成的长曝光图像进行可选增强
WO2021093534A1 (zh) 主体检测方法和装置、电子设备、计算机可读存储介质
WO2019056527A1 (zh) 一种拍摄方法及装置
CN110349163B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN110650288B (zh) 对焦控制方法和装置、电子设备、计算机可读存储介质
CN110365897B (zh) 图像修正方法和装置、电子设备、计算机可读存储介质
CN110399823B (zh) 主体跟踪方法和装置、电子设备、计算机可读存储介质
CN112581481B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN110689007B (zh) 主体识别方法和装置、电子设备、计算机可读存储介质
CN110545384B (zh) 对焦方法和装置、电子设备、计算机可读存储介质
TW201820261A (zh) 用於合成人物的影像合成方法
CN116894935A (zh) 对象识别方法和装置、存储介质及电子装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881883

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020881883

Country of ref document: EP

Effective date: 20220513