WO2021057359A1 - 图像处理方法、电子设备及可读存储介质 - Google Patents

图像处理方法、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2021057359A1
WO2021057359A1 PCT/CN2020/110771 CN2020110771W WO2021057359A1 WO 2021057359 A1 WO2021057359 A1 WO 2021057359A1 CN 2020110771 W CN2020110771 W CN 2020110771W WO 2021057359 A1 WO2021057359 A1 WO 2021057359A1
Authority
WO
WIPO (PCT)
Prior art keywords
video image
frame
video
target area
image
Prior art date
Application number
PCT/CN2020/110771
Other languages
English (en)
French (fr)
Inventor
黄晓政
郑云飞
闻兴
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2021057359A1 publication Critical patent/WO2021057359A1/zh
Priority to US17/706,457 priority Critical patent/US20220222831A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present disclosure belongs to the technical field of video processing, and particularly relates to an image processing method, electronic equipment, and a readable storage medium.
  • the salient area here refers to the video image that is more likely to be affected by people. The area of concern.
  • a salient area detection algorithm when determining the salient area in a video image, is often used to perform visual saliency detection for each frame of video image one by one to determine the salient area in each frame of video image.
  • the present disclosure provides an image processing method, electronic equipment, and readable storage medium.
  • an image processing method including:
  • the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video, and one frame of the second video image Associated with a frame of the first video image.
  • the acquiring at least one frame of the first video image in the to-be-processed video includes:
  • At least one frame of video image is selected from the video images included in the video to be processed as at least one frame of the first video image.
  • the first target area of the first video image of at least one frame is determined to determine at least one frame of the first video image other than the first video image in the to-be-processed video.
  • the second target area of the video image includes:
  • Image tracking is performed on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • the first target area of the first video image of at least one frame is determined to determine at least one frame of the first video image other than the first video image in the to-be-processed video.
  • the second target area of the video image includes:
  • the motion information of the second video image includes the displacement and displacement of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of video image direction;
  • the determining at least one frame of the first target area of the first video image and the motion information of at least one frame of the second video image includes:
  • the determining, based on the motion information of the second video image, that the multiple video image blocks included in the motion information are in the previous frame of the second video image includes:
  • each pixel point is mapped from the second video image to the previous frame of video image, and an area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • the method further includes:
  • the motion information of the second video image does not include the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the previous frame Within the first target area or the second target area of the video image;
  • the video image block is determined as a target video image block.
  • the acquiring motion information of at least one frame of the second video image includes:
  • Re-encoding the to-be-processed video to obtain re-encoded data of the to-be-processed video, and obtain at least one frame of motion information of the second video image from the re-encoded data.
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to execute:
  • the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video, and one frame of the second video image Associated with a frame of the first video image.
  • the processor is configured to execute:
  • At least one frame of video image is selected from the video images included in the video to be processed as at least one frame of the first video image.
  • the processor is configured to execute:
  • Image tracking is performed on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • the processor is configured to execute:
  • the motion information of the second video image includes the displacement and displacement of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of video image direction;
  • the processor is configured to execute:
  • the processor is configured to execute:
  • each pixel point is mapped from the second video image to the previous frame of video image, and an area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • the processor is further configured to execute:
  • the motion information of the second video image does not include the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the previous frame Within the first target area or the second target area of the video image;
  • the video image block is determined as a target video image block.
  • the processor is configured to execute:
  • Re-encoding the to-be-processed video to obtain re-encoded data of the to-be-processed video, and obtain at least one frame of motion information of the second video image from the re-encoded data.
  • a non-transitory computer-readable storage medium When instructions in the storage medium are executed by a processor of a mobile terminal, the processor of the mobile terminal can perform the following operations:
  • the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video, and one frame of the second video image Associated with a frame of the first video image.
  • Fig. 1 is a flowchart of steps of an image processing method provided by an embodiment of the present disclosure
  • Fig. 2 is a flowchart of another image processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a flowchart of steps of yet another image processing method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a detection provided by an embodiment of the present disclosure.
  • FIG. 5 is a block diagram of an image processing device provided by an embodiment of the present disclosure.
  • Fig. 6 is a block diagram showing a device for image processing according to an exemplary embodiment
  • Fig. 7 is a block diagram showing a device for image processing according to an exemplary embodiment.
  • Fig. 1 is a step flow chart of an image processing method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method is applied to a server, and the method may include the following steps:
  • Step 101 The server extracts at least one frame of reference video image in the video to be processed.
  • the above step 101 is a possible implementation manner for the server to obtain at least one frame of the first video image in the to-be-processed video, wherein the number of the first video image is less than the number of the video images contained in the to-be-processed video .
  • the first video image refers to a video image determined by selecting equidistant or non-equal distance from the to-be-processed video. Because in the subsequent step 102, the server needs to perform regionalization on the first video image. Recognition to determine the first target area of the first video image, and then use the first target area of the first video image as a reference to determine the second target area of the second video image, so the first video image can also be called "Reference video image".
  • the video to be processed is a video whose target area needs to be determined.
  • the target area is a salient area
  • the salient area of the video image in video A needs to be image-enhanced
  • video A can be used as the to-be-processed video.
  • the reference video image may be a partial video image selected from the video to be processed, and the number of the reference video image is smaller than the number of video images contained in the video to be processed.
  • Step 102 The server performs area recognition on at least one frame of the reference video image based on the comparison between any pixel in at least one frame of the reference video image and its surrounding background, to determine the first target in each frame of the reference video image area.
  • step 102 is a possible implementation manner in which the server performs area recognition on at least one frame of the first video image and determines the first target area of at least one frame of the first video image.
  • the server may be based on an area detection algorithm to realize area recognition based on the comparison between any pixel in the reference video image and its surrounding background.
  • the area detection algorithm may be a salient area detection algorithm
  • the first target area may be a salient area of the first video image.
  • the server can use each frame of the reference video image as the input of the salient area detection algorithm.
  • the salient area detection algorithm can determine the salient value of each pixel in the reference video image, and then output a saliency map.
  • the saliency value can be determined based on the comparison of the color, brightness, and orientation of the pixel with the surrounding background, or it can be determined based on the comparison of the distance between the pixel and the pixel in the surrounding background. The way the value is determined is not limited.
  • the server may perform Gaussian blurring on the reference video image multiple times and down-sample to generate multiple sets of images at different scales. For the images at each scale, extract the The color feature, brightness feature, and orientation feature of the image are used to obtain feature maps at each scale. Then, each feature map can be normalized, and each feature map can be convolved with a two-dimensional Gaussian difference function. And superimpose the convolution result back to the original feature map, and finally superimpose all the feature maps to obtain a saliency map, where the saliency map can be a grayscale image.
  • an area composed of pixels with a saliency value greater than a preset threshold may be drawn from the reference video image, and the area may be marked as a salient area.
  • Step 103 For each frame of the reference video image, the server determines a second target area in other video images associated with the reference video image in the to-be-processed video according to the first target area in the reference video image .
  • the server determines the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video according to the first target area of at least one frame of the first video image.
  • a frame of the second video image is associated with a frame of the first video image.
  • the second video image is a video image other than the first video image in the video to be processed
  • the second video image can also be vividly called “other video images” or “non-reference video images”.
  • each frame of the first video image can be associated with one or more frames of the second video image.
  • the first target area refers to the salient area in the first video image
  • the second target area refers to the salient area in the second video image
  • the salient area refers to the salient area in a frame of video image that is more likely to cause people The area of concern.
  • each reference video image may be associated with other video images.
  • other video images associated with the reference video image may be non-reference video images between the reference video image and another reference video image, and correspondingly
  • all reference video images and all other video images constitute the to-be-processed video image.
  • the differences between the frames of video images contained in the video are often caused by the relative changes in pixels. For example, there may be some pixels between two adjacent frames of video images. Move, and then form two different video images. Therefore, in the embodiments of the present disclosure, after the first target area in the first video image is determined, it can be based on the first target area in the first video image and the associated second target area of each pixel in the first video image.
  • the relative change information between each pixel in the video image determines the second target area in the second video image, and then omits the area recognition operation of the second video image based on the salient area detection algorithm, thereby saving calculations to a certain extent Resources and time.
  • the image processing method provided by the embodiments of the present disclosure may first extract at least one frame of reference video image in the video to be processed, where the number of reference video images is less than the number of video images contained in the video to be processed, and then According to the comparison between any pixel in the reference video image and its surrounding background, at least one frame of reference video image is identified to determine the first target area in each frame of reference video image. Finally, for each frame of reference video image, according to The first target area in the reference video image determines the second target area in other video images associated with the reference video image in the video to be processed.
  • the second target area in other video images can be based on these reference video images.
  • the first target area in the video image is determined. In this way, it is not necessary to perform area recognition on all video images based on the comparison between any pixel in the video image and its surrounding background. Therefore, the determination of the salient area in each video image can be reduced to a certain extent. Consumption of computing resources and time improves the efficiency of determination.
  • Fig. 2 is a step flow chart of another image processing method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method is applied to a server, and the method may include the following steps:
  • Step 201 The server extracts at least one frame of reference video image in the video to be processed; the number of the reference video images is less than the number of video images included in the video to be processed.
  • the above step 201 is a possible implementation manner for the server to obtain at least one frame of the first video image in the to-be-processed video, wherein the number of the first video image is less than that of the video image contained in the to-be-processed video. Quantity.
  • N is an integer greater than or equal to 1.
  • the smaller N is, the more video images need to be recognized based on the comparison between any pixel in the video image and its surrounding background, that is, the more video images that need to be recognized based on the area detection algorithm, the more computing time and resources are needed.
  • the larger N is, it needs According to the comparison between any pixel in the video image and its surrounding background, the fewer video images to be recognized, the less calculation time and resources are required, but the larger N is, the number of second video images associated with the first video image There are often more, so the accuracy of the determined second target area may be lower. Therefore, the specific value of N can be set according to actual needs, for example, N can be 5, which is not limited in the embodiment of the present disclosure. For example, assuming that the video image to be processed includes 100 frames of video images, then the first frame of video image, the sixth frame of video image, the 11th frame of video image,..., the 96th frame of video image can be used as the first video Images, a total of 20 frames of first video images are obtained.
  • the number of other video images associated with each frame of the reference video image can be made the same. In this way, it is possible to avoid too many other video images associated with certain reference video images, resulting in The second target area in other video images determined based on the first target area in the reference video image is inaccurate, so that the effect of area determination can be achieved.
  • At least one frame of video image may be selected as at least one frame of the first video image from the video images included in the video to be processed. For example, you can select one frame of video image at intervals of 2 frames of video image, then select one frame of video image at intervals of 5 frames of video image, then select one frame of video image at intervals of 4 frames of video image, etc., finally, the selected video The image is used as at least one frame of the first video image. In this implementation, it is not limited by the preset value N. Each time you select, you can select any number of frames of video images at random intervals, that is, select through non-equal spacing, which can improve the selection operation. flexibility.
  • Step 202 The server performs area recognition on at least one frame of the reference video image based on the comparison between any pixel in the at least one frame of the reference video image and its surrounding background to determine the first target in each frame of the reference video image area.
  • step 202 is a possible implementation manner for the server to perform area recognition on at least one frame of the first video image and determine the first target area of at least one frame of the first video image.
  • Step 203 For each frame of the reference video image, the server determines the previous frame of the other video image according to the image timing of each frame of the other video image associated with the reference video image based on a preset image tracking algorithm The first target area or the second target area in the video image corresponds to the area in the other video image to obtain the second target area in the other video image.
  • the server when determining the second target area, determines at least one frame of the second video image associated with the first video image based on the timing of the video image in the to-be-processed video, and the timing of the second video image Located between the first video image and the next frame of the first video image.
  • the time sequence of the video images represents the time sequence in which the video images appear in the video to be processed. For example, suppose that video image a appears at the 10th second of the video to be processed, and video image b appears at the 30th second of the video to be processed. If video image c appears in the 20th second of the video to be processed, the image timing of video image a is earlier than that of video image c, and the image timing of video image c is earlier than that of video image b.
  • the server uses all the video images between the first video image and the next frame of the first video image as at least one frame of the second video image.
  • the server randomly selects a part of the video images as at least one frame of the second video image from all the video images between the first video image and the next frame of the first video image.
  • the server after determining each second video image, performs image tracking on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • all video images between the first video image and the next first video image are determined to be at least one frame of the second video image. Then, since image tracking is performed on the first target area of the first video image, the second target area of the second video image of the first frame is obtained, and image tracking is continued for the second target area of the second video image of the first frame. Obtaining the second target area of the second frame of the second video image,..., and so on, the second target area of each second video image can be tracked.
  • other video images associated with the reference video image may be a non-reference video image between the reference video image and the next frame of reference video image.
  • the previous frame video of the other video image with the earliest image sequence The image is the reference video image. Therefore, in this step, the first target area in the reference video image can be tracked based on the preset image tracking algorithm to determine that the first target area in the reference video image is The second target area of the other video image is obtained from the corresponding area in the other video image, and then the second target area of the other video image can be tracked to determine that the image timing is only later than the other video images of the frame. The second target area in the video image.
  • the preset tracking algorithm may be an optical flow tracking algorithm, where the optical flow tracking algorithm may be based on the principle of constant brightness, that is, the brightness of the same point will not change as time changes, and the principle of spatial consistency , That is, the pixels adjacent to a pixel point are also adjacent points when projected on the next frame of image, and the speed is the same, based on the brightness characteristics and neighboring pixels of the first target area or the second target area in the previous frame of video image The speed characteristic of the pixel is predicted to correspond to the pixel in the other video image, and then the second target area in the other video image is obtained.
  • the optical flow tracking algorithm may be based on the principle of constant brightness, that is, the brightness of the same point will not change as time changes, and the principle of spatial consistency , That is, the pixels adjacent to a pixel point are also adjacent points when projected on the next frame of image, and the speed is the same, based on the brightness characteristics and neighboring pixels of the first target area or the second target area in the previous frame of video image The speed characteristic of the
  • the previous frame of video image needs to be used as the input of the preset tracking algorithm to determine the target area in other video images, which can improve the determination of the target area in other video images to a certain extent. effectiveness.
  • the previous frame of video image is the first video image
  • the first target area of the first video image needs to be tracked.
  • the previous frame of video image is the second video image with an earlier timing, then it needs to The second target area of the second video image is tracked.
  • the method of determining sequentially according to the image sequence can make the difference of the images to be tracked each time smaller, and to a certain extent, it can be based on The tracking algorithm can accurately track the corresponding area and improve the determination effect.
  • the image processing method may first extract at least one frame of reference video image in the video to be processed, where the number of reference video images is less than the number of video images contained in the video to be processed, and then According to the comparison between any pixel in the reference video image and its surrounding background, perform area recognition on at least one frame of reference video image to determine the first target area in each frame of reference video image, and finally, for each frame of reference video image associated For other video images, according to the image timing of each frame of other video images, based on the preset image tracking algorithm, determine the corresponding first target area or second target area in the previous video image of other video images in other video images Area to obtain the second target area in other video images.
  • the second target area in other video images can be based on these reference video images.
  • the determination of the salient area in each video image can be reduced to a certain extent. Consumption of computing resources and time improves the efficiency of determination.
  • Fig. 3 is a step flowchart of another image processing method provided by an embodiment of the present disclosure. As shown in Fig. 301, the method is applied to a server, and the method may include the following steps:
  • Step 301 The server extracts at least one frame of reference video image in the video to be processed; the number of reference video images is less than the number of video images included in the video to be processed.
  • step 301 is a possible implementation manner for the server to obtain at least one frame of the first video image in the to-be-processed video, wherein the number of the first video image is less than the number of the video images contained in the to-be-processed video. Quantity.
  • this step may refer to the foregoing step 201, which is not limited in the embodiment of the present disclosure.
  • Step 302 The server performs area recognition on at least one frame of the reference video image based on the comparison between any pixel in at least one frame of the reference video image and its surrounding background to determine the first target in each frame of the reference video image area.
  • step 302 is a possible implementation manner for the server to perform area recognition on at least one frame of the first video image and determine the first target area of at least one frame of the first video image.
  • step 202 for the implementation of this step, reference may be made to the above step 202, which is not described in detail in the embodiment of the present disclosure.
  • Step 303 For each frame of the reference video image, the server obtains motion information of other video images associated with the reference video image from the encoded data of the video to be processed.
  • the encoded data in the first encoding process, refers to the first encoded data, and in the re-encoding process, the encoded data refers to re-encoded data.
  • step 303 is a possible implementation manner for the server to obtain the motion information of at least one frame of the second video image, where one frame of the second video image is associated with one frame of the first video image.
  • the motion information of the second video image includes the displacement amount and displacement direction of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of video image.
  • each key frame image contained in the video to be processed is usually extracted.
  • the multiple non-key frame images adjacent to the key frame image will be obtained The position and displacement direction of each pixel in the key frame image relative to the corresponding pixel point in the key frame image, and then the motion information is obtained.
  • the motion information of the key frame image and the non-key frame image is used as the coded data. Therefore, in the embodiments of the present disclosure, the motion information of other video images can be obtained from the encoded data of the video to be processed, so as to facilitate identification based on the information in the subsequent process.
  • the encoded data corresponding to the to-be-processed video may also be acquired first.
  • the to-be-processed video is often encoded once, that is, the to-be-processed video is a video that has been encoded for the first time. Therefore, in this step, the motion information of at least one frame of the second video image can be obtained from the first encoded data of the video to be processed.
  • the video platform may have a custom video coding standard, and accordingly, it may re-encode the received video to be processed according to the custom video coding standard. Therefore, in this In the step, a re-encoding operation may be performed on the video to be processed to obtain re-encoded data of the to-be-processed video, and then at least one frame of motion information of the second video image may be obtained from the re-encoded data.
  • the re-encoding operation may be based on the last encoded data of the to-be-processed video, and re-encode based on the content in the last-encoded data, because the data volume of the content of the last-encoded data is smaller than The data volume of the video itself, therefore, the re-encoding operation based on the last encoded data can reduce the occupation of processing resources to a certain extent, thereby avoiding the problem of jams.
  • Step 304 The server determines the second target area in each frame of the other video images according to the first target area in the reference video image and the motion information corresponding to each frame of other video images associated with the reference video image.
  • the above step 304 is that the server determines the second target of at least one frame of the second video image according to the first target area of at least one frame of the first video image and the motion information of at least one frame of the second video image.
  • the server determines the second target of at least one frame of the second video image according to the first target area of at least one frame of the first video image and the motion information of at least one frame of the second video image.
  • the first target area in the reference video image and the motion information corresponding to other video images can be combined to determine the motion information in the other video images.
  • the second target area In this way, it is only necessary to determine the first target area in part of the reference video image in the video to be processed based on the comparison between any pixel in the reference video image and its surrounding background, and subsequently combine the motion information corresponding to other video images to determine other
  • the second target area in the video image here the first target area and the second target area are collectively referred to as the "saliency area", so the efficiency of determining the salient area in all the video images in the video to be processed can be improved to a certain extent.
  • this step can be implemented through the following sub-steps (1) to (4):
  • Sub-step (1) The server divides the other video image into multiple video image blocks for each frame of the other video image according to the image timing of each frame of the other video image associated with the reference video image.
  • the other video image can be divided into a plurality of video image blocks of a preset size according to a preset size, wherein the specific value of the preset size can be set based on actual needs, and the smaller the preset size,
  • the more video image blocks correspondingly, the more accurate the second target area determined based on the video image block, but the more processing resources are consumed.
  • the larger the preset size the fewer the video image blocks, and accordingly, the lower the accuracy of the second target area determined based on the video image block, but the less processing resources are consumed.
  • Sub-step (2) For each video image block, the server determines the video image based on the motion information corresponding to the video image block if the motion information contains the motion information corresponding to the video image block The block corresponds to the area in the previous frame of the video image of the video image block.
  • the adjacent video image includes at least a reference video image
  • the motion information corresponding to the video image block includes the displacement of each pixel in the video image block relative to the corresponding pixel in the previous frame of video image, and The direction of displacement.
  • the corresponding motion information determines the corresponding area of the video image block in the previous frame of video image.
  • other video images associated with the reference video image may be video images between the reference video image and the next frame of reference video image, that is, the image timings of other video images associated with the reference video image are all later than the reference video image The timing of the image.
  • the motion information corresponding to the video image block includes the displacement and the displacement direction of each pixel in the video image block relative to the corresponding pixel in the previous frame of video image, it is necessary to determine that the video image block is in
  • the displacement amount and direction of each pixel in the video image block relative to the corresponding pixel point in the previous frame of video image can be used to determine the displacement of each pixel in the video image block.
  • the position coordinates of the pixel points move the displacement amount in the opposite direction of the displacement direction of each pixel point, and get the position coordinates of each pixel point after the movement. Then, each pixel point is corresponding to the previous frame of the video image.
  • the area composed of the position coordinates of the moved pixels is determined as the corresponding area.
  • the displacement amount may be a coordinate value, and the positive or negative of the coordinate value may indicate different displacement directions.
  • the position coordinates of each pixel in the video image block are moved (equivalent to a position coordinate mapping), which can realize the video image
  • the block is mapped to the previous frame of video image, and then the area corresponding to the video image block is obtained.
  • Sub-step (3) If the corresponding area is located in the first target area or the second target area of the previous frame of video image, the server determines the video image block as the target of the other video image Regional components.
  • the video image block can be determined as the target area component of the other video image.
  • FIG. 4 is a schematic diagram of detection provided by an embodiment of the present disclosure.
  • A represents the previous frame of video image in which a significant area has been determined
  • B represents other video images, where area a represents the previous frame.
  • the salient area refers to The second target area
  • area b represents a video image block in another video image
  • area c represents another video image block in another video image
  • area d is the corresponding area of area b in the previous frame of video image
  • area e It is the area corresponding to the c area in the previous frame of video image. It can be seen that the d area is located in the salient area of the previous frame of video image, and the e area is not located in the salient area of the previous frame of video image. Therefore, the video image block represented by the b area can be determined as a component of the target area.
  • the second target area in the other video image can be determined.
  • the motion information does not include the motion information corresponding to the video image block, it can be determined whether the adjacent image blocks of the video image block are components of the target area of other video images. If so, the video image block can be determined as a component part of the target area of the other video image.
  • the adjacent image block of the video image block may be an image block adjacent to the video image block, and the adjacent image block may be any adjacent image block. If the adjacent image block of the video image block is a component part of the target area of the other video image, it can be considered that the video image block is also a component part of the target area with a high probability. Therefore, the determination can be directly based on the adjacent image block. In this way, for a video image block with missing motion information, it can also be quickly determined whether the video image block is a component of the target area, thereby ensuring the efficiency of target area detection.
  • Sub-step (4) the server determines the area composed of all the components as the second target area of the other video images.
  • the area composed of the three video image blocks is the second target area of the other video image.
  • the other associated video images are: image Y and image Z, where the image timing of image X is the earliest, the image timing of image Y is the second, and the image timing of image Z is the latest, Then, based on the motion information of the image Y, determine the corresponding area of each video image block in the image Y in the image X, and locate the corresponding area in the salient area (the first target area or the second target area) of the image X. The area composed of the image blocks is determined as the salient area in the image Y, and then the second target area in the image Y is obtained.
  • the area corresponding to each video image block in image Z in image Y can be determined, and the area composed of video image blocks in the second target area of image Y in the corresponding area can be determined as a salient area in image Z, Then the second target area in the image Z is obtained.
  • the above step 304 can also be implemented through the following sub-steps 3041-3043:
  • Step 3041 the server obtains the displacement direction and displacement amount of each pixel in each video image block from the motion information of the second video image.
  • the motion information of the second video image is actually stored in the motion information of multiple video image blocks in the second video image
  • the motion information of each video image block that has been stored in the motion of the second video image is read, The displacement direction and displacement amount of each pixel in each video image block can be obtained.
  • Step 3042 based on the displacement direction and the displacement amount, the server maps each pixel from the second video image to the previous frame of the second video image, and determines the area formed by the mapped pixels It is a mapping area.
  • the displacement direction and displacement amount of the pixel recorded in its motion information refer to how to map from the previous frame of video image to the current second video image. Therefore, it is only necessary to inversely map it to find the corresponding pixel position of each pixel in the video image block in the previous frame of video image, that is, to map each pixel in the video image block to the previous frame.
  • the area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • the server performs the above steps 3041-3042 for each video image block recorded in the motion information, which is equivalent to that the server determines that the multiple video image blocks contained in the motion information are in the second video based on the motion information of the second video image.
  • the corresponding mapping area in the previous frame of the image is equivalent to that the server determines that the multiple video image blocks contained in the motion information are in the second video based on the motion information of the second video image.
  • Step 3043 The server obtains the target video image block whose mapping area is located in the first target area or the second target area of the previous frame of video image, and determines the area composed of the target video image block as the second video image block. 2. Target area.
  • the server first maps each pixel in each video image block recorded in the motion information to obtain the mapping area of each video image block in the previous frame of the video image, and then obtains that the mapping area is located in the previous frame
  • the target video image block in the salient area of the video image is equivalent to filtering out the target video image block from each video image block according to whether the mapping area is located in the salient area.
  • the previous frame of video image is the first Video image
  • its salient area refers to the first target area
  • the previous frame of video image is the second video image
  • its salient area refers to the second target area, that is, according to the type of the previous frame of video image Different, the types of their salient areas are also different.
  • the motion information since the motion information only records the motion information of the video image blocks that move the pixel positions in the adjacent video images, and if some video image blocks do not move, then the motion information of these video image blocks The motion information will not be recorded in the motion information of the second video image, but these unmoved video image blocks may still be in the second target area of the current second video image.
  • the adjacent video image blocks of the video image block in the video image block are judged whether they are target video image blocks, so as to determine whether the video image blocks that have not moved are target video image blocks.
  • the server may also perform the following operations: divide the second video image into multiple video image blocks; For any video image block, if the motion information of the second video image does not contain the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the first video image of the previous frame In the target area or the second target area; if the mapping area of the adjacent image block is located in the first target area or the second target area of the previous frame of video image, the video image block is determined as a target video image block.
  • the video image can be determined by judging whether the mapping area of its neighboring image block is located in the salient area of the previous frame of video image Whether a block is a target video image block, and determining whether the mapping area of its neighboring image block is located in the salient area of the previous frame of video image is similar to the above steps 3041-3043, and will not be repeated here.
  • the image processing method may first extract at least one frame of reference video image in the video to be processed, where the number of reference video images is less than the number of video images contained in the video to be processed, and then According to the comparison between any pixel in the reference video image and its surrounding background, perform area recognition on at least one frame of reference video image to determine the first target area in each frame of reference video image, and then, for each frame of reference video image, from In the encoded data corresponding to the video to be processed, the motion information corresponding to the other video images associated with the reference video image is obtained, and finally, according to the first target area in the reference video image and each frame of other video images associated with the reference video image The corresponding motion information determines the second target area in each frame of other video images.
  • the salient regions in all video images in the video to be processed can be determined. Therefore, to a certain extent, the computing resources and time consumed for determining the target area in each video image can be reduced, and the determination efficiency can be improved.
  • FIG. 5 is a block diagram of an image processing device provided by an embodiment of the present disclosure. As shown in FIG. 5, the device 40 may include:
  • the extraction module 401 is configured to extract at least one frame of reference video image in the to-be-processed video; the number of the reference video images is smaller than the number of video images included in the to-be-processed video.
  • the reference video image is also referred to as the first video image.
  • the extraction module 401 is configured to obtain at least one frame of first video images in the to-be-processed video, and the number of the first video images is less than the number of video images included in the to-be-processed video.
  • the recognition module 402 is configured to perform area recognition on at least one frame of the reference video image based on the comparison between any pixel in at least one frame of the reference video image and its surrounding background to determine the The first target area.
  • the recognition module 402 is configured to perform area recognition on at least one frame of the first video image, and determine the first target area of at least one frame of the first video image.
  • the determining module 403 is configured to, for each frame of the reference video image, determine, according to the first target area in the reference video image, the first of the other video images associated with the reference video image in the to-be-processed video 2. Target area.
  • the determining module 403 is configured to determine at least one frame of second video image other than the first video image in the to-be-processed video according to the first target area of at least one frame of the first video image In the second target area, a frame of second video image is associated with a frame of first video image.
  • the image processing device may first extract at least one frame of reference video image in the video to be processed, where the number of reference video images is less than the number of video images contained in the video to be processed, and then According to the comparison between any pixel in the reference video image and its surrounding background, at least one frame of reference video image is identified to determine the first target area in each frame of reference video image. Finally, for each frame of reference video image, according to The first target area in the reference video image determines the second target area in other video images associated with the reference video image in the video to be processed.
  • the second target area in other video images can be based on these reference video images.
  • the first target area in the video image is determined. In this way, it is not necessary to perform area recognition on all video images based on the comparison between any pixel in the video image and its surrounding background. Therefore, the determination of the salient area in each video image can be reduced to a certain extent. Consumption of computing resources and time improves the efficiency of determination.
  • the extraction module 401 is configured to:
  • one frame of the first video image is selected every N frames of video image to obtain at least one frame of the first video image, where N is an integer greater than or equal to 1.
  • At least one frame of video image is selected as at least one frame of the first video image from the video images included in the video to be processed.
  • the determining module 403 is configured to:
  • each frame of the other video image associated with the reference video image for each frame of the other video image, based on a preset image tracking algorithm, determine the video image in the previous frame of the other video image The area corresponding to the first target area or the second target area in the other video image obtains the second target area in the other video image.
  • the video image of the previous frame of the other video image with the earliest image sequence is the reference video image.
  • the determination module 403 is configured as:
  • Image tracking is performed on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • the determining module 403 is configured to:
  • the second target area in each frame of the other video images is determined.
  • the determination module 403 is configured as:
  • the motion information of the second video image includes the displacement and displacement of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of video image direction;
  • the determining module 403 is further configured to:
  • the motion information contains the motion information corresponding to the video image block, it is determined that the video image block is in the reference video image based on the motion information corresponding to the video image block.
  • the area composed of all the components is determined as the second target area of the other video image.
  • the motion information includes the displacement amount and displacement direction of each pixel in the video image block relative to the corresponding pixel in the previous frame of video image.
  • the determining module 403 is also configured as:
  • the determining module 403 is further configured to:
  • the motion information does not include the motion information corresponding to the video image block, determining whether adjacent image blocks of the video image block are components of the target area of the other video image;
  • the video image block is determined as a component part of the target area of the other video image.
  • the determining module 403 is also configured as:
  • the motion information of the second video image does not include the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the previous frame Within the first target area or the second target area of the video image;
  • the video image block is determined as a target video image block.
  • the determining module 403 is further configured to: use the encoded data of the encoded video to be processed as the encoding corresponding to the video to be processed Data; or, re-encoding the to-be-processed video to obtain re-encoded data of the to-be-processed video as the encoded data corresponding to the to-be-processed video.
  • the other video images associated with the reference video image are video images between the reference video image and the next frame of reference video image.
  • the extraction module 401 is also configured to:
  • Re-encoding the to-be-processed video to obtain re-encoded data of the to-be-processed video, and obtain at least one frame of motion information of the second video image from the re-encoded data.
  • the determining module 403 is further configured to:
  • each pixel in the video image block For each pixel in the video image block, move each pixel in the video image block by the displacement amount in a direction opposite to the displacement direction of each pixel in the video image block.
  • the area formed by the corresponding pixel points in the previous frame of the video image of each of the pixels after the movement is determined as the corresponding area.
  • the determining module 403 is also configured to:
  • each pixel point is mapped from the second video image to the previous frame of video image, and an area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • an electronic device including: a processor, and a memory for storing executable instructions of the processor, wherein the processor is configured to execute the image as in any of the above embodiments.
  • the steps in the processing method, the image processing method includes:
  • At least one frame of the first target area of the first video image determine the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video, one frame of the second video image and one frame The first video image is associated.
  • the processor is configured to execute:
  • At least one frame of video image is selected as at least one frame of the first video image from the video images included in the video to be processed.
  • the processor is configured to execute:
  • At least one frame of second video image associated with the first video image is determined, and the time sequence of the second video image is located between the first video image and the next frame of the first video image. between;
  • Image tracking is performed on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • the processor is configured to execute:
  • Acquiring motion information of at least one frame of the second video image where the motion information of the second video image includes the displacement amount and displacement direction of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of the video image;
  • the processor is configured to execute:
  • the target video image block whose mapping area is located in the first target area or the second target area of the previous frame of video image is acquired, and the area composed of the target video image blocks is determined as the second target area of the second video image.
  • the processor is configured to execute:
  • each pixel point is mapped from the second video image to the previous frame of video image, and the area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • the processor is further configured to execute:
  • the motion information of the second video image does not include the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the first frame of the video image.
  • the mapping area of the adjacent image block of the video image block is located in the first frame of the video image.
  • the video image block is determined as a target video image block.
  • the processor is configured to execute:
  • the to-be-processed video is re-encoded to obtain re-encoded data of the to-be-processed video, and at least one frame of motion information of the second video image is obtained from the re-encoded data.
  • the mobile terminal can execute Steps in an image processing method, the image processing method includes:
  • the first target area of the at least one frame of the first video image determine the second target area of at least one frame of the second video image other than the first video image in the to-be-processed video, one frame of the second video image and one frame The first video image is associated.
  • the processor of the mobile terminal performs the following operations:
  • At least one frame of video image is selected as at least one frame of the first video image from the video images included in the video to be processed.
  • the processor of the mobile terminal performs the following operations:
  • At least one frame of second video image associated with the first video image is determined, and the time sequence of the second video image is located between the first video image and the next frame of the first video image. between;
  • Image tracking is performed on the first target area of the first video image to obtain at least one frame of the second target area of the second video image.
  • the processor of the mobile terminal performs the following operations:
  • Acquiring motion information of at least one frame of the second video image where the motion information of the second video image includes the displacement amount and displacement direction of each pixel in the multiple video image blocks relative to the corresponding pixel in the previous frame of the video image;
  • the processor of the mobile terminal performs the following operations:
  • the target video image block whose mapping area is located in the first target area or the second target area of the previous frame of video image is acquired, and the area composed of the target video image blocks is determined as the second target area of the second video image.
  • the processor of the mobile terminal performs the following operations:
  • each pixel point is mapped from the second video image to the previous frame of video image, and the area formed by each pixel point obtained by the mapping is determined as a mapping area.
  • the processor of the mobile terminal performs the following operations:
  • the motion information of the second video image does not include the motion information of the video image block, determine whether the mapping area of the adjacent image block of the video image block is located in the first frame of the video image.
  • the mapping area of the adjacent image block of the video image block is located in the first frame of the video image.
  • the video image block is determined as a target video image block.
  • the processor of the mobile terminal performs the following operations:
  • Re-encoding the to-be-processed video to obtain re-encoded data of the to-be-processed video, and obtain at least one frame of motion information of the second video image from the re-encoded data.
  • an application program is also provided.
  • the mobile terminal can execute the steps in the image processing method in any of the above-mentioned embodiments.
  • Fig. 6 is a block diagram showing a device for image processing according to an exemplary embodiment.
  • the device 500 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
  • the device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, And communication component 516.
  • the processing component 502 generally controls the overall operations of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 502 may include one or more processors 520 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 502 may include one or more modules to facilitate the interaction between the processing component 502 and other components.
  • the processing component 502 may include a multimedia module to facilitate the interaction between the multimedia component 508 and the processing component 502.
  • the memory 504 is configured to store various types of data to support operations in the device 500. Examples of these data include instructions for any application or method operating on the device 500, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 504 can be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power supply component 506 provides power to various components of the device 500.
  • the power supply component 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 500.
  • the multimedia component 508 includes a screen that provides an output interface between the device 500 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor can not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 508 includes a front camera and/or a rear camera. When the device 500 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 510 is configured to output and/or input audio signals.
  • the audio component 510 includes a microphone (MIC), and when the device 500 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 504 or transmitted via the communication component 516.
  • the audio component 510 further includes a speaker for outputting audio signals.
  • the I/O interface 512 provides an interface between the processing component 502 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 514 includes one or more sensors for providing the device 500 with various aspects of status assessment.
  • the sensor component 514 can detect the on/off status of the device 500 and the relative positioning of components.
  • the component is the display and keypad of the device 500.
  • the sensor component 514 can also detect the position change of the device 500 or a component of the device 500 , The presence or absence of contact between the user and the device 500, the orientation or acceleration/deceleration of the device 500, and the temperature change of the device 500.
  • the sensor component 514 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 516 is configured to facilitate wired or wireless communication between the apparatus 500 and other devices.
  • the device 500 can access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof.
  • the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 516 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the apparatus 500 may be implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • non-transitory computer-readable storage medium including instructions, such as the memory 504 including instructions, and the foregoing instructions may be executed by the processor 520 of the device 500 to complete the foregoing method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • Fig. 7 is a block diagram showing a device for image processing according to an exemplary embodiment.
  • the device 600 may be provided as a server.
  • the apparatus 600 includes a processing component 622, which further includes one or more processors, and a memory resource represented by a memory 632, for storing instructions that can be executed by the processing component 622, such as application programs.
  • the application program stored in the memory 632 may include one or more modules each corresponding to a set of instructions.
  • the processing component 622 is configured to execute instructions to execute the above-mentioned image processing method.
  • the device 600 may also include a power component 626 configured to perform power management of the device 600, a wired or wireless network interface 650 configured to connect the device 600 to a network, and an input output (I/O) interface 658.
  • the device 600 can operate based on an operating system stored in the memory 632, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

Abstract

本公开提供了一种图像处理方法、电子设备及可读存储介质,属于视频处理技术领域。本公开实施例中,可以提取待处理视频中的至少一帧基准视频图像,其中,基准视频图像的数量小于待处理视频中包含的视频图像的数量,根据基准视频图像中任意像素点与其周边背景的对比,对至少一帧所述基准视频图像进行区域识别,以确定每帧基准视频图像中的第一目标区域,对于每帧基准视频图像,根据基准视频图像中的第一目标区域,确定待处理视频中与基准视频图像关联的其他视频图像中的第二目标区域。这样,一定程度上可以降低确定各个视频图像中目标区域所耗费的计算资源及时间,提高确定效率。

Description

图像处理方法、电子设备及可读存储介质
本申请要求于2019年09月29日提交的申请号为201910936022.1、发明名称为“图像处理方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开属于视频处理技术领域,特别是涉及一种图像处理方法、电子设备及可读存储介质。
背景技术
为了提高视频的观看效果,经常需要对视频图像中的显著区域进行特定处理,例如,对显著区域进行超分辨率处理、图像增强处理等等,这里的显著区域是指视频图像中更容易被人们关注到的区域。
相关技术中,在确定视频图像中的显著区域时,往往是通过显著区域检测算法,逐一对每一帧视频图像进行视觉显著性检测,以确定出每一帧视频图像中的显著区域。
发明内容
本公开提供一种图像处理方法、电子设备及可读存储介质。
依据本公开的第一方面,提供了一种图像处理方法,该方法包括:
获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域;
根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
在上述第一方面的一种可能实施方式中,所述获取待处理视频中的至少一帧第一视频图像,包括:
从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数;或者,
从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
在上述第一方面的一种可能实施方式中,所述根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,包括:
基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
在上述第一方面的一种可能实施方式中,所述根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,包括:
获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
在上述第一方面的一种可能实施方式中,所述根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域,包括:
基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域;
获取映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为所述第二视频图像的第二目标区域。
在上述第一方面的一种可能实施方式中,所述基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前 一帧视频图像中对应的映射区域,包括:
从所述第二视频图像的运动信息中,获取每个所述视频图像块中的每个像素点的位移方向和位移量;
基于所述位移方向和位移量,将每个像素点从所述第二视频图像映射到所述前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
在上述第一方面的一种可能实施方式中,所述方法还包括:
将所述第二视频图像划分为多个视频图像块;
对任一个视频图像块,若所述第二视频图像的运动信息中不包含所述视频图像块的运动信息,确定所述视频图像块的相邻图像块的映射区域是否位于所述前一帧视频图像的第一目标区域或第二目标区域内;
若相邻图像块的映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,将所述视频图像块确定为一个目标视频图像块。
在上述第一方面的一种可能实施方式中,所述获取至少一帧所述第二视频图像的运动信息,包括:
从所述待处理视频的首次编码数据中获取至少一帧所述第二视频图像的运动信息;或者,
对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据,从所述重新编码数据中获取至少一帧所述第二视频图像的运动信息。
依据本公开的第二方面,提供了一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行:
获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域;
根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数;或者,
从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域;
获取映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为所述第二视频图像的第二目标区域。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
从所述第二视频图像的运动信息中,获取每个所述视频图像块中的每个像素点的位移方向和位移量;
基于所述位移方向和位移量,将每个像素点从所述第二视频图像映射到所述前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
在上述第二方面的一种可能实施方式中,所述处理器还被配置为执行:
将所述第二视频图像划分为多个视频图像块;
对任一个视频图像块,若所述第二视频图像的运动信息中不包含所述视频图像块的运动信息,确定所述视频图像块的相邻图像块的映射区域是否位于所述前一帧视频图像的第一目标区域或第二目标区域内;
若相邻图像块的映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,将所述视频图像块确定为一个目标视频图像块。
在上述第二方面的一种可能实施方式中,所述处理器被配置为执行:
从所述待处理视频的首次编码数据中获取至少一帧所述第二视频图像的运动信息;或者,
对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据,从所述重新编码数据中获取至少一帧所述第二视频图像的运动信息。
依据本公开的第三方面,提供了一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得移动终端的处理器能够执行如下操作:
获取待处理视频中的至少一帧第一视频图像,所述至少一帧第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
对所述至少一帧第一视频图像进行区域识别,确定所述至少一帧第一视频图像的第一目标区域;
根据所述至少一帧第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1是本公开实施例提供的一种图像处理方法的步骤流程图;
图2是本公开实施例提供的另一种图像处理方法的步骤流程图;
图3是本公开实施例提供的又一种图像处理方法的步骤流程图;
图4是本公开实施例提供的一种检测示意图;
图5是本公开实施例提供的一种图像处理装置的框图;
图6是根据一示例性实施例示出的一种用于图像处理的装置的框图;
图7是根据一示例性实施例示出的一种用于图像处理的装置的框图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图1是本公开实施例提供的一种图像处理方法的步骤流程图,如图1所示,该方法应用于服务器,该方法可以包括下述步骤:
步骤101、服务器提取待处理视频中的至少一帧基准视频图像。
可选地,上述步骤101为服务器获取待处理视频中的至少一帧第一视频图像的一种可能实现方式,其中,该第一视频图像的数量小于该待处理视频中包含的视频图像的数量。
在本公开实施例中,第一视频图像是指从待处理视频中通过等距或非等距的选取方式确定出的视频图像,由于在后续步骤102中,服务器需要对第一视频图像进行区域识别,以确定出第一视频图像的第一目标区域,接着以第一视频图像的第一目标区域为基准,来确定第二视频图像的第二目标区域,因此第一视频图像也可以称为“基准视频图像”。
本公开实施例中,待处理视频是需要确定目标区域的视频,示例的,假设该目标区域为显著区域,需要对视频A中视频图像的显著区域进行图像增强处理,那么可以将视频A作为待处理视频。进一步地,该基准视频图像可以是从待处理视频选出的部分视频图像,该基准视频图像的数量小于该待处理视频中包含的视频图像的数量。
步骤102、服务器根据至少一帧所述基准视频图像中任意像素点与其周边背景的对比,对至少一帧所述基准视频图像进行区域识别,以确定每帧所述基准视频图像中的第一目标区域。
也即是说,上述步骤102为服务器对至少一帧该第一视频图像进行区域识 别,确定至少一帧该第一视频图像的第一目标区域的一种可能实现方式。
本公开实施例中,服务器可以是基于区域检测算法,实现根据基准视频图像中任意像素点与其周边背景的对比以进行区域识别。示例的,该区域检测算法可以为显著区域检测算法,该第一目标区域可以为第一视频图像的显著区域。例如,服务器可以将每帧基准视频图像分别作为该显著区域检测算法的输入,通过显著区域检测算法可以确定该基准视频图像中每个像素点的显著值,进而输出一张显著图,其中,该显著值可以是基于像素点在颜色、亮度、方位与周边背景的对比确定出的,也可以是基于像素点与其周边背景中的像素点在距离上的对比确定出的,本公开实施例对显著值的确定方式不作限定。
在一些实施例中,在生成显著图时,服务器可以对基准视频图像进行多次高斯模糊,并且向下取样,以产生不同尺度下的多组图像,对于每一尺度下的图像,提取该图像的颜色特征、亮度特征及方位特征,得到每一尺度下的特征图,接着,可以对每个特征图进行归一化,并对每个特征图分别用二维高斯差函数进行卷积,并把卷积结果叠加回原特征图,最后将所有特征图叠加得到显著图,其中,该显著图可以为一个灰度图。在得到显著图之后,可以基于显著图中每个像素点的显著值,从该基准视频图像划出显著值大于预设阈值的像素点组成的区域,将该区域标注为显著区域。
步骤103、对于每帧所述基准视频图像,服务器根据所述基准视频图像中的第一目标区域,确定所述待处理视频中与所述基准视频图像关联的其他视频图像中的第二目标区域。
也即是说,上述步骤103为服务器根据至少一帧该第一视频图像的第一目标区域,确定该待处理视频中该第一视频图像以外的至少一帧第二视频图像的第二目标区域的一种可能实现方式,其中,一帧第二视频图像与一帧第一视频图像相关联。
由于第二视频图像是在待处理视频中除了第一视频图像之外的视频图像,因此第二视频图像也可以形象地称之为“其他视频图像”或者“非基准视频图像”。可选地,每帧第一视频图像能够与一帧或多帧第二视频图像相关联。
需要说明的是,第一目标区域是指第一视频图像中的显著区域,第二目标区域是指第二视频图像中的显著区域,而显著区域则是指一帧视频图像中更容易引起人们关注的区域。
本公开实施例中,每个基准视频图像可以关联有其他视频图像,示例的, 基准视频图像关联的其他视频图像可以是该基准视频图像与另一个基准视频图像之间的非基准视频图像,相应地,所有的基准视频图像与所有的其他视频图像组成该待处理视频图像。进一步地,实际应用场景中,视频中包含的各帧视频图像之间的差异,往往是由像素点发生的相对变化引起的,例如,相邻两帧视频图像之间可能存在部分像素点发生了移动,进而形成了两帧不同的视频图像。因此,本公开实施例中,在确定了第一视频图像中的第一目标区域之后,可以基于这些第一视频图像中的第一目标区域以及第一视频图像中各个像素点与关联的第二视频图像中各个像素点之间的相对变化信息,确定出第二视频图像中的第二目标区域,进而省略基于显著区域检测算法对第二视频图像进行区域识别的操作,进而一定程度上节省计算资源及时间。
综上所述,本公开实施例提供的图像处理方法,可以先提取待处理视频中的至少一帧基准视频图像,其中,基准视频图像的数量小于待处理视频中包含的视频图像的数量,接着,根据基准视频图像中任意像素点与其周边背景的对比,对至少一帧基准视频图像进行区域识别,以确定每帧基准视频图像中的第一目标区域,最后,对于每帧基准视频图像,根据基准视频图像中的第一目标区域,确定待处理视频中与基准视频图像关联的其他视频图像中的第二目标区域。本公开实施例中,仅需根据基准视频图像中任意像素点与其周边背景的对比,对待处理视频中的部分基准视频图像进行区域识别,其他视频图像中的第二目标区域可以基于这些基准视频图像中的第一目标区域进行确定,这样,由于无需根据视频图像中任意像素点与其周边背景的对比,对所有视频图像均进行区域识别,因此,一定程度上可以降低确定各个视频图像中显著区域所耗费的计算资源及时间,提高确定效率。
图2是本公开实施例提供的另一种图像处理方法的步骤流程图,如图2所示,该方法应用于服务器,该方法可以包括下述步骤:
步骤201、服务器提取待处理视频中的至少一帧基准视频图像;所述基准视频图像的数量小于所述待处理视频中包含的视频图像的数量。
也即是说,上述步骤201为服务器获取待处理视频中的至少一帧第一视频图像的一种可能实现方式,其中,该第一视频图像的数量小于该待处理视频中包含的视频图像的数量。
在确定第一视频图像的一种实现方式中,可以是从待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第 一视频图像,N为大于或等于1的整数。其中,N越小,需要根据视频图像中任意像素点与其周边背景的对比,进行识别的视频图像越多,即,需要基于区域检测算法识别的视频图像越多,所需消耗的计算时间及资源也越多,但是N越小,第一视频图像关联的第二视频图像的数量往往会越少,这样,确定的第二目标区域的准确度往往越高,相反地,N越大,需要基于根据视频图像中任意像素点与其周边背景的对比,进行识别的视频图像越少,所需消耗的计算时间及资源也越少,但是N越大,第一视频图像关联的第二视频图像的数量往往会越多,这样,确定的第二目标区域的准确度可能会越低,因此,N的具体值可以根据实际需要进行设置,例如,N可以为5,本公开实施例对此不作限定。示例的,假设待处理视频图像中包括100帧视频图像,那么,可以将第一帧视频图像、第6帧视频图像、第11帧视频图像,……,第96帧视频图像,作为第一视频图像,共计得到20帧第一视频图像。
本公开实施例中,通过间隔固定帧视频图像进行选择,可以使得每帧基准视频图像关联的其他视频图像的数量相同,这样,可以避免某些基准视频图像的关联的其他视频图像过多,导致基于该基准视频图像中的第一目标区域确定的其他视频图像中的第二目标区域不准确,进而可以区域确定的效果。
进一步地,在确定基准视频图像的另一种实现方式中,可以从待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧该第一视频图像。示例的,可以间隔2帧视频图像选择一帧视频图像,接着,间隔5帧视频图像选择一帧视频图像,接着,间隔4帧视频图像选择一帧视频图像,等等,最后,将选择的视频图像作为至少一帧该第一视频图像。本实现方式中,可以不受预设值N的限制,每次选择时候,可以随机间隔任意数量的帧视频图像进行选择,即,通过非等间距的方式进行选择,这样,可以提高选择操作的灵活性。
步骤202、服务器根据至少一帧所述基准视频图像中任意像素点与其周边背景的对比,对至少一帧所述基准视频图像进行区域识别,以确定每帧所述基准视频图像中的第一目标区域。
也即是说,上述步骤202为服务器对至少一帧该第一视频图像进行区域识别,确定至少一帧该第一视频图像的第一目标区域的一种可能实现方式。
可选地,本步骤的实现方式可以参照上述步骤102,本公开实施例在此不做赘述。
步骤203、服务器对于每帧所述基准视频图像,按照所述基准视频图像关联 的每帧所述其他视频图像的图像时序,基于预设的图像跟踪算法,确定所述其他视频图像的前一帧视频图像中的第一目标区域或第二目标区域在所述其他视频图像中对应的区域,得到所述其他视频图像中的第二目标区域。
在一些实施例中,在确定第二目标区域时,服务器基于该待处理视频中视频图像的时序,确定与该第一视频图像关联的至少一帧第二视频图像,该第二视频图像的时序位于该第一视频图像与下一帧第一视频图像之间。
本步骤中,视频图像的时序表示视频图像在待处理视频中出现的时间先后顺序,示例的,假设视频图像a在待处理视频的第10秒出现,视频图像b在待处理视频的第30秒出现,视频图像c在待处理视频的第20秒出现,那么视频图像a的图像时序早于视频图像c的的图像时序,视频图像c的图像时序早于视频图像b的的图像时序。
可选地,服务器将该第一视频图像与下一帧第一视频图像之间的所有视频图像作为至少一帧该第二视频图像。可选地,服务器从该第一视频图像与下一帧第一视频图像之间的所有视频图像中,随机选取部分视频图像作为至少一帧该第二视频图像。
可选地,在确定了各个第二视频图像之后,服务器对该第一视频图像的第一目标区域进行图像跟踪,得到至少一帧该第二视频图像的第二目标区域。
在一个示例中,将该第一视频图像与其下一帧第一视频图像之间的所有视频图像确定为至少一帧该第二视频图像。接着,由于对该第一视频图像的第一目标区域进行图像跟踪,得到第1帧第二视频图像的第二目标区域,继续对第1帧第二视频图像的第二目标区域进行图像跟踪,得到第2帧第二视频图像的第二目标区域,…,以此类推,能够跟踪得到各个第二视频图像的第二目标区域。
进一步地,基准视频图像关联的其他视频图像可以是该基准视频图像与后一帧基准视频图像之间的非基准视频图像,这些其他视频图像中,图像时序最早的其他视频图像的前一帧视频图像即为该基准视频图像,因此,本步骤中可以基于预设的图像跟踪算法,对该基准视频图像中的第一目标区域进行跟踪,以确定出该基准视频图像中的第一目标区域在该其他视频图像中对应的区域,得到该其他视频图像的第二目标区域,接着可以对该其他视频图像的第二目标区域进行跟踪,以确定出图像时序仅晚于该帧其他视频图像的其他视频图像中的第二目标区域。
可选地,该预设的跟踪算法可以是光流跟踪算法,其中,光流跟踪算法可 以基于亮度恒定原则,即,同一点随着时间的变化,其亮度不会发生改变,以及空间一致原则,即,一个像素点邻近的像素点投影到下一帧图像上也是邻近点,且速度一致,基于前一帧视频图像中的第一目标区域或第二目标区域中像素点的亮度特征及邻近像素点的速度特征,预测该像素点在该其他视频图像中对应的像素点,进而得到该其他视频图像中的第二目标区域。本公开实施例中,仅需将前一帧视频图像作为预设的跟踪算法的输入,即可确定出其他视频图像中的目标区域,进而可以一定程度上可以提高确定其他视频图像中目标区域的效率。可选地,如果前一帧视频图像是第一视频图像,那么需要对第一视频图像的第一目标区域进行跟踪,如果前一帧视频图像是时序较早的第二视频图像,那么需要对第二视频图像的第二目标区域进行跟踪。
同时,本步骤中,由于相邻的视频图像之间的差异往往较小,因此,按照图像时序依次确定的方式,可以使得每次要跟踪的图像的差异较小,进而一定程度上可以使得基于跟踪算法能够准确的跟踪到对应的区域,提高确定效果。
综上所述,本公开实施例提供的图像处理方法,可以先提取待处理视频中的至少一帧基准视频图像,其中,基准视频图像的数量小于待处理视频中包含的视频图像的数量,接着,根据基准视频图像中任意像素点与其周边背景的对比,对至少一帧基准视频图像进行区域识别,以确定每帧基准视频图像中的第一目标区域,最后,对于每帧基准视频图像关联的其他视频图像,按照每帧其他视频图像的图像时序,基于预设的图像跟踪算法,确定其他视频图像的前一帧视频图像中的第一目标区域或第二目标区域在其他视频图像中对应的区域,得到其他视频图像中的第二目标区域。本公开实施例中,仅需根据基准视频图像中任意像素点与其周边背景的对比,对待处理视频中的部分基准视频图像进行区域识别,其他视频图像中的第二目标区域可以基于这些基准视频图像中的第一目标区域进行确定,这样,由于无需根据视频图像中任意像素点与其周边背景的对比,对所有视频图像均进行区域识别,因此,一定程度上可以降低确定各个视频图像中显著区域所耗费的计算资源及时间,提高确定效率。
图3是本公开实施例提供的又一种图像处理方法的步骤流程图,如图301所示,该方法应用于服务器,该方法可以包括如下步骤:
步骤301、服务器提取待处理视频中的至少一帧基准视频图像;所述基准视频图像的数量小于所述待处理视频中包含的视频图像的数量。
也即是说,上述步骤301为服务器获取待处理视频中的至少一帧第一视频 图像的一种可能实现方式,其中,该第一视频图像的数量小于该待处理视频中包含的视频图像的数量。
可选地,本步骤的实现方式可以参照前述步骤201,本公开实施例对此不作限定。
步骤302、服务器根据至少一帧所述基准视频图像中任意像素点与其周边背景的对比,对至少一帧所述基准视频图像进行区域识别,以确定每帧所述基准视频图像中的第一目标区域。
也即是说,上述步骤302为服务器对至少一帧该第一视频图像进行区域识别,确定至少一帧该第一视频图像的第一目标区域的一种可能实现方式。
可选地,本步骤的实现方式可以参照上述步骤202,本公开实施例在此不做赘述。
步骤303、服务器对于每帧所述基准视频图像,从所述待处理视频的编码数据中,获取所述基准视频图像关联的其他视频图像的运动信息。
可选地,在首次编码过程中,该编码数据是指首次编码数据,在重新编码过程中,该编码数据是指重新编码数据。
也即是说,上述步骤303为服务器获取至少一帧第二视频图像的运动信息的一种可能实施方式,其中,一帧第二视频图像与一帧第一视频图像相关联。
其中,该第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向。
本步骤中,对待处理视频进行编码操作时,通常是提取待处理视频中包含的每个关键帧图像,对于每个关键帧图像,会获取该关键帧图像后边邻接的多个非关键帧图像中的各个像素点相对于该关键帧图像中对应的像素点的位置量及位移方向,进而得到运动信息,最后,将关键帧图像及非关键帧图像的运动信息作为编码数据。因此,本公开实施例中,可以从待处理视频的编码数据中,获取其他视频图像的运动信息,以便于后续过程中基于这些信息进行识别。
相应地,在获取其他视频图像对应的运动信息之前,还可以先获取该待处理视频对应的编码数据。可选地,在视频流媒体的点播场景中,视频生产者将待处理视频上传至服务器时候,往往会对该待处理视频进行一次编码,即,该待处理视频是已经过首次编码的视频。因此,本步骤中,可以从待处理视频的首次编码数据中获取至少一帧该第二视频图像的运动信息。进一步地,由于实际应用场景中,视频平台方可能会有自定义的视频编码标准,相应地,可能会 按照自定义的视频编码标准,对接收到的待处理视频进行重新编码,因此,在本步骤中,可以对待处理视频进行重新编码操作,得到待处理视频的重新编码数据,然后从该重新编码数据中获取至少一帧该第二视频图像的运动信息。可选地,该重新编码操作可以是以该待处理视频的上一次编码数据为基础,基于该上一次编码数据中的内容进行重新编码,由于该上一次编码数据的内容的数据量小于待处理视频本身的数据量,因此,基于该上一次编码数据进行重新编码操作的方式,一定程度上可以减少处理资源的占用,进而避免出现卡顿的问题。
步骤304、服务器根据所述基准视频图像中的第一目标区域及所述基准视频图像关联的每帧其他视频图像对应的运动信息,确定每帧所述其他视频图像中的第二目标区域。
也即是说,上述步骤304为服务器根据至少一帧该第一视频图像的第一目标区域及至少一帧该第二视频图像的运动信息,确定至少一帧该第二视频图像的第二目标区域的一种可能实施方式。
由于运动信息可以体现视频图像之间的像素点的相对变化,因此,本公开实施例中,可以结合基准视频图像中的第一目标区域及其他视频图像对应的运动信息,确定其他视频图像中的第二目标区域。这样,仅需根据基准视频图像中任意像素点与其周边背景的对比,确定出待处理视频中部分基准视频图像中的第一目标区域,后续结合其他视频图像对应的运动信息,即可确定出其他视频图像中的第二目标区域,这里将第一目标区域和第二目标区域统称为“显著区域”,因此可以一定程度上可以提高确定待处理视频中的所有视频图像中显著区域的效率。
可选地,本步骤可以通过下述子步骤(1)~子步骤(4)实现:
子步骤(1):服务器按照所述基准视频图像关联的每帧所述其他视频图像的图像时序,对于每帧所述其他视频图像,将所述其他视频图像划分为多个视频图像块。
本步骤中,可以按照预设尺寸将该其他视频图像划分为多个预设尺寸的视频图像块,其中,该预设尺寸的具体值,可以是基于实际需求设置的,预设尺寸越小,视频图像块越多,相应地,基于该视频图像块确定的第二目标区域越精确,但是耗费的处理资源越多。而预设尺寸越大,视频图像块越少,相应地,基于该视频图像块确定的第二目标区域的精确程度越低,但是耗费的处理资源 越少。
子步骤(2):服务器对于每个所述视频图像块,若所述运动信息中包含所述视频图像块对应的运动信息,则基于所述视频图像块对应的运动信息,确定所述视频图像块在所述视频图像块的前一帧视频图像中对应的区域。
本步骤中,该相邻视频图像中至少包括基准视频图像,该视频图像块对应的运动信息包括该视频图像块中的每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向。进一步地,由于实际应用场景中,可能会出现运动信息缺失的问题,因此,本步骤中,可以先判断运动信息中是否包含该视频图像块对应的运动信息,如果包含,则可以基于视频图像块对应的运动信息,确定该视频图像块在前一帧视频图像中对应的区域。
可选地,基准视频图像关联的其他视频图像可以是基准视频图像与后一帧基准视频图像之间的视频图像,即,基准视频图像关联的其他视频图像的图像时序均晚于该基准视频图像的图像时序。
由于该视频图像块对应的运动信息包括的是该视频图像块中的每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向,因此,要确定出该视频图像块在前一帧视频图像中对应的区域时,可以基于该视频图像块中的每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向,将该视频图像块中的每个像素点的位置坐标,按照每个像素点的位移方向的反方向移动该位移量,得到移动后的每个像素点的位置坐标,接着,将每个像素点在前一帧视频图像中对应的移动后的像素点的位置坐标组成的区域,确定为对应的区域。示例的,位移量可以是坐标值,坐标值的正负性可以表示不同的位移方向。这样,基于视频图像块中的每个像素对应的位移量及位移方向,对视频图像块中的每个像素的位置坐标进行移动(相当于进行了一次位置坐标的映射),可以实现将视频图像块映射至前一帧视频图像中,进而得到该视频图像块对应的区域。
子步骤(3):若所述对应的区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,则服务器将所述视频图像块,确定为所述其他视频图像的目标区域组成部分。
本步骤中,可以判断该对应的区域是否落入该前一帧视频图像的第一目标区域或第二目标区域(可以统称为“显著区域”)内,如果位于则可以认为该视频图像块的内容是该前一帧视频图像的显著区域中的内容,相应地,可以将该 视频图像块确定为该其他视频图像的目标区域组成部分。
示例的,图4是本公开实施例提供的一种检测示意图,如图4所示,A表示已经确定出显著区域的前一帧视频图像,B表示其他视频图像,其中,a区域表示该前一帧视频图像中的显著区域,如果前一帧视频图像为第一视频图像,那么该显著区域是指第一目标区域,如果前一帧视频图像为第二视频图像,那么该显著区域是指第二目标区域,b区域表示一个其他视频图像中的视频图像块,c区域表示其他视频图像中的另一个视频图像块,d区域是b区域在前一帧视频图像中对应的区域,e区域是c区域在前一帧视频图像中对应的区域,可以看出,d区域位于前一帧视频图像的显著区域中,e区域不位于前一帧视频图像的显著区域。因此,可以将b区域表示的视频图像块确定为目标区域组成部分。本公开实施例中,仅需基于运动信息,确定其他视频图像的视频图像块对应的区域是否位于前一帧视频图像的显著区域中,即可确定出该其他视频图像中的第二目标区域,这样,仅需根据基准视频图像中任意像素点与其周边背景的对比,对待处理视频中的部分基准视频图像进行区域识别,即可实现对所有视频图像均进行区域识别。因此,一定程度上可以降低确定各个视频图像中显著区域所耗费的计算资源及时间,提高确定效率。
进一步地,若运动信息中不包含视频图像块对应的运动信息,则可以确定该视频图像块的相邻图像块是否为其他视频图像的目标区域组成部分。若是,则可以将该视频图像块,确定为该其他视频图像的目标区域组成部分。其中,该视频图像块的相邻图像块可以是与该视频图像块邻接的图像块,该相邻图像块可以是任一邻接的图像块。如果视频图像块的相邻图像块是该其他视频图像的目标区域组成部分,则可以认为该视频图像块很大概率也属于目标区域组成部分,因此,可以直接基于相邻图像块来进行确定。这样,对于运动信息缺失视频图像块,也能快速的确定出该视频图像块是否为目标区域组成部分,进而确保目标区域检测的效率。
子步骤(4):服务器将所有组成部分组成的区域确定为所述其他视频图像的第二目标区域。
假设该其他视频图像中有3个视频图像块对应的区域位于前一帧视频图像的显著区域内,这3个视频图像块组成的区域即为该其他视频图像的第二目标区域。
进一步地,假设基准视频图像为图像X,关联的其他视频图像分别为:图 像Y及图像Z,其中,图像X的图像时序最早、图像Y的图像时序次之及图像Z的图像时序最晚,那么可以先基于图像Y的运动信息,确定图像Y中每个视频图像块在图像X中对应的区域,将对应的区域位于图像X的显著区域(第一目标区域或第二目标区域)的视频图像块所组成的区域,确定为图像Y中的显著区域,进而得到图像Y中的第二目标区域。接着,可以确定图像Z中每个视频图像块在图像Y中对应的区域,将对应的区域位于图像Y的第二目标区域的视频图像块所组成的区域,确定为图像Z中的显著区域,进而得到图像Z中的第二目标区域。
在一些实施例中,上述步骤304还可以通过下述子步骤3041-3043实现:
步骤3041、服务器从该第二视频图像的运动信息中,获取每个视频图像块中的每个像素点的位移方向和位移量。
由于第二视频图像的运动信息中实际存储的是第二视频图像中多个视频图像块的运动信息,因此,读取第二视频图像的运动中已存储的每个视频图像块的运动信息,即可得到每个视频图像块中每个像素点的位移方向和位移量。
步骤3042、服务器基于该位移方向和位移量,将每个像素点从该第二视频图像映射到该第二视频图像的前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
在上述过程中,对于该视频图像块中每个像素点,由于其运动信息中记载的该像素点的位移方向和位移量是指从前一帧视频图像中如何映射到当前的第二视频图像,因此,只需要通过对其进行逆映射,即可找到该视频图像块中的各个像素点在前一帧视频图像中的对应像素点位置,也即将该视频图像块中的各个像素点映射到前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
服务器对运动信息中记载的每个视频图像块均执行上述步骤3041-3042,相当于服务器基于该第二视频图像的运动信息,确定该运动信息中包含的多个视频图像块在该第二视频图像的前一帧视频图像中对应的映射区域。
步骤3043、服务器获取映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为该第二视频图像块的第二目标区域。
在上述过程中,服务器首先对运动信息中已记载的各个视频图像块中各个像素点进行映射,得到各个视频图像块在前一帧视频图像中的映射区域,接着 获取到映射区域位于前一帧视频图像的显著区域内的目标视频图像块,相当于按照映射区域是否位于显著区域内,来从各个视频图像块中筛选出目标视频图像块,可选地,如果前一帧视频图像为第一视频图像,那么其显著区域是指第一目标区域,如果前一帧视频图像为第二视频图像,那么其显著区域是指第二目标区域,也即是说,根据前一帧视频图像的类型不同,其显著区域的类型也不尽相同。
在一些实施例中,由于运动信息中仅记载了在相邻的视频图像中像素点位置发生运动的视频图像块的运动信息,而如果一些视频图像块并未发生运动,那么这些视频图像块的运动信息将不会被记录在第二视频图像的运动信息中,但是这些未发生运动的视频图像块仍然有可能处于当前第二视频图像的第二目标区域内,因此可以通过对这些未发生运动的视频图像块的相邻视频图像块进行判断是否为目标视频图像块,以确定出这些未发生运动的视频图像块是否为目标视频图像块。
在一些实施例中,如果第二视频图像块的运动信息中未记载某些视频图像块的运动信息,服务器还可以执行下述操作:将该第二视频图像划分为多个视频图像块;对任一个视频图像块,若该第二视频图像的运动信息中不包含该视频图像块的运动信息,确定该视频图像块的相邻图像块的映射区域是否位于该前一帧视频图像的第一目标区域或第二目标区域内;若相邻图像块的映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内,将该视频图像块确定为一个目标视频图像块。
在上述过程中,对于第二视频图像的运动信息中未记载的视频图像块,通过判断其相邻图像块的映射区域是否位于前一帧视频图像的显著区域内,即可确定出该视频图像块是否为目标视频图像块,判断其相邻图像块的映射区域是否位于前一帧视频图像的显著区域内的方式,与上述步骤3041-3043类似,这里不做赘述。
综上所述,本公开实施例提供的图像处理方法,可以先提取待处理视频中的至少一帧基准视频图像,其中,基准视频图像的数量小于待处理视频中包含的视频图像的数量,接着,根据基准视频图像中任意像素点与其周边背景的对比,对至少一帧基准视频图像进行区域识别,以确定每帧基准视频图像中的第一目标区域,然后,对于每帧基准视频图像,从所述待处理视频对应的编码数据中,获取所述基准视频图像关联的其他视频图像对应的运动信息,最后,根 据基准视频图像中的第一目标区域及基准视频图像关联的每帧其他视频图像对应的运动信息,确定每帧其他视频图像中的第二目标区域。这样,由于无需根据视频图像中任意像素点与其周边背景的对比,对所有视频图像均进行区域识别,即可确定出待处理视频中的所有视频图像中的显著区域。因此,一定程度上可以降低确定各个视频图像中目标区域所耗费的计算资源及时间,提高确定效率。
图5是本公开实施例提供的一种图像处理装置的框图,如图5所示,该装置40可以包括:
提取模块401,被配置为提取待处理视频中的至少一帧基准视频图像;所述基准视频图像的数量小于所述待处理视频中包含的视频图像的数量。
可选地,基准视频图像又称为第一视频图像。
也即是说,提取模块401,被配置为获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量。
识别模块402,被配置为根据至少一帧所述基准视频图像中任意像素点与其周边背景的对比,对至少一帧所述基准视频图像进行区域识别,以确定每帧所述基准视频图像中的第一目标区域。
也即是说,识别模块402,被配置为对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域。
确定模块403,被配置为对于每帧所述基准视频图像,根据所述基准视频图像中的第一目标区域,确定所述待处理视频中与所述基准视频图像关联的其他视频图像中的第二目标区域。
也即是说,确定模块403,被配置为根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
综上所述,本公开实施例提供的图像处理装置,可以先提取待处理视频中的至少一帧基准视频图像,其中,基准视频图像的数量小于待处理视频中包含的视频图像的数量,接着,根据基准视频图像中任意像素点与其周边背景的对比,对至少一帧基准视频图像进行区域识别,以确定每帧基准视频图像中的第一目标区域,最后,对于每帧基准视频图像,根据基准视频图像中的第一目标区域,确定待处理视频中与基准视频图像关联的其他视频图像中的第二目标区域。本公开实施例中,仅需根据基准视频图像中任意像素点与其周边背景的对 比,对待处理视频中的部分基准视频图像进行区域识别,其他视频图像中的第二目标区域可以基于这些基准视频图像中的第一目标区域进行确定,这样,由于无需根据视频图像中任意像素点与其周边背景的对比,对所有视频图像均进行区域识别,因此,一定程度上可以降低确定各个视频图像中显著区域所耗费的计算资源及时间,提高确定效率。
可选的,所述提取模块401,被配置为:
从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数。
或者,从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
可选的,所述确定模块403,被配置为:
按照所述基准视频图像关联的每帧所述其他视频图像的图像时序,对于每帧所述其他视频图像,基于预设的图像跟踪算法,确定所述其他视频图像的前一帧视频图像中的第一目标区域或第二目标区域在所述其他视频图像中对应的区域,得到所述其他视频图像中的第二目标区域。
其中,图像时序最早的其他视频图像的前一帧视频图像为所述基准视频图像。
也即是说,确定模块403,被配置为:
基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
可选的,所述确定模块403,被配置为:
从所述待处理视频对应的编码数据中,获取所述基准视频图像关联的其他视频图像对应的运动信息;
根据所述基准视频图像中的第一目标区域及所述基准视频图像关联的每帧其他视频图像对应的运动信息,确定每帧所述其他视频图像中的第二目标区域。
也即是说,确定模块403,被配置为:
获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位 移量以及位移方向;
根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
可选的,所述确定模块403,还被配置为:
按照所述基准视频图像关联的每帧所述其他视频图像的图像时序,对于每帧所述其他视频图像,将所述其他视频图像划分为多个视频图像块;
对于每个所述视频图像块,若所述运动信息中包含所述视频图像块对应的运动信息,则基于所述视频图像块对应的运动信息,确定所述视频图像块在所述基准视频图像的前一帧视频图像中对应的区域;
若所述对应的区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,则将所述视频图像块,确定为所述其他视频图像的目标区域组成部分;
将所有组成部分组成的区域确定为所述其他视频图像的第二目标区域。
其中,所述运动信息包括所述视频图像块中的每个像素点相对于所述前一帧视频图像中对应像素点的位移量以及位移方向。
也即是说,确定模块403,还被配置为:
基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域;
获取映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为所述第二视频图像的第二目标区域。
可选的,所述确定模块403,还被配置为:
若所述运动信息中不包含所述视频图像块对应的运动信息,则确定所述视频图像块的相邻图像块是否为所述其他视频图像的目标区域组成部分;
若是,则将所述视频图像块,确定为所述其他视频图像的目标区域组成部分。
也即是说,确定模块403,还被配置为:
将所述第二视频图像划分为多个视频图像块;
对任一个视频图像块,若所述第二视频图像的运动信息中不包含所述视频图像块的运动信息,确定所述视频图像块的相邻图像块的映射区域是否位于所述前一帧视频图像的第一目标区域或第二目标区域内;
若相邻图像块的映射区域位于所述前一帧视频图像的第一目标区域或第二 目标区域内,将所述视频图像块确定为一个目标视频图像块。
可选的,在所述待处理视频为已编码视频的情况下,所述确定模块403,还被配置为:将已编码的所述待处理视频的编码数据作为所述待处理视频对应的编码数据;或者,对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据,以作为所述待处理视频对应的编码数据。
可选的,所述基准视频图像关联的其他视频图像是所述基准视频图像与后一帧基准视频图像之间的视频图像。
也即是说,所述提取模块401,还被配置为:
从所述待处理视频的首次编码数据中获取至少一帧所述第二视频图像的运动信息;或者,
对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据,从所述重新编码数据中获取至少一帧所述第二视频图像的运动信息。
可选地,所述确定模块403,还被配置为:
对于所述视频图像块中的每个像素点,按照所述视频图像块中每个所述像素点的位移方向的反方向,将每个所述像素点移动所述位移量。
将移动后的每个所述像素点在所述前一帧视频图像中对应的像素点组成的区域,确定为所述对应的区域。
也即是说,所述确定模块403,还被配置为:
从所述第二视频图像的运动信息中,获取每个所述视频图像块中的每个像素点的位移方向和位移量;
基于所述位移方向和位移量,将每个像素点从所述第二视频图像映射到所述前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
根据本公开的一个实施例,提供了一种电子设备,包括:处理器、用于存储处理器可执行指令的存储器,其中,处理器被配置为执行时实现如上述任一个实施例中的图像处理方法中的步骤,该图像处理方法包括:
获取待处理视频中的至少一帧第一视频图像,该第一视频图像的数量小于该待处理视频中包含的视频图像的数量;
对至少一帧该第一视频图像进行区域识别,确定至少一帧该第一视频图像 的第一目标区域;
根据至少一帧该第一视频图像的第一目标区域,确定该待处理视频中该第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
在一些实施例中,该处理器被配置为执行:
从该待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧该第一视频图像,N为大于或等于1的整数;或者,
从该待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧该第一视频图像。
在一些实施例中,该处理器被配置为执行:
基于该待处理视频中视频图像的时序,确定与该第一视频图像关联的至少一帧第二视频图像,该第二视频图像的时序位于该第一视频图像与下一帧第一视频图像之间;
对该第一视频图像的第一目标区域进行图像跟踪,得到至少一帧该第二视频图像的第二目标区域。
在一些实施例中,该处理器被配置为执行:
获取至少一帧该第二视频图像的运动信息,该第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
根据至少一帧该第一视频图像的第一目标区域及至少一帧该第二视频图像的运动信息,确定至少一帧该第二视频图像的第二目标区域。
在一些实施例中,该处理器被配置为执行:
基于该第二视频图像的运动信息,确定该运动信息中包含的多个视频图像块在该第二视频图像的前一帧视频图像中对应的映射区域;
获取映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为该第二视频图像的第二目标区域。
在一些实施例中,该处理器被配置为执行:
从该第二视频图像的运动信息中,获取每个该视频图像块中的每个像素点的位移方向和位移量;
基于该位移方向和位移量,将每个像素点从该第二视频图像映射到该前一 帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
在一些实施例中,该处理器还被配置为执行:
将该第二视频图像划分为多个视频图像块;
对任一个视频图像块,若该第二视频图像的运动信息中不包含该视频图像块的运动信息,确定该视频图像块的相邻图像块的映射区域是否位于该前一帧视频图像的第一目标区域或第二目标区域内;
若相邻图像块的映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内,将该视频图像块确定为一个目标视频图像块。
在一些实施例中,该处理器被配置为执行:
从该待处理视频的首次编码数据中获取至少一帧该第二视频图像的运动信息;或者,
对该待处理视频进行重新编码,得到该待处理视频的重新编码数据,从该重新编码数据中获取至少一帧该第二视频图像的运动信息。
根据本公开的一个实施例,还提供了一种非临时性计算机可读存储介质,当存储介质中的指令由移动终端的处理器执行时,使得移动终端能够执行如上述任一个实施例中的图像处理方法中的步骤,该图像处理方法包括:
获取待处理视频中的至少一帧第一视频图像,该至少一帧第一视频图像的数量小于该待处理视频中包含的视频图像的数量;
对该至少一帧第一视频图像进行区域识别,确定该至少一帧第一视频图像的第一目标区域;
根据该至少一帧第一视频图像的第一目标区域,确定该待处理视频中该第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
在一些实施例中,该移动终端的处理器执行如下操作:
从该待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧该第一视频图像,N为大于或等于1的整数;或者,
从该待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧该第一视频图像。
在一些实施例中,该移动终端的处理器执行如下操作:
基于该待处理视频中视频图像的时序,确定与该第一视频图像关联的至少一帧第二视频图像,该第二视频图像的时序位于该第一视频图像与下一帧第一 视频图像之间;
对该第一视频图像的第一目标区域进行图像跟踪,得到至少一帧该第二视频图像的第二目标区域。
在一些实施例中,该移动终端的处理器执行如下操作:
获取至少一帧该第二视频图像的运动信息,该第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
根据至少一帧该第一视频图像的第一目标区域及至少一帧该第二视频图像的运动信息,确定至少一帧该第二视频图像的第二目标区域。
在一些实施例中,该移动终端的处理器执行如下操作:
基于该第二视频图像的运动信息,确定该运动信息中包含的多个视频图像块在该第二视频图像的前一帧视频图像中对应的映射区域;
获取映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为该第二视频图像的第二目标区域。
在一些实施例中,该移动终端的处理器执行如下操作:
从该第二视频图像的运动信息中,获取每个该视频图像块中的每个像素点的位移方向和位移量;
基于该位移方向和位移量,将每个像素点从该第二视频图像映射到该前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
在一些实施例中,该移动终端的处理器执行如下操作:
将该第二视频图像划分为多个视频图像块;
对任一个视频图像块,若该第二视频图像的运动信息中不包含该视频图像块的运动信息,确定该视频图像块的相邻图像块的映射区域是否位于该前一帧视频图像的第一目标区域或第二目标区域内;
若相邻图像块的映射区域位于该前一帧视频图像的第一目标区域或第二目标区域内,将该视频图像块确定为一个目标视频图像块。
在一些实施例中,该移动终端的处理器执行如下操作:
从该待处理视频的首次编码数据中获取至少一帧该第二视频图像的运动信息;或者,
对该待处理视频进行重新编码,得到该待处理视频的重新编码数据,从该 重新编码数据中获取至少一帧该第二视频图像的运动信息。
根据本公开的一个实施例,还提供了一种应用程序,当应用程序由移动终端的处理器执行时,使得移动终端能够执行如上述任一个实施例中的图像处理方法中的步骤。
图6是根据一示例性实施例示出的一种用于图像处理的装置的框图。例如,装置500可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图6,装置500可以包括以下一个或多个组件:处理组件502,存储器504,电力组件506,多媒体组件508,音频组件510,输入/输出(I/O)的接口512,传感器组件514,以及通信组件516。
处理组件502通常控制装置500的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件502可以包括一个或多个处理器520来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件502可以包括一个或多个模块,便于处理组件502和其他组件之间的交互。例如,处理组件502可以包括多媒体模块,以方便多媒体组件508和处理组件502之间的交互。
存储器504被配置为存储各种类型的数据以支持在设备500的操作。这些数据的示例包括用于在装置500上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器504可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件506为装置500的各种组件提供电力。电源组件506可以包括电源管理系统,一个或多个电源,及其他与为装置500生成、管理和分配电力相关联的组件。
多媒体组件508包括在所述装置500和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸 或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件508包括一个前置摄像头和/或后置摄像头。当设备500处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件510被配置为输出和/或输入音频信号。例如,音频组件510包括一个麦克风(MIC),当装置500处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器504或经由通信组件516发送。在一些实施例中,音频组件510还包括一个扬声器,用于输出音频信号。
I/O接口512为处理组件502和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件514包括一个或多个传感器,用于为装置500提供各个方面的状态评估。例如,传感器组件514可以检测到设备500的打开/关闭状态,组件的相对定位,例如所述组件为装置500的显示器和小键盘,传感器组件514还可以检测装置500或装置500一个组件的位置改变,用户与装置500接触的存在或不存在,装置500方位或加速/减速和装置500的温度变化。传感器组件514可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件514还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件514还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件516被配置为便于装置500和其他设备之间有线或无线方式的通信。装置500可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一个示例性实施例中,通信组件516经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件516还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置500可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、 现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器504,上述指令可由装置500的处理器520执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
图7是根据一示例性实施例示出的一种用于图像处理的装置的框图。例如,装置600可以被提供为一服务器。参照图7,装置600包括处理组件622,其进一步包括一个或多个处理器,以及由存储器632所代表的存储器资源,用于存储可由处理组件622的执行的指令,例如应用程序。存储器632中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件622被配置为执行指令,以执行上述图像处理方法。
装置600还可以包括一个电源组件626被配置为执行装置600的电源管理,一个有线或无线网络接口650被配置为将装置600连接到网络,和一个输入输出(I/O)接口658。装置600可以操作基于存储在存储器632的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本发明的其它实施方案。本申请旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由下面的权利要求指出。
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
    对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域;
    根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
  2. 根据权利要求1所述的方法,其特征在于,所述获取待处理视频中的至少一帧第一视频图像,包括:
    从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数;或者,
    从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
  3. 根据权利要求1所述的方法,其特征在于,所述根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,包括:
    基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
    对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
  4. 根据权利要求1所述的方法,其特征在于,所述根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,包括:
    获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
    根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
  5. 根据权利要求4所述的方法,其特征在于,所述根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域,包括:
    基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域;
    获取映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为所述第二视频图像的第二目标区域。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域,包括:
    从所述第二视频图像的运动信息中,获取每个所述视频图像块中的每个像素点的位移方向和位移量;
    基于所述位移方向和位移量,将每个像素点从所述第二视频图像映射到所述前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
  7. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    将所述第二视频图像划分为多个视频图像块;
    对任一个视频图像块,若所述第二视频图像的运动信息中不包含所述视频图像块的运动信息,确定所述视频图像块的相邻图像块的映射区域是否位于所述前一帧视频图像的第一目标区域或第二目标区域内;
    若相邻图像块的映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,将所述视频图像块确定为一个目标视频图像块。
  8. 根据权利要求4所述的方法,其特征在于,所述获取至少一帧所述第二视频图像的运动信息,包括:
    从所述待处理视频的首次编码数据中获取至少一帧所述第二视频图像的运动信息;或者,
    对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据, 从所述重新编码数据中获取至少一帧所述第二视频图像的运动信息。
  9. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行:
    获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
    对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域;
    根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
  10. 根据权利要求9所述的计算机设备,其特征在于,所述处理器被配置为执行:
    从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数;或者,
    从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
  11. 根据权利要求9所述的计算机设备,其特征在于,所述处理器被配置为执行:
    基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
    对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
  12. 根据权利要求9所述的计算机设备,其特征在于,所述处理器被配置为执行:
    获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位 移量以及位移方向;
    根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述处理器被配置为执行:
    基于所述第二视频图像的运动信息,确定所述运动信息中包含的多个视频图像块在所述第二视频图像的前一帧视频图像中对应的映射区域;
    获取映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内的目标视频图像块,将目标视频图像块所组成的区域确定为所述第二视频图像的第二目标区域。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器被配置为执行:
    从所述第二视频图像的运动信息中,获取每个所述视频图像块中的每个像素点的位移方向和位移量;
    基于所述位移方向和位移量,将每个像素点从所述第二视频图像映射到所述前一帧视频图像中,将映射得到的各个像素点所组成的区域确定为一个映射区域。
  15. 根据权利要求12所述的计算机设备,其特征在于,所述处理器还被配置为执行:
    将所述第二视频图像划分为多个视频图像块;
    对任一个视频图像块,若所述第二视频图像的运动信息中不包含所述视频图像块的运动信息,确定所述视频图像块的相邻图像块的映射区域是否位于所述前一帧视频图像的第一目标区域或第二目标区域内;
    若相邻图像块的映射区域位于所述前一帧视频图像的第一目标区域或第二目标区域内,将所述视频图像块确定为一个目标视频图像块。
  16. 根据权利要求12所述的计算机设备,其特征在于,所述处理器被配置为执行:
    从所述待处理视频的首次编码数据中获取至少一帧所述第二视频图像的运动信息;或者,
    对所述待处理视频进行重新编码,得到所述待处理视频的重新编码数据, 从所述重新编码数据中获取至少一帧所述第二视频图像的运动信息。
  17. 一种非临时性计算机可读存储介质,当所述存储介质中的指令由移动终端的处理器执行时,使得所述移动终端的处理器执行如下操作:
    获取待处理视频中的至少一帧第一视频图像,所述第一视频图像的数量小于所述待处理视频中包含的视频图像的数量;
    对至少一帧所述第一视频图像进行区域识别,确定至少一帧所述第一视频图像的第一目标区域;
    根据至少一帧所述第一视频图像的第一目标区域,确定所述待处理视频中所述第一视频图像以外的至少一帧第二视频图像的第二目标区域,一帧第二视频图像与一帧第一视频图像相关联。
  18. 根据权利要求17所述的可读存储介质,其特征在于,所述移动终端的处理器执行如下操作:
    从所述待处理视频的首帧视频图像开始,每间隔N帧视频图像选择一帧第一视频图像,得到至少一帧所述第一视频图像,N为大于或等于1的整数;或者,
    从所述待处理视频包含的视频图像中任选至少一帧视频图像作为至少一帧所述第一视频图像。
  19. 根据权利要求17所述的可读存储介质,其特征在于,所述移动终端的处理器执行如下操作:
    基于所述待处理视频中视频图像的时序,确定与所述第一视频图像关联的至少一帧第二视频图像,所述第二视频图像的时序位于所述第一视频图像与下一帧第一视频图像之间;
    对所述第一视频图像的第一目标区域进行图像跟踪,得到至少一帧所述第二视频图像的第二目标区域。
  20. 根据权利要求17所述的可读存储介质,其特征在于,所述移动终端的处理器执行如下操作:
    获取至少一帧所述第二视频图像的运动信息,所述第二视频图像的运动信息包括多个视频图像块中每个像素点相对于前一帧视频图像中对应像素点的位移量以及位移方向;
    根据至少一帧所述第一视频图像的第一目标区域及至少一帧所述第二视频 图像的运动信息,确定至少一帧所述第二视频图像的第二目标区域。
PCT/CN2020/110771 2019-09-29 2020-08-24 图像处理方法、电子设备及可读存储介质 WO2021057359A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/706,457 US20220222831A1 (en) 2019-09-29 2022-03-28 Method for processing images and electronic device therefor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910936022.1 2019-09-29
CN201910936022.1A CN110796012B (zh) 2019-09-29 2019-09-29 图像处理方法、装置、电子设备及可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/706,457 Continuation US20220222831A1 (en) 2019-09-29 2022-03-28 Method for processing images and electronic device therefor

Publications (1)

Publication Number Publication Date
WO2021057359A1 true WO2021057359A1 (zh) 2021-04-01

Family

ID=69439960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110771 WO2021057359A1 (zh) 2019-09-29 2020-08-24 图像处理方法、电子设备及可读存储介质

Country Status (3)

Country Link
US (1) US20220222831A1 (zh)
CN (1) CN110796012B (zh)
WO (1) WO2021057359A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796012B (zh) * 2019-09-29 2022-12-27 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及可读存储介质
CN111294512A (zh) * 2020-02-10 2020-06-16 深圳市铂岩科技有限公司 图像处理方法、装置、存储介质及摄像装置
CN113553963A (zh) * 2021-07-27 2021-10-26 广联达科技股份有限公司 安全帽的检测方法、装置、电子设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116896A (zh) * 2013-03-07 2013-05-22 中国科学院光电技术研究所 一种基于视觉显著性模型的自动检测跟踪方法
CN106611412A (zh) * 2015-10-20 2017-05-03 成都理想境界科技有限公司 贴图视频生成方法及装置
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN110267010A (zh) * 2019-06-28 2019-09-20 Oppo广东移动通信有限公司 图像处理方法、装置、服务器及存储介质
CN110796012A (zh) * 2019-09-29 2020-02-14 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及可读存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301596B (zh) * 2013-07-11 2018-09-25 炬芯(珠海)科技有限公司 一种视频处理方法及装置
CN105631803B (zh) * 2015-12-17 2019-05-28 小米科技有限责任公司 滤镜处理的方法和装置
CN107277301B (zh) * 2016-04-06 2019-11-29 杭州海康威视数字技术股份有限公司 监控视频的图像分析方法及其系统
CN108961304B (zh) * 2017-05-23 2022-04-26 阿里巴巴集团控股有限公司 识别视频中运动前景的方法和确定视频中目标位置的方法
CN107295309A (zh) * 2017-07-29 2017-10-24 安徽博威康信息技术有限公司 一种基于多监控视频的目标人物锁定显示系统
CN109635657B (zh) * 2018-11-12 2023-01-06 平安科技(深圳)有限公司 目标跟踪方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116896A (zh) * 2013-03-07 2013-05-22 中国科学院光电技术研究所 一种基于视觉显著性模型的自动检测跟踪方法
CN106611412A (zh) * 2015-10-20 2017-05-03 成都理想境界科技有限公司 贴图视频生成方法及装置
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN110267010A (zh) * 2019-06-28 2019-09-20 Oppo广东移动通信有限公司 图像处理方法、装置、服务器及存储介质
CN110796012A (zh) * 2019-09-29 2020-02-14 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN110796012B (zh) 2022-12-27
CN110796012A (zh) 2020-02-14
US20220222831A1 (en) 2022-07-14

Similar Documents

Publication Publication Date Title
US9674395B2 (en) Methods and apparatuses for generating photograph
CN106651955B (zh) 图片中目标物的定位方法及装置
WO2021057359A1 (zh) 图像处理方法、电子设备及可读存储介质
US10212386B2 (en) Method, device, terminal device, and storage medium for video effect processing
EP3125135A1 (en) Picture processing method and device
US20170287188A1 (en) Method and apparatus for intelligently capturing image
WO2017031901A1 (zh) 人脸识别方法、装置及终端
US9959484B2 (en) Method and apparatus for generating image filter
EP2998960B1 (en) Method and device for video browsing
WO2016192325A1 (zh) 视频文件的标识处理方法及装置
WO2020042826A1 (zh) 视频流降噪方法和装置、电子设备及存储介质
CN109784164B (zh) 前景识别方法、装置、电子设备及存储介质
CN107967459B (zh) 卷积处理方法、装置及存储介质
CN108122195B (zh) 图片处理方法及装置
CN106534951B (zh) 视频分割方法和装置
US11310443B2 (en) Video processing method, apparatus and storage medium
CN109509195B (zh) 前景处理方法、装置、电子设备及存储介质
WO2020233201A1 (zh) 图标位置确定方法和装置
US9799376B2 (en) Method and device for video browsing based on keyframe
US9665925B2 (en) Method and terminal device for retargeting images
CN106469446B (zh) 深度图像的分割方法和分割装置
CN108596957B (zh) 物体跟踪方法及装置
CN112866612B (zh) 插帧方法、装置、终端及计算机可读存储介质
CN113315903B (zh) 图像获取方法和装置、电子设备、存储介质
CN113761275A (zh) 视频预览动图生成方法、装置、设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867643

Country of ref document: EP

Kind code of ref document: A1