CN114827706A

CN114827706A - Image processing method, computer program product, electronic device, and storage medium

Info

Publication number: CN114827706A
Application number: CN202210223753.3A
Authority: CN
Inventors: 吴戈
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-07-29

Abstract

The application belongs to the technical field of image processing, and discloses an image processing method, a computer program product, electronic equipment and a storage medium, wherein the method comprises the steps of determining a characteristic offset between a target video frame and a historical video frame in a video to be processed; obtaining a highlight detection image corresponding to the target video frame based on the characteristic offset and the highlight detection image corresponding to the historical video frame; and rendering the target video frame based on the highlight detection image corresponding to the target video frame to obtain a light spot rendering image. Therefore, the highlight detection graph of the target video frame can be optimized and adjusted based on the characteristic offset between different video frames and the historical video frame, and then the light spot rendering is carried out on the target video frame based on the highlight detection graph after optimization and adjustment, so that the light spot positions in different video frames are close to be consistent, the stability of the video light spots is improved, and the problem of serious flicker of the video light spots is reduced.

Description

Image processing method, computer program product, electronic device, and storage medium

Technical Field

The present application relates to the field of image processing technology, and in particular, to a method of image processing, a computer program product, an electronic device, and a storage medium.

Background

With the increasing demand of people for image processing, the application of the facula rendering technology for images is generated. The speckle rendering technology is to increase speckle effects in an image based on an image segmentation technology and an image rendering technology.

In the prior art, light spot rendering is usually performed on each video frame in a video through a light spot rendering technology to obtain a video after the light spot rendering.

However, in the prior art, when performing flare rendering on a video, the problem that video flare flicker is serious generally exists.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image processing method, a computer program product, an electronic device, and a storage medium, which are used to solve the problem of serious video flare flicker when performing flare rendering on a video.

In one aspect, a method of image processing is provided, including:

determining the characteristic offset between a target video frame and a historical video frame in a video to be processed, wherein the historical video frame is the first n frames of the target video frame, and n is a positive integer;

obtaining a highlight detection image corresponding to the target video frame based on the characteristic offset and the highlight detection image corresponding to the historical video frame;

and rendering the target video frame based on the highlight detection image corresponding to the target video frame to obtain a light spot rendering image.

In the implementation process, whether obvious image characteristic deviation exists between different video frames can be determined based on the characteristic deviation between the different video frames, and a highlight detection image of a target video frame is generated based on a historical video frame according to the judgment result of the image characteristic deviation instead of directly extracting the highlight detection image of the target video frame, so that the target video frame can be rendered according to the highlight detection image which is optimized and adjusted based on the characteristic deviation and the historical video frame, a light spot rendering image with a light spot blurring effect is obtained, the light spot positions in the different video frames are close to be consistent, the video light spots are more stable, and the problem of serious flicker of the video light spots is solved.

In one embodiment, the feature offset is derived based on the region of interest in the target video frame and the region of interest in the historical video frame;

the region of interest is determined based on a region of a target object contained in the video frame.

In the implementation process, the characteristic offset is determined according to the interested regions in different video frames, so that the time cost and system resources consumed by image characteristic offset judgment are reduced.

In one embodiment, the determining the feature offset between the target video frame and the historical video frame in the video to be processed includes:

the method comprises the steps that a historical video frame is subjected to segmentation processing, and an annular region graph of the historical video frame is obtained, wherein the annular region graph of the historical video frame is obtained on the basis of an annular region surrounding a target object in the historical video frame;

dividing the target video frame to obtain a target video frame annular region image, wherein the target video frame annular region image is obtained on the basis of an annular region surrounding a target object in the target video frame;

and determining the characteristic offset according to the historical video frame annular region graph and the target video frame annular region graph.

In one embodiment, determining the feature offset according to the ring region map of the historical video frame and the ring region map of the target video frame includes:

extracting characteristic points of the historical video frame annular region graph to obtain a historical characteristic point set, wherein the historical characteristic point set comprises a plurality of characteristic points;

acquiring a target feature point set of the target video frame based on the target video frame annular region graph, the historical video frame annular region graph and the historical feature point set;

and determining the characteristic offset according to the position information of the plurality of characteristic points in the historical characteristic point set and the position information of the plurality of characteristic points in the target characteristic point set.

In the implementation process, a plurality of feature points of the historical video frame and the target video frame are extracted, and the feature offset of the historical video frame and the target video frame is determined based on the extracted feature points, so that the time cost and the system resource consumed by image feature offset judgment are reduced.

In one embodiment, extracting feature points from a ring region map of a historical video frame to obtain a historical feature point set includes:

respectively determining the credibility of each pixel point in the historical video frame annular region graph;

screening out a target number of pixel points from all pixel points of the historical video frame annular region graph according to the credibility of all the pixel points, wherein the credibility of the screened pixel points is higher than the credibility of the pixel points which are not screened out;

and generating a historical feature point set based on the screened pixel points, wherein the feature points in the historical feature point set are the screened pixel points.

In the implementation process, a plurality of effective feature points in the historical video frame are screened out according to the credibility of the feature points, so that the feature offset can be determined according to the effective feature points, and the accuracy of image feature offset judgment is improved.

In one embodiment, before determining the feature offset according to the position information of the plurality of feature points in the historical feature point set and the position information of the plurality of feature points in the target feature point set, the method further includes:

respectively determining a state vector and an error vector of each feature point in the target feature point set according to whether a matching point of each feature point in the target feature point set exists in the historical feature point set, wherein the matching point is determined according to the similarity between the feature points;

removing abnormal state vector characterization feature points or wrong state vector characterization feature points from the target feature point set to obtain a screened target feature point set;

and screening the characteristic points of the historical characteristic point set based on the screened target characteristic point set.

In the implementation process, invalid feature points are removed, and the accuracy of image feature deviation judgment is improved.

In one embodiment, determining a feature offset according to the position information of the plurality of feature points in the historical feature point set and the position information of the plurality of feature points in the target feature point set includes:

respectively determining a matching point corresponding to each feature point in a target feature point set from all feature points in a historical feature point set, wherein the matching points are determined according to the similarity between the feature points;

respectively determining a horizontal coordinate difference value between the horizontal coordinate of each feature point in the target feature point set and the horizontal coordinate of the corresponding matching point;

respectively determining a vertical coordinate difference value between the vertical coordinate of each feature point in the target feature point set and the vertical coordinate of the corresponding matching point;

and determining the characteristic offset according to each horizontal coordinate difference value and each vertical coordinate difference value.

In the implementation process, the deviation between the characteristic points in the target video frame and the characteristic points in the historical video frame is determined according to the position relationship between the characteristic points in the target video frame and the characteristic points in the historical video frame, so that the accuracy of judging the image characteristic deviation is improved.

In one embodiment, obtaining a highlight detection map corresponding to a target video frame based on a feature offset and a highlight detection map corresponding to a historical video frame includes:

if the characteristic offset is not lower than the offset threshold, performing weighted fusion processing on the initial highlight detection graph corresponding to the target video frame and the highlight detection graph corresponding to the historical video frame to obtain a highlight detection graph corresponding to the target video frame;

and if the characteristic offset is lower than the offset threshold, determining the highlight detection map corresponding to the historical video frame as the highlight detection map corresponding to the target video frame.

In the implementation process, the highlight detection graph is optimized and adjusted according to the characteristic offset and the historical video frame, so that the target video frame can be rendered according to the highlight detection graph after optimization and adjustment in the subsequent steps, the light spot rendering graph with the light spot blurring effect is obtained, the light spot positions in different video frames are close to be consistent, the video light spots are more stable, and the problem of serious flicker of the video light spots is solved.

In one aspect, an apparatus for image processing is provided, including:

the device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining the characteristic offset between a target video frame and a historical video frame in a video to be processed, the historical video frame is the first n frames of the video of the target video frame, and n is a positive integer;

the obtaining unit is used for obtaining a highlight detection graph corresponding to the target video frame based on the characteristic offset and the highlight detection graph corresponding to the historical video frame;

and the rendering unit is used for rendering the target video frame based on the highlight detection graph corresponding to the target video frame to obtain a light spot rendering graph.

In one embodiment, the determining unit is configured to:

In one embodiment, the determining unit is further configured to:

respectively determining a state vector and an error vector of each feature point in the target feature point set according to whether a matching point of each feature point in the target feature point set exists in the historical feature point set or not, wherein the matching point is determined according to the similarity between the feature points;

In one embodiment, the determining unit is configured to:

In one embodiment, the obtaining unit is configured to:

In one aspect, an electronic device is provided, including: a memory having stored therein computer program instructions which, when read and executed by the processor, perform the steps of the method as provided in any of the various alternative implementations of image processing described above.

In one aspect, a computer-readable storage medium is provided, having stored thereon computer program instructions which, when read and executed by a processor, perform the steps of the method as provided in any of the various alternative implementations of image processing described above.

In one aspect, a computer program product is provided, comprising computer program instructions which, when read and executed by a processor, perform the steps of the method as provided in any of the various alternative implementations of image processing described above.

In the image processing method, the computer program product, the electronic device and the storage medium provided by the embodiment of the application, a characteristic offset between a target video frame and a historical video frame in a video to be processed is determined; obtaining a highlight detection image corresponding to the target video frame based on the characteristic offset and the historical video frame; and rendering the target video frame based on the highlight detection image to obtain a light spot rendering image. Therefore, whether obvious image characteristic deviation exists between different video frames can be determined based on the characteristic deviation amount between the different video frames, and the highlight detection graph of the target video frame is generated based on the historical video frame instead of directly extracting the highlight detection graph of the target video frame according to the judgment result of the image characteristic deviation, so that the target video frame can be rendered according to the highlight detection graph which is optimized and adjusted based on the characteristic deviation amount and the historical video frame, a light spot rendering graph with a light spot blurring effect is obtained, the light spot positions in the different video frames are close to be consistent, the video light spots are more stable, and the problem of serious flicker of the video light spots is solved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart illustrating an implementation of a method for image processing according to an embodiment of the present disclosure;

fig. 2 is a flowchart of an implementation of a method for video rendering according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating an architecture of an image processing system according to an embodiment of the present disclosure;

fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is a new scientific technology that is developed to study and develop theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction, computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like. The image processing method in the embodiment of the application also utilizes the technology of aspects such as artificial intelligence and the like.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

First, some terms referred to in the embodiments of the present application will be described to facilitate understanding by those skilled in the art.

The terminal equipment: may be a mobile terminal, a fixed terminal, or a portable terminal such as a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system device, personal navigation device, personal digital assistant, audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the terminal device can support any type of interface to the user (e.g., wearable device), and the like.

A server: the cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platform and the like.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data.

With the increasing demand of people for image processing, for example, the demand of people for image blurring effect, the light spot rendering technology for images is applied. The speckle rendering technology is to increase speckle effects in an image based on an image segmentation technology and an image rendering technology. The facula effect is to carry out fuzzy processing on the image, and the blurring effect of the image is realized. For example, a neon spot video blurring technique (i.e., a spot rendering technique) based on a terminal device (e.g., a mobile phone) is an image spot effect generated by a computer vision algorithm like a single lens reflex of a camera.

Under the conventional technology, light spot rendering is usually performed on each video frame in a video through a light spot rendering technology, so as to obtain a video after light spot rendering. However, in this way, the problem that video flare is serious in obtaining a video after flare rendering generally exists, and therefore, in order to solve the problem that video flare is serious when flare rendering is performed on a video, the embodiment of the present application provides an image processing method, a computer program product, an electronic device, and a storage medium.

In this embodiment of the application, the execution subject is an electronic device for image processing, and optionally, the electronic device may be a server or a terminal device, which is not limited herein.

Referring to fig. 1, a flowchart of an implementation of a method for image processing according to an embodiment of the present application is shown, and the specific implementation flow of the method is as follows:

step 100: and determining the characteristic offset between the target video frame and the historical video frame in the video to be processed.

Specifically, the target video frame and the historical video frame are different video frames in the same video (i.e., the video to be processed). The target video frame is a certain video frame in the video to be processed, the historical video frame is the first n frames of the target video frame, and n is a positive integer. Optionally, n may be 1, and the historical video frame is a previous video frame of the target video frame. It should be noted that the target video frame is not the first frame video frame in the video to be processed. The feature offset is derived based on the region of interest in the target video frame and the region of interest in the historical video frame. Optionally, the video to be processed may be obtained by shooting through an electronic device by a user, or may be obtained through other devices. The format of the video frame may be a Red Green Blue (RGB) format.

In one embodiment, a user captures a camera of the family's daily life through a mobile phone to obtain a life video in an RGB color format.

In the embodiment of the present application, only the region of interest is taken as an annular region for example, and in practical application, the region of interest may be set according to a practical application scene, which is not limited herein.

When step 100 is executed, the following steps may be adopted:

s1001: and (4) carrying out segmentation processing on the historical video frame to obtain an annular region map of the historical video frame.

Specifically, the historical video frame annular region map (i.e., the region of interest in the historical video frame) is obtained based on an annular region surrounding the target object in the historical video frame.

In one embodiment, the ring region map of the historical video frame is obtained based on segmentation of a background region in the historical video frame. The background area is an area of the historical video frame other than the target object. Optionally, the target object may be set according to an actual application scenario, for example, the target object may be a portrait.

In practical applications, the ring-shaped region map of the historical video frame may be set according to a practical application scene, for example, obtained by segmenting from a foreground in an image, and is not limited herein.

When S1001 is executed, the following steps may be adopted:

s10011: and carrying out target object detection on the historical video frame to obtain a first target object detection area in the historical video frame.

In one embodiment, the target object is a portrait, the portrait is segmented for the historical video frame based on the deep learning model, a single-channel binary image of the historical video frame is obtained, and the portrait area is obtained through the single-channel binary image. That is, the history video frame is subjected to binarization processing.

Alternatively, the deep learning model may be constructed based on unet and self-supervision (rescet) frameworks. The shape of the target object detection area may be set according to an actual application scenario, such as a rectangle, which is not limited herein. The single-channel binary image may also be referred to as a portrait mask image and may be represented as a matrix I _mask 。

In one embodiment, the pixel value of the human image region (i.e., the first target object detection region) in the single-channel binary image is 1, and the pixel value of the background region is 0.

In one embodiment, the pixel value of the human image region (i.e., the first target object detection region) in the single-channel binary image is 0, and the pixel value of the background region is 1.

Furthermore, according to the portrait area, a circumscribed matrix frame of the portrait in the historical video frame can be obtained.

S10012: and amplifying the first target object detection area in the historical video frame to obtain a second target object detection area in the historical video frame.

In one embodiment, the first target detection area is expanded outward in a plurality of directions (e.g., four directions, i.e., up, down, left, and right), so that the width and height of the first target detection area are both expanded in a specified expansion ratio, and the second target detection area is obtained.

In practical applications, the specified expansion ratio may be set according to practical application scenarios, for example, 25%, and is not limited herein.

S10013: an annular detection region between a first target object detection region in the historical video frame and a second target object detection region in the historical video frame is determined.

In one embodiment, a difference between the first target object detection area and the second target object detection area is determined to obtain an annular detection area.

Further, in order to reduce the time and system resources consumed by image processing, the annular detection area may be downsampled by m times to obtain a downsampled annular detection area.

In this embodiment, m is a positive integer, for example, m may be 1, and in practical application, m may be set according to a practical application scenario, which is not limited herein.

S10014: and according to the annular detection area in the historical video frame, carrying out segmentation processing on the historical video frame to obtain an annular area map of the historical video frame.

Alternatively, the video frames (including the historical video frames and the target video frames) may be represented as a matrix Iimage. The annular detection area may be represented as a matrix Imask _ roi. The historical video frame annular region map can be represented as a matrix Idamper _ roi _ pre. Then when determining the historical video frame annular region map Iimage _ roi _ pre, the following formula can be used:

the historical video frame annular region graph Iimage _ roi _ pre is the historical video frame Iimage annular detection region Imask _ roi.

In this way, a partial background region map in the historical video frame, i.e. a ring region map of the historical video frame, can be obtained.

S1002: and carrying out segmentation processing on the target video frame to obtain an annular region image of the target video frame.

Specifically, the target video frame annular region map (i.e., the region of interest in the target video frame) is obtained based on an annular region surrounding the target object in the target video frame.

In one embodiment, the target video frame annular region map is an annular region map surrounding a target object in the target video frame obtained based on background region segmentation in the target video frame.

Alternatively, the target video frame annular region map can be represented as a matrix Iimage _ roi _ cur.

The target video frame annular region image may be obtained based on a principle similar to that of obtaining the historical video frame annular region image, which is not described herein again.

S1003: and determining the characteristic offset according to the historical video frame annular region graph and the target video frame annular region graph.

Specifically, when S1003 is executed, the following steps may be adopted:

s10031: and extracting characteristic points of the annular region graph of the historical video frame to obtain a historical characteristic point set.

Specifically, the historical feature point set includes a plurality of feature points. The feature points are pixel points in the image.

S10032: and obtaining a target feature point set of the target video frame based on the target video frame annular region graph, the historical video frame annular region graph and the historical feature point set.

Specifically, an optical flow algorithm (Lucas-Kanade, LK) is adopted, and a target feature point set of a target video frame is obtained based on a target video frame annular area graph, a historical video frame annular area graph and a historical feature point set.

The LK optical flow method is an optical flow estimation algorithm of two-frame difference.

In one embodiment, calcptical flowpyrlk () in an open-source video library opencv is adopted to obtain a target feature point set of a target video frame based on a target video frame ring region map, a historical video frame ring region map and a historical feature point set.

In practical applications, the target feature point set of the target video frame may also be determined in other manners, which is not limited herein.

Therefore, a target feature point set of the target video frame, namely a plurality of effective feature points in a partial background region in the target video frame, can be obtained based on the historical feature point set of the historical video frame.

Further, feature points in the target feature point set and the historical feature point set can be screened respectively, so that feature points and characteristic points with errors, which are not matched with each other, in the target feature point set and the historical feature point set are removed.

S10033: and determining the characteristic offset according to the position information of the plurality of characteristic points in the historical characteristic point set and the position information of the plurality of characteristic points in the target characteristic point set.

When S10031 is executed, that is, feature point extraction is performed on the historical video frame annular region map, and a historical feature point set is obtained, the following steps may be adopted:

s100311: and respectively determining the credibility of each pixel point in the historical video frame annular region graph.

The reliability represents the probability that the pixel value of the pixel is a non-abnormal value, that is, the probability that the pixel is normal.

In one implementation mode, the reliability of each pixel point in the historical video frame annular region graph is determined by adopting a goodfeeds ToTrack () function in an open-source video library opencv.

In practical applications, the confidence level may also be determined in other manners, and is not limited herein.

S100312: and screening out the target number of pixel points from the pixel points of the historical video frame annular region graph according to the credibility of the pixel points.

And the credibility of the screened pixel points is higher than the credibility of the pixel points which are not screened.

In practical applications, the number of targets may be set according to practical application scenarios, for example, 30, and is not limited herein.

In one embodiment, pixel points with the reliability higher than the reliability threshold value are screened out from all the pixel points of the annular region graph of the historical video frame, and the target number of pixel points are screened out again from the screened pixel points according to the sequence from high to low of the reliability.

Further, if the number of the screened pixel points is less than the target number, the subsequent highlight detection graph extraction operation for the target video frame is stopped, the highlight detection graph of the historical video frame is determined as the highlight detection graph of the target video frame, and step 102 is executed, so that the target video frame can be rendered directly based on the highlight detection graph of the historical video frame in the subsequent steps, and a light spot rendering graph is obtained.

If the number of the screened pixel points is lower than the target number, it is indicated that the target video frame is abnormal, and therefore, the highlight detection graph based on the historical video frame performs light spot rendering on the target video frame.

S100313: and generating a historical characteristic point set based on the screened pixel points.

Specifically, the feature points in the historical feature point set are the screened pixel points. That is to say, the screened pixel points are used as the feature points of the historical video frame.

In this way, a plurality of valid feature points in a partial background area in the historical video frame can be extracted.

When feature points in the target feature point set and the historical feature point set are screened, the following steps can be adopted:

s100321: and respectively determining the state vector and the error vector of each feature point in the target feature point set according to whether the historical feature point set has the matching point of each feature point in the target feature point set.

It should be noted that, if the historical video frame and the target video frame both include the same image feature a, the pixel point of the image feature a in the historical video frame is the matching point of the corresponding feature point in the image feature a in the target video frame.

That is to say, the pixels of the same image feature a in different video frames are matching points.

In one embodiment, whether matching points exist in each feature point in the target feature point set is determined according to the position information and the pixel value of each feature point in the historical feature point set and the target feature point set respectively. And respectively aiming at each feature point in the target feature point set, if one feature point has a matching point, the state vector of the feature point is a first vector value, and if not, the state vector is a second vector value.

Optionally, both the first vector value and the second vector value may be set according to an actual application scenario, for example, the first vector value may be 1 to characterize that the feature point is normal, and the second vector value may be 0 to characterize that the feature point is abnormal, which is not limited herein.

In practical application, the manner of determining the matching point may be set according to a practical application scenario, which is not limited herein.

In one embodiment, the error vector of each feature point in the target feature point set is determined according to the position information and the pixel value of each feature point in the historical feature point set and the target feature point set.

Wherein, the error vector of a feature point indicates the error type of the feature point, i.e. indicates that there is an abnormality in the feature point in the historical video frame and the target video frame. In practical applications, the error type may be set according to practical application scenarios, and is not limited herein.

For example, the error type of the feature point may be set in an attribute (flags) parameter.

S100322: and removing abnormal state vector characterization feature points or wrong state vector characterization feature points from the target feature point set to obtain the screened target feature point set.

In this way, feature points that may cause interference can be removed from the target feature point set.

Further, if the number of the feature points with the error vector characterizing feature point errors in the target feature point set is higher than the error number threshold, stopping the subsequent highlight detection graph extraction operation for the target video frame, determining the highlight detection graph of the historical video frame as the highlight detection graph of the target video frame, and executing step 102, so that the target video frame can be rendered directly based on the highlight detection graph of the historical video frame in the subsequent steps, and a light spot rendering graph is obtained.

In practical applications, the threshold value of the number of errors may be set according to practical application scenarios, for example, 6, and is not limited herein.

This is because, if there are many erroneous feature points included in the target feature point set, it is indicated that there is an abnormality in the target video frame, and therefore, the target video frame is directly subjected to flare rendering based on the highlight detection map of the history video frame.

S100323: and screening the characteristic points of the historical characteristic point set based on the screened target characteristic point set.

In one embodiment, pixel points which do not match any feature point of the filtered target feature point set are removed from the historical feature point set.

That is to say, the feature points in the filtered historical feature point set are all matching points of a certain feature point in the filtered target feature point set. The matching points are determined based on the similarity between the feature points.

When S10033 is executed, that is, the feature offset is determined according to the position information of the plurality of feature points in the historical feature point set and the position information of the plurality of feature points in the target feature point set, the following steps may be adopted:

s100331: and respectively determining the matching point of each feature point in the target feature point set from the feature points in the historical feature point set.

S100332: respectively determining a horizontal coordinate difference value between the horizontal coordinate of each feature point in the target feature point set and the horizontal coordinate of the corresponding matching point;

s100333: respectively determining a vertical coordinate difference value between the vertical coordinate of each feature point in the target feature point set and the vertical coordinate of the corresponding matching point;

s100334: and determining the characteristic offset according to each horizontal coordinate difference value and each vertical coordinate difference value.

Specifically, the characteristic offset amount is determined based on the sum of the respective abscissa difference values and the sum of the respective ordinate difference values.

Wherein the characteristic offset is positively correlated with the sum of the differences of the horizontal coordinates and positively correlated with the sum of the differences of the vertical coordinates.

In one embodiment, a sum of the respective abscissa difference values is determined, a sum of the abscissa difference values is obtained, a sum of the respective ordinate difference values is determined, a sum of the ordinate difference values is obtained, and the characteristic offset is obtained based on the sum of the abscissa difference values and the sum of the ordinate difference values.

Optionally, when determining the horizontal coordinate difference sum and the vertical coordinate difference sum, the following formula may be adopted:

wherein Sum _ x is the Sum of the horizontal coordinate differences, Sum _ y is the Sum of the vertical coordinate differences, V _cur For the set of target feature points, len (V) _cur ) And i represents the serial number of the feature points, which is the total number of the feature points of the target feature point set, and the feature points with the same serial number are mutually matched points. V _pre And the set of historical feature points is shown, x is the abscissa of the feature point, and y is the ordinate of the feature point.

Optionally, when determining the characteristic offset, the following formula may be adopted:

wherein offset _ dist is a characteristic offset, Sum _ x is a Sum of horizontal coordinate differences, Sum _ y is a Sum of vertical coordinate differences, V _cur For the set of target feature points, len (V) _cur ) The total number of feature points in the target feature point set.

In this way, the feature offset between the target video frame and the historical video frame, i.e., the image deviation between the target video frame and the historical video frame, can be determined.

Step 101: and obtaining a highlight detection map corresponding to the target video frame based on the characteristic offset and the highlight detection map corresponding to the historical video frame.

Specifically, when step 101 is executed, the following two ways may be adopted:

the first mode is as follows: and if the characteristic offset is lower than the offset threshold, determining the highlight detection map corresponding to the historical video frame as the highlight detection map corresponding to the target video frame.

Specifically, if the characteristic offset is lower than the offset threshold, a highlight detection map of the historical video frame is obtained, and the highlight detection map of the historical video frame is determined as a highlight detection map corresponding to the target video frame.

In practical applications, the offset threshold may be set according to practical application scenarios, for example, the offset threshold is 15 pixels, and is not limited herein.

It should be noted that the highlight detection map of the first frame video frame in the video to be processed is obtained by directly performing highlight detection on the first frame video frame, and the target video frame is not the first frame video frame.

If the characteristic offset is lower than the offset threshold, it indicates that there is no obvious offset between the target video frame and the historical video frame, and the highlight detection of the historical video frame can be continuously used to render the target video frame instead of performing highlight detection on the target video frame.

The second way is: and if the characteristic offset is not lower than the offset threshold, performing weighted fusion processing on the initial highlight detection graph corresponding to the target video frame and the highlight detection graph corresponding to the historical video frame to obtain the highlight detection graph corresponding to the target video frame.

Specifically, if the characteristic offset is not lower than the offset threshold, highlight detection processing is performed on the target video frame to obtain an initial highlight detection image, and weighting fusion processing is performed on the initial highlight detection image and the highlight detection image of the historical video frame to obtain a highlight detection image corresponding to the target video frame.

In one embodiment, the following steps are performed for each pixel point in the initial highlight detection map:

the method comprises the steps of obtaining a first pixel value of a pixel point and a second pixel value of the pixel point corresponding to a highlight detection image of a historical video frame, carrying out weighted summation on the first pixel value and the second pixel value, and updating the first pixel value of the pixel point into the pixel value after the weighted summation.

In one embodiment, the weights may be adjusted using an optical flow algorithm and timing information of the video frames, so as to reduce the flare of the video.

In practical applications, the weights used in the weighted summation may be set according to practical application scenarios, for example, the weights are all 0.5, and are not limited herein.

If the characteristic offset is not lower than the offset threshold, it is indicated that there is an obvious offset between the target video frame and the historical video frame, and the initial highlight detection map of the target video frame needs to be adjusted based on the highlight detection map of the historical video frame, so that the offset between the highlight detection map of the target video frame and the highlight detection map of the historical video frame is reduced, and further the offset of the subsequent flare effect is reduced.

Step 102: and rendering the target video frame based on the highlight detection image corresponding to the target video frame to obtain a light spot rendering image.

Specifically, depth-of-field image processing is performed on the target video frame based on the depth estimation model to obtain a circle of confusion (COC) image of the target video frame, and rendering processing is performed on the target video frame based on the target video frame, the highlight detection image, the COC image and the portrait mask image to obtain a light spot rendering image of the target video frame.

In one embodiment, a background area image is determined based on a portrait mask image and a target video frame, blurring rendering is performed in a gathering convolution mode based on the background area image, a highlight detection image and a COC image to obtain a background rendering image, and the background rendering image and the target video frame are combined to obtain a light spot rendering image.

In the embodiment of the application, through the image characteristics of the interest areas in the target video frame and the historical video frame, whether obvious offset exists between the target video frame and the historical video frame is judged, if obvious, the highlight point detection graph of the historical video frame is obtained through adjustment, the highlight point detection graph of the target video frame is adjusted, and based on the adjusted highlight point detection graph, the light spot rendering is carried out on the target video frame to obtain a light spot rendering graph, so that the positions of the light spots in the light spot rendering graph of the historical video frame are close to each other, and after each video frame in the video to be processed is rendered, the light spot flicker degree of the video after the obtained light spot rendering is low.

Referring to fig. 2, an implementation flow chart of a method for video rendering according to an embodiment of the present application is shown, and a specific implementation flow of the method is as follows:

step 200: a spot rendering instruction for the video is received.

Alternatively, the video (i.e., the video to be processed) may be captured by the user through the terminal device, may be received from another device, or may be stored locally.

In one embodiment, a user's spot rendering instruction is received in response to a user's spot rendering operation for a video.

In one embodiment, a voice instruction of a user is analyzed to obtain a light spot rendering instruction of the user for a video.

In one embodiment, when determining that a light spot rendering request message for a video sent by other equipment is received, obtaining a light spot rendering instruction for the video of a user.

Step 201: and carrying out segmentation processing on a target video frame in the video to obtain an annular region image of the target video frame.

The target video frame may be a frame of video frame which is not rendered by the light spot in the video, and the rendering sequence of the video frames may be a playing time sequence in the video.

Step 202: and acquiring a historical video frame annular region map and a historical feature point set of a historical video frame corresponding to the target video frame from the video.

Step 203: and obtaining a target feature point set of the target video frame based on the target video frame annular region graph, the historical video frame annular region graph and the historical feature point set.

Step 204: and determining the characteristic offset according to the position information of the plurality of characteristic points in the historical characteristic point set and the position information of the plurality of characteristic points in the target characteristic point set.

Step 205: and judging whether the characteristic offset is lower than an offset threshold, if so, executing a step 206, otherwise, executing a step 207.

Step 206: and determining the highlight detection map of the historical video frame as the highlight detection map corresponding to the target video frame, and executing step 209.

Step 207: and carrying out highlight detection processing on the target video frame to obtain an initial highlight detection graph.

Step 208: and carrying out weighted fusion processing on the initial highlight detection graph and the highlight detection graph of the historical video frame to obtain the highlight detection graph corresponding to the target video frame.

Step 209: and rendering the target video frame based on the highlight detection image corresponding to the target video frame to obtain a light spot rendering image.

Step 210: judging whether a video frame to be rendered exists in the video, if so, executing step 201, otherwise, executing step 211.

Step 211: and finishing the light spot rendering process.

Specifically, when step 200 to step 211 are executed, the specific steps refer to step 100 to step 102, which are not described herein again.

Fig. 3 is a schematic diagram of an architecture of an image processing system according to an embodiment of the present disclosure. The image processing system in fig. 3 includes a terminal device 301 and a server 302.

The terminal device 301: for sending a light spot rendering request message for the video to the server 302, and receiving the light spot rendered video returned by the server 302.

Optionally, there may be one or more terminal devices 301, which is not limited herein.

The server 302: the processing method is used for determining and receiving the light spot rendering request message sent by the terminal device 301, sequentially and respectively determining the characteristic offset of each group of video frames (namely, the video frame combination including the target video frame and the historical video frame) based on the light spot rendering request message, and respectively obtaining the highlight detection image of the target video frame in each group of video frames based on the characteristic offset of each group of video frames and the historical video frame in each group of video frames, and respectively generating the light spot rendering image of the target video frame in each group of video frames based on the highlight detection image of the target video frame in each group of video frames, so as to obtain the video after light spot rendering.

In one embodiment, if the target video frame is a current video frame to be processed, and the historical video frame is a previous video frame of the current video frame, a highlight detection map corresponding to the current video frame may be obtained based on a feature offset between the current video frame and the previous video frame and a highlight detection map of the previous video frame, and a light spot rendering map of the current video frame may be obtained according to the highlight detection map corresponding to the current video frame.

Specifically, when the server 302 performs the light spot rendering on each video frame in the video, the specific steps refer to the above step 100 to step 102, which are not described herein again.

Based on the same inventive concept, the embodiment of the present application further provides an image processing apparatus, and since the principles of the apparatus and the device for solving the problems are similar to those of an image processing method, the implementation of the apparatus can refer to the implementation of the method, and repeated details are omitted.

As shown in fig. 4, which is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application, the image processing apparatus includes:

a determining unit 401, configured to determine a feature offset between a target video frame and a historical video frame in a video to be processed, where the historical video frame is a first n frames of the target video frame, and n is a positive integer;

an obtaining unit 402, configured to obtain a highlight detection map corresponding to the target video frame based on the feature offset and the highlight detection map corresponding to the historical video frame;

and a rendering unit 403, configured to perform rendering processing on the target video frame based on the highlight detection map corresponding to the target video frame, so as to obtain a light spot rendering map.

In one embodiment, determining unit 401 is configured to:

In one embodiment, the determining unit 401 is further configured to:

In one embodiment, determining unit 401 is configured to:

In one embodiment, the obtaining unit 403 is configured to:

Fig. 5 shows a schematic structural diagram of an electronic device 5000. Referring to fig. 5, the electronic device 5000 includes: the processor 5010 and the memory 5020 can optionally include a power supply 5030, a display unit 5040, and an input unit 5050.

The processor 5010 is a control center of the electronic apparatus 5000, connects various components using various interfaces and lines, and performs various functions of the electronic apparatus 5000 by running or executing software programs and/or data stored in the memory 5020, thereby monitoring the electronic apparatus 5000 as a whole.

In the embodiment of the present application, the processor 5010 executes each step in the above embodiments when calling a computer program stored in the memory 5020.

Optionally, the processor 5010 can include one or more processing units; preferably, the processor 5010 can integrate an application processor, which mainly handles operating systems, user interfaces, applications, etc., and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 5010. In some embodiments, the processor, memory, and/or memory may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips.

The memory 5020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, various applications, and the like; the storage data area may store data created according to the use of the electronic device 5000, and the like. Further, the memory 5020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The electronic device 5000 also includes a power supply 5030 (e.g., a battery) that provides power to the various components and that may be logically connected to the processor 5010 via a power management system to provide management of charging, discharging, and power consumption via the power management system.

The display unit 5040 may be configured to display information input by a user or information provided to the user, and various menus of the electronic device 5000, and in the embodiment of the present invention, the display unit is mainly configured to display a display interface of each application in the electronic device 5000 and objects such as texts and pictures displayed in the display interface. The display unit 5040 may include a display panel 5041. The Display panel 5041 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The input unit 5050 may be used to receive information such as numbers or characters input by a user. Input units 5050 may include touch panel 5051 as well as other input devices 5052. Among other things, the touch panel 5051, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 5051 (e.g., operations by a user on or near the touch panel 5051 using a finger, a stylus, or any other suitable object or attachment).

Specifically, the touch panel 5051 may detect a touch operation by a user, detect signals resulting from the touch operation, convert the signals into touch point coordinates, transmit the touch point coordinates to the processor 5010, and receive and execute a command sent from the processor 5010. In addition, the touch panel 5051 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. Other input devices 5052 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), a trackball, a mouse, a joystick, and the like.

Of course, the touch panel 5051 may cover the display panel 5041, and when the touch panel 5051 detects a touch operation thereon or thereabout, it is transmitted to the processor 5010 to determine the type of touch event, and then the processor 5010 provides a corresponding visual output on the display panel 5041 according to the type of touch event. Although in fig. 5, the touch panel 5051 and the display panel 5041 are implemented as two separate components to implement input and output functions of the electronic device 5000, in some embodiments, the touch panel 5051 and the display panel 5041 may be integrated to implement input and output functions of the electronic device 5000.

The electronic device 5000 may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the electronic device 5000 may further include other components such as a camera according to the requirements of a specific application, and these components are not shown in fig. 5 and are not described in detail since they are not components used in this embodiment of the present application.

Those skilled in the art will appreciate that fig. 5 is merely an example of an electronic device and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components.

In an embodiment of the present application, a computer-readable storage medium stores computer program instructions thereon, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps in the above embodiments.

In an embodiment of the present application, a computer program product includes computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps in the above embodiments.

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of image processing, comprising:

determining a characteristic offset between a target video frame and a historical video frame in a video to be processed, wherein the historical video frame is the first n frames of the target video frame, and n is a positive integer;

obtaining a highlight detection graph corresponding to the target video frame based on the characteristic offset and the highlight detection graph corresponding to the historical video frame;

2. The method of claim 1, wherein the feature offset is derived based on a region of interest in the target video frame and a region of interest in the historical video frame;

the region of interest is determined based on a region of a target object contained in a video frame.

3. The method of claim 2, wherein the region of interest is a circular region map, and the determining the feature offset between the target video frame and the historical video frame in the video to be processed comprises:

performing segmentation processing on the historical video frame to obtain a historical video frame annular region map, wherein the historical video frame annular region map is obtained on the basis of an annular region surrounding a target object in the historical video frame;

segmenting the target video frame to obtain a target video frame annular region image, wherein the target video frame annular region image is obtained on the basis of an annular region surrounding a target object in the target video frame;

4. The method of claim 3, wherein determining the feature offset based on the historical video frame annular region map and the target video frame annular region map comprises:

obtaining a target feature point set of the target video frame based on the target video frame annular region graph, the historical video frame annular region graph and the historical feature point set;

5. The method of claim 4, wherein said extracting feature points from the ring region map of the historical video frame to obtain a historical feature point set comprises:

and generating the historical characteristic point set based on the screened pixel points, wherein the characteristic points in the historical characteristic point set are the screened pixel points.

6. The method according to claim 4, further comprising, before determining the feature offset amount according to the position information of the plurality of feature points in the historical feature point set and the position information of the plurality of feature points in the target feature point set, the step of:

7. The method according to any one of claims 4 to 6, wherein the determining the feature offset according to the position information of the plurality of feature points in the historical feature point set and the position information of the plurality of feature points in the target feature point set comprises:

respectively determining a matching point corresponding to each feature point in the target feature point set from each feature point in the historical feature point set, wherein the matching points are determined according to the similarity between the feature points;

8. The method of any one of claims 1-7, wherein obtaining the highlight detection map corresponding to the target video frame based on the feature offset and the highlight detection map corresponding to the historical video frame comprises:

if the characteristic offset is not lower than an offset threshold, performing weighted fusion processing on the initial highlight detection image corresponding to the target video frame and the highlight detection image corresponding to the historical video frame to obtain a highlight detection image corresponding to the target video frame;

and if the characteristic offset is lower than an offset threshold, determining the highlight detection map corresponding to the historical video frame as the highlight detection map corresponding to the target video frame.

9. An electronic device, comprising: a memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of any of claims 1-8.

10. A computer-readable storage medium having computer program instructions stored thereon, which when read and executed by a processor, perform the method of any one of claims 1-8.

11. A computer program product comprising computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1 to 8.