CN116596774A

CN116596774A - Image detail enhancement method, device, equipment and storage medium of target area

Info

Publication number: CN116596774A
Application number: CN202310391126.5A
Authority: CN
Inventors: 杨莹; 靳凯
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2023-04-12
Filing date: 2023-04-12
Publication date: 2023-08-15

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for enhancing image details of a target area, wherein the method comprises the following steps: acquiring a video to be processed, and judging whether the video to be processed accords with an image detail enhancement condition; if the image detail enhancement condition is met, carrying out template transformation processing of a target area on a frame image of the video to be processed to obtain an input matrix of the target area; inputting an input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multilayer features, hidden vectors and random signals; and carrying out reduction processing corresponding to template transformation on the enhancement result of the target area, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result. The scheme can improve the image processing quality and the accuracy, so that the viewing experience of a user is improved, meanwhile, the image detail is enhanced, meanwhile, the calculated data calculation amount is accurately controlled, and the processing efficiency is ensured.

Description

Image detail enhancement method, device, equipment and storage medium of target area

Technical Field

The embodiment of the application relates to the technical field of video data processing, in particular to a method, a device, equipment and a storage medium for enhancing image details of a target area.

Background

With the growing maturity of live video technology and the popularization of related applications, the presentation quality of image quality content also becomes a core competitiveness of each live application that needs attention. When watching a video program, a viewer has more obvious preference and stay on a specific concerned area, such as a face area, an area where an object is located, a scenic area and the like, so that the enhancement of the detail of the specific area is also an important part for improving the subjective quality of the video.

The convolutional deep neural network is utilized at present, the high-resolution high-definition image without detail loss can be obtained by downsampling the image for a plurality of times and then upsampling for the same plurality of times, and the image features learned by different neural network layers are fully utilized.

However, the convolutional neural network does not specifically process the specific region of the image, and although a poor superdivision effect can be obtained for the specific region of the image, the detail features of the specific region cannot be well restored. Therefore, how to improve the characteristics of a specific area in an image and meet the requirement of the user on the presentation of high-quality content in each video application scene, so that the use experience of the user is improved and the retention time of the user is a problem to be solved in the field.

Disclosure of Invention

The embodiment of the application provides a target area image detail enhancement method, device, equipment and storage medium, which solve the problem that the video quality is reduced because specific area detail characteristics in video cannot be well restored in the prior art. By adopting the scheme, the image processing quality and the image processing precision can be improved, so that the viewing experience of a user is improved, meanwhile, the image detail is enhanced, the calculated data calculation amount is accurately controlled, and the processing efficiency is ensured.

In a first aspect, an embodiment of the present application provides a method for enhancing image details of a target area, where the method includes:

acquiring a video to be processed, and judging whether the video to be processed accords with an image detail enhancement condition;

if the image detail enhancement condition is met, carrying out template transformation processing of a target area on the frame image of the video to be processed to obtain an input matrix of the target area;

inputting the input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multilayer features, hidden vectors and random signals;

and carrying out inverse transformation processing on the enhancement result of the target region by adopting the template transformation processing, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result.

In a second aspect, an embodiment of the present application further provides an image detail enhancement apparatus for a target area, including:

the acquisition module is used for acquiring the video to be processed and judging whether the video to be processed accords with the image detail enhancement condition or not;

the frame image processing module is used for carrying out template transformation processing on a target area on the frame image of the video to be processed if the frame image processing module accords with the image detail enhancement condition to obtain an input matrix of the target area;

the target area enhancement result generation module is used for inputting the input matrix into an enhancement network model and generating an enhancement result of the target area based on the extracted multilayer features, hidden vectors and random signals;

and the image detail enhancement result generation module is used for carrying out inverse transformation processing on the enhancement result of the target area by adopting the template transformation processing, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result.

In a third aspect, an embodiment of the present application further provides an image detail enhancement apparatus for a target area, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for enhancing image details of a target area according to the embodiment of the present application.

In a fourth aspect, embodiments of the present application further provide a storage medium storing computer-executable instructions that, when executed by a computer processor, are configured to perform the image detail enhancement method of a target area according to embodiments of the present application.

In a fifth aspect, the embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, where the computer program is stored in a computer readable storage medium, and where at least one processor of the device reads and executes the computer program from the computer readable storage medium, so that the device performs the image detail enhancing method of the target area according to the embodiment of the present application.

In the embodiment of the application, a video to be processed is obtained, and whether the video to be processed accords with an image detail enhancement condition or not is judged; if the image detail enhancement condition is met, carrying out template transformation processing of a target area on the frame image of the video to be processed to obtain an input matrix of the target area; inputting the input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multilayer features, hidden vectors and random signals; and carrying out inverse transformation processing on the enhancement result of the target region by adopting the template transformation processing, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result. By the image detail enhancement method of the target area, the image processing quality and the image processing precision can be improved, so that the viewing experience of a user is improved, meanwhile, the image detail is enhanced, the calculated data calculation amount is accurately controlled, and the processing efficiency is ensured.

Drawings

Fig. 1 is a flowchart of a method for enhancing image details of a target area according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a network structure with enhanced image details of a target area according to a first embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a step of a restoration process for enhancing image details of a target area according to a first embodiment of the present application;

fig. 4 is a flowchart of a method for enhancing image details of a target area according to a second embodiment of the present application;

fig. 5 is a schematic structural diagram of an image detail enhancement device for a target area according to a third embodiment of the present application;

fig. 6 is a schematic structural diagram of an image detail enhancement device for a target area according to a fourth embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present application are shown in the drawings.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

Example 1

Fig. 1 is a flowchart of a method for enhancing image details of a target area according to an embodiment of the present application. As shown in fig. 1, the method specifically comprises the following steps:

s101, acquiring a video to be processed, and judging whether the video to be processed meets the image detail enhancement condition.

Firstly, the usage scene of the scheme can be a scene of processing a video to be processed in an intelligent terminal, carrying out detail enhancement on a target area of a frame image of the video to be processed, and outputting an image subjected to detail enhancement.

Based on the above usage scenario, it can be understood that the execution subject of the present application may be the intelligent terminal, which is not limited herein too.

In this scheme, the video to be processed may refer to video files that need to be subjected to image detail enhancement processing, and specifically includes which video files need to be determined according to specific application scenarios and requirements. For example, in the field of video editing, image detail enhancement processing is required for recorded original video to improve picture quality and visual effect.

The video to be processed may be acquired in several ways:

1. if the video to be processed is already stored in the local computer, the video to be processed can be obtained directly by reading the file.

2. If video captured by the camera needs to be processed in real time, the video can be captured using the camera on a computer or mobile device.

3. If an online video stream needs to be processed, the video stream may be retrieved via a network request and converted into a format that can be processed.

4. Some video processing APIs (Application Programming Interface, application programming interfaces) provide a method of capturing video, which can be used to capture pending video by calling APIs.

When the video to be processed contains a part needing detail enhancement, the video to be processed can be regarded as meeting the condition of image detail enhancement. In this scheme, when the video to be processed uses the person as the main body, the face details of the person can be enhanced, and correspondingly, the video frame of the video to be processed can be set to contain the face part and the area occupied by the face part accounts for less than 90% of the whole image area, so as to satisfy the image detail enhancing condition.

Taking a person as a main body of a video to be processed, and taking the case that the face details of the face are required to be enhanced, to judge whether the video to be processed accords with the image detail enhancing condition, firstly, a single-frame YUV/RGB image with the size of H, W and the pixel value range of [0, 255] can be read in by using an algorithm module, and sent into an existing face detector to judge whether the video is the video with the face, and the non-face video is skipped and not processed. The image frame containing the human face can obtain a detection rectangular frame containing the human face through the human face detector, the length and width of the human face contained in the detection rectangular frame can be further calculated according to the coordinates of the human face detection rectangular frame, the human face length and width are respectively marked as face_W and face_H, the standard size human face size is 512 x 512, when the ratio of the human face area to the standard size human face size is less than 90%, the image detail enhancement condition is met, and the ratio calculation formula is as follows:

ratio＝(face_W×face_H)÷(512*512)；

S102, if the image detail enhancement condition is met, performing template transformation processing of a target area on the frame image of the video to be processed to obtain an input matrix of the target area.

The video to be processed consists of a series of successive frame images, each frame being a still image. In a computer, a video is typically composed of a series of image frames arranged in a time sequence, with the faster the playback speed, the shorter the time interval between adjacent frames.

The target area may be a specific area in the video frame image to be processed, which is selected depending on the specific application scenario and requirements. In this scheme, the target area may be a rectangular area where the face is located.

In the template transformation processing of the target area, the input matrix of the target area may be a matrix obtained by converting the target area into a matrix form, and may include pixel values, pixel coordinates, color information, and the like in the target area.

The input matrix of the target area can be obtained by:

1. and determining a target area to be processed in the video frame image to be processed according to specific application requirements.

2. The feature extraction is performed on the target region, and for example, the feature of the target region may be represented in a vector or matrix form using a method such as a color histogram, texture feature, and shape description.

3. And matching the extracted target region features with the whole frame image to find an image region most similar to the target region. The matching method may use a deep learning method, for example, a convolutional neural network, or the like.

4. Based on the matching result, a transformation matrix is calculated, and features of the target region are mapped into the whole image. The calculation method of the transformation matrix may be affine transformation, projective transformation, or the like.

5. Applying a transformation matrix: and (3) applying the calculated transformation matrix to perform transformation operation on the whole frame image, thereby realizing the enhancement and deformation or other processing of the image.

6. Extracting an input matrix of a target area: and extracting pixel values, pixel coordinates, color information and the like of the target area according to the transformed target area, and converting the pixel values, the pixel coordinates, the color information and the like into a matrix form to obtain an input matrix of the target area.

And S103, inputting the input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multilayer features, hidden vectors and random signals.

The enhanced network model may refer to a deep learning-based image enhancement method that can enhance each frame of an input video to improve the quality and visual effect of the video, typically using a convolutional neural network or similar neural network architecture to learn the mapping from the input image to the output image. In the training process, the enhancement network model learns by using the existing correspondence between the high-quality image and the low-quality image, so as to learn how to extract more information from the low-quality image to generate a higher-quality image. During testing, the model takes the input low-quality image as input, and generates a higher-quality output image through the learned mapping relation.

The multi-layer features may be features extracted from multiple convolutional or other layers in the neural network. These features may represent different levels of information in the input data, from low-level image texture to high-level semantic information, etc.

Hidden vectors may refer to potential spatial vectors generated in certain layers in a neural network, generally having lower dimensions, and may represent abstract features in the input data.

The random signal may be some noise or random variable added in the enhanced network model, typically used to increase the generalization ability and robustness of the model.

The enhancement result of the target area may be a result generated after the image enhancement operation performed on the target area specified in the input matrix by the enhanced network model. Specifically, in the enhanced network model, the input matrix is first divided into a plurality of different regions, and then each region is separately subjected to an enhancement operation. In the enhancement operation, the model generates a corresponding target region enhancement result, which may be an enhanced image, feature map or other form of output, based on the input multi-layer features, hidden vectors and random signals.

In this scheme, taking face recognition as an example, if the target area is a face area, performing template transformation processing on the face area to obtain an original face with a size scaled to 512×512 and a pixel value range normalized to [ -1,1] in the interval. The method comprises the steps of firstly carrying out feature extraction and downsampling on an input network through a plurality of convolution layers, carrying out hierarchical extraction on core features of a face with lower original quality, stretching the downsampled features, inputting the downsampled features into a small full-connection layer network to obtain a hidden vector representing the input original face, combining the downsampled feature map obtained layer by layer in the downsampling process with the hidden vector, and taking part of random signals as input, inputting the partial random signals into a Generator of an enhanced network model, gradually passing through a plurality of Generator blocks (GAN blocks) with an upsampling function, and recovering the enhanced face feature map with different resolutions layer by layer, and iteratively recovering until an enhanced result of a face region consistent with the 512 x 512 of the input face is obtained.

On the basis of the above technical solutions, optionally, inputting the input matrix into an enhanced network model, generating an enhanced result of the target area based on the extracted multi-layer features, hidden vectors and random signals, including:

inputting the input matrix into a convolution layer of an enhanced network model to obtain a multi-layer feature;

inputting the characteristics output by the last convolution layer into a linear connection layer to obtain hidden vectors of the frame image;

inputting the multilayer features, the hidden vectors and the pre-obtained random signals into a generator of the enhanced network model to obtain an enhanced result of the target area; the random signal is random noise which is obtained based on a signal generation network and accords with Gaussian distribution.

In this scheme, the convolution layer may be used to extract multiple layers of features of the input image for subsequent classification, detection, segmentation, and other tasks. The convolution layer carries out convolution operation on the input image through sliding convolution check to generate a series of convolution feature images, and each convolution feature image corresponds to one convolution kernel. With the increase of depth, the convolutional neural network can extract more and more abstract features, so that the performance of the model is improved.

The input matrix of the target region is input to a convolution layer of the enhanced network model, and the feature extraction can be performed on the input matrix by using convolution operation. The function of the convolution layer is to extract features from the input matrix by performing a convolution operation using a series of learnable convolution kernels to the input matrix. The convolution operation can carry out convolution operation on each pixel point of the input matrix and surrounding pixel points to obtain a series of convolution characteristic diagrams. By using multiple convolution layers, features of different layers can be extracted to form a multi-layer feature representation for use in subsequent hidden vector generation and enhancement processing.

The linear connection layer may be a layer type commonly used in deep learning. The function of the method is to map the input feature matrix into a new feature space by means of matrix multiplication and bias term addition. In convolutional neural networks, the feature matrix output by the last convolutional layer is usually taken as an input, flattened, and then connected to a full-connection layer.

In deep learning, the linear connection layer is also called a full connection layer, and the function of the linear connection layer is to perform the reduction and reconstruction of the feature map extracted by the convolution layer, so as to obtain a more abstract and high-dimensional feature representation. For the feature output by the last convolution layer, the feature map is usually subjected to dimension reduction by using global pooling operation, and then the dimension-reduced feature is reconstructed through a full connection layer to obtain the final hidden vector representation.

The generator of the enhanced network model may refer to a neural network model capable of generating enhanced results from hidden vectors, random signals, and multi-layer features. The method is mainly used for synthesizing the input hidden vectors, the random signals and the multilayer characteristics to generate an enhanced result.

The signal generation network may be a deep learning model whose purpose is to generate some random signals that conform to a particular distribution. In the generator model, random noise may be obtained by sampling from a known distribution. Wherein a gaussian distribution, also known as a normal distribution, is a continuous probability distribution. Random noise conforming to a gaussian distribution can be obtained by sampling from a gaussian distribution with a mean of 0 and a variance of 1.

The multi-layer features, hidden vectors and random signals are connected together and fed as inputs into a generator network. The generator network maps the inputs into an enhancement result of a target area, which may be up-sampling step by using a deconvolution layer or a transposed convolution layer, and simultaneously performing convolution operation and activation function processing to gradually restore resolution and detail information of an original image, so as to finally obtain an enhanced target area image.

In this scheme, by using a convolution layer, information of different layers of the input matrix can be captured. These features can reflect the spatial structure and texture information of the input matrix for subsequent processing. The generator of the enhanced network model generates an enhanced result of the target area by utilizing the multi-layer characteristics, the hidden vectors and the random signals obtained in advance, so that the enhanced result is more real and diversified, and the generalization capability of the model is improved.

Based on the above technical solutions, optionally, the enhanced network model further includes:

and the discriminator is used for forming an antagonistic neural network with the generator in the training process so as to discriminate the authenticity of the enhanced result of the target area generated by the generator.

In this scheme, taking face recognition as an example, fig. 2 is a schematic diagram of a framework of a network structure with enhanced image details of a target area according to the first embodiment of the present application, as shown in fig. 2, the overall network structure is composed of three parts including a feature extraction part (including a convolution layer), a generator, and a discriminator: the feature extraction part receives a low-quality face as input, generates image related features layer by layer, then acquires a hidden vector through a linear connection layer, and restores Gao Qingzeng the strong face layer by combining the middle layer features and some random const signal input generators, wherein the generators possibly diverge and do not guarantee the authenticity of the face, a discriminator is needed to be used as constraint, and in the process of continuously generating the face by the generators, the discriminator is used for judging whether the authenticity and the approximation degree of the produced face and the actual face are matched.

The discriminator may be a neural network model in the generation of the challenge network for performing the authenticity discrimination of the image generated by the generator. The discriminator usually adopts a convolutional neural network structure, and the input of the discriminator is an image (either a real image or an image generated by a generator), and a probability value is output to indicate whether the input image is real or fake. The training goal of the discriminant is to enable it to accurately distinguish between true images and generated images. In the countermeasure network, the generator and the discriminator are optimized in a manner of countermeasure training, that is, the image that the generator wishes to generate can fool the discriminator so that it cannot distinguish between a true image and a generated image, and the discriminator wishes to be able to accurately judge whether the input image is true or generated. Through countermeasure training, the generator and the discriminator can gradually improve the capability of the generator and the discriminator, and finally more realistic images are generated.

The antagonistic neural network may be a deep learning model consisting of two neural networks: a generator and a arbiter. The generator is responsible for generating samples similar to the training data, and the discriminator is responsible for distinguishing the samples generated by the generator from the training data. In the training process, the generator and the discriminator are mutually opposed, and parameters are continuously updated, so that samples generated by the generator are more and more similar to real data, and the probability of error judgment of the samples generated by the generator by the discriminator is also less and more.

In the training process, firstly, a real sample and a sample generated by a generator are respectively input into a discriminator to discriminate the true and the false, and the loss value of the discriminator to the samples is calculated. Then, a loss value of the generator is calculated from the samples generated by the generator and the discrimination results of the discriminators on the samples. The smaller the loss value of the generator, the more the sample generated by the generator can deceive the arbiter, i.e. the closer to the real sample, so the generator can continuously adjust its own generation strategy to improve the quality of the generated sample.

In the scheme, the training mode of the antagonistic neural network is formed by introducing the discriminator and the generator, so that the enhancement effect of the target area generated by the generator can be effectively improved.

S104, the enhancement result of the target area is subjected to inverse transformation processing of the template transformation processing, and fusion is carried out on the enhancement result and the original frame image based on the inverse transformation processing result, so that an image detail enhancement result is obtained.

Fig. 3 is a schematic diagram showing steps of a restoration process for enhancing image details of a target area according to an embodiment of the present application, and as shown in fig. 3, the restoration process of the template transformation process may be an inverse template transformation process, and the image after the template transformation process is restored to an original image by solving an inverse matrix of a given template transformation matrix. In performing the template transformation process, geometric transformations, such as translation, rotation, scaling, and warping, are typically performed on the original image, and these operations may be represented in a matrix-transformed form.

The image detail enhancement result may refer to a processed image that emphasizes details in the image to make it clearer and easier to identify, including enhancement processing of the target area and final results obtained by fusing the enhancement result with the original frame image.

The image detail enhancement result can be obtained by adopting the following steps:

1. the enhancement results of the original image and the target region are converted to the same color space and pixel depth, respectively, to ensure that they have the same image properties.

2. The enhancement results of the original image and the target region are fused by adopting a proper fusion method, and the fusion method can comprise weighted average, maximum value, minimum value, pixel level fusion and the like.

3. And carrying out some post-processing, such as denoising, color correction and the like on the fused image so as to make the fused image clearer and more natural.

4. And fusing the enhancement result with the original image to obtain image detail enhancement results of a plurality of target areas.

Based on the above technical solutions, optionally, fusing the enhancement result based on the target area with the original frame image to obtain an image detail enhancement result, including:

Weighting the enhancement result of the target area and the target area in the original frame image to obtain a weighted processing result;

determining a segmentation object in the weighted processing result by adopting a segmentation algorithm; or determining a segmentation object in the enhancement result by adopting a segmentation algorithm;

and splicing the segmentation object in the weighted processing result or the segmentation object in the enhancement result with the non-segmentation object region in the original frame image to obtain an image detail enhancement result.

In this scheme, the weighting result may be a result obtained by performing weighting processing on the target area in the original frame image and the target area after the enhancement processing. Specifically, the weighted processing result means that the two regions are weighted and averaged to generate a new image combining the information of the two regions. The weight of the weighted average can be adjusted according to the application requirement, so that the weight of the enhancement effect and the weight of the original information are reasonably balanced, and better visual effect or other application requirements are achieved.

The weighted result can be obtained by:

1. and (3) aligning the original frame image with the target area subjected to enhancement processing in size, so as to ensure that the original frame image and the target area have the same size and resolution.

2. The weight of the weighted average is determined according to the needs and purpose of the application. For example, the weights of the original image and the enhanced image may be set to 0.5, i.e., an average value. If the enhanced image information is emphasized more, the weight may be set to 0.7; the original image information is more emphasized and the weight may be set to 0.3. And carrying out weighted average on the original image and the enhanced target area to obtain a weighted processing result.

3. For other untreated areas, the untreated areas can be reserved or subjected to corresponding treatment according to the requirement, so that a final treatment result is obtained.

The segmentation algorithm may be a computer vision algorithm that segments the digital image into a plurality of sub-regions. In the scheme, a computer algorithm is used for dividing the weighted processing result into different objects or areas.

The segmented objects in the weighted results may refer to different objects or regions identified by the segmentation algorithm in the weighted results. These objects or regions may be objects, backgrounds, boundaries, etc. in the image, in this case the segmented objects may be face regions.

The segmented object may be determined by:

1. preprocessing, such as denoising and smoothing, is performed on the weighted processing result to improve accuracy and stability of the segmentation algorithm.

2. The weighted processing result is segmented by selecting a proper segmentation algorithm, for example, a deep learning algorithm can be used for segmentation, and the segmentation includes semantic segmentation, instance segmentation and the like.

3. According to different algorithms, some parameters are adjusted or some thresholds are set to optimize the segmentation result. For example, threshold segmentation requires selection of an appropriate threshold, whereas region growth requires setting of growth conditions, and so on.

4. The result of the segmentation algorithm may include multiple segmented objects that require further processing, such as merging and filtering.

The segmentation object in the enhancement result may be an object, a background, an edge, etc. in the image, depending on the design and application requirements of the segmentation algorithm.

The segmented object in the enhanced result may be determined as follows:

1. the enhancement results are preprocessed so that the segmentation algorithm can handle better. For example, filtering, binarizing, noise removing, and the like may be performed on the image.

2. And selecting a proper segmentation algorithm for processing according to the type of the processed image, the target characteristics and other factors. Common segmentation algorithms include threshold-based segmentation algorithms, region growing algorithms, edge detection algorithms, graph theory algorithms, and the like.

3. And applying the selected segmentation algorithm to the enhanced result image to segment the target object. After segmentation, a binarized segmented image is obtained, wherein the segmented object is a white region and the background is a black region.

4. And carrying out post-processing on the segmentation result to obtain a more accurate segmentation effect. For example, the split image may be morphologically processed to remove small non-connected regions, and the hollow may be filled. Extracting required segmented objects according to requirements

5. According to specific requirements, a required segmentation object is extracted from the segmentation result image. For example, the contour of the segmented object may be extracted, or the segmented object may be displayed in combination with the original image.

In the scheme, taking face recognition as an example, a face region enhancement result is sent to a simple face divider operator to obtain a binary segmentation map which is equal to the face region enhancement result, the face region relevant to enhancement is segmented out through a model to be used as a 1 value, and the rest irrelevant regions are assigned 0 value. The binary segmentation map at this time outlines the part of the face that is expected to be enhanced, however, the face position is still at the standard position rotated after alignment at this time. In the front processing, the rotation transformation of the face is carried out, the enhanced face image and the face segmentation outline are reversely restored to the corresponding position in the original image frame H.W before alignment, the image frame H.W with the same size as the original image frame and the mask of the face segmentation part are obtained, the enhanced face image frame and the mask of the face segmentation part are contained, in order to easily control the enhancement degree of the face, the local distortion caused by the excessive intensity is not caused, the local distortion caused by the excessive weakness is not caused, no visual effect is caused, firstly, a super parameter alpha is set, the value range is [0,1], the finally enhanced face part is a weighted sum value of the enhanced face and the original face, and the code for determining the segmented face area can be as follows:

final_enhance_face＝enhance_face*alpha+origin_face*(1-alpha)；

tmp_img=warp_face (5 points_inv)// where tmp_img is h×w;

tmp_mask=warp_buffer (face_mask, 5 points_inv)// where tmp_mask has a size h×w;

the image detail enhancement result can be obtained according to the following steps:

1. the original frame image is preprocessed to facilitate subsequent processing. For example, the image may be scaled, cropped, and noise removed.

2. And extracting the segmented objects in the enhancement result, and further processing the segmented objects according to the requirement. For example, morphological operations, contour extraction, and the like may be performed on the segmented object to obtain a more accurate segmentation result.

3. And dividing the non-division object area in the original frame image by using the same division algorithm as the enhancement result. After segmentation, a binarized segmented image is obtained, wherein the segmented object is a white region and the background is a black region.

4. And merging the segmented object in the enhancement result and the non-segmented object region in the original frame image to obtain an image detail enhancement result. The segmented object may be fused with the non-segmented object region using image masking techniques.

5. And carrying out post-processing on the merging result, such as denoising, sharpening, color adjustment and the like on the image, so as to obtain a better visual effect.

In the scheme, taking face recognition as an example, before the coordinates of the enhanced face are restored to rotational alignment, a segmentation mask of the enhanced face region is also obtained through a segmenter; it is then necessary to further fuse the enhanced portion of the face to the corresponding face position in the original video frame: at this time, a binary mask obtained during segmentation is needed, a face region of a part 1 assigned in the mask is filled with a value for enhancing the face, and the rest part is filled with pixel values corresponding to an original video frame, so that an image detail enhancement result can be obtained, and the following partial codes are realized:

Final_img＝origin_img*(1-tmp_mask)+tmp_img*tmp_mask；

in the scheme, the image is processed by adopting the method, so that the target area can be enhanced, and the detail definition of the image is improved. The segmentation algorithm is adopted to determine the segmentation objects in the enhancement results or the segmentation objects in the weighted processing results, so that the target region can be extracted more accurately, and the segmentation accuracy and effect are improved.

Based on the above technical solutions, optionally, the weighted processing result or the enhancement result includes three channel values of a color space;

correspondingly, determining a segmentation object in the weighted processing result by adopting a segmentation algorithm; or determining the segmented object in the enhancement result by adopting a segmentation algorithm, which comprises the following steps:

Determining a segmentation object in the weighted processing result based on at least one channel value in the weighted processing result by adopting a segmentation algorithm; or determining the segmented object in the enhancement result based on at least one channel value in the enhancement result by adopting a segmentation algorithm.

In this solution, the color space may refer to a way to represent colors as spatial coordinates of different dimensions for color processing and representation, and RGB (Red, green, blue) is one of the most commonly used color spaces. The RGB color space is composed of three color channels, red (R), green (G), and blue (B), which represent colors into three values, each representing the intensity of brightness of a corresponding channel. The range of values for the three channel values is typically 0-255, where 0 indicates the darkest color in the channel and 255 indicates the brightest color in the channel. The three channel values of the color space in the weighted processing result or the enhanced result may refer to three channel values of each pixel of the target area in the RGB color space.

The segmented object may be determined using the following steps:

1. first, an appropriate threshold value needs to be determined in order to divide the pixels in the weighted processing result or enhancement result into two parts, namely a target object and a background object. The threshold may be determined based on factors such as application requirements and image characteristics.

2. And dividing pixels in the weighting processing result or the enhancement result by using the determined threshold value to obtain two parts of a target object and a background object. The processing may be performed using a segmentation algorithm such as a binarization algorithm or a watershed algorithm.

3. Post-processing is typically required after segmentation to remove small noise or to connect adjacent regions. Post-processing can be performed using techniques such as morphological operations and connectivity analysis to obtain more accurate segmentation results.

In the scheme, the image can be effectively segmented into different parts by using a segmentation algorithm so as to carry out subsequent processing and analysis. The three channel values of the color space are used for segmentation, so that the edge and color characteristics of a target area or a segmented object can be more accurately identified, and further, enhancement and splicing processing are performed more finely, and a higher-quality image detail enhancement result is obtained.

Based on the above technical solutions, optionally, before the split object in the weighted processing result or the split object in the enhanced result is spliced with the non-split object area in the original frame image, the method further includes:

obtaining a transformation rule of template transformation processing;

And determining a non-segmented object region in the original frame image based on the inverse operation of the transformation rule and the segmented object in the weighted processing result or the segmented object in the enhancement result.

In this scheme, the transformation rule of the template transformation process may be an affine transformation rule, which is a two-dimensional graphics transformation manner, and may be described by a set of basic transformations. The method maps the original image coordinates to the target image coordinates through linear transformation and translational transformation, so that the transformed image maintains the relative position relationship in the original image.

If the transformation rule of the template transformation process is an affine transformation rule, the transformation rule can be obtained by:

1. for each corresponding matching point in the source image and the target image, at least three pairs of matching points need to be selected to determine six unknown parameters required for affine transformation.

2. Based on the selected matching point pairs, an affine transformation matrix is calculated that maps points in the source image to points in the target image. The affine transformation matrix may be a 2 x 3 matrix containing six unknown parameters that can be solved by known pairs of matching points.

3. And applying the calculated affine transformation matrix to all points in the source image, thereby obtaining a transformed image.

If the transformation rule of the template transformation process is an affine transformation rule, in the inverse operation based on the affine transformation rule, it is first necessary to inversely transform the segmentation object in the weighting process result or the segmentation object in the enhancement result back to the coordinate system of the original image. Then, from the coordinates of these inverse transformed segmented objects, the positions of the areas covered by these segmented objects in the original image can be calculated. By inverting these coverage areas in the original frame image, the non-segmented object area can be determined.

In the scheme, by using the method, local enhancement processing can be performed on the target area, and meanwhile, non-segmentation object information in the original image is reserved, so that a more accurate enhancement result is obtained. Meanwhile, the segmentation algorithm is adopted to determine the segmentation object, so that the enhancement processing of the whole image can be avoided, the processing amount is reduced, and the processing efficiency is improved.

In the embodiment, a video to be processed is obtained, and whether the video to be processed accords with an image detail enhancement condition or not is judged; if the image detail enhancement condition is met, carrying out template transformation processing of a target area on the frame image of the video to be processed to obtain an input matrix of the target area; inputting the input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multilayer features, hidden vectors and random signals; and carrying out inverse transformation processing on the enhancement result of the target region by adopting the template transformation processing, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result. By the image detail enhancement method of the target area, the image processing quality and the image processing precision can be improved, so that the viewing experience of a user is improved, meanwhile, the image detail is enhanced, the calculated data calculation amount is accurately controlled, and the processing efficiency is ensured.

Example two

Fig. 4 is a flowchart of a method for enhancing image details of a target area according to a second embodiment of the present application. As shown in fig. 4, the method specifically comprises the following steps:

s201, acquiring a video to be processed, and judging whether the video to be processed meets the image detail enhancement condition.

S202, acquiring a target area with a preset size.

The target region of a preset size in the frame image may refer to a region of a fixed size and shape preset in the image, typically a rectangle or square. These target areas may be important targets in the image, faces, objects, etc. The size and shape of the target area may be set and adjusted according to specific application requirements, for example, in face recognition, the target area may include the position and size of the face.

The target region of a preset size in the frame image can be obtained by the following steps:

1. first, a dataset containing target region labels needs to be prepared, and for each image, the target region needs to be labeled with a rectangular box or other geometric shape.

2. The target detection model is trained using the annotation dataset.

3. After model training is completed, the model may be used to detect objects in the image.

4. For each image, the model returns a rectangular box of one or more target regions and corresponding scores.

5. Since the model may detect multiple target regions, the most desirable target region needs to be selected based on the score and other conditions. The target may be selected by comparing information such as the size, location, and score of the target frame.

6. The target area is obtained by cropping the original image.

S203, acquiring key point information in the target area.

The key point information may include edge information, texture information, illumination information, and color information of the object. In this scheme, taking face recognition as an example, eyebrows, eyes and mouth can be taken as key points.

The key point information in the target area can be acquired by:

1. it is first necessary to prepare a data set containing the target region and to pre-process the data, such as resizing the image and enhancing the contrast.

2. A keypoint detection model is trained using the data set.

3. After model training is completed, the model may be used to detect keypoints in the target region. For each target region, the model returns the coordinates of a set of keypoints.

4. Since the model may detect multiple sets of keypoints, the keypoints may be selected by comparing the location of the keypoints with information such as scores.

5. The keypoint information is visualized for better understanding and analysis.

On the basis of the technical schemes, optionally, the key point information comprises key point coordinates;

based on the key point information and a preset template transformation rule, performing transformation processing on the target area to obtain an input matrix of the target area, wherein the transformation processing comprises the following steps:

based on the key point coordinates and a preset template transformation rule, at least one of translation, rotation and scaling is carried out on the target area, and whether a missing area exists in a processing result is identified;

and filling the missing region according to a preset rule to obtain an input matrix of the target region.

In this solution, the key point coordinates may be position coordinates marked as key points in the image, typically two-dimensional coordinates in units of pixels, for describing the feature point positions of the objects in the image. In this scheme, taking face recognition as an example, the coordinates of the key points may be the coordinates of 5 key points of the eyebrows, eyes and mouth.

Taking face recognition as an example, the preset template transformation rule may be to perform reference and mapping changes on coordinates of a face contained in a current picture and a face with standard coordinates, and perform appropriate rotation transformation and scaling at the same time to obtain a face origin_face with standard size 512×512 after alignment and an affine post coordinate 5points_inv.

The missing region may refer to that the information of some pixels in the target region is lost or covered, so that some blank regions or discontinuous regions exist in the image, and complete information of the target region cannot be restored. In this scheme, taking face recognition as an example, the missing region may be a blank region after translation, rotation and scaling of the face.

Taking face recognition as an example, whether there is a missing region can be identified by:

1. and carrying out alignment and mapping transformation on the key point coordinates of the target area and preset standard face key point coordinates to obtain an affine transformation matrix of the target area.

2. And carrying out translation, rotation and scaling on the target area according to the affine transformation matrix to obtain an input matrix of the processed target area.

3. In the processed target area input matrix, it is checked whether there is a missing area.

The preset rules may be rules that are followed when filling the missing region, including zero-valued fill, mean fill, nearest neighbor interpolation, bilinear interpolation, and content-based fill. In this scheme, taking face recognition as an example, the values of 0 can be set as preset rules for all the parts filled with the blank adjustment in the rotation process.

The input matrix of the target area may be a matrix corresponding to the preprocessed target area image. In image detail enhancement, the input matrix of the target area is typically a two-dimensional matrix, where each element represents a luminance or color value of the corresponding pixel. The size of this matrix is typically the same as the preset target area size. Taking face recognition as an example, the input matrix of the target area may be a two-dimensional matrix, where each element represents a gray value or a color value of a corresponding pixel, and the size of this matrix is usually fixed and the same as the preset face size.

For the missing region in the target region, interpolation method may be adopted for filling. Interpolation is a method of inferring an unknown data point based on known data points, by which the value at the unknown location can be estimated from the location and value of the known data point.

In the scheme, through the method, the input matrix can be more normalized and standardized, and subsequent algorithms and models can be more easily processed and analyzed. If the method is applied to face recognition, the accuracy and the robustness of face recognition can be improved by aligning and normalizing face images.

S204, carrying out transformation processing on the target area based on the key point information and a preset template transformation rule to obtain an input matrix of the target area.

In order to implement the transformation process for the target region, some template transformation rules or transformation matrices need to be defined to describe the transformation of the target region. These rules or transformation matrices may be defined according to a specific application scenario and may include translational transformation, scaling transformation, rotational transformation, flip transformation, affine transformation, perspective transformation, and the like.

In the scheme, taking face recognition as an example, after a video frame is determined to contain a face, detecting corresponding face key point coordinates, simultaneously rotating and scaling the face of the video frame by affine transformation to obtain a face with a standard position and a fixed size (512 x 512), and obtaining an input matrix required by network reasoning after numerical normalization.

S205, inputting the input matrix into an enhanced network model, and generating an enhanced result of the target area based on the extracted multi-layer features, hidden vectors and random signals.

S206, the enhancement result of the target area is subjected to inverse transformation processing of the template transformation processing, and fusion is carried out on the enhancement result and the original frame image based on the inverse transformation processing result, so that an image detail enhancement result is obtained.

In the embodiment, the accuracy and the robustness of the image detail enhancement can be improved by preprocessing the target area, and the image detail enhancement can be adjusted and modified according to the application scene and the requirements, so that the personalized image enhancement is realized. Meanwhile, as key point information of the target area is extracted and template transformation rules are set, the processed image can be guaranteed to have good visualization effect and practicality.

Example III

Fig. 5 is a schematic structural diagram of an image detail enhancement device for a target area according to a third embodiment of the present application. As shown in fig. 5, the method specifically includes the following steps:

the acquiring module 301 is configured to acquire a video to be processed, and determine whether the video to be processed meets an image detail enhancement condition;

the frame image processing module 302 is configured to perform template transformation processing on a target area on a frame image of the video to be processed if the frame image meets an image detail enhancement condition, so as to obtain an input matrix of the target area;

A target region enhancement result generation module 303, configured to input the input matrix into an enhancement network model, and generate an enhancement result of the target region based on the extracted multi-layer features, hidden vectors, and random signals;

and the image detail enhancement result generating module 304 is configured to apply the enhancement result of the target area to the inverse transformation of the template transformation, and fuse the enhancement result with the original frame image based on the inverse transformation result, so as to obtain an image detail enhancement result.

In this embodiment, an obtaining module is configured to obtain a video to be processed, and determine whether the video to be processed meets an image detail enhancement condition; the frame image processing module is used for carrying out template transformation processing on a target area on the frame image of the video to be processed if the frame image processing module accords with the image detail enhancement condition to obtain an input matrix of the target area; the target area enhancement result generation module is used for inputting the input matrix into an enhancement network model and generating an enhancement result of the target area based on the extracted multilayer features, hidden vectors and random signals; and the image detail enhancement result generation module is used for carrying out inverse transformation processing on the enhancement result of the target area by adopting the template transformation processing, and fusing the enhancement result with the original frame image based on the inverse transformation processing result to obtain an image detail enhancement result. Through the image detail enhancement device of the target area, the image processing quality and the image processing precision can be improved, so that the viewing experience of a user is improved, meanwhile, the image detail is enhanced, the calculated data calculation amount is accurately controlled, and the processing efficiency is ensured.

The image detail enhancing device for the target area provided by the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 and fig. 4, and in order to avoid repetition, a detailed description is omitted here.

Example IV

Fig. 6 is a schematic structural diagram of an image detail enhancement apparatus for a target area according to an embodiment of the present application, where, as shown in fig. 6, the apparatus includes a processor 401, a memory 402, an input device 403, and an output device 404; the number of processors 401 in the device may be one or more, one processor 401 being exemplified in fig. 6; the processor 401, memory 402, input means 403 and output means 404 in the device may be connected by a bus or other means, in fig. 6 by way of example. The memory 402 is used as a computer readable storage medium for storing a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the image detail enhancement method of a target area in an embodiment of the present application. The processor 401 executes various functional applications of the apparatus and data processing, that is, implements the above-described image detail enhancement method of the target area by running software programs, instructions, and modules stored in the memory 402. The input means 403 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output 404 may include a display device such as a display screen.

The embodiment of the present application also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the image detail enhancement method of the target area described in the above embodiment, where the method includes:

It should be noted that, in the embodiment of the image detail enhancement device of the target area, each unit and module included are only divided according to the functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.

In some possible embodiments, the aspects of the method provided by the present application may also be implemented in the form of a program product, which includes a program code for causing a computer device to perform the steps in the method according to the various exemplary embodiments of the present application described in the present specification when the program product is run on the computer device, for example, the computer device may perform the image detail enhancing method of the target area described in the embodiment of the present application. The program product may be implemented using any combination of one or more readable media.

Claims

1. A method for enhancing image details of a target area, comprising:

2. The method for enhancing image details of a target area according to claim 1, wherein performing template transformation processing of the target area on the frame image of the video to be processed to obtain an input matrix of the target area comprises:

acquiring a target area with a preset size;

acquiring key point information in the target area;

and carrying out transformation processing on the target area based on the key point information and a preset template transformation rule to obtain an input matrix of the target area.

3. The image detail enhancement method of a target area according to claim 2, wherein the keypoint information comprises keypoint coordinates;

4. The image detail enhancement method of a target area according to claim 1, wherein inputting the input matrix into an enhancement network model generates an enhancement result of the target area based on the extracted multi-layer features, hidden vectors, and random signals, comprising:

5. The image detail enhancement method of a target area according to claim 4, wherein said enhancement network model further comprises:

6. The image detail enhancement method of a target area according to claim 1, wherein the image detail enhancement result is obtained by fusing an enhancement result of the target area with an original frame image, comprising:

7. The image detail enhancement method of a target area according to claim 6, wherein the weighted processing result or the enhancement result each includes three channel values of a color space;

8. The image detail enhancement method of a target area according to claim 6, wherein before stitching a segmented object in a weighted processing result or a segmented object in the enhancement result with a non-segmented object area in an original frame image, the method further comprises:

obtaining a transformation rule of template transformation processing;

9. An image detail enhancement device for a target area, comprising:

10. An image detail enhancement device of a target area, the device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of image detail enhancement of a target area of any of claims 1-8.

11. A storage medium storing computer executable instructions which, when executed by a computer processor, are for performing the image detail enhancement method of a target area of any one of claims 1-8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the image detail enhancement method of a target area according to any of claims 1-8.