CN108921782B - Image processing method, device and storage medium - Google Patents

Image processing method, device and storage medium Download PDF

Info

Publication number
CN108921782B
CN108921782B CN201810475606.9A CN201810475606A CN108921782B CN 108921782 B CN108921782 B CN 108921782B CN 201810475606 A CN201810475606 A CN 201810475606A CN 108921782 B CN108921782 B CN 108921782B
Authority
CN
China
Prior art keywords
image
information
position information
error
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810475606.9A
Other languages
Chinese (zh)
Other versions
CN108921782A (en
Inventor
邰颖
丁守鸿
李绍欣
汪铖杰
李季檩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810475606.9A priority Critical patent/CN108921782B/en
Publication of CN108921782A publication Critical patent/CN108921782A/en
Application granted granted Critical
Publication of CN108921782B publication Critical patent/CN108921782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses an image processing method, an image processing device and a storage medium, wherein images to be processed are obtained; performing feature extraction on a target object in the image to be processed to obtain target feature position information; carrying out characteristic region segmentation on the target object to obtain target segmentation region information; and adopting a preset image processing model, and increasing the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information, wherein the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images. According to the scheme, the original resolution of the image to be processed can be accurately adjusted to be high based on the target feature position information and the target segmentation area information of the target object in the image to be processed, so that the definition of the processed image is high, and the quality of the processed image is improved.

Description

Image processing method, device and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, and a storage medium.
Background
With the advance of science and technology, digital images are widely used and gradually become one of the most important information carriers. The higher the resolution of the image, the higher the pixel density in the image, and the more detail information that can be analyzed from the high resolution image. However, since many factors in practical applications may cause the resolution of the acquired image to be not satisfactory, it is necessary to increase the resolution of the low-resolution image.
In the prior art, when a low-resolution image is converted into a high-resolution image, a high-resolution image is generally restored through a pixel value of the low-resolution image by adopting bicubic interpolation, for example, the pixel value of the low-resolution image is simply interpolated to obtain the high-resolution image, a resolution difference between the obtained high-resolution image and the low-resolution image is not large, details of an object in the obtained high-resolution image, such as a component or a contour, are still blurred, and the image quality is low, so that the image display effect is poor.
Disclosure of Invention
The embodiment of the invention provides an image processing method, an image processing device and a storage medium, and aims to improve the quality of a processed image.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
an image processing method, comprising:
acquiring an image to be processed;
performing feature extraction on a target object in the image to be processed to obtain target feature position information;
carrying out characteristic region segmentation on the target object to obtain target segmentation region information;
and adopting a preset image processing model, and increasing the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information, wherein the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images.
An image processing apparatus comprising:
the first acquisition unit is used for acquiring an image to be processed;
the extraction unit is used for extracting the characteristics of the target object in the image to be processed to obtain target characteristic position information;
the segmentation unit is used for carrying out feature region segmentation on the target object to obtain target segmentation region information;
and the heightening unit is used for adopting a preset image processing model, heightening the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information, wherein the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images.
A storage medium storing a plurality of instructions, the instructions being suitable for being loaded by a processor to perform the steps of any one of the image processing methods provided by the embodiments of the present invention.
When the resolution ratio of the image needs to be increased, the embodiment of the invention can acquire the image to be processed, extract the characteristics of the target object in the image to be processed to obtain the position information of the target characteristics, and perform characteristic region segmentation on the target object in the image to be processed to obtain the information of the target segmentation region; and then, a preset image processing model is adopted, and the original resolution of the image to be processed is increased based on the target characteristic position information and the target segmentation area information, namely, the image to be processed with low resolution is converted into the image with high resolution. According to the scheme, the original resolution of the image to be processed can be accurately adjusted based on the target feature position information and the target segmentation area information of the target object in the image to be processed, so that the definition of the processed image can be improved, and the quality of the processed image is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a schematic scene diagram of an image processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an image processing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of feature location information in training sample images provided by an embodiment of the invention;
FIG. 4 is a schematic diagram of information of a segmentation region in a training sample image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a low resolution image converted to a high resolution image according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of training a model to be trained according to an embodiment of the present invention;
FIG. 7 is a schematic flowchart of another image processing method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of another structure of an image processing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of another structure of an image processing apparatus according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a network device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The embodiment of the invention provides an image processing method, an image processing device and a storage medium.
Referring to fig. 1, fig. 1 is a scene schematic diagram of an image processing method according to an embodiment of the present invention, where the image processing apparatus may be specifically integrated in a network device such as a terminal or a server, for example, the network device may acquire a plurality of training sample images, where the training sample images may be images with higher definition, and may be acquired from a memory storing the images, determine first feature information (including first feature position information and first segmentation area information) of a preset object (e.g., a human face or a vehicle) in each training sample image, and reduce an original resolution of each training sample image to a preset value (where the preset value may be flexibly set according to actual needs), so as to obtain a plurality of training sample images with reduced resolutions. Then, calculating second characteristic information (including second characteristic position information and second division area information) of a preset object in each training sample image with the resolution being reduced through a preset model to be trained; and training the model to be trained according to the first characteristic information and the second characteristic information to obtain an image processing model.
Then, when the resolution of the image needs to be increased, an image processing request input by a user may be received, and an image to be processed is obtained based on the image processing request (for example, the image to be processed may be obtained by shooting through a mobile phone, a camera, or the like), and feature extraction is performed on a target object in the image to be processed, so as to obtain target feature position information; carrying out characteristic region segmentation on a target object in the image to be processed to obtain target segmentation region information; and then, an image processing model is adopted, the original resolution of the image to be processed is increased based on the target characteristic position information and the target segmentation area information, namely, the image to be processed with low resolution is converted into an image with high resolution, the converted image can be stored in a memory, and the like.
It should be noted that the scene schematic diagram of the image processing method shown in fig. 1 is only an example, and the scene of the image processing method described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation to the technical solution provided in the embodiment of the present invention.
The following are detailed descriptions.
In the present embodiment, description will be made from the viewpoint of an image processing apparatus which can be specifically integrated in a network device such as a terminal or a server.
An image processing method comprising: acquiring an image to be processed; carrying out feature extraction on a target object in an image to be processed to obtain target feature position information; carrying out characteristic region segmentation on the target object to obtain target segmentation region information; and adopting a preset image processing model, and increasing the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information, wherein the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images.
Referring to fig. 2, fig. 2 is a flowchart illustrating an image processing method according to an embodiment of the invention. The image processing method may include:
and S101, acquiring an image to be processed.
The image to be processed may be an image with low resolution, for example, a low-resolution image with a resolution of 16 × 16, or other resolution images. The image to be processed includes a target object, which may include a human face or a vehicle.
The obtaining mode of the image processing device for obtaining the image to be processed may include: firstly, a large number of images containing a target object can be shot through a mobile phone, a camera or a camera; in the second mode, the to-be-processed image may be obtained through searching on the internet or from a database, and the like, and of course, the to-be-processed image may also be obtained in other obtaining modes, and the specific mode is not limited herein.
In some embodiments, before the step of acquiring the image to be processed, or before the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation area information, the image processing method may further include:
(1) Acquiring a plurality of training sample images, and determining first characteristic position information and first segmentation area information of a preset object in each training sample image;
(2) Reducing the original resolution of each training sample image to a preset value to obtain a training sample image with reduced resolution;
(3) Acquiring second characteristic position information and second segmentation area information of a preset object in each training sample image with the resolution being reduced;
(4) And training a preset model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information and the second segmentation area information to obtain an image processing model.
The training sample image may be a high-definition image, for example, a high-resolution image with a resolution of 128 × 128, a high-resolution image with a resolution of 1024 × 1024, or the like. The plurality of training sample images may include images of different preset objects, or may include images of the same preset object, where the preset object may include a human face or a vehicle, for example, a part of the training sample images may include a human face, another part of the training sample images may include a vehicle, and the preset objects included in each training sample image may be the same or different.
For example, taking a preset object as a face as an example, multiple images of the same face may be taken at different places, at different times or at different angles, or multiple images of different faces may be taken for different people, the same training sample image may include one or more faces, the training sample image may include an entire image of a face, or may include only an image of a local area of a face, etc.; the shooting angle of the training sample image including the face can be the angle of the front or the side and the like.
For another example, taking a preset object as a vehicle as an example, multiple images of the same vehicle may be captured at different locations, at different times or at different angles, or multiple images of different vehicles may be captured for different vehicles, one or more vehicles may be included in the same training sample image, and the training sample image may include an entire image of the vehicle or only an image of a local area of the vehicle; the shooting angle of the vehicle in the training sample image can be the front or the side angle.
It should be noted that the number of the training sample images, the type and number of the preset objects, the shooting angle, the resolution, and the like can be flexibly set according to actual needs, and specific contents are not limited herein.
The acquisition mode of the image processing apparatus for acquiring the training sample image may include: in the first mode, a plurality of training sample images can be acquired by taking a large number of images containing a preset object, a plurality of images of the same preset object and the like through a mobile phone, a camera or a camera. In the second mode, a plurality of training sample images may be obtained by searching on the internet or obtaining from the image database, etc., and of course, the obtaining mode of the plurality of training sample images may also be another obtaining mode, and the specific mode is not limited herein.
After obtaining the plurality of training sample images, the image processing apparatus may determine first feature information of the preset object in each training sample image, where the first feature information may include first feature position information, first segmentation area information, and the like, that is, the image processing apparatus may determine the first feature position information and the first segmentation area information of the preset object in each training sample image. For example, as shown in fig. 3, when the preset object is a human face, the first feature position information may include feature position information of human face organs such as eyes, eyebrows, a nose, a mouth, and a face contour, and the position information of each feature may include position information of a plurality of feature points, and the position information may be a two-dimensional coordinate position or a pixel coordinate position.
The first characteristic position information can be position information of each face organ such as eyes, a nose, eyebrows and a mouth on a face in an image through a face recognition technology, position information of characteristic points of each face organ is generated, the characteristic points can be position coordinate information of key points corresponding to each face organ, the characteristic points can be located on the outer contour of the face and the edge or the center of each face organ, and the number of the characteristic points can be flexibly set according to actual needs. The first characteristic position information can also be position information of each human face organ characteristic point such as eyes, a nose, eyebrows, a mouth and the like on the human face through manual labeling.
The first feature information may further include face attributes, texture information, and the like, where the face attributes may include eye size, hair color, nose size, mouth size, and the like, and the texture information may include face pixels, and the specific content may be flexibly set according to actual needs, and is not limited herein.
For example, as shown in fig. 3, when the preset object is a human face, the first divided region information may include divided regions such as hair (divided region 1), left eye (divided region 5), right eye (divided region 3), left eyebrow (divided region 4), right eyebrow (divided region 2), nose (divided region 6), lips (divided region 7), teeth (divided region 8), and face. The divided region information may be obtained by setting different flags for each divided region, for example, a constant may be set for a pixel value located in a divided region, a pixel value located in a non-divided region is 0, and the like, and pixel values in different divided regions may be represented by different constants.
It should be noted that, when the preset object is a vehicle, the first feature position information may include position information of vehicle features such as wheels, license plates, windows, logos, lamps and mirrors, and the first divided region information may include vehicle feature divided region information such as wheels, license plates, windows, logos, lamps and mirrors.
In some embodiments, the step of determining the first feature information of the preset object in each training sample image by the image processing device may include: receiving a labeling instruction, and determining first characteristic position information of a preset object in each training sample image based on the labeling instruction; receiving a setting instruction, and determining first segmentation area information of a preset object in each training sample image based on the setting instruction; the first feature position information and the first divided area information are set as first feature information.
Specifically, the image processing apparatus may receive a labeling instruction input by a user, where the labeling instruction may be used to set labeling information at a position where a feature of a preset object is located in a training sample image, where the labeling information may be a point, a circle, a polygon, or the like. One or more pieces of labeling information can be set on one training sample image based on the labeling instruction, for example, the labeling information is set at the positions of eyes or a nose of a human face in the training sample image. Then, the position of each feature of the preset object in the training sample image can be determined according to each piece of labeling information, then, the first feature position information of the feature of each preset object on the training sample image is calculated according to the position of each feature of the preset object in the training sample image, and so on, one or more pieces of labeling information can be set in another training sample image based on the labeling instruction, then, the first feature position information of each feature of the preset object on another training sample image can be calculated according to each piece of labeling information until all the training sample images in the plurality of training sample images are calculated, and the first feature position information of the preset object in the training sample images is obtained.
The image processing apparatus may receive a setting instruction input by a user, and the setting instruction may be used to set an identifier, which may be a number or a name, for a pixel value of an area where a feature of an object is located in a training sample image. One or more identifiers corresponding to the areas can be set on one training sample image based on the setting instruction, for example, the identifiers are set in the areas where the eyes or the nose of the human face are located in the training sample image. Then, a segmentation region of each feature of the preset object in the training sample image can be determined according to the identifier of each region, and then, according to the segmentation region of each feature of the preset object in the training sample image, first segmentation region information of the feature of each preset object on the training sample image is determined, and so on, one or more region identifiers can be set in another training sample image based on the setting instruction, and then, first segmentation region information of each feature of the preset object in another training sample image can be determined according to each region identifier until all the training sample images in the training sample images are determined, so that segmentation region information of the preset object in the training sample image is obtained. And finally obtaining the first characteristic position information and the first segmentation area information as the first characteristic information.
After obtaining each training sample image, the image processing apparatus may down-sample or otherwise reduce the original resolution of each training sample image to a preset value, so as to obtain a plurality of training sample images with reduced resolutions, where the preset value may be flexibly set according to actual needs, and the training sample images with reduced resolutions may be images with lower definition, for example, low-resolution images with a resolution of 16 × 16, or the like. For example, the original resolution of the training sample image a may be reduced to a preset value to obtain a training sample image a with the reduced resolution, and the original resolution of the training sample image B may be reduced to a preset value to obtain a training sample image B with the reduced resolution; the original resolution of the training sample image C can be reduced to a preset value, and the training sample image C with the reduced resolution is obtained; and so on.
After obtaining the plurality of training sample images with the resolution being turned down, second feature position information and second segmentation area information of the preset object in each training sample image with the resolution being turned down may be obtained, for example, the second feature position information and the second segmentation area information of the preset object in each training sample image with the resolution being turned down are calculated through a preset model to be trained.
The preset model to be trained may include a model composed of a Residual Network (Residual Network) and a generated confrontation Network (GAN), or a model composed of a convolution Network and a generated confrontation Network, and the Network framework for generating the confrontation Network may include a plurality of Network variants, for example, a generated Network may include a priori estimation Network, a discrimination Network, a feature Network, and the like, the model to be trained may also be another model, and may be flexibly set according to actual needs, and specific contents are not limited herein.
In some embodiments, the step of acquiring the second feature position information and the second segmentation area information of the preset object in each image of the training sample after the resolution is adjusted to be lower may include: and calculating second characteristic position information and second segmentation area information of the preset object in each training sample image with the lowered resolution by adopting a prior estimation network in the model to be trained.
The image processing device may call the prior estimation network in the model to be trained, and calculate second feature information of a preset object in each training sample image with the resolution turned down by using the prior estimation network, where the preset object is consistent with the above-mentioned preset object, and for example, the preset object may include a human face or a vehicle. The second feature information may include second feature position information similar to the first feature position information, second divided region information similar to the first divided region information, and the like, for example, when the preset object is a human face, the second feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, the position information of each feature may include position information of a plurality of feature points, and the second divided region information may include information of divided regions such as hair, eyes, eyebrows, a nose, a mouth (including lips and teeth), and a face. For another example, when the preset object is a vehicle, the first feature position information may include position information of vehicle features such as wheels, license plates, windows, logos, lamps and mirrors, and the second divided region information may include vehicle feature divided region information such as wheels, license plates, windows, logos, lamps and mirrors.
In some embodiments, the step of calculating, by the preset model to be trained, second feature position information and second segmentation area information of the preset object in each training sample image with the resolution being turned down may include:
selecting a training sample image from the training sample images with the resolution reduced as a current training sample image;
searching a preset object from a current training sample image;
if the preset object is found in the current training sample image, calculating second characteristic position information and second segmentation area information of the preset object by adopting a prior estimation network in the model to be trained;
and returning to execute the operation of selecting one training sample image from the plurality of training sample images with the reduced resolution as the current training sample image until the plurality of training sample images with the reduced resolution are all calculated.
Specifically, the current training sample image is a training sample image with a resolution reduced, and the image processing apparatus may search for a preset object from the current training sample image, for example, may search for a human face from the current training sample image through a face recognition technique, and search for features of the human face, such as eyes, eyebrows, nose, mouth, and face contour. If the preset object and the relevant features of the preset object cannot be searched in the current training sample image, second feature position information and second segmentation area information of the preset object do not need to be calculated. And if the preset object and the relevant characteristics thereof are found in the current training sample image, calculating second characteristic position information and second segmentation area information of the preset object through the model to be trained. And then, returning to execute the operation of selecting one training sample image from the plurality of training sample images with the reduced resolution as the current training sample image until the plurality of training sample images with the reduced resolution are all calculated.
After the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information are obtained, the model to be trained may be trained according to the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information.
In some embodiments, the training the model to be trained according to the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information, and the obtaining the image processing model may include:
(a) Adopting a residual error network in the model to be trained, based on the second characteristic position information and the second segmentation area information, converging the resolution of each training sample image with the lowered resolution to the original resolution of the training sample image, and obtaining the training sample image with the converged resolution;
(b) Calculating third characteristic position information of a preset object in the training sample image after resolution convergence by adopting a characteristic network in the model to be trained;
(c) And updating the parameters of the model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information, the second segmentation area information and the third characteristic position information to obtain an image processing model.
Specifically, in order to accurately restore each of the training sample images with the resolution turned down to the training sample image, the image processing device may call a residual error network in the model to be trained, and use the residual error network to converge the original resolution of each of the training sample images with the resolution turned down to the original resolution of the training sample image based on the second feature information (including the second feature position information and the second division area information), so as to obtain the training sample image with the resolution converged, where the resolution of the training sample image with the resolution turned down is greater than the resolution of the training sample image with the resolution converged, and a difference between the resolutions of the two training sample images may be smaller than a preset threshold, and the preset threshold may be flexibly set according to actual needs.
After the training sample image with the converged resolution is obtained, a feature network in the model to be trained may be called, and the feature network is used to calculate third feature position information of a preset object in the training sample image with the converged resolution, where the preset object is consistent with the mentioned preset object, and the preset object may include a human face, a vehicle, or the like, for example. The third feature position information is similar to the first feature position information, for example, when the preset object is a human face, the third feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, and the position information of each feature may include position information of a plurality of feature points. Alternatively, the third feature position information may be position information of feature points of each face organ generated by positioning each face organ such as eyes, a nose, eyebrows, and a mouth on the face in the training sample image after resolution convergence by using a face recognition technique. At this time, the image processing apparatus may update the parameters of the model to be trained according to the first feature position information, the first divided region information, the second feature position information, the second divided region information, and the third feature position information, to obtain the image processing model.
In some embodiments, the training the model to be trained according to the first feature position information, the first segmented region information, the second feature position information, the second segmented region information, and the third feature position information, and the obtaining the image processing model may include:
calculating an error between the first characteristic position information and the second characteristic position information by adopting a priori estimation network in a model to be trained to obtain a characteristic position error, calculating an error between the first segmentation region information and the second segmentation region information to obtain a segmentation region error, and setting the characteristic position error and the segmentation region error as a first characteristic error;
calculating an error between the first characteristic position information and the third characteristic position information by adopting a characteristic network in the model to be trained to obtain a second characteristic error;
determining an image error between a training sample image with a converged resolution and an original training sample image by adopting a residual error network in a model to be trained;
acquiring gradient information according to the first characteristic error, the second characteristic error and the image error;
and updating the parameters of the model to be trained according to the gradient information to obtain an image processing model.
For example, the image processing device may call a priori estimation network in the model to be trained, and compare the first feature position information with the second feature position information through the priori estimation network to obtain a feature position error; comparing the first segmentation region information with the second segmentation region information to obtain a segmentation region error; the feature position error and the divided region error are set as a first feature error. Specifically, the image processing apparatus may calculate a feature position error between the first feature position information and the second feature position information, and a divided region error between the first divided region information and the second divided region information, which are the first feature errors, respectively.
Specifically, the image processing apparatus may calculate the first feature error between the first feature information and the second feature information according to formula (1), and the formula (1) may specifically be as follows:
Figure BDA0001664385590000111
wherein, delta 1 Representing a first characteristic error, N representing the number of training sample images (the value of N can be flexibly set according to actual needs), N representing the nth training sample image, z representing first characteristic information (including first characteristic position information and first segmentation area information), and z representing the first characteristic information n Representing the first characteristic information corresponding to the n-th training sample image, x representing the training sample image with the resolution reduced, x n Representing the n-th training sample image with the resolution reduced, P representing a priori estimation network in the model to be trained, P (x) n ) And second feature information (including second feature position information and second divided region information) corresponding to the nth training sample image with the lowered resolution.
And the image processing device can call the feature network in the model to be trained, and calculate the feature position error between the first feature position information and the third feature position information by adopting the feature network to obtain a second feature error. For example, the second feature error between the first feature position information and the third feature position information may be calculated according to formula (2)
Figure BDA0001664385590000121
Wherein, delta 2 Representing the second characteristic error, N representing the number of training sample images, N representing the nth training sample image, phi representing the characteristic network in the model to be trained, y representing the training sample image, phi (y) n ) Representing the first characteristic position information corresponding to the nth training sample image, G representing a residual error network, and x representing the reduced resolutionTraining sample image of (1), x n Represents the n-th image of the training sample with the resolution turned down, phi (G (x) n ) And) third feature position information corresponding to the nth training sample image with the reduced resolution.
And the image processing device may call a residual network in the model to be trained, and determine an image error between the training sample image with the converged resolution and the original training sample image by using the residual network, where the image error may include a pixel error, an identification error, a countermeasure error, and the like. The pixel error may be an error between each pixel value between the resolution-converged training sample image and the original training sample image, and the countermeasure error may be an error generated by the discrimination network for counteracting the residual error network and the prior estimation network. The identification error may be an error for identifying whether the training sample image with the converged resolution is true or false from the original training sample image, for example, for a training sample image to be identified, when it is determined that the training sample image to be identified is the training sample image with the converged resolution, the identifier of the training sample image to be identified is set to 0, when it is determined that the training sample image to be identified is the original training sample image, the identifier of the training sample image to be identified is set to 1, and then the obtained identifier is compared with the true value to obtain the identification error.
In some embodiments, the step of determining the image error between the resolution-converged training sample image and the training sample image may include:
acquiring a pixel error between a training sample image with a converged resolution and an original training sample image by adopting a residual error network in a model to be trained; adopting a discrimination network in a model to be trained to discriminate the training sample image with the converged resolution ratio from the original training sample image to obtain an identification error and a confrontation error; the pixel error, the discrimination error, and the countermeasure error are set as image errors.
For example, a pixel error, a discrimination error, a countermeasure error, and the like may be calculated separately, wherein the pixel error may be calculated according to the following equation (3):
Figure BDA0001664385590000131
wherein, delta 3 Representing pixel error, N representing the number of training sample images, N representing the nth training sample image, y representing the training sample image, y n Representing the nth training sample image, x representing the reduced resolution training sample image, x n Representing the n-th training sample image with the resolution reduced, G representing a residual error network in the model to be trained, G (x) n ) And representing the nth training sample image with the resolution reduced, which is obtained by restoring the residual error network.
The discrimination error may be calculated according to the following equation (4), and the countermeasure error may be calculated according to the following equation (5):
Figure BDA0001664385590000132
Figure BDA0001664385590000133
wherein, delta 4 Representing the identification error, N representing the number of training sample images, N representing the nth training sample image, log representing a logarithmic function, D representing a discrimination network in the model to be trained, y representing the training sample images, y n Representing the nth training sample image, x representing the reduced resolution training sample image, x n Representing the n-th training sample image with the resolution reduced, G representing a residual error network in the model to be trained, G (x) n ) Representing the nth image of the training sample with reduced resolution, Δ, obtained by residual network restoration 5 Indicating a counter error.
After obtaining the respective errors, the image processing apparatus may obtain gradient information based on the first feature error, the second feature error, and the image error, and in some embodiments, the step of obtaining the gradient information based on the first feature error, the second feature error, and the image error may include:
constructing a first loss function based on the first characteristic error, the second characteristic error, the pixel error and the countermeasure error;
carrying out gradient descent on the first loss function to obtain first gradient information;
constructing a second loss function based on the identification error, and performing gradient descent on the second loss function to obtain second gradient information;
the first gradient information and the second gradient information are set as gradient information.
Specifically, the image processing apparatus may construct the first loss function based on the first feature error, the second feature error, the pixel error in the image error, and the countermeasure error according to equation (6), where equation (6) may specifically be as follows:
Figure BDA0001664385590000141
wherein L is 1 The first loss function is represented, and the meaning of the other parameter representations is similar to the above equations (1) to (5), which is not described herein, and the first loss function is the total error of the generation network (including the residual error network and the prior estimation network).
Then, the image processing apparatus may perform gradient descent on the first loss function to minimize the first loss function, so as to obtain the first gradient information, where a manner of the gradient descent may be flexibly set according to an actual need, and specific contents are not limited herein.
And the image processing apparatus may construct a second loss function based on the discrimination error in the image error according to equation (7), and perform gradient descent on the second loss function to minimize the second loss function to obtain second gradient information, where equation (7) may specifically be as follows:
Figure BDA0001664385590000142
wherein L is 2 Representing a second loss function, other parametric representationsSimilar to the above equation (4), which is not described herein, the second loss function is the discriminant network error.
After the first gradient information and the second gradient information are obtained, the image processing device may update parameters of the model to be trained according to the first gradient information and the second gradient information to adjust parameters or weights of the model to be trained to appropriate values, so as to obtain the image processing model, wherein a generation network (including a residual error network and a priori estimation network) and a discrimination network in the model to be trained may be updated alternately.
S102, extracting the features of the target object in the image to be processed to obtain target feature position information.
In some embodiments, the step of extracting features of the target object in the image to be processed to obtain the target feature position information may include:
extracting the characteristics of a target object in an image to be processed by adopting an image processing model to obtain target characteristic position information; or, determining an object identifier and a feature identifier of the target object, searching the target object from the image to be processed according to the object identifier, and extracting the feature and the position of the target object according to the feature identifier when the target object is searched, so as to obtain target feature position information.
Specifically, after obtaining the image processing model, the image processing apparatus may use the image processing model to increase the resolution of the image, and first, the image processing apparatus may call a priori estimation network in the image processing model, and use the priori estimation network to perform feature extraction on a target object in the image to be processed, for example, when the target object is a human face, use the priori estimation network to perform feature extraction on a human face organ on the human face in the image to be processed, so as to obtain feature position information of the human face organ, such as eyes, eyebrows, nose, and mouth. When the target object is a vehicle, the prior estimation network can be adopted to extract the characteristics of the vehicle in the image to be processed, and characteristic position information of wheels, license plates, windows, logos, lamps, mirrors and the like is obtained.
Alternatively, the image processing apparatus may obtain the target feature location information in other manners, for example, the image processing apparatus may determine an object identifier and a feature identifier of the target object first, where the object identifier may include one or more identifiers for uniquely identifying each target object; the feature identifier may include one or more features for uniquely identifying one or more features included in the target object, and the object identifier and the feature identifier may be names or numbers composed of numbers, letters and/or letters, or outline identifiers, etc. Then, a target object can be searched from the image to be processed according to the object identifier, and when the target object is not searched, operations such as extracting the characteristics and the position of the target object do not need to be executed; when the target object is found, the characteristics and the position of the target object can be extracted according to the characteristic identification to obtain the position information of the target characteristic.
And S103, carrying out characteristic region segmentation on the target object to obtain target segmentation region information.
In some embodiments, the step of extracting features of the target object in the image to be processed to obtain the target feature position information may include:
extracting the characteristics of a target object in an image to be processed by adopting an image processing model to obtain target characteristic position information; or, determining an object identifier and a feature identifier of the target object, searching the target object from the image to be processed according to the object identifier, and extracting the feature and the position of the target object according to the feature identifier when the target object is searched, so as to obtain target feature position information.
Specifically, the image processing apparatus may call a priori estimation network in the image processing model, and perform feature region segmentation on the target object in the image to be processed by using the priori estimation network, for example, when the target object is a human face, perform feature region segmentation on a human face organ in the image to be processed by using the priori estimation network, so as to obtain segmentation region information of the human face organ, such as eyes, eyebrows, nose, and mouth. When the target object is a vehicle, the prior estimation network can be adopted to segment the characteristic region of the vehicle in the image to be processed to obtain the segmented region information of wheels, license plates, windows, logos, lamps, mirrors and the like. Alternatively, the image processing apparatus may acquire the target segmentation area information in other manners, for example, the image processing apparatus searches for the target object from the image to be processed according to the object identifier, and extracts the feature of the target object and the position thereof according to the feature identifier when the target object is found, so as to obtain the target feature position information.
And S104, increasing the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information by adopting a preset image processing model.
The image processing model is formed by training feature position information and segmentation region information of preset objects in a plurality of training sample images. Because the image processing model is obtained by training based on the feature position information and the segmentation area information of the preset object in the training sample images with different resolutions, the image processing device can increase the original resolution of the image to be processed, namely convert the low-resolution image into the high-resolution image, by the aid of the target feature position information and the target segmentation area information of the target object in the image to be processed, which are obtained by the image processing model. The target object is similar to the above-mentioned preset object, for example, the preset object may include a human face or a vehicle. The target feature position information is similar to the first feature position information, and the target divided region information is similar to the first divided region information.
In some embodiments, when the target object is a human face, the target feature position information is position information of a human face organ, the target segmentation area information is segmentation area information of the human face organ, and the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation area information may include:
and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the position information of the human face organ and the information of the human face organ segmentation region.
The face organ position information may include position information of features such as eyes, eyebrows, a nose, a mouth, a face contour and the like, the position information of each feature may include position information of a plurality of feature points, and the face organ segmentation region information may include segmentation region information such as hair, eyes, eyebrows, a nose, a mouth, a face and the like.
The image processing device can increase the original resolution of the image to be processed based on the face organ position information and the face organ segmentation region information through the image processing model to obtain the processed image, namely, the original resolution of the image to be processed can be increased to a preset resolution value, and the preset resolution value can be flexibly set according to actual needs. For example, as shown in fig. 5, a pixel point with a higher multiple can be restored according to each pixel point in a low-resolution image, so that a high-resolution image can be obtained, a super-resolution effect with a higher multiple is achieved, restoration of details such as human face organs and contours can be effectively promoted, image quality is greatly improved, and a display effect of the image is better.
In some embodiments, when the target object is a vehicle, the target feature position information is vehicle feature position information, the target segmentation area information is vehicle segmentation area information, and the step of increasing the original resolution of the image to be processed based on the target feature position information and the target segmentation area information by using a preset image processing model may include:
and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the vehicle characteristic position information and the vehicle segmentation region information.
The vehicle characteristic position information may include position information of vehicle characteristics such as wheels, license plates, windows, logos, lamps and mirrors, and the vehicle segmentation area information may include vehicle characteristic segmentation area information such as wheels, license plates, windows, logos, lamps and mirrors. The image processing device can increase the original resolution of the image to be processed through the image processing model based on the vehicle characteristic position information and the vehicle segmentation area information to obtain the processed image. The original resolution of the image to be processed can be increased to a preset resolution value, and the preset resolution value can be flexibly set according to actual needs.
As can be seen from the above, when the resolution of the image needs to be increased, the embodiment of the present invention may acquire the image to be processed, perform feature extraction on the target object in the image to be processed to obtain target feature position information, and perform feature region segmentation on the target object in the image to be processed to obtain target segmentation region information; and then, a preset image processing model is adopted, and the original resolution of the image to be processed is increased based on the target characteristic position information and the target segmentation area information, namely, the image to be processed with low resolution is converted into the image with high resolution. According to the scheme, the original resolution of the image to be processed can be accurately adjusted based on the target feature position information and the target segmentation area information of the target object in the image to be processed, so that the definition of the processed image can be improved, and the quality of the processed image is improved.
The method described in the above embodiments is further illustrated in detail by way of example.
In this embodiment, an image processing apparatus is taken as an example of a network device, and a preset object is taken as an example of a human face, when a network device acquires a human face image of a user through a monitoring device, the resolution of the image that may be acquired is low due to the influence of factors such as a current environment and equipment, so that an image with low resolution can be converted into an image with high resolution by the scheme of the embodiment of the present invention.
Taking the model to be trained as including a residual error network and a generation countermeasure network as an example, the variant network of the generation countermeasure network may include an a priori estimation network, a feature network, a discrimination network, and the like, for example, as shown in fig. 6, where a core structure of the residual error network is a residual error module, and the residual error module learns the residual error input to the output, rather than the direct mapping between the two, so that the problem of performance degradation caused by the network structure can be effectively overcome. The generation countermeasure network comprises a generator and a discriminator, wherein the generator aims to generate a sample obfuscation discriminator which is real enough, the discriminator can be a two-classifier, and whether input data is real data or a generated sample can be judged.
Taking the whole network structure of the model to be trained as an example of a priori estimation network, a residual error network, a discriminant network and a feature network, for example, as shown in fig. 6, where the a priori estimation network may be configured to estimate, from a low-resolution image (i.e., a training sample image with a reduced resolution), face prior information (i.e., feature information), where the face prior information may include feature position information (i.e., second feature position information) of a face and segmentation region information (i.e., second segmentation region information) of each organ of the face, and transmit the obtained feature information (including the second feature position information and the second segmentation region information) to the residual error network, and the a priori estimation network may also be configured to compare the obtained feature information with first feature information corresponding to a real high-resolution image (i.e., the training sample image) to obtain a prior error (i.e., a first feature error); and so on.
The residual error network can be used for restoring the low-resolution image into a high-resolution image according to the characteristic information transmitted by the prior estimation network to obtain a restored high-resolution image (namely a training sample image with a converged resolution), transmitting the restored high-resolution image to the discrimination network and the characteristic network, and comparing the restored high-resolution image with the real high-resolution image to obtain a pixel error; and so on.
The discrimination network can be used to determine whether the input data (including the restored high-resolution image and the true high-resolution image) is true data (i.e., a true high-resolution image) or a generated sample (i.e., a restored high-resolution image), causing the residual network to restore a more realistic high-resolution image. For example, it is possible to output countermeasure errors and discrimination errors by discriminating whether the input to the network is a restored high-resolution image or a true high-resolution image.
The feature network can be used for extracting feature information (including feature position information) of the recovered high-resolution image, and comparing the feature information with feature information of a real high-resolution image, so that the image recovered by the network can keep identification information, and further the completion of a face verification task is facilitated. The input of the characteristic network is a recovered high-resolution image and a real high-resolution image, and the output is a characteristic error between the recovered high-resolution image and the real high-resolution image.
After forward calculation is performed based on a plurality of training sample images to obtain corresponding pixel errors, prior errors, characteristic errors, identification errors, countermeasure errors and other errors, a model to be trained with super-resolution of the face can be further trained, for example, a loss function can be constructed based on the errors, gradient reduction is performed on the loss function to update parameters of the model to be trained, iteration is repeated continuously until the model to be trained converges, the training process can be performed in an end-to-end training mode, and a generation network (including a residual error network and a prior estimation network) and a discrimination network can be updated alternately, so that an image processing model can be obtained.
After the image processing model is obtained, the feature position information, the segmentation region information and the like of the target object in the low-resolution image to be converted can be calculated through a priori estimation network in the image processing model, and are transmitted to a residual error network, and at the moment, the residual error network can convert the low-resolution image to be converted into the high-resolution image (namely, the processed image) according to the feature position information, the segmentation region information and the like.
Referring to fig. 7, fig. 7 is a flowchart illustrating an image processing method according to an embodiment of the invention. The method flow can comprise the following steps:
201. the network equipment acquires a plurality of training sample images and determines first characteristic information of the face in each training sample image.
First, the network device needs to perform model training, that is, training a model to be trained, for example, a large number of images including a human face may be taken through a mobile phone, a camera, a video camera, or the like, multiple images of the same human face may be taken, and multiple training sample images may be obtained through searching on the internet or obtaining from a picture database.
The training sample image may be an image with high definition, for example, a high-resolution image with a resolution of 128 × 128, or a high-resolution image with a resolution of 1024 × 1024, or the like. The multiple training sample images may include images of different faces or images of the same face, and the faces included in each training sample image may be different. For example, a plurality of images of the same face may be captured at different locations, at different times, or at different angles, or a plurality of images of different faces may be captured for different people, one or more faces may be included in the same training sample image, and the capturing angle of the face included in the training sample image may be an angle such as a front face or a side face.
After obtaining a plurality of training sample images, the network device may determine first feature information of a face in each training sample image, where the first feature information may include first feature position information, first segmentation area information, and the like. The first feature position information may include position information of features such as eyes, eyebrows, nose, mouth, and face contour, for example, as shown in fig. 3, the position information of each feature may include position information of a plurality of feature points.
The first characteristic position information may be position information of characteristic points of each face organ generated by positioning each face organ such as eyes, a nose, eyebrows, a mouth, and the like on the face in the image by a face recognition technology. The first characteristic position information can also be position information of each human face organ characteristic point such as eyes, a nose, eyebrows, a mouth and the like on the human face through manual labeling.
For example, as shown in fig. 4, the first divided region information may include divided regions such as hair, eyes, eyebrows, nose, mouth, and face, and different flags may be set for each divided region to obtain the divided region information, for example, a constant may be set for a pixel value located in the divided region, a 0 may be set for a pixel value located in the non-divided region, and the like, and pixel values in different divided regions may be represented by different constants, for example, a 1 may be set for a pixel value in the left eye region, a 2 may be set for a pixel value in the right eye region, a 3 may be set for a pixel value in the nose region, and the like.
202. And the network equipment reduces the original resolution of each training sample image to a preset value to obtain a plurality of training sample images with reduced resolutions.
After obtaining each training sample image, the network device may down-sample or otherwise adjust the original resolution of each training sample image to a preset value, so as to obtain a plurality of training sample images with the reduced resolution, where the preset value may be flexibly set according to actual needs, and the training sample images with the reduced resolution may be images with lower definition, for example, images with a resolution of 16 × 16, or images with a resolution of 8 × 8, and the like. For example, the original resolution of the training sample image a may be reduced to a preset value to obtain a training sample image a with the reduced resolution, and the original resolution of the training sample image B may be reduced to a preset value to obtain a training sample image B with the reduced resolution; the original resolution of the training sample image C can be reduced to a preset value, and the training sample image C with the reduced resolution is obtained; and so on.
203. The network equipment calculates second feature information of the face in each training sample image with the resolution being turned down through a priori estimation network in the model to be trained, and determines a first feature error between the first feature information and the second feature information.
The network device calculates second feature information of the face in each training sample image with the resolution being reduced through an a priori estimation network in the model to be trained, where the second feature information may include second feature position information, second segmentation area information, and the like, where the second feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, the position information of each feature may include position information of a plurality of feature points, and the second segmentation area information may include information of segmentation areas such as hair, eyes, eyebrows, a nose, a mouth (including lips and teeth), and a face.
At this time, the network device may compare the second feature information with the first feature information through the a priori estimation network in the model to be trained, for example, the first feature position information may be compared with the second feature position information to obtain a feature position error, and the first divided region information may be compared with the second divided region information to obtain a divided region error, where the obtained feature position error and the obtained divided region error are the first feature error, and the calculation formula of the first feature error may be formula (1).
204. And the network equipment converges the original resolution of each training sample image with the lowered resolution towards the training sample image based on the second characteristic information through a residual error network in the model to be trained to obtain the training sample image with the converged resolution, and compares the training sample image with the converged resolution with the training sample image to obtain the pixel error.
In order to accurately restore each training sample image with the lowered resolution to the original training sample image, the network device may converge each training sample image with the lowered resolution to the original resolution of the original training sample image based on the second feature information through a residual error network in the model to be trained, so as to restore each training sample image to the training sample image, and obtain the training sample image with the converged resolution.
After the training sample image with the converged resolution is obtained, the network device may perform pixel comparison on the training sample image with the converged resolution and the original training sample image through a residual error network in the model to be trained to obtain a pixel error, where a calculation formula of the pixel error may be formula (3).
205. And the network equipment calculates third feature information of the face in the training sample image after resolution convergence through a feature network in the model to be trained, and calculates a second feature error between the first feature information and the third feature information.
After obtaining the training sample image with the converged resolution, the network device may calculate, through the feature network in the model to be trained, third feature information of the face in the training sample image with the converged resolution, where the third feature information may include third feature position information and the like, where the third feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, and the position information of each feature may include position information of a plurality of feature points.
At this time, the network device may calculate a feature position error between the first feature position information and the third feature position information through a feature network in the model to be trained, to obtain a second feature error, where a calculation formula of the second feature error may be formula (2) above.
206. And the network equipment discriminates the training sample image with converged resolution ratio and the original training sample image through a discrimination network in the model to be trained to obtain discrimination errors and confrontation errors.
After the training sample image with the converged resolution is obtained, the network equipment can distinguish the training sample image with the converged resolution and the training sample image through a distinguishing network in the model to be trained to obtain an identification error and a confrontation error. The calculation formula of the discrimination error may be the above formula (4), and the calculation formula of the countermeasure error may be the above formula (5).
207. The network equipment constructs a first loss function based on the first characteristic error, the second characteristic error, the pixel error and the countermeasure error, trains the residual error network and the prior estimation network through the first loss function, constructs a second loss function based on the discrimination error, and trains the discrimination network through the second loss function to obtain the image processing model.
After obtaining the respective errors, the network device may construct a first loss function based on the first characteristic error, the second characteristic error, the pixel error in the image error, and the antagonistic error, where an expression of the first loss function may be formula (6) above, train the residual network and the prior estimation network through the first loss function, for example, perform gradient descent on the first loss function to obtain first gradient information, and update parameters of the residual network and the prior estimation network according to the first gradient information, so as to adjust parameters or weights of the residual network and the prior estimation network to appropriate values.
And constructing a second loss function based on the discrimination error in the image error, where an expression of the second loss function may be the above equation (7), training the discrimination network through the second loss function, for example, gradient reduction may be performed on the second loss function to obtain second gradient information, and updating parameters of the discrimination network according to the second gradient information to adjust parameters or weights of the discrimination network to appropriate values. Thus, after training each network such as a discriminant network, a residual error network and a priori estimation network in the model to be trained, an image processing model can be obtained.
208. And the network equipment calculates the target characteristic information of the target face in the image to be processed through a priori estimation network in the image processing model.
After the image processing model is obtained, when the network equipment acquires the face image of the user through the monitoring equipment, the acquired image with lower resolution can be converted into an image with higher resolution through the image processing model. Because the image processing model is obtained by training the feature information of the face in the training sample images with different resolutions, after the image processing model is obtained, when the resolution of the image needs to be increased, the network device may calculate, through a priori estimation network in the image processing model, target feature information of a target face in the image to be processed, where the target feature information may include target feature position information and target segmentation area information, and the like, where the target feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, a face contour, and the like, the position information of each feature may include position information of a plurality of feature points, and the target segmentation area information may include information of segmentation areas such as a hair, eyes, eyebrows, a nose, a mouth, a face, and the like.
209. And the network equipment increases the original resolution of the image to be processed based on the target characteristic information through a residual error network in the model to be trained to obtain the processed image.
After obtaining the target feature information, the network device may increase the original resolution of the image to be processed based on the target feature information through a residual error network in the model to be trained, to obtain a processed image, that is, convert the low resolution image into a high resolution image, for example, as shown in fig. 5. For example, 64 or 128 pixels can be restored according to 1 pixel information in the low-resolution image, and after corresponding 64 or 128 pixels are restored for each pixel in the low-resolution image, the high-resolution image can be obtained, the super-resolution effect of 8 times or even higher times can be achieved, the restoration of details such as human face organs and outlines can be effectively improved, the image quality is greatly improved, and the display effect of the image is better.
The image processing flow of the network equipment can be applied to various aspects, for example, in a nuclear business, because the resolution of images with a lot of identity card data is low, the images with low quality of identity card data need to be subjected to overdivision, namely, the images to be converted are converted into high-resolution images, the faces of users can be distinguished based on the high-resolution images, and the verification performance is effectively improved. In addition, in a monitoring environment, due to the limitation of a scene and a monitoring camera, the quality of an image acquired by the monitoring camera is generally poor, and the resolution is low, so that the quality of the acquired image needs to be improved, and the performance of subsequent related tasks is further improved.
In the embodiment of the invention, the model to be trained can be trained based on the images with different resolutions and the characteristic information of the preset object in the images to obtain the image processing model, and when the resolution of the images needs to be increased, the image processing model can be adopted, and the resolution of the images can be accurately increased based on the characteristic information, so that the definition of the processed images can be improved, and the quality of the processed images is improved.
In order to better implement the image processing method provided by the embodiment of the present invention, an embodiment of the present invention further provides an apparatus based on the image processing method. The terms are the same as those in the image processing method, and details of implementation can be referred to the description in the method embodiment.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention, where the image processing apparatus may include a first obtaining unit 301, an extracting unit 302, a dividing unit 303, an increasing unit 304, and the like.
The first acquiring unit 301 is configured to acquire an image to be processed.
The image to be processed may be a low-definition image, for example, a low-resolution image with a resolution of 16 × 16, or other resolution images. The image to be processed includes a target object, which may include a human face or a vehicle.
The acquisition mode of the first acquisition unit 301 for acquiring the image to be processed may include: in the first mode, a large number of images containing a target object can be shot through a mobile phone, a camera or a camera and the like; in the second mode, the to-be-processed image may be obtained through searching on the internet or from a database, and the like, and of course, the to-be-processed image may also be obtained in other obtaining modes, and the specific mode is not limited herein.
In some embodiments, as shown in fig. 9, the image processing apparatus may further include a determining unit 305, a reducing unit 306, a second obtaining unit 307, a training unit 308, and the like, which may specifically be as follows:
a determining unit 305, configured to obtain multiple training sample images, and determine first feature position information and first segmentation area information of a preset object in each training sample image;
a reducing unit 306, configured to reduce the original resolution of each training sample image to a preset value, so as to obtain a plurality of training sample images with reduced resolutions;
a second obtaining unit 307, configured to obtain second feature position information and second segmentation area information of a preset object in each reduced-resolution training sample image;
the training unit 308 is configured to train the model to be trained according to the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information, so as to obtain an image processing model.
The training sample image may be a high-definition image, for example, a high-resolution image with a resolution of 128 × 128, a high-resolution image with a resolution of 1024 × 1024, or the like. The plurality of training sample images may include images of different preset objects, or may include images of the same preset object, where the preset object may include a human face or a vehicle, for example, a part of the training sample images may include a human face, another part of the training sample images may include a vehicle, and the preset objects included in each training sample image may be the same or different.
For example, taking a preset object as a face as an example, multiple images of the same face may be taken at different places, at different times or at different angles, or multiple images of different faces may be taken for different people, the same training sample image may include one or more faces, the training sample image may include an entire image of a face, or may include only an image of a local area of a face, and the like; the shooting angle of the training sample image including the face can be the front or the side and the like.
For another example, taking a preset object as a vehicle as an example, multiple images of the same vehicle may be captured at different locations, at different times, or at different angles, or multiple images of different vehicles may be captured for different vehicles, one or more vehicles may be included in the same training sample image, and the training sample image may include an entire image of the vehicle, or may include only an image of a local area of the vehicle; the shooting angle of the vehicle in the training sample image can be the front or the side angle.
It should be noted that the number of the training sample images, the type and number of the preset objects, the shooting angle, the resolution, and the like can be flexibly set according to actual needs, and specific contents are not limited herein.
The determination unit 305 may acquire the training sample image in a manner including: in the first mode, a plurality of training sample images can be acquired by taking a large number of images containing a preset object through a mobile phone, a camera or a video camera and the like, and taking a plurality of images of the same preset object. In the second mode, a plurality of training sample images may be obtained by searching on the internet or obtaining from the picture database, etc., and of course, the obtaining mode of the plurality of training sample images may also be other obtaining modes, and the specific mode is not limited herein.
After obtaining the plurality of training sample images, the determining unit 305 may determine first feature information of the preset object in each training sample image, where the first feature information may include first feature position information, first segmentation area information, and the like, that is, the determining unit 305 may determine the first feature position information and the first segmentation area information of the preset object in each training sample image. For example, as shown in fig. 3, when the preset object is a human face, the first feature position information may include feature position information of human face organs such as eyes, eyebrows, a nose, a mouth, and a face contour, and the position information of each feature may include position information of a plurality of feature points, and the position information may be a two-dimensional coordinate position or a pixel coordinate position.
The first characteristic position information can be position information of each face organ such as eyes, a nose, eyebrows, a mouth and the like on a face in an image through a face recognition technology to generate position information of characteristic points of each face organ, the characteristic points can be position coordinate information of key points corresponding to each face organ, the characteristic points can be located on the external contour of the face and the edge or the center of each face organ and the like, and the number of the characteristic points can be flexibly set according to actual needs. The first characteristic position information can also be position information of various human face organ characteristic points such as eyes, a nose, eyebrows, a mouth and the like marked on the human face by manpower.
The first feature information may further include face attributes, texture information, and the like, where the face attributes may include eye size, hair color, nose size, mouth size, and the like, and the texture information may include face pixels, and the specific content may be flexibly set according to actual needs, which is not limited herein.
For example, as shown in fig. 3, when the preset object is a human face, the first divided region information may include divided regions such as hair (divided region 1), left eye (divided region 5), right eye (divided region 3), left eyebrow (divided region 4), right eyebrow (divided region 2), nose (divided region 6), lips (divided region 7), teeth (divided region 8), and face. The divided region information may be obtained by setting different flags for each divided region, for example, a constant may be set for a pixel value located in a divided region, a pixel value located in a non-divided region is 0, and the like, and pixel values in different divided regions may be represented by different constants.
It should be noted that, when the preset object is a vehicle, the first characteristic position information may include position information of a wheel, a license plate, a window, a logo, a lamp, a mirror, and the like, and the first divided area information may include area information of a wheel, a license plate, a window, a logo, a lamp, a mirror, and the like.
After obtaining each training sample image, the reducing unit 306 may reduce the original resolution of each training sample image to a preset value through downsampling or other methods, so as to obtain a plurality of training sample images with reduced resolutions, where the preset value may be flexibly set according to actual needs, and the training sample images with reduced resolutions may be images with lower definition, for example, images with a resolution of 16 × 16 and the like. For example, the original resolution of the training sample image a may be reduced to a preset value to obtain a training sample image a with the reduced resolution, and the original resolution of the training sample image B may be reduced to a preset value to obtain a training sample image B with the reduced resolution; the original resolution of the training sample image C can be reduced to a preset value, and the training sample image C with the reduced resolution is obtained; and so on.
After obtaining a plurality of training sample images with the resolution being reduced, the second obtaining unit 307 may calculate second feature position information and second segmentation area information of a preset object in each training sample image with the resolution being reduced through a preset model to be trained.
The preset model to be trained may include a model composed of a residual network and a generated countermeasure network, or a model composed of a convolutional network and a generated countermeasure network, and the network framework for generating the countermeasure network may include a plurality of network variants, for example, may include a priori estimation network, a discrimination network, a feature network, and other generated networks, and the model to be trained may also be other models, and may be flexibly set according to actual needs, and specific contents are not limited here.
In some embodiments, the second obtaining unit 307 is specifically configured to: and calculating second characteristic position information and second segmentation area information of the preset object in each training sample image with the lowered resolution by adopting a prior estimation network in the model to be trained.
The second obtaining unit 307 may call the prior estimation network in the model to be trained, and calculate second feature information of a preset object in each of the training sample images with the reduced resolution by using the prior estimation network, where the preset object is consistent with the above-mentioned preset object, and for example, the preset object may include a human face or a vehicle. The second feature information may include second feature position information similar to the first feature position information, second divided region information similar to the first divided region information, and the like, for example, when the preset object is a human face, the second feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, the position information of each feature may include position information of a plurality of feature points, and the second divided region information may include information of divided regions such as hair, eyes, eyebrows, a nose, a mouth (including lips and teeth), and a face. For another example, when the preset object is a vehicle, the first feature position information may include position information of vehicle features such as wheels, license plates, windows, logos, lamps and mirrors, and the second divided region information may include vehicle feature divided region information such as wheels, license plates, windows, logos, lamps and mirrors.
In some embodiments, the second obtaining unit 307 may specifically be configured to:
selecting a training sample image from the training sample images with the resolution reduced as a current training sample image;
searching a preset object from a current training sample image;
if the preset object is found in the current training sample image, calculating second characteristic position information and second segmentation area information of the preset object through the model to be trained;
and returning to execute the operation of selecting one training sample image from the plurality of training sample images with the resolution being reduced as the current training sample image until the plurality of training sample images with the resolution being reduced are all calculated.
After obtaining the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information, the training unit 308 may train the model to be trained according to the first feature position information, the first segmentation area information, the second feature position information, and the second segmentation area information.
In some embodiments, as shown in fig. 10, the training unit 308 may include a convergence sub-unit 3081, a calculation sub-unit 3082, an update sub-unit 3083, and the like, which may specifically be as follows:
a convergence subunit 3081, configured to converge, by using a residual network in the model to be trained, the resolution of each reduced-resolution training sample image to an original resolution of the training sample image based on the second feature position information and the second partition area information, to obtain a training sample image with a converged resolution;
the calculation subunit 3082 is configured to calculate, by using a feature network in the model to be trained, third feature position information of a preset object in the training sample image after resolution convergence;
and the updating subunit 3083 is configured to update the parameters of the model to be trained according to the first feature position information, the first divided region information, the second feature position information, the second divided region information, and the third feature position information, so as to obtain the image processing model.
Specifically, in order to accurately restore each of the training sample images with the reduced resolution to the training sample image, the convergence subunit 3081 may converge, based on the second feature information (including the second feature position information and the second division area information), the original resolution of each of the training sample images with the reduced resolution to the training sample image through the model to be trained, so as to obtain the training sample image with the converged resolution, for example, the training sample image with the reduced resolution may be converted into the training sample image with a fifth resolution through a residual network in the model to be trained.
After obtaining the training sample image with the converged resolution, the calculating subunit 3082 may call a feature network in the model to be trained, and calculate third feature position information of a preset object in the training sample image with the converged resolution by using the feature network, where the preset object is consistent with the mentioned preset object, and for example, the preset object may include a human face, a vehicle, or the like. The third feature position information is similar to the first feature position information, for example, when the preset object is a human face, the third feature position information may include position information of features such as eyes, eyebrows, a nose, a mouth, and a face contour, and the position information of each feature may include position information of a plurality of feature points. Alternatively, the third feature position information may be position information of feature points of each face organ generated by positioning, by a face recognition technology, each face organ such as eyes, a nose, eyebrows, and a mouth on the face in the training sample image after resolution convergence. At this time, the convergence subunit 3081 may update the parameters of the model to be trained according to the first feature position information, the first divided region information, the second feature position information, the second divided region information, and the third feature position information, to obtain the image processing model.
In some embodiments, the updating subunit 3083 may include a first calculating module, a second calculating module, a determining module, an obtaining module, an updating module, and the like, and specifically may be as follows:
the first calculation module is used for calculating an error between the first characteristic position information and the second characteristic position information by adopting a priori estimation network in the model to be trained to obtain a characteristic position error, calculating an error between the first segmentation area information and the second segmentation area information to obtain a segmentation area error, and setting the characteristic position error and the segmentation area error as the first characteristic error;
the second calculation module is used for calculating the error between the first characteristic position information and the third characteristic position information by adopting a characteristic network in the model to be trained to obtain a second characteristic error;
the determining module is used for determining an image error between the training sample image with the converged resolution and the original training sample image by adopting a residual error network in the model to be trained;
the acquisition module is used for acquiring gradient information according to the first characteristic error, the second characteristic error and the image error;
and the updating module is used for updating the parameters of the model to be trained according to the gradient information to obtain the image processing model.
For example, the first calculation module may call a priori estimation network in the model to be trained, and compare the first feature position information with the second feature position information through the priori estimation network to obtain a feature position error; comparing the first segmentation region information with the second segmentation region information to obtain a segmentation region error; the feature position error and the divided region error are set as the first feature error, that is, the first calculation module may calculate the first feature error between the first feature information and the second feature information according to the above equation (1). Specifically, the first calculation module may calculate a feature position error between the first feature position information and the second feature position information, and a segmentation area error between the first segmentation area information and the second segmentation area information, respectively, where the feature position error and the segmentation area error are the first feature error.
And the second calculation module can call the feature network in the model to be trained, and calculate the feature position error between the first feature position information and the third feature position information according to the formula (2) by adopting the feature network to obtain a second feature error.
And the determining module can call a residual error network in the model to be trained, and determine the image errors between the training sample images with the converged resolution and the training sample images by adopting the residual error network, wherein the image errors can comprise pixel errors, identification errors, countermeasure errors and the like. The pixel error may be an error between each pixel value between the resolution-converged training sample image and the original training sample image, and the countermeasure error may be an error generated by the discrimination network for counteracting the residual error network and the prior estimation network. The identification error may be an error for identifying whether the training sample image with the converged resolution is true or false from the original training sample image, for example, for a training sample image to be identified, when it is determined that the training sample image to be identified is the training sample image with the converged resolution, the identifier of the training sample image to be identified is set to 0, when it is determined that the training sample image to be identified is the original training sample image, the identifier of the training sample image to be identified is set to 1, and then the obtained identifier is compared with the true value to obtain the identification error.
In some embodiments, the determining module may be specifically configured to: acquiring a pixel error between a training sample image with a converged resolution and an original training sample image by adopting a residual error network in a model to be trained; adopting a discrimination network in a model to be trained to discriminate the training sample image with the converged resolution ratio from the original training sample image to obtain an identification error and a confrontation error; the pixel error, the discrimination error, and the countermeasure error are set as image errors.
For example, the determination module may calculate the pixel error according to the above formula (3), calculate the discriminant error according to the above formula (4), and counter the error according to the above formula (5), etc.
After obtaining the respective errors, gradient information may be obtained according to the first characteristic error, the second characteristic error, and the image error, and in some embodiments, the obtaining module may be specifically configured to: constructing a first loss function based on the first characteristic error, the second characteristic error, the pixel error and the countermeasure error; carrying out gradient descent on the first loss function to obtain first gradient information; constructing a second loss function based on the identification error, and performing gradient descent on the second loss function to obtain second gradient information; the first gradient information and the second gradient information are set as gradient information.
Specifically, the acquisition module may construct a first loss function based on the first feature error, the second feature error, a pixel error in the image error, and the counter error according to equation (6). Then, the first loss function may be subjected to gradient descent to minimize the first loss function, so as to obtain the first gradient information, where a manner of gradient descent may be flexibly set according to actual needs, and specific contents are not limited herein.
And the obtaining module may construct a second loss function based on the discrimination error in the image error according to formula (7), and perform gradient descent on the second loss function to minimize the second loss function, to obtain second gradient information.
After the first gradient information and the second gradient information are obtained, the updating module can update the parameters of the model to be trained according to the first gradient information and the second gradient information so as to adjust the parameters or the weights of the model to be trained to appropriate values, and then the image processing model can be obtained, wherein a generating network (including a residual error network and a priori estimation network) and a discrimination network in the model to be trained can be updated alternately.
The extracting unit 302 is configured to perform feature extraction on a target object in an image to be processed to obtain target feature position information.
In some embodiments, the extraction unit 302 is specifically configured to: extracting the characteristics of a target object in an image to be processed by adopting an image processing model to obtain target characteristic position information; alternatively, the extracting unit 302 is specifically configured to: determining an object identifier and a feature identifier of a target object, searching the target object from the image to be processed according to the object identifier, and extracting the feature and the position of the target object according to the feature identifier when the target object is searched, so as to obtain target feature position information.
Specifically, after obtaining the image processing model, the resolution of the image may be converted through the image processing model, first, the extracting unit 302 calls a priori estimation network in the image processing model, and performs feature extraction on the target object in the image to be processed by using the priori estimation network, for example, when the target object is a human face, the priori estimation network may perform feature extraction on human face organs of the human face in the image to be processed, so as to obtain feature position information of the human face organs, such as eyes, eyebrows, nose, and mouth. When the target object is a vehicle, the prior estimation network can be adopted to extract the characteristics of the vehicle in the image to be processed, and characteristic position information of wheels, license plates, windows, logos, lamps, mirrors and the like is obtained.
Alternatively, the extracting unit 302 may obtain the target feature location information in other manners, for example, the extracting unit 302 may determine an object identifier and a feature identifier of the target object first, where the object identifier may include one or more identifiers for uniquely identifying each target object; the feature identifier may include one or more features for uniquely identifying one or more features included in the target object, and the object identifier and the feature identifier may be names or numbers composed of numbers, letters and/or letters, or outline identifiers, etc. Then, a target object can be searched from the image to be processed according to the object identifier, and when the target object is not searched, operations such as extracting the characteristics and the position of the target object do not need to be executed; when the target object is found, the characteristics and the position of the target object can be extracted according to the characteristic identification to obtain the position information of the target characteristic.
A dividing unit 303, configured to perform feature region division on the target object to obtain target divided region information.
In some embodiments, the segmentation unit 303 is specifically configured to: carrying out characteristic region segmentation on the target object by adopting an image processing model to obtain target segmentation region information; alternatively, the dividing unit 303 is specifically configured to: determining an object identifier and a feature identifier of a target object, searching the target object from the image to be processed according to the object identifier, and when the target object is searched, segmenting the features of the target object and the region where the target object is located according to the feature identifier to obtain target segmentation region information.
Specifically, the segmentation unit 303 may call a priori estimation network in the image processing model, and perform feature region segmentation on the target object in the image to be processed by using the priori estimation network, for example, when the target object is a human face, perform feature region segmentation on a human face organ in the image to be processed by using the priori estimation network, so as to obtain segmentation region information of the human face organ, such as eyes, eyebrows, nose, and mouth. When the target object is a vehicle, the prior estimation network can be adopted to segment the characteristic region of the vehicle in the image to be processed, so as to obtain the segmented region information of wheels, license plates, windows, logos, lamps, mirrors and the like. Alternatively, the segmentation unit 303 may obtain the target segmentation area information in other manners, for example, the image processing apparatus searches for the target object from the image to be processed according to the object identifier, and when the target object is found, extracts the feature of the target object and the position thereof according to the feature identifier to obtain the target feature position information.
And the increasing unit 304 is configured to increase, by using a preset image processing model, an original resolution of the image to be processed based on the target feature position information and the target segmentation area information.
The image processing model is formed by training feature position information and segmentation region information of preset objects in a plurality of training sample images. Since the image processing model is obtained by training based on the feature position information and the segmentation region information of the preset object in the training sample images with different resolutions, the increasing unit 304 may increase the original resolution of the image to be processed by using the target feature position information and the target segmentation region information of the target object in the image to be processed, which are obtained by the image processing model, where the fourth resolution is greater than the third resolution and the second resolution, that is, the low-resolution image is converted into the high-resolution image. The target object is similar to the above-mentioned preset object, for example, the preset object may include a human face or a vehicle. The target feature position information is similar to the first feature position information, and the target divided region information is similar to the first divided region information.
In some embodiments, when the target object is a human face, the target feature position information is position information of a human face organ, and the target segmentation area information is segmentation area information of the human face organ, and the height adjustment unit 304 may be specifically configured to: and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the position information of the human face organ and the information of the human face organ segmentation region.
The face organ position information may include position information of features such as eyes, eyebrows, a nose, a mouth, a face contour and the like, the position information of each feature may include position information of a plurality of feature points, and the face organ segmentation region information may include segmentation region information such as hair, eyes, eyebrows, a nose, a mouth, a face and the like.
The increasing unit 304 may increase the original resolution of the image to be processed based on the face organ position information and the face organ segmentation region information through the image processing model to obtain the processed image, that is, the original resolution of the image to be processed may be increased to a preset resolution value, and the preset resolution value may be flexibly set according to actual needs. For example, as shown in fig. 5, the higher-multiple pixel points can be restored according to each pixel point in the low-resolution image, so that the high-resolution image can be obtained, the higher-multiple super-resolution effect can be achieved, the restoration of details such as human face organs and contours can be effectively promoted, the image quality is greatly improved, and the display effect of the image is better.
In some embodiments, when the target object is a vehicle, the target feature position information is vehicle feature position information, and the target segmented region information is vehicle segmented region information, and the height adjustment unit 304 may be specifically configured to: and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the vehicle characteristic position information and the vehicle segmentation region information.
The vehicle characteristic position information may include position information of vehicle characteristics such as wheels, license plates, windows, logos, lamps and mirrors, and the vehicle segmentation area information may include vehicle characteristic segmentation area information such as wheels, license plates, windows, logos, lamps and mirrors. The increasing unit 304 may increase the original resolution of the image to be processed based on the vehicle characteristic position information and the vehicle segmentation area information through the image processing model, so as to obtain a processed image. The original resolution of the image to be processed can be increased to a preset resolution value, and the preset resolution value can be flexibly set according to actual needs.
As can be seen from the above, in the embodiment of the present invention, when the resolution of the image needs to be increased, the first obtaining unit 301 may obtain the image to be processed, perform feature extraction on the target object in the image to be processed by the extracting unit 302 to obtain the target feature position information, and perform feature region segmentation on the target object in the image to be processed by the segmenting unit 303 to obtain the target segmentation region information; then, the increasing unit 304 increases the original resolution of the image to be processed, that is, converts the image to be processed with low resolution into an image with high resolution based on the target feature position information and the target segmentation region information by using a preset image processing model. According to the scheme, the original resolution of the image to be processed can be accurately adjusted to be high based on the target feature position information and the target segmentation area information of the target object in the image to be processed, so that the definition of the processed image can be improved, and the quality of the processed image is improved.
The embodiment of the invention also provides network equipment, which can be equipment such as a server or a terminal. Fig. 11 is a schematic diagram showing a structure of a network device according to an embodiment of the present invention, specifically:
the network device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the network device architecture shown in fig. 11 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the network device, connects various parts of the entire network device using various interfaces and lines, performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby integrally monitoring the network device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The network device further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The network device may also include an input unit 404, where the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
acquiring an image to be processed; carrying out feature extraction on a target object in an image to be processed to obtain target feature position information; carrying out characteristic region segmentation on the target object to obtain target segmentation region information; and adopting a preset image processing model, and increasing the original resolution of the image to be processed based on the target characteristic position information and the target segmentation area information, wherein the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images.
Optionally, before the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation area information, the method further includes:
acquiring a plurality of training sample images, and determining first characteristic position information and first segmentation area information of a preset object in each training sample image; reducing the original resolution of each training sample image to a preset value to obtain a training sample image with reduced resolution; acquiring second characteristic position information and second segmentation area information of a preset object in each training sample image with the resolution being reduced; and training a preset model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information and the second segmentation area information to obtain an image processing model.
Optionally, the feature position information further includes third feature position information, and the step of training a preset training model according to the first feature position information, the first divided area information, the second feature position information, and the second divided area information to obtain an image processing model includes:
adopting a residual error network in the model to be trained, converging the resolution of each training sample image with the lowered resolution to the original resolution of the training sample image based on the second characteristic position information and the second division area information, and obtaining a training sample image with the converged resolution; calculating third characteristic position information of a preset object in the training sample image after resolution convergence by adopting a characteristic network in the model to be trained; and updating the parameters of the model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information, the second segmentation area information and the third characteristic position information to obtain an image processing model.
As can be seen from the above, when the resolution of the image needs to be increased, the embodiment of the present invention may acquire the image to be processed, perform feature extraction on the target object in the image to be processed to obtain target feature position information, and perform feature region segmentation on the target object in the image to be processed to obtain target segmentation region information; and then, a preset image processing model is adopted, and the original resolution of the image to be processed is increased based on the target characteristic position information and the target segmentation area information, namely, the image to be processed with low resolution is converted into the image with high resolution. According to the scheme, the original resolution of the image to be processed can be accurately adjusted to be high based on the target feature position information and the target segmentation area information of the target object in the image to be processed, so that the definition of the processed image can be improved, and the quality of the processed image is improved.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the image processing method, and are not described again here.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute steps of any one of the image processing methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:
acquiring an image to be processed; performing feature extraction on a target object in an image to be processed to obtain target feature position information; carrying out characteristic region segmentation on the target object to obtain target segmentation region information; and a preset image processing model is adopted, the original resolution of the image to be processed is increased based on the target characteristic position information and the target segmentation area information, and the image processing model is formed by training the characteristic position information and the segmentation area information of a preset object in a plurality of training sample images.
Optionally, before the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation area information, the method further includes:
acquiring a plurality of training sample images, and determining first characteristic position information and first segmentation area information of a preset object in each training sample image; reducing the original resolution of each training sample image to a preset value to obtain a training sample image with reduced resolution; acquiring second characteristic position information and second segmentation area information of a preset object in each training sample image with the resolution being adjusted to be low; and training a preset model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information and the second segmentation area information to obtain an image processing model.
Optionally, the feature position information further includes third feature position information, and the step of training a preset training model according to the first feature position information, the first divided area information, the second feature position information, and the second divided area information to obtain an image processing model includes:
adopting a residual error network in the model to be trained, based on the second characteristic position information and the second segmentation area information, converging the resolution of each training sample image with the lowered resolution to the original resolution of the training sample image, and obtaining the training sample image with the converged resolution; calculating third characteristic position information of a preset object in the training sample image after resolution convergence by adopting a characteristic network in the model to be trained; and updating the parameters of the model to be trained according to the first characteristic position information, the first segmentation region information, the second characteristic position information, the second segmentation region information and the third characteristic position information to obtain an image processing model.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any image processing method provided in the embodiment of the present invention, the beneficial effects that can be achieved by any image processing method provided in the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
The foregoing detailed description has provided a method, an apparatus, and a storage medium for image processing according to embodiments of the present invention, and the present disclosure has been made in detail by applying specific examples to explain the principles and embodiments of the present invention, and the description of the foregoing embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. An image processing method, characterized by comprising:
acquiring an image to be processed;
performing feature extraction on a target object in the image to be processed to obtain target feature position information;
carrying out characteristic region segmentation on the target object to obtain target segmentation region information;
increasing the original resolution of the image to be processed by adopting a preset image processing model based on the target characteristic position information and the target segmentation region information, wherein the image processing model is obtained by updating the parameters of the model to be trained by the gradient information determined by the characteristic position information and the segmentation region information of a preset object in a plurality of training sample images;
the feature position information comprises first feature position information, second feature position information and third feature position information, and the segmentation region information comprises first segmentation region information and second segmentation region information;
the gradient information is obtained based on the first characteristic error, the second characteristic error and the image error;
the first characteristic error comprises a characteristic position error and a segmentation area error, the characteristic position error is obtained by calculating an error between the first characteristic position information and the second characteristic position information by adopting a priori estimation network in the model to be trained, and the segmentation area error is obtained by calculating an error between the first segmentation area information and the second segmentation area information;
calculating the error between the first characteristic position information and the third characteristic position information by adopting a characteristic network in the model to be trained to obtain the second characteristic error;
and the image error is the image error between the training sample image with the converged resolution and the original training sample image determined by adopting a residual error network in the model to be trained.
2. The image processing method according to claim 1, wherein the feature position information includes first feature position information and second feature position information, the segmentation region information includes first segmentation region information and second segmentation region information, and before the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation region information, the method further comprises:
acquiring a plurality of training sample images, and determining first characteristic position information and first segmentation area information of a preset object in each training sample image;
reducing the original resolution of each training sample image to a preset value to obtain a training sample image with reduced resolution;
acquiring second characteristic position information and second segmentation area information of a preset object in each training sample image with the resolution being reduced;
and training a preset model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information and the second segmentation area information to obtain the image processing model.
3. The image processing method according to claim 2, wherein the step of acquiring the second feature position information and the second divided area information of the preset object in each of the reduced-resolution training sample images includes:
and calculating second characteristic position information and second segmentation area information of a preset object in each training sample image with the resolution ratio adjusted to be low by adopting the prior estimation network in the model to be trained.
4. The image processing method according to claim 2, wherein the feature position information further includes third feature position information, and the step of training a preset training model based on the first feature position information, the first divided region information, the second feature position information, and the second divided region information to obtain the image processing model includes:
adopting a residual error network in the model to be trained based on the second characteristic position information and the second segmentation area information, converging the resolution of each training sample image with the lowered resolution to the original resolution of the training sample image, and obtaining the training sample image with the converged resolution;
calculating third characteristic position information of a preset object in the training sample image after resolution convergence by adopting a characteristic network in the model to be trained;
and updating the parameters of the model to be trained according to the first characteristic position information, the first segmentation region information, the second characteristic position information, the second segmentation region information and the third characteristic position information to obtain an image processing model.
5. The image processing method according to claim 4, wherein the step of updating the parameters of the model to be trained according to the first feature position information, the first segmentation area information, the second feature position information, the second segmentation area information, and the third feature position information to obtain the image processing model comprises:
calculating an error between the first characteristic position information and the second characteristic position information by adopting a priori estimation network in the model to be trained to obtain a characteristic position error, calculating an error between the first segmentation region information and the second segmentation region information to obtain a segmentation region error, and setting the characteristic position error and the segmentation region error as a first characteristic error;
calculating an error between the first characteristic position information and the third characteristic position information by adopting a characteristic network in the model to be trained to obtain a second characteristic error;
determining an image error between the training sample image with the converged resolution and the original training sample image by adopting a residual error network in the model to be trained;
obtaining gradient information according to the first characteristic error, the second characteristic error and the image error;
and updating the parameters of the model to be trained according to the gradient information to obtain an image processing model.
6. The method according to claim 5, wherein the step of determining the image error between the resolution-converged training sample image and the original training sample image by using a residual network in the model to be trained comprises:
acquiring a pixel error between the training sample image with the converged resolution and the original training sample image by adopting a residual error network in the model to be trained;
judging the training sample image with the converged resolution and the original training sample image by adopting a judging network in the model to be trained to obtain an identification error and a confrontation error;
setting the pixel error, the discrimination error and the countermeasure error as an image error.
7. The method according to claim 6, wherein the step of obtaining gradient information from the first and second feature errors and the image error comprises:
constructing a first loss function based on the first characteristic error, the second characteristic error, the pixel error, and the counter error;
carrying out gradient descent on the first loss function to obtain first gradient information;
constructing a second loss function based on the discrimination error, and performing gradient descent on the second loss function to obtain second gradient information;
setting the first gradient information and the second gradient information as gradient information.
8. The image processing method according to any one of claims 1 to 7, wherein the step of performing feature extraction on the target object in the image to be processed to obtain target feature position information includes:
performing feature extraction on a target object in the image to be processed by adopting the image processing model to obtain target feature position information; or,
determining an object identifier and a feature identifier of a target object, searching the target object from the image to be processed according to the object identifier, and extracting the feature and the position of the target object according to the feature identifier when the target object is found to obtain target feature position information.
9. The image processing method according to any one of claims 1 to 7, wherein the step of performing feature region segmentation on the target object to obtain target segmentation region information includes:
carrying out characteristic region segmentation on the target object by adopting the image processing model to obtain target segmentation region information; or,
determining an object identifier and a feature identifier of a target object, searching the target object from the image to be processed according to the object identifier, and when the target object is found, segmenting the features of the target object and the region where the target object is located according to the feature identifier to obtain target segmentation region information.
10. The image processing method according to any one of claims 1 to 7, wherein when the target object is a human face, the target feature position information is human face organ position information, the target segmentation region information is human face organ segmentation region information, and the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation region information includes:
and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the position information of the human face organ and the information of the human face organ segmentation area.
11. The image processing method according to any one of claims 1 to 7, wherein when the target object is a vehicle, the target feature position information is vehicle feature position information, the target segmentation area information is vehicle segmentation area information, and the step of using a preset image processing model to increase the original resolution of the image to be processed based on the target feature position information and the target segmentation area information comprises:
and increasing the original resolution of the image to be processed by adopting a preset image processing model based on the vehicle characteristic position information and the vehicle segmentation region information.
12. An image processing apparatus characterized by comprising:
the first acquisition unit is used for acquiring an image to be processed;
the extraction unit is used for extracting the characteristics of the target object in the image to be processed to obtain target characteristic position information;
the segmentation unit is used for carrying out characteristic region segmentation on the target object to obtain target segmentation region information;
the image processing module is used for updating the parameters of the model to be trained according to the gradient information determined by the characteristic position information and the segmentation area information of the preset object in the plurality of training sample images;
the feature position information comprises first feature position information, second feature position information and third feature position information, and the segmentation area information comprises first segmentation area information and second segmentation area information;
the gradient information is obtained based on the first characteristic error, the second characteristic error and the image error;
the first characteristic error comprises a characteristic position error and a segmentation area error, the characteristic position error is obtained by calculating an error between the first characteristic position information and the second characteristic position information by adopting a priori estimation network in the model to be trained, and the segmentation area error is obtained by calculating an error between the first segmentation area information and the second segmentation area information;
calculating the error between the first characteristic position information and the third characteristic position information by adopting a characteristic network in the model to be trained to obtain the second characteristic error;
and the image error is the image error between the training sample image with the converged resolution and the original training sample image determined by adopting a residual error network in the model to be trained.
13. The image processing apparatus according to claim 12, wherein the feature position information includes first feature position information and second feature position information, and the divided region information includes first divided region information and second divided region information, the image processing apparatus further comprising:
the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for acquiring a plurality of training sample images and determining first characteristic position information and first segmentation area information of a preset object in each training sample image;
the reducing unit is used for reducing the original resolution of each training sample image to a preset value to obtain a training sample image with reduced resolution;
the second acquisition unit is used for acquiring second characteristic position information and second segmentation area information of a preset object in each training sample image with the lowered resolution;
and the training unit is used for training a preset model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information and the second segmentation area information to obtain the image processing model.
14. The image processing apparatus according to claim 13, wherein the feature position information further includes third feature position information, the training unit includes:
a convergence subunit, configured to converge, by using a residual network in the model to be trained, the resolution of each reduced-resolution training sample image to an original resolution of the training sample image based on the second feature position information and the second partition area information, so as to obtain a training sample image with a converged resolution;
the calculating subunit is configured to calculate, by using the feature network in the model to be trained, third feature position information of a preset object in the training sample image after the resolution is converged;
and the updating subunit is used for updating the parameters of the model to be trained according to the first characteristic position information, the first segmentation area information, the second characteristic position information, the second segmentation area information and the third characteristic position information to obtain an image processing model.
15. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image processing method according to any one of claims 1 to 11.
CN201810475606.9A 2018-05-17 2018-05-17 Image processing method, device and storage medium Active CN108921782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810475606.9A CN108921782B (en) 2018-05-17 2018-05-17 Image processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810475606.9A CN108921782B (en) 2018-05-17 2018-05-17 Image processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN108921782A CN108921782A (en) 2018-11-30
CN108921782B true CN108921782B (en) 2023-04-14

Family

ID=64403366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810475606.9A Active CN108921782B (en) 2018-05-17 2018-05-17 Image processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN108921782B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008817B (en) * 2019-01-29 2021-12-28 北京奇艺世纪科技有限公司 Model training method, image processing method, device, electronic equipment and computer readable storage medium
CN110059652B (en) * 2019-04-24 2023-07-25 腾讯科技(深圳)有限公司 Face image processing method, device and storage medium
WO2020241142A1 (en) * 2019-05-27 2020-12-03 昭和電工株式会社 Image analysis device, method, and program
CN110276352A (en) * 2019-06-28 2019-09-24 拉扎斯网络科技(上海)有限公司 Identification recognition method and device, electronic equipment and computer readable storage medium
CN110288097A (en) * 2019-07-01 2019-09-27 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of model training
CN110335199A (en) * 2019-07-17 2019-10-15 上海骏聿数码科技有限公司 A kind of image processing method, device, electronic equipment and storage medium
CN112241640B (en) * 2019-07-18 2023-06-30 杭州海康威视数字技术股份有限公司 Graphic code determining method and device and industrial camera
CN110602484B (en) * 2019-08-29 2021-07-27 海南电网有限责任公司海口供电局 Online checking method for shooting quality of power transmission line equipment
CN110547210B (en) * 2019-09-04 2021-10-01 北京海益同展信息科技有限公司 Feed supply method and system, computer system, and storage medium
CN110675312B (en) * 2019-09-24 2023-08-29 腾讯科技(深圳)有限公司 Image data processing method, device, computer equipment and storage medium
CN111080515A (en) * 2019-11-08 2020-04-28 北京迈格威科技有限公司 Image processing method, neural network training method and device
CN110889809B9 (en) * 2019-11-28 2023-06-23 RealMe重庆移动通信有限公司 Image processing method and device, electronic equipment and storage medium
CN113628122A (en) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 Image processing method, model training method, device and equipment
CN113744130B (en) * 2020-05-29 2023-12-26 武汉Tcl集团工业研究院有限公司 Face image generation method, storage medium and terminal equipment
CN111932555A (en) * 2020-07-31 2020-11-13 商汤集团有限公司 Image processing method and device and computer readable storage medium
CN112215225B (en) * 2020-10-22 2024-03-15 北京通付盾人工智能技术有限公司 KYC certificate verification method based on computer vision technology
CN112418054B (en) * 2020-11-18 2024-07-19 北京字跳网络技术有限公司 Image processing method, apparatus, electronic device, and computer readable medium
CN112381717A (en) * 2020-11-18 2021-02-19 北京字节跳动网络技术有限公司 Image processing method, model training method, device, medium, and apparatus
CN112945240B (en) * 2021-03-16 2022-06-07 北京三快在线科技有限公司 Method, device and equipment for determining positions of feature points and readable storage medium
CN112926580B (en) * 2021-03-29 2023-02-03 深圳市商汤科技有限公司 Image positioning method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704857A (en) * 2017-09-25 2018-02-16 北京邮电大学 A kind of lightweight licence plate recognition method and device end to end
WO2018054283A1 (en) * 2016-09-23 2018-03-29 北京眼神科技有限公司 Face model training method and device, and face authentication method and device
CN107958246A (en) * 2018-01-17 2018-04-24 深圳市唯特视科技有限公司 A kind of image alignment method based on new end-to-end human face super-resolution network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10169871B2 (en) * 2016-01-21 2019-01-01 Elekta, Inc. Systems and methods for segmentation of intra-patient medical images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018054283A1 (en) * 2016-09-23 2018-03-29 北京眼神科技有限公司 Face model training method and device, and face authentication method and device
CN107704857A (en) * 2017-09-25 2018-02-16 北京邮电大学 A kind of lightweight licence plate recognition method and device end to end
CN107958246A (en) * 2018-01-17 2018-04-24 深圳市唯特视科技有限公司 A kind of image alignment method based on new end-to-end human face super-resolution network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Image Super-Resolution via Deep Recursive Residual Network;Tai Ying 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;全文 *
Video Super Resolution with Generative Adversarial Network;Karthika Gopan 等;《2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI)》;全文 *
生成对抗网络理论模型和应用综述;徐一峰;《金华职业技术学院学报》;全文 *

Also Published As

Publication number Publication date
CN108921782A (en) 2018-11-30

Similar Documents

Publication Publication Date Title
CN108921782B (en) Image processing method, device and storage medium
CN109508678B (en) Training method of face detection model, and detection method and device of face key points
WO2022001509A1 (en) Image optimisation method and apparatus, computer storage medium, and electronic device
WO2022078041A1 (en) Occlusion detection model training method and facial image beautification method
CN106682632B (en) Method and device for processing face image
CN112308095A (en) Picture preprocessing and model training method and device, server and storage medium
CN113420719B (en) Method and device for generating motion capture data, electronic equipment and storage medium
CN111160202B (en) Identity verification method, device, equipment and storage medium based on AR equipment
CN112581370A (en) Training and reconstruction method of super-resolution reconstruction model of face image
CN111008935B (en) Face image enhancement method, device, system and storage medium
CN111985281B (en) Image generation model generation method and device and image generation method and device
CN113723317B (en) Reconstruction method and device of 3D face, electronic equipment and storage medium
CN112381104A (en) Image identification method and device, computer equipment and storage medium
US11423630B1 (en) Three-dimensional body composition from two-dimensional images
CN111723707A (en) Method and device for estimating fixation point based on visual saliency
CN114782864B (en) Information processing method, device, computer equipment and storage medium
CN111292262A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN114783022A (en) Information processing method and device, computer equipment and storage medium
CN114821404A (en) Information processing method and device, computer equipment and storage medium
CN111476060A (en) Face definition analysis method and device, computer equipment and storage medium
CN110766631A (en) Face image modification method and device, electronic equipment and computer readable medium
CN115116468A (en) Video generation method and device, storage medium and electronic equipment
CN114862716A (en) Image enhancement method, device and equipment for face image and storage medium
CN114004974A (en) Method and device for optimizing images shot in low-light environment
CN113128277A (en) Generation method of face key point detection model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant