CN113516592A

CN113516592A - Image processing method, model training method, device and equipment

Info

Publication number: CN113516592A
Application number: CN202010280796.6A
Authority: CN
Inventors: 陈超锋; 張磊; 李晓明; 林宪晖
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-19

Abstract

The embodiment of the invention provides an image processing method, a model training method, a device and equipment. The method comprises the following steps: acquiring a face image to be processed; determining a semantic graph corresponding to the face image, wherein the semantic graph comprises semantic information for identifying a face structure in the face image; and obtaining a target image corresponding to the face image based on the semantic image and the face image, wherein the definition of the target image is different from that of the face image. According to the technical scheme provided by the embodiment, the semantic graph is determined to process the face image to obtain the target image, and the target image is obtained by fully combining the semantic information of the face structure in the face image and analyzing and processing the semantic information, so that the image can be processed without obtaining a high-quality image, the quality and the efficiency of processing the face image are ensured, the difficulty degree of processing the face image is reduced, and the method can be applied to various application scenes.

Description

Image processing method, model training method, device and equipment

Technical Field

The invention relates to the technical field of image processing, in particular to an image processing method, a model training device and image processing equipment.

Background

In the technical field of image processing, the method has wide application scenes for enhancing and sharpening the blurred face image in the image or video; for example: in monitoring security, the enhancement of the low-definition face image can assist in judging the identity of people in monitoring, or the restoration processing of the face image in an old photo and an old movie and television play can not only improve the quality of the image, but also improve the watching experience of audiences.

In many current face restoration schemes, it is a feasible technical solution to utilize details in a high-definition face reference image to compensate a low-definition face image, but the following constraints are limited: it is very difficult to find a high-definition face reference image similar to the structural appearance of the low-definition face image, and the enhancement effect of the face image can be obviously influenced by the similarity of the structural appearance.

Disclosure of Invention

The embodiment of the invention provides an image processing method, a model training device and image processing equipment, which can realize the processing operation of an image without acquiring a high-definition face image, ensure the quality and effect of image processing, reduce the difficulty degree of processing the image and enable the image processing method to be widely applied to various application scenes.

In a first aspect, an embodiment of the present invention provides an image processing method, including:

acquiring a face image to be processed;

determining a semantic graph corresponding to the face image, wherein the semantic graph comprises semantic information used for identifying a face structure in the face image;

and obtaining a target image corresponding to the face image based on the semantic graph and the face image, wherein the definition of the target image is different from that of the face image.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:

the third acquisition module is used for acquiring a face image to be processed;

a third determining module, configured to determine a semantic graph corresponding to the face image, where the semantic graph includes semantic information for identifying a face structure in the face image;

and the third processing module is used for obtaining a target image corresponding to the face image based on the semantic graph and the face image, and the definition of the target image is different from that of the face image.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image processing method of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to make a computer implement the image processing method in the first aspect when executed.

In a fifth aspect, an embodiment of the present invention provides an image processing method, including:

acquiring an image to be processed;

determining a semantic graph corresponding to the image to be processed, wherein the semantic graph is used for identifying semantic information included in the image to be processed;

and obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed, wherein the definition of the target image is different from that of the image to be processed.

In a sixth aspect, an embodiment of the present invention provides an image processing apparatus, including:

the first acquisition module is used for acquiring an image to be processed;

the first determination module is used for determining a semantic graph corresponding to the image to be processed, and the semantic graph is used for identifying semantic information included in the image to be processed;

the first processing module is used for obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed, and the definition of the target image is different from that of the image to be processed.

In a seventh aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image processing method of the fifth aspect.

In an eighth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to enable a computer to execute the image processing method in the fifth aspect.

In a ninth aspect, an embodiment of the present invention provides a model training method, including:

acquiring a first image and a reference image corresponding to the first image, wherein the definition of the reference image is different from that of the first image;

determining a semantic map corresponding to the first image, the semantic map identifying semantic information included in the first image;

and learning and training a spatial adaptive convolution residual error network based on the semantic graph, the reference image and the first image to obtain a machine learning model, wherein the machine learning model is used for determining a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

In a tenth aspect, an embodiment of the present invention provides a model training apparatus, including:

the second acquisition module is used for acquiring a first image and a reference image corresponding to the first image, wherein the definition of the reference image is different from that of the first image;

a second determining module for determining a semantic map corresponding to the first image, the semantic map being used to identify semantic information included in the first image;

and the second training module is used for learning and training a spatial adaptive convolution residual error network based on the semantic graph, the reference image and the first image to obtain a machine learning model, wherein the machine learning model is used for determining a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

In an eleventh aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the model training method in the ninth aspect.

In a twelfth aspect, an embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to make a computer implement the model training method in the ninth aspect when executed.

In the image processing method, the model training method, the device and the equipment provided by the embodiment, the semantic graph corresponding to the face image is determined, the face image is processed according to the semantic graph, and the target image corresponding to the face image is obtained, so that the image can be processed without obtaining a high-quality image, the quality and the efficiency of processing the face image are ensured, in addition, the target image is obtained by fully combining the semantic information of the face structure in the face image for analysis and processing, the effect of processing the face image is ensured, the difficulty degree of processing the face image is also reduced, the method can be suitable for analyzing and processing the face image in various application scenes, and the practicability of the method is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

fig. 2 is a schematic view of an application scenario of an image processing method according to an embodiment of the present invention;

fig. 3 is a schematic view of an application scenario of an image processing method according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of determining a semantic graph corresponding to the image to be processed according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of determining a semantic graph corresponding to the image to be processed based on the image coding information according to the embodiment of the present invention;

fig. 6 is a schematic flowchart of a process of obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed according to the embodiment of the present invention;

fig. 7 is a schematic diagram of obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed according to the embodiment of the present invention;

FIG. 8 is a schematic diagram of a model training method according to an embodiment of the present invention;

fig. 9 is a schematic diagram of performing learning training on a spatial adaptive convolution residual error network based on the semantic graph, the reference image, and the first image to obtain a machine learning model according to the embodiment of the present invention;

FIG. 10 is a diagram illustrating an image processing method according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating another exemplary image processing method according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device corresponding to the image processing apparatus provided in the embodiment shown in fig. 12;

FIG. 14 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;

FIG. 15 is a schematic structural diagram of an electronic device corresponding to the model training apparatus provided in the embodiment shown in FIG. 14;

FIG. 16 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of an electronic device corresponding to the image processing apparatus provided in the embodiment shown in fig. 16.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

In order to facilitate understanding of the technical solutions of the present application, the following briefly describes the prior art:

with the popularization of mobile networks and broadband applications, users have increasingly high requirements on the quality of videos and photos. However, some of the old photos or classical film and television works taken before are often blurred and have poor viewing experience. The face image occupies a very important part in the movie and television work, namely, in the movie and television work or old photos, more application scenes containing characters exist, and a user is more sensitive to definition in the character image.

However, common Convolutional Neural Network (CNN) based deep learning picture restoration algorithms are Networks trained by using picture pairs (low quality images and high quality images), wherein the deep learning picture restoration algorithm may include at least one of the following: Super-Resolution generation adaptive Networks (SRGAN), depth Residual Channel Attention Networks (RCANs), Enhanced Super-Resolution generation adaptive Networks (ESRGANs), and so on.

However, when image processing is performed using the above-described trained convolutional neural network, there are the following disadvantages:

(1) the low-quality pictures used in the network training are generally obtained by manual down-sampling, which easily causes the trained network not to be suitable for the real low-quality face pictures.

(2) Networks generated by training often cannot process low-quality face pictures of different degrees at the same time, for example: the training network is used for learning and training through a preset low-quality image, and the image features of the low-quality image are within a certain range, so that the training generated network can only perform image processing on the image within the certain range, and when the range is exceeded, the accuracy and reliability of the image processing cannot be guaranteed.

(3) The prior knowledge of the face structure is not fully utilized, and the accuracy and reliability of processing the image are further improved.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention; referring to fig. 1, the present embodiment provides an image processing method, and the execution subject of the method may be an image processing apparatus, and it is understood that the image processing apparatus may be implemented as software, or a combination of software and hardware. Specifically, the processing method may include:

step S101: and acquiring an image to be processed.

Step S102: determining a semantic map corresponding to the image to be processed, wherein the semantic map is used for identifying semantic information included in the image to be processed.

Step S103: and obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed, wherein the definition of the target image is different from that of the image to be processed.

The following is a detailed description of the above steps:

step S101: and acquiring an image to be processed.

Wherein the image to be processed is a biological face image which needs to be subjected to image processing, it is understood that the image processing may include at least one of the following: the image processing method includes image enhancement processing, image blurring processing, image rendering processing, image editing processing and the like, specifically, the image enhancement processing may increase the definition of the image to be processed, the image blurring processing may reduce the definition of the image to be processed, the image rendering processing may perform rendering processing such as whitening and beautifying on an object in the image to be processed, and the image editing processing may perform various types of editing operations on the image to be processed, for example, filtering processing of the image, texture processing of the image, clipping processing of the image and the like.

In addition, the biological face image may refer to: face images, cat face images, dog face images, or a biological face avatar of other living being, etc. The image to be processed may include at least one of: image information obtained by photographing by a photographing device, image information in video information, a composite image, and the like. It is understood that the number of the images to be processed may be one or more, and when the number of the images to be processed is multiple, multiple images to be processed may form an image sequence, so that the image processing operation on the image sequence may be implemented. Moreover, the image classification of the image to be processed can be a static image or a dynamic image, so that the image processing operation on the static image or the dynamic image can be realized.

In addition, the embodiment does not limit the specific implementation manner of the image processing apparatus for acquiring the image to be processed, and a person skilled in the art may set the method according to specific application requirements and design requirements, for example: the photographing device may be in communication connection with the enhancing device, and after the photographing device photographs and obtains the image to be processed, the image processing device may obtain the image to be processed through the photographing device, specifically, the image processing device may actively obtain the image to be processed obtained by the photographing device, or the photographing device may actively send the image to be processed to the enhancing device, so that the image processing device may obtain the image to be processed. Still alternatively, the image to be processed may be stored in a preset area, and the image processing apparatus may obtain the image to be processed by accessing the preset area.

Wherein the semantic information may include at least one of: the visual layer semantics and the object layer semantics, specifically, the visual layer semantics may include: color, texture, shape, and the like; the object layer semantics may include property features of preset objects (objects located in the image to be processed), such as: status information of a certain preset object at a certain time, structural information included in a certain preset object, and the like. Of course, the semantic information may not only include the above-defined information, and those skilled in the art may set the semantic information according to the specific application requirement and design requirement, for example: the semantic information may include concept layer semantics and the like, which are not described in detail herein.

In addition, the embodiment is not limited to a specific implementation manner of determining the semantic graph corresponding to the image to be processed, and a person skilled in the art may set the semantic graph according to specific application requirements and design requirements, for example, an implementation manner may be that the semantic graph is determined by a preset machine learning model, and specifically, determining the semantic graph corresponding to the image to be processed may include:

step S1021: and analyzing and processing the image to be processed by utilizing a first machine learning model to obtain a semantic graph corresponding to the image to be processed, wherein the first machine learning model is trained to determine the semantic graph corresponding to the image to be processed.

The first machine learning model may be a semantic graph trained in advance to determine a semantic graph corresponding to the image to be processed, and it may be understood that the size of the obtained semantic graph is the same as or different from that of the image to be processed, and preferably, the size of the semantic graph may be the same as that of the image to be processed. In addition, the first machine learning model can be generated by performing learning training on the convolutional neural network, that is, the convolutional neural network is performed with learning training by using a preset reference image and a standard semantic graph corresponding to the reference image, so that the first machine learning model can be obtained. After the first machine learning model is established, the image to be processed can be analyzed and processed by utilizing the first machine learning model, so that a semantic graph corresponding to the image to be processed can be obtained.

In the embodiment, the trained first machine learning model is used for analyzing and processing the image to be processed to obtain the semantic graph corresponding to the image to be processed, so that the accuracy and the reliability of obtaining the semantic graph are effectively ensured, the quality and the efficiency of obtaining the target image based on the semantic graph are also ensured, and the stability and the reliability of the method are further improved.

Of course, the specific implementation manner of obtaining the semantic map corresponding to the image to be processed is not limited in this embodiment, and a person skilled in the art may also use other manners to determine the semantic map corresponding to the image to be processed, as long as it is ensured that the semantic map corresponding to the image to be processed is accurately obtained, which is not described herein again.

After the semantic graph and the image to be processed are obtained, the semantic graph and the image to be processed can be analyzed and processed, so that a target image corresponding to the image to be processed can be obtained, and the definition of the target image is different from that of the image to be processed; it is understood that the relationship between the definition of the target image and the definition of the image to be processed may include: the definition of the target image is higher than that of the image to be processed; or the definition of the target image is lower than that of the image to be processed.

Specifically, the embodiment does not limit the specific implementation manner of obtaining the target image corresponding to the image to be processed, and a person skilled in the art may set the target image according to specific application requirements and design requirements, for example: based on the semantic graph and the image to be processed, obtaining the target image corresponding to the image to be processed may include:

step S1031: and analyzing and processing the semantic graph and the image to be processed by utilizing a second machine learning model to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed, and the second machine learning model is trained to determine the target image corresponding to the image to be processed.

The second machine learning model may be formed by a spatial adaptive convolution residual network, and the second machine learning model may be pre-trained to determine a target image corresponding to the image to be processed, and the definition of the target image is different from that of the image to be processed. It can be understood that, when the definition of the target image is higher than that of the image to be processed, the second machine learning model is trained to determine the target image for enhancement processing of the image to be processed based on the semantic graph. When the definition of the target image is lower than that of the image to be processed, the second machine learning model is trained to determine the target image for performing fuzzy processing on the image to be processed based on the semantic graph.

Specifically, in this embodiment, the analyzing the semantic graph and the to-be-processed image by using the second machine learning model, and obtaining the target image corresponding to the to-be-processed image may include:

step S10311: and carrying out coding processing on the image to be processed to obtain image coding characteristics corresponding to the image to be processed.

Step S10312: and analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed.

After the image to be processed is acquired, encoding processing may be performed on the image to be processed, so as to obtain an image encoding feature corresponding to the image to be processed, where the image encoding feature may include image feature information for identifying image features included in the image to be processed, for example, the image feature information may include at least one of: position feature information, color feature information, shape feature information, size feature information, semantic feature information, texture feature information. After the semantic graph and the image coding features are obtained, the semantic graph and the image coding features can be analyzed and processed by utilizing a space self-adaptive convolution residual error network, so that a target image corresponding to the image to be processed can be obtained.

In addition, when the image to be processed is the face image to be processed, the number of the target images may be at least one, and when the number of the target images is multiple, a final target image may be determined based on the similarities between the multiple target images and the image to be processed. Specifically, at least one similarity may correspond to at least one target image and an image to be processed, and the similarity between the target image and the image to be processed may include: and the similarity between the structure and the appearance of the face in the target image and the structure and the appearance of the face in the image to be processed. The structure of the human face comprises at least one of the following components: face orientation (forward, left, right, etc.), pose (head up, head down, etc.), position information of the face relative to the image (center position, left position, right position, etc.); the appearance of the human face includes at least one of: hair features, skin tone features, brightness features, color features.

It will be appreciated that the similarity of different target images to the image to be processed may be the same or different. After the similarity between the image to be processed and different target images is acquired, at least one target image can be sorted based on the size of the similarity, so that a sorting queue of at least one target image based on different similarities can be acquired, a target image with the highest similarity can be acquired based on the sorting queue, and the selected target image is determined as a final target reference image, so that the quality and the effect of image processing can be effectively guaranteed.

In an application embodiment, referring to fig. 2, an image processing method capable of implementing an image enhancement operation is described as an example, at this time, an execution subject for executing the image processing method is an image processing apparatus, the image processing apparatus is communicatively connected to a client, when there is an image enhancement requirement for a user, an image processing request corresponding to the image enhancement requirement may be generated at the client, the image processing request corresponds to an image to be processed, then the client may transmit the generated image processing request and the image to be processed to the image processing apparatus, and after the image processing apparatus receives the image processing request and the image to be processed, the image to be processed may be processed based on the image processing request, which specifically includes:

step 1, receiving an image processing request and an image to be processed.

Step 2: and processing the image to be processed to obtain a semantic graph corresponding to the image to be processed.

And step 3: and inputting the image to be processed and the semantic graph into a preset second machine learning model, and obtaining a target image corresponding to the image to be processed, wherein the definition of the target image is higher than that of the image to be processed.

And 4, step 4: and transmitting the target image to the client so that the client can display the target image through a preset display area, and a user can view the target image subjected to image enhancement processing.

In a second application embodiment, referring to fig. 3, an image processing method capable of implementing an image blurring operation is described as an example, at this time, an execution subject for executing the image processing method is an image processing apparatus, the image processing apparatus is communicatively connected to a client, when there is an image blurring requirement for a user, an image processing request corresponding to the image blurring requirement may be generated at the client, the image processing request corresponds to an image to be processed, then the client may transmit the generated image processing request and the image to be processed to the image processing apparatus, and after the image processing apparatus receives the image processing request and the image to be processed, the image to be processed may be processed based on the image processing request, which specifically includes:

step 1, receiving an image processing request and an image to be processed.

And step 3: and inputting the image to be processed and the semantic graph into a preset second machine learning model, and obtaining a target image corresponding to the image to be processed, wherein the definition of the target image is lower than that of the image to be processed.

And 4, step 4: and transmitting the target image to the client so that the client can display the target image through a preset display area, and a user can view the target image after image blurring processing.

In the image processing method provided by the embodiment, the image to be processed is acquired, the semantic map corresponding to the image to be processed is determined, and the image to be processed is processed according to the semantic map, so that the target image corresponding to the image to be processed is acquired, thereby effectively realizing processing of the image without acquiring a high-quality image, and ensuring the quality and efficiency of processing the face image.

FIG. 4 is a schematic flow chart illustrating a process of determining a semantic graph corresponding to an image to be processed according to an embodiment of the present invention; on the basis of the foregoing embodiment, with continued reference to fig. 4, the present embodiment provides a manner that may implement determining a semantic graph corresponding to an image to be processed, and specifically, determining a semantic graph corresponding to an image to be processed may include:

step S401: image coding information corresponding to the image to be processed is acquired, and the image coding comprises information for identifying the image to be processed.

Step S402: and determining a semantic graph corresponding to the image to be processed based on the image coding information.

After the image to be processed is acquired, encoding processing may be performed on the image to be processed, so that image encoding information corresponding to the image to be processed may be acquired, where the image encoding information includes feature information used to identify the image to be processed, and the feature information included in the image encoding information may include semantic feature information, attribute feature information, and the like. After the image coding information is acquired, the image coding information may be processed, so that a semantic graph corresponding to the image to be processed may be acquired. Specifically, referring to fig. 5, in this embodiment, determining the semantic graph corresponding to the image to be processed based on the image coding information may include:

step S4021: semantic feature information corresponding to the image coding information is acquired.

Step S4022: and decoding the semantic feature information to obtain a semantic graph corresponding to the image to be processed.

Because the image coding information includes semantic feature information of the image to be processed, the semantic feature information corresponding to the image coding information can be extracted in order to accurately acquire the semantic graph. After the semantic feature information is acquired, the semantic feature information may be decoded, so that a semantic graph corresponding to the image to be processed may be acquired, and size information of the acquired semantic graph may be the same as size information of the image to be processed.

In the embodiment, the image coding information corresponding to the image to be processed is acquired, and then the semantic graph corresponding to the image to be processed is acquired based on the image coding information, so that the accuracy and reliability of acquiring the semantic graph are ensured, the quality and efficiency of acquiring the semantic graph are improved, and the stability and reliability of the method are effectively improved.

Fig. 6 is a schematic flowchart of a process of obtaining a target image corresponding to a to-be-processed image based on a semantic graph and the to-be-processed image according to an embodiment of the present invention; fig. 7 is a schematic diagram of obtaining a target image corresponding to an image to be processed based on a semantic graph and the image to be processed according to an embodiment of the present invention; on the basis of the foregoing embodiments, referring to fig. 6 to 7, in this embodiment, another implementation manner is provided for obtaining a target image corresponding to a to-be-processed image based on a semantic graph and the to-be-processed image, and specifically, the obtaining the target image corresponding to the to-be-processed image based on the semantic graph and the to-be-processed image may include:

step S601: and segmenting the image to be processed based on the semantic graph to obtain a semantic subgraph corresponding to the image to be processed.

Step S602: and processing the semantic subgraph to obtain a target subgraph corresponding to the semantic subgraph, wherein the definition of the target subgraph is different from that of the semantic subgraph.

Step S603: and splicing all the target subgraphs based on the semantic graph to obtain a target image corresponding to the image to be processed.

After the semantic graph is acquired, the image to be processed can be segmented based on the semantic graph, so that semantic subgraphs corresponding to the image to be processed can be acquired, and the number of the semantic subgraphs can be one or more. After the semantic subgraph is acquired, image processing can be carried out on the semantic subgraph, so that a target subgraph corresponding to the semantic subgraph can be acquired, and the definition of the target subgraph is different from that of the semantic subgraph. After the target subgraphs are obtained, all the target subgraphs are spliced based on the semantic graph, so that a target image corresponding to the image to be processed can be obtained.

In some examples, since the semantic subgraphs correspond to different parts or image areas, at this time, in order to meet personalized requirements of users, an image processing strategy corresponding to the semantic subgraphs may be obtained, where the image processing strategy may include image processing parameters corresponding to each semantic subgraph, and the image processing parameters are used to implement image processing on the semantic subgraphs, so that different image processing operations/image processing operations of different degrees may be implemented for different semantic subgraphs, thereby effectively meeting customization requirements of users and further improving flexibility and reliability of use of the method.

For example, referring to fig. 7, when the image to be processed is a face image to be processed, after the face image to be processed is acquired, a semantic graph corresponding to the face image to be processed may be determined, and then the face image to be processed may be segmented based on the semantic graph, so that semantic subgraphs corresponding to the image to be processed may be acquired, where the number of the semantic subgraphs may be one or more. After the semantic subgraph is acquired, image recovery processing can be carried out on the semantic subgraph, so that a target subgraph corresponding to the semantic subgraph can be acquired, and the definition of the target subgraph is higher than that of the semantic subgraph. After the target subgraphs are obtained, all the target subgraphs are spliced based on the semantic graph, so that a target image corresponding to the image to be processed can be obtained.

Similarly, when the image to be processed is the face image to be processed, after the face image to be processed is acquired, the semantic graph corresponding to the face image to be processed can be determined, and then the face image to be processed can be segmented based on the semantic graph, so that semantic subgraphs corresponding to the image to be processed can be acquired, and the number of the semantic subgraphs can be one or more. After the semantic subgraph is acquired, image processing can be carried out on the semantic subgraph, so that a target subgraph corresponding to the semantic subgraph can be acquired, and the definition of the target subgraph is lower than that of the semantic subgraph. After the target subgraphs are obtained, all the target subgraphs are spliced based on the semantic graph, so that a target image corresponding to the image to be processed can be obtained.

In other examples, when the image to be processed is a face image to be processed, after the face image to be processed is acquired, a semantic graph corresponding to the face image to be processed may be determined, and then the face image to be processed may be segmented based on the semantic graph, so that semantic subgraphs corresponding to the image to be processed may be acquired, where the number of the semantic subgraphs may be multiple, for example: semantic subgraph of eye, semantic subgraph of mouth, semantic subgraph of face, semantic subgraph of forehead, semantic subgraph of chin, etc. After the different semantic subgraphs are acquired, different image recovery processing can be performed on the different semantic subgraphs, for example: the method comprises the steps of carrying out image recovery operation on semantic subgraphs of eyes according to a first image processing parameter, carrying out image recovery operation on semantic subgraphs of mouths according to a second image processing parameter, carrying out image recovery operation on semantic subgraphs of faces according to a third image processing parameter, carrying out image recovery operation on semantic subgraphs of forehead according to a fourth image processing parameter, and carrying out image recovery operation on semantic subgraphs of chin according to a fourth image processing parameter, so that target subgraphs corresponding to the semantic subgraphs can be obtained, wherein the definition of the target subgraphs is higher than that of the semantic subgraphs, and the definitions of the target subgraphs can be the same or different. After the target subgraphs are obtained, all the target subgraphs are spliced based on the semantic graph, so that a target image corresponding to the image to be processed can be obtained.

In the embodiment, the image to be processed is segmented based on the semantic graph to obtain the semantic subgraph corresponding to the image to be processed, then the semantic subgraph is processed to obtain the target subgraph corresponding to the semantic subgraph, and all the target subgraphs are spliced based on the semantic graph, so that the target image is effectively and accurately obtained, and the flexible reliability of obtaining the target image is improved.

FIG. 8 is a schematic diagram of a model training method according to an embodiment of the present invention; referring to fig. 8, the embodiment provides a model training method, and the execution subject of the method may be a model training apparatus, and it is understood that the model training apparatus may be implemented as software, or a combination of software and hardware. Specifically, the method may include:

step S801: the method comprises the steps of obtaining a first image and a reference image corresponding to the first image, wherein the definition of the reference image is different from that of the first image.

Step S802: a semantic map corresponding to the first image is determined, the semantic map identifying semantic information included in the first image.

Step S803: and based on the semantic graph, the reference image and the first image, learning and training the spatial adaptive convolution residual error network to obtain a machine learning model, wherein the machine learning model is used for determining a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

The first image and the reference image are the same image with different definition, and in specific implementation, the definition of the reference image may be higher than that of the first image, or the definition of the reference image is lower than that of the first image. The first image and the reference image may be stored in a preset area, and the first image and the reference image may be acquired by accessing the preset area. In a specific application, the plurality of first images may be a plurality of preset blurred images, and the first image may include at least one of: image information obtained by photographing by a photographing device, image information in video information, a composite image, and the like. The embodiment does not limit the specific implementation manner of the training device for acquiring the first image, and a person skilled in the art may set the first image according to specific application requirements and design requirements, for example: the shooting device may be in communication connection with the training device, and after the shooting device shoots and obtains the first image, the training device may obtain the first image through the shooting device, specifically, the training device may actively obtain the first image obtained by the shooting device, or the shooting device may actively send the first image to the training device, so that the training device obtains the first image. Still alternatively, the first image may be stored in a preset area, and the training apparatus may obtain the first image by accessing the preset area.

After the first image is acquired, the first image may be analyzed, so that a semantic map corresponding to the first image may be obtained, the semantic map being used to identify semantic information included in the first image. After the semantic graph is obtained, learning and training can be performed on the spatial adaptive convolution residual error network based on the semantic graph, the reference image and the first image, so that a machine learning model can be obtained, the machine learning model is used for determining a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

In the model training method provided by this embodiment, a first image and a reference image corresponding to the first image are obtained; the method comprises the steps of determining a semantic graph corresponding to a first image, and learning and training a spatial adaptive convolution residual error network based on the semantic graph, a reference image and the first image, so that a machine learning model suitable for processing images in all definition ranges can be obtained, the machine learning model can determine a second image corresponding to the first image, and the purpose of analyzing and processing the images based on the generated machine learning model is achieved, so that the application range of the machine learning model is effectively guaranteed, and the practicability of the model training method is improved.

In some examples, determining the semantic graph corresponding to the first image may include:

step S8021: the first image is analyzed and processed by a first machine learning model, a semantic graph corresponding to the first image is obtained, and the first machine learning model is trained to determine the semantic graph corresponding to the first image.

In still other examples, determining the semantic graph corresponding to the first image may include:

step S8022: image coding information corresponding to the first image is acquired, the image coding including information for identifying the first image.

Step S8023: based on the image coding information, a semantic graph corresponding to the first image is determined.

Specifically, based on the image coding information, determining the semantic graph corresponding to the first image may include:

step S80231: semantic feature information corresponding to the image coding information is acquired.

Step S80232: and decoding the semantic feature information to obtain a semantic graph corresponding to the first image.

The specific implementation process and technical effect of the above steps in this embodiment are similar to the specific implementation process and technical effect of determining the semantic graph corresponding to the first image in the above embodiment, and specific reference may be made to the above statements, which are not described herein again.

Fig. 9 is a schematic diagram of a machine learning model obtained by performing learning training on a spatial adaptive convolution residual error network based on a semantic graph, a reference image, and a first image according to an embodiment of the present invention; based on the foregoing embodiment, with reference to fig. 9, in this embodiment, the performing learning training on the spatial adaptive convolution residual error network based on the semantic graph, the reference image, and the first image to obtain the machine learning model may include:

step S901: and carrying out coding processing on the first image to acquire image coding characteristics corresponding to the first image.

Step S902: and analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a second image corresponding to the first image, wherein the definition of the second image is different from that of the first image.

Step S903: based on the reference image and the second image, a machine learning model is generated.

After the first image is acquired, the first image may be subjected to encoding processing, so that image encoding information corresponding to the first image may be acquired, the image encoding information includes information for identifying the first image, and the information included in the image encoding information may include semantic feature information, attribute feature information, and the like. After the image coding information is acquired, the semantic graph and the image coding features can be analyzed and processed by using a spatial adaptive convolution residual error network, so that a second image corresponding to the first image is obtained, the second image is a predicted image obtained by performing image processing on the basis of the semantic graph and the image coding features, and the reference image is a standard image for performing image processing operation on the first image, so that the generation operation of the machine learning model is realized through the second image and the predicted image.

After acquiring the reference image and the first image, generating the machine learning model based on the reference image and the first image may include:

step S9031: the similarity between the reference image and the second image is obtained.

Step S9032: and when the similarity is greater than or equal to a preset threshold value, generating a machine learning model.

Or,

step S9033: and when the similarity is smaller than a preset threshold value, continuously performing learning training on the space self-adaptive convolution residual error network to generate a machine learning model.

Wherein after the reference image and the second image are acquired, the similarity between the reference image and the second image may be acquired. After the similarity between the reference image and the second image is obtained, the similarity can be analyzed and compared with a preset threshold, when the similarity is greater than or equal to the preset threshold, the effect of learning and training the spatial adaptive convolution residual error network meets the preset requirement, at the moment, the operation of learning and training the spatial adaptive convolution residual error network can be stopped, and therefore the machine learning model is generated. When the similarity is smaller than the preset threshold, the effect of learning and training the spatial adaptive convolution residual error network does not meet the preset requirement, and then the learning and training of the spatial adaptive convolution residual error network can be continued so as to generate a machine learning model meeting the preset requirement.

In addition, the embodiment does not limit a specific implementation manner of obtaining the similarity between the reference image and the second image, and a person skilled in the art may set the similarity according to specific application requirements and design requirements, and preferably, the obtaining the similarity between the reference image and the second image in the embodiment may include:

step S90311: analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image; and/or the presence of a gas in the gas,

step S90312: analyzing and processing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image; and/or;

step S90313: analyzing and processing the reference image and the second image by using a loss function for generating a countermeasure network to obtain a third similarity between the second image and the reference image; and/or;

step S90314: and determining the similarity between the reference image and the second image based on the first similarity, the second similarity and the third similarity.

After the reference image and the second image are acquired, the reference image and the first image may be analyzed and compared by using a preset image comparison algorithm, so that the similarity between the second image and the reference image may be obtained.

Specifically, when obtaining the similarity between the reference image and the second image, the first implementation manner includes: and analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image, and determining the first similarity as the similarity between the reference image and the second image.

It is understood that, in the present embodiment, the implementation process of determining the similarity between the reference image and the second image may not include: and obtaining a second similarity and a third similarity between the second image and the reference image.

When obtaining the similarity between the reference image and the second image, the second implementation manner includes: and analyzing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image, and determining the second similarity as the similarity between the reference image and the second image.

It is understood that, in the present embodiment, the implementation process of determining the similarity between the reference image and the second image may not include: and obtaining a first similarity and a third similarity between the second image and the reference image.

When obtaining the similarity between the reference image and the second image, the third implementation manner includes: and analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image, and analyzing the reference image and the second image by using a semantic loss function to obtain a second similarity between the second image and the reference image. The weighted sum of the first similarity and the second similarity is then determined as the similarity between the reference image and the second image.

It is understood that, in the present embodiment, the implementation process of determining the similarity between the reference image and the second image may not include: and obtaining a third similarity between the second image and the reference image.

When obtaining the similarity between the reference image and the second image, a fourth implementation manner includes: and analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image, and analyzing the reference image and the second image by using a loss function for generating a countermeasure network to obtain a third similarity between the second image and the reference image. The weighted sum of the first similarity and the third similarity is then determined as the similarity between the reference image and the second image.

It is understood that, in the present embodiment, the implementation process of determining the similarity between the reference image and the second image may not include: and obtaining a second similarity between the second image and the reference image.

When obtaining the similarity between the reference image and the second image, a fourth implementation manner includes: and analyzing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image, and analyzing the reference image and the second image by utilizing a loss function for generating a countermeasure network to obtain a third similarity between the second image and the reference image. The weighted sum of the second similarity and the third similarity is then determined as the similarity between the reference image and the second image.

It is understood that, in the present embodiment, the implementation process of determining the similarity between the reference image and the second image may not include: and obtaining a first similarity between the second image and the reference image.

When obtaining the similarity between the reference image and the second image, a fifth implementation manner includes: analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image; and analyzing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image, and analyzing the reference image and the second image by utilizing a loss function for generating a countermeasure network to obtain a third similarity between the second image and the reference image. And determining the similarity between the reference image and the second image based on the first similarity, the second similarity and the third similarity.

To summarize, determining the similarity between the reference image and the second image based on the first similarity, the second similarity, and the third similarity may comprise:

step S903141: and respectively acquiring a first weight, a second weight and a third weight corresponding to the first similarity, the second similarity and the third similarity.

Step S903142: and carrying out weighted summation on the first similarity, the second similarity and the third similarity based on the first weight, the second weight and the third weight to obtain the similarity between the reference image and the second image.

It is to be understood that the first weight, the second weight and the third weight may be values greater than or equal to zero, and the first weight, the second weight and the third weight of the different value ranges may enable the similarity between the reference image and the second image to be determined in different manners as described above.

In the embodiment, the similarity between the second image and the reference image is obtained in different manners, and then the obtained similarities based on the different manners are subjected to weighted summation, so that the similarity between the second image and the reference image can be obtained, the accuracy and reliability of obtaining the similarity between the second image and the reference image are effectively ensured, and the quality and efficiency of learning and training the machine learning model are further improved.

In specific application, referring to fig. 10, an embodiment of the present application provides an image processing method, where the image processing method can implement enhancement processing on a face image, and specifically, the image processing method includes the following implementation steps:

the method comprises the following steps: and acquiring an image to be processed, wherein the quality of the image to be processed is low, namely the definition of the image to be processed is less than or equal to a preset threshold value.

Step two: determining a semantic graph corresponding to the image to be processed.

Specifically, the first machine learning model may be utilized to process the image to be processed, so as to obtain a semantic graph corresponding to the image to be processed. Specifically, when the first machine learning model is used to process the image to be processed, the method may include the following steps:

step 2.1: and encoding the image to be processed by using an encoder to obtain image encoding information.

The encoder is composed of 4 layers, 6 layers, and 8 layers of convolutional neural networks with twice downsampling, and in specific implementation, the number of layers of the convolutional neural networks included in the encoder can be determined according to the image size of an image to be processed, for example: when the image size of the image to be processed is larger, the number of layers of the convolutional neural network included in the encoder is larger, and at this time, the image to be processed can be subjected to down-sampling processing by more times. The number of layers of the convolutional neural network included in the encoder is smaller as the image size of the image to be processed is smaller.

Step 2.2: and analyzing and processing the image coding information by using a prediction network to obtain semantic feature information.

The prediction network may include 10 or any number of layers (within 50 layers) of convolutional residual error network units, and it can be understood that the number of convolutional residual error network units included in the prediction network may affect the time for image processing, and the higher the number of convolutional residual error network units included in the prediction network is, the higher the accuracy of image processing is.

Step 2.3: and decoding the semantic feature information by using a decoder to obtain a semantic graph, wherein the size of the semantic graph is the same as that of the image to be processed.

Step three: and inputting the image to be processed and the semantic graph into a second machine learning model to obtain a target image corresponding to the image to be processed.

The second machine learning model is formed by a spatial-Adaptive convolution Residual block (SPADE), is trained in advance to be used for determining a target image corresponding to the image to be processed, and has higher definition than the image to be processed. Specifically, the image to be processed is coded, image coding features corresponding to the image to be processed are obtained, the semantic graph and the image coding features are analyzed and processed by using a spatial adaptive convolution residual error network, a target image corresponding to the image to be processed is obtained, and the definition of the target image is higher than that of the image to be processed.

It should be noted that, when the second machine learning model is subjected to learning training, optimization training may be performed in a random gradient descent manner, specifically, the optimizer may be implemented by using an Adaptive motion (Adam) optimization algorithm, and the learning rate may be set to 0.0002 and implemented by using an open-source Python machine learning library.

In addition, a low-quality face image and a standard high-quality face image corresponding to the low-quality face image are obtained; the low-quality face image can be determined by a public data set 'large-scale face image data set CelebAMask-HQ [3 ]', and specifically, the low-quality face image can be generated by adopting a plurality of artificial degradation modes. After the low-quality face image is obtained, a semantic graph corresponding to the low-quality face image can be determined, specifically, the low-quality face image is subjected to coding processing to obtain image coding information, and then the image coding information and the semantic graph are subjected to analysis processing by using a spatial adaptive convolution residual error network to obtain a predicted high-quality face image corresponding to the low-quality face image. Specifically, when the image coding information and the semantic graph are analyzed and processed by using the spatial adaptive convolution residual error network, corresponding feature information can be extracted from different human face structures (such as forehead, nose, eyes, mouth, ears and the like) based on the semantic graph.

Then, the learning training of the spatial adaptive convolution residual error network can be realized by using the standard high-quality face image and the predicted high-quality face image, specifically, the similarity between the standard high-quality face image and the predicted high-quality face image is obtained, and specifically, the similarity between the standard high-quality face image and the predicted high-quality face image can be obtained by using any one of the following three loss functions: cross entropy loss function (squared error function), semantic loss function, and GAN-based generation countermeasures loss function.

And if the similarity between the standard high-quality face image and the predicted high-quality face image is greater than or equal to a preset threshold value, the spatial adaptive convolution residual error network is completely trained. And if the similarity between the standard high-quality face image and the predicted high-quality face image is smaller than a preset threshold value, the fact that the spatial adaptive convolution residual network is not trained at the moment to achieve a preset effect is shown, and further the spatial adaptive convolution residual network can be continuously subjected to learning training to generate a second machine learning model.

According to the image processing method provided by the application embodiment, because the face image to be processed has high structurality and symmetry, the low-quality image is processed by combining the semantic graph, and the low-quality image can be recovered based on the semantics of all parts of the image, so that the high-quality image can be obtained, and the quality and the efficiency of image processing are effectively ensured; in addition, the method can effectively realize the image processing of the low-quality images with different definition degrees or different damage degrees, thereby effectively improving the application range of the method.

FIG. 11 is a diagram illustrating another exemplary image processing method according to an embodiment of the present invention; as shown in fig. 11, the present embodiment provides another image processing method, and the execution subject of the method may be an image processing apparatus, and it is understood that the image processing apparatus may be implemented as software, or a combination of software and hardware. Specifically, the processing method may include:

step S1101: and acquiring a face image to be processed.

Step S1102: determining a semantic graph corresponding to the face image, wherein the semantic graph comprises semantic information for identifying a face structure in the face image;

step S1103: and obtaining a target image corresponding to the face image based on the semantic image and the face image, wherein the definition of the target image is different from that of the face image.

The specific implementation manner and implementation effect of the steps in this embodiment are similar to those of the steps in the embodiment in fig. 1, and the above statements may be specifically referred to, and are not repeated here.

It should be noted that, unlike the embodiment of fig. 1, the image processing method in the present embodiment is implemented by taking a face image to be processed as an example of an image to be processed, and semantic information for identifying face structures (including eyes, nose, mouth, ears, and the like) in the face image may be included in the semantic graph in the present embodiment.

In some examples, determining the semantic graph corresponding to the face image may include: and analyzing and processing the face image by using a first machine learning model to obtain a semantic graph corresponding to the face image, wherein the first machine learning model is trained to determine the semantic graph corresponding to the face image.

In some examples, determining the semantic graph corresponding to the face image may include: acquiring image coding information corresponding to the face image, wherein the image coding comprises information used for identifying the face image; and determining a semantic graph corresponding to the face image based on the image coding information.

In some examples, determining the semantic graph corresponding to the face image based on the image coding information may include: obtaining semantic feature information corresponding to the image coding information; and decoding the semantic feature information to obtain a semantic graph corresponding to the face image.

In some examples, obtaining a target image corresponding to the face image based on the semantic graph and the face image includes: and analyzing and processing the semantic graph and the face image by using a second machine learning model to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image, and the second machine learning model is trained to determine the target image corresponding to the face image.

In some examples, the second machine learning model is comprised of a spatially adaptive convolution residual network.

In some examples, the analyzing the semantic graph and the face image by using the second machine learning model to obtain a target image corresponding to the face image includes: carrying out coding processing on the face image to obtain image coding characteristics corresponding to the face image; and analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image.

In some examples, obtaining a target image corresponding to the face image based on the semantic graph and the face image includes: segmenting the face image based on the semantic graph to obtain a semantic subgraph corresponding to the face image; processing the semantic subgraph to obtain a target subgraph corresponding to the semantic subgraph, wherein the definition of the target subgraph is different from that of the semantic subgraph; and splicing all the target sub-images based on the semantic graph to obtain a target image corresponding to the face image.

In some examples, the sharpness of the target image is higher than the sharpness of the face image; or the definition of the target image is lower than that of the human face image.

In some examples, the semantic graph has a size that is the same as or different from the size of the face image.

The implementation process and technical effect of the method in this embodiment are similar to those in the embodiments shown in fig. 1 to 7 and 10, and the above statements may be specifically referred to, and are not repeated herein.

Fig. 12 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention; referring to fig. 12, the present embodiment provides an image processing apparatus, which can execute the image processing method corresponding to fig. 1, and the image processing apparatus can include a first obtaining module 11, a first determining module 12, and a first processing module 13; in particular, the method comprises the following steps of,

the first obtaining module 11 is configured to obtain an image to be processed.

A first determining module 12, configured to determine a semantic map corresponding to the image to be processed, where the semantic map is used to identify semantic information included in the image to be processed.

And the first processing module 13 is configured to obtain a target image corresponding to the image to be processed based on the semantic graph and the image to be processed, where the definition of the target image is different from that of the image to be processed.

In some examples, when the first determination module 12 determines the semantic map corresponding to the image to be processed, the first determination module 12 may be configured to perform: and analyzing and processing the image to be processed by utilizing a first machine learning model to obtain a semantic graph corresponding to the image to be processed, wherein the first machine learning model is trained to determine the semantic graph corresponding to the image to be processed.

In some examples, when the first determination module 12 determines the semantic map corresponding to the image to be processed, the first determination module 12 may be configured to perform: acquiring image coding information corresponding to an image to be processed, wherein the image coding comprises information used for identifying the image to be processed; and determining a semantic graph corresponding to the image to be processed based on the image coding information.

In some examples, when the first determining module 12 determines the semantic map corresponding to the image to be processed based on the image encoding information, the first determining module 12 may be configured to perform: obtaining semantic feature information corresponding to the image coding information; and decoding the semantic feature information to obtain a semantic graph corresponding to the image to be processed.

In some examples, when the first processing module 13 obtains the target image corresponding to the image to be processed based on the semantic map and the image to be processed, the first processing module 13 may be configured to perform: and analyzing and processing the semantic graph and the image to be processed by utilizing a second machine learning model to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed, and the second machine learning model is trained to determine the target image corresponding to the image to be processed.

In some examples, when the first processing module 13 performs analysis processing on the semantic graph and the image to be processed by using the second machine learning model to obtain a target image corresponding to the image to be processed, the first processing module 13 may be configured to perform: coding the image to be processed to obtain image coding characteristics corresponding to the image to be processed; and analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed.

In some examples, when the first processing module 13 obtains the target image corresponding to the image to be processed based on the semantic map and the image to be processed, the first processing module 13 may be configured to perform: segmenting an image to be processed based on the semantic graph to obtain a semantic subgraph corresponding to the image to be processed; processing the semantic subgraph to obtain a target subgraph corresponding to the semantic subgraph, wherein the definition of the target subgraph is different from that of the semantic subgraph; and splicing all the target subgraphs based on the semantic graph to obtain a target image corresponding to the image to be processed.

In some examples, the sharpness of the target image is higher than the sharpness of the image to be processed; or the definition of the target image is lower than that of the image to be processed.

In some examples, the semantic graph has a size that is the same as or different from the size of the image to be processed.

The apparatus shown in fig. 12 can perform the method of the embodiments shown in fig. 1-7 and 10, and the related description of the embodiments shown in fig. 1-7 and 10 can be referred to for the parts not described in detail in this embodiment. The implementation process and technical effect of the technical solution are described in the embodiments shown in fig. 1 to 7 and 10, and are not described again here.

In one possible design, the structure of the image processing apparatus shown in fig. 12 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 13, the electronic device may include: a first processor 21 and a first memory 22. Wherein the first memory 22 is used for storing a program for executing the image processing method provided in the embodiments shown in fig. 1-7 and 10, and the first processor 21 is configured to execute the program stored in the first memory 22.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the first processor 21, are capable of performing the steps of:

acquiring an image to be processed;

Further, the first processor 21 is also configured to perform all or part of the steps in the embodiments shown in fig. 1-7 and 10.

The electronic device may further include a first communication interface 23 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the image processing method in the method embodiments shown in fig. 1 to 7 and 10.

FIG. 14 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention; referring to fig. 14, the present embodiment provides a model training apparatus, which may perform the model training method corresponding to fig. 8, and the model training apparatus may include a second obtaining module 31, a second determining module 32, and a second training module 33; in particular, the method comprises the following steps of,

the second obtaining module 31 is configured to obtain the first image and a reference image corresponding to the first image, where the reference image and the first image have different definitions.

A second determining module 32, configured to determine a semantic map corresponding to the first image.

And a second training module 33, configured to perform learning training on the spatial adaptive convolution residual error network based on the semantic graph, the reference image, and the first image, to obtain a machine learning model, where the machine learning model is used to determine a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

In some examples, when the second determination module 32 determines the semantic map corresponding to the first image, the second determination light module 32 may be configured to perform: the first image is analyzed and processed by a first machine learning model, a semantic graph corresponding to the first image is obtained, and the first machine learning model is trained to determine the semantic graph corresponding to the first image.

In some examples, when the second determination module 32 determines the semantic map corresponding to the first image, the second determination light module 32 may be configured to perform: acquiring image coding information corresponding to a first image, the image coding including information for identifying the first image; based on the image coding information, a semantic graph corresponding to the first image is determined.

In some examples, when the second determining module 32 determines the semantic map corresponding to the first image based on the image encoding information, the second determining light module 32 may be configured to perform: obtaining semantic feature information corresponding to the image coding information; and decoding the semantic feature information to obtain a semantic graph corresponding to the first image.

In some examples, when the second training module 33 performs learning training on the spatial adaptive convolution residual network based on the semantic graph, the reference image and the first image to obtain the machine learning model, the second training module 33 may be configured to perform: coding the first image to obtain image coding characteristics corresponding to the first image; analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a second image corresponding to the first image, wherein the definition of the second image is different from that of the first image; based on the reference image and the second image, a machine learning model is generated.

In some examples, when the second training module 33 generates the machine learning model based on the reference image and the second image, the second training module 33 may be operable to perform: acquiring the similarity between the reference image and the second image; when the similarity is greater than or equal to a preset threshold value, generating a machine learning model; or when the similarity is smaller than a preset threshold value, continuously performing learning training on the spatial adaptive convolution residual error network to generate a machine learning model.

In some examples, when second training module 33 obtains a similarity between the reference image and the second image, second training module 33 may be configured to perform: analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image; and/or analyzing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image; and/or; analyzing and processing the reference image and the second image by using a loss function for generating a countermeasure network to obtain a third similarity between the second image and the reference image; and/or; and determining the similarity between the reference image and the second image based on the first similarity, the second similarity and the third similarity.

In some examples, when second training module 33 determines a similarity between the reference image and the second image based on the first similarity, the second similarity, and the third similarity, second training module 33 may be operable to perform: respectively acquiring a first weight, a second weight and a third weight corresponding to the first similarity, the second similarity and the third similarity; and carrying out weighted summation on the first similarity, the second similarity and the third similarity based on the first weight, the second weight and the third weight to obtain the similarity between the reference image and the second image.

The apparatus shown in fig. 14 can perform the method of the embodiment shown in fig. 8-10, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 8-10. The implementation process and technical effect of the technical solution are described in the embodiments shown in fig. 8 to 10, and are not described herein again.

In one possible design, the structure of the model training apparatus shown in fig. 14 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 15, the electronic device may include: a second processor 41 and a second memory 42. Wherein the second memory 42 is used for storing programs for the corresponding electronic device to execute the model training method provided in the embodiments shown in fig. 8-10, and the second processor 41 is configured to execute the programs stored in the second memory 42.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the second processor 41, are capable of performing the steps of:

determining a semantic map corresponding to the first image, the semantic map being used to identify semantic information included in the first image;

and based on the semantic graph, the reference image and the first image, learning and training the spatial adaptive convolution residual error network to obtain a machine learning model, wherein the machine learning model is used for determining a second image corresponding to the first image, and the definition of the second image is different from that of the first image.

Further, the second processor 41 is also used to execute all or part of the steps in the embodiments shown in fig. 8-10.

The electronic device may further include a second communication interface 43 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the model training method in the method embodiments shown in fig. 8 to 10.

FIG. 16 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention; referring to fig. 16, the present embodiment provides another image processing apparatus, which can execute the image processing method corresponding to fig. 11, and the image processing apparatus can include a third acquiring module 51, a third determining module 52 and a third processing module 53; in particular, the method comprises the following steps of,

a third obtaining module 51, configured to obtain a face image to be processed;

a third determining module 52, configured to determine a semantic map corresponding to the face image, where the semantic map includes semantic information for identifying a face structure in the face image;

and a third processing module 53, configured to obtain a target image corresponding to the face image based on the semantic graph and the face image, where a definition of the target image is different from a definition of the face image.

In some examples, when the third determination module 52 determines the semantic map corresponding to the face image, the third determination module 52 may perform: and analyzing and processing the face image by using a first machine learning model to obtain a semantic graph corresponding to the face image, wherein the first machine learning model is trained to determine the semantic graph corresponding to the face image.

In some examples, when the third determination module 52 determines the semantic map corresponding to the face image, the third determination module 52 may perform: acquiring image coding information corresponding to the face image, wherein the image coding comprises information used for identifying the face image; and determining a semantic graph corresponding to the face image based on the image coding information.

In some examples, when the third determination module 52 determines the semantic map corresponding to the face image based on the image coding information, the third determination module 52 may perform: obtaining semantic feature information corresponding to the image coding information; and decoding the semantic feature information to obtain a semantic graph corresponding to the face image.

In some examples, when the third processing module 53 obtains the target image corresponding to the face image based on the semantic graph and the face image, the third processing module 53 may be configured to perform: and analyzing and processing the semantic graph and the face image by using a second machine learning model to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image, and the second machine learning model is trained to determine the target image corresponding to the face image.

In some examples, when the third processing module 53 performs analysis processing on the semantic graph and the face image by using the second machine learning model to obtain a target image corresponding to the face image, the third processing module 53 may be configured to perform: carrying out coding processing on the face image to obtain image coding characteristics corresponding to the face image; and analyzing and processing the semantic graph and the image coding features by using a space self-adaptive convolution residual error network to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image.

In some examples, when the third processing module 53 obtains the target image corresponding to the face image based on the semantic graph and the face image, the third processing module 53 may be configured to perform: segmenting the face image based on the semantic graph to obtain a semantic subgraph corresponding to the face image; processing the semantic subgraph to obtain a target subgraph corresponding to the semantic subgraph, wherein the definition of the target subgraph is different from that of the semantic subgraph; and splicing all the target sub-images based on the semantic graph to obtain a target image corresponding to the face image.

The apparatus shown in fig. 16 can execute the method of the embodiment shown in fig. 11, and reference may be made to the related description of the embodiment shown in fig. 11 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution are described in the embodiment shown in fig. 11, and are not described herein again.

In one possible design, the structure of the image processing apparatus shown in fig. 16 may be implemented as an electronic device, which may be a mobile phone, a tablet computer, a server, or other devices. As shown in fig. 17, the electronic device may include: a third processor 61 and a third memory 62. Wherein the third memory 62 is used for storing a program for executing the image processing method provided in the embodiment shown in fig. 11, and the third processor 61 is configured for executing the program stored in the third memory 62.

The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the third processor 61, are capable of performing the steps of:

acquiring a face image to be processed;

determining a semantic graph corresponding to the face image, wherein the semantic graph comprises semantic information for identifying a face structure in the face image;

and obtaining a target image corresponding to the face image based on the semantic image and the face image, wherein the definition of the target image is different from that of the face image.

Further, the third processor 61 is also used for executing all or part of the steps in the embodiment shown in fig. 11.

The electronic device may further include a third communication interface 63 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the image processing method in the method embodiment shown in fig. 11.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image processing method, comprising:

acquiring a face image to be processed;

2. The method of claim 1, wherein determining the semantic graph corresponding to the face image comprises:

analyzing and processing the face image by utilizing a first machine learning model, and obtaining a semantic graph corresponding to the face image, wherein the first machine learning model is trained to determine the semantic graph corresponding to the face image.

3. The method of claim 1, wherein determining the semantic graph corresponding to the face image comprises:

acquiring image coding information corresponding to the face image, wherein the image coding comprises information used for identifying the face image;

and determining a semantic graph corresponding to the face image based on the image coding information.

4. The method of claim 3, wherein determining the semantic graph corresponding to the face image based on the image coding information comprises:

obtaining semantic feature information corresponding to the image coding information;

and decoding the semantic feature information to obtain a semantic graph corresponding to the face image.

5. The method of claim 1, wherein obtaining a target image corresponding to the face image based on the semantic graph and the face image comprises:

and analyzing and processing the semantic graph and the face image by using a second machine learning model to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image, and the second machine learning model is trained to determine the target image corresponding to the face image.

6. The method of claim 5, wherein the second machine learning model is comprised of a spatially adaptive convolutional residual network.

7. The method according to claim 6, wherein the analyzing the semantic graph and the face image by using a second machine learning model to obtain a target image corresponding to the face image comprises:

carrying out coding processing on the face image to obtain image coding characteristics corresponding to the face image;

and analyzing and processing the semantic graph and the image coding features by utilizing the space self-adaptive convolution residual error network to obtain a target image corresponding to the face image, wherein the definition of the target image is different from that of the face image.

8. The method of claim 1, wherein obtaining a target image corresponding to the face image based on the semantic graph and the face image comprises:

segmenting the face image based on the semantic graph to obtain a semantic subgraph corresponding to the face image;

processing the semantic subgraph to obtain a target subgraph corresponding to the semantic subgraph, wherein the definition of the target subgraph is different from that of the semantic subgraph;

and splicing all target sub-images based on the semantic graph to obtain a target image corresponding to the face image.

9. The method according to any one of claims 1 to 8,

the definition of the target image is higher than that of the face image; or,

the definition of the target image is lower than that of the face image.

10. The method according to any one of claims 1-8, wherein the semantic graph has a size that is the same as or different from the size of the face image.

11. An image processing method, comprising:

acquiring an image to be processed;

12. The method of claim 11, wherein determining the semantic graph corresponding to the image to be processed comprises:

analyzing and processing an image to be processed by utilizing a first machine learning model, and obtaining a semantic graph corresponding to the image to be processed, wherein the first machine learning model is trained to determine the semantic graph corresponding to the image to be processed.

13. The method of claim 11, wherein determining the semantic graph corresponding to the image to be processed comprises:

acquiring image coding information corresponding to the image to be processed, wherein the image coding comprises information used for identifying the image to be processed;

and determining a semantic graph corresponding to the image to be processed based on the image coding information.

14. The method of claim 13, wherein determining a semantic map corresponding to the image to be processed based on the image coding information comprises:

and decoding the semantic feature information to obtain a semantic graph corresponding to the image to be processed.

15. The method according to claim 11, wherein obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed comprises:

analyzing and processing the semantic graph and the image to be processed by utilizing a second machine learning model to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed, and the second machine learning model is trained to determine the target image corresponding to the image to be processed.

16. The method of claim 15, wherein the second machine learning model is comprised of a spatially adaptive convolutional residual network.

17. The method according to claim 16, wherein performing analysis processing on the semantic graph and the image to be processed by using a second machine learning model to obtain a target image corresponding to the image to be processed comprises:

coding the image to be processed to obtain image coding characteristics corresponding to the image to be processed;

and analyzing and processing the semantic graph and the image coding features by utilizing the space self-adaptive convolution residual error network to obtain a target image corresponding to the image to be processed, wherein the definition of the target image is different from that of the image to be processed.

18. The method according to claim 11, wherein obtaining a target image corresponding to the image to be processed based on the semantic graph and the image to be processed comprises:

segmenting the image to be processed based on the semantic graph to obtain a semantic subgraph corresponding to the image to be processed;

and splicing all the target sub-images based on the semantic graph to obtain a target image corresponding to the image to be processed.

19. The method according to any one of claims 11 to 18,

the definition of the target image is higher than that of the image to be processed; or,

the definition of the target image is lower than that of the image to be processed.

20. The method according to any one of claims 11-18, wherein the semantic graph has a size that is the same as or different from the size of the image to be processed.

21. A method of model training, comprising:

22. The method of claim 21, wherein determining the semantic graph corresponding to the first image comprises:

analyzing the first image with a first machine learning model, obtaining a semantic graph corresponding to the first image, the first machine learning model being trained to determine the semantic graph corresponding to the first image.

23. The method of claim 21, wherein determining the semantic graph corresponding to the first image comprises:

acquiring image coding information corresponding to the first image, wherein the image coding comprises information used for identifying the first image;

determining a semantic graph corresponding to the first image based on the image coding information.

24. The method of claim 23, wherein determining the semantic graph corresponding to the first image based on the image coding information comprises:

and decoding the semantic feature information to obtain a semantic graph corresponding to the first image.

25. The method of claim 21, wherein performing learning training on a spatial adaptive convolution residual network based on the semantic graph, a reference image and the first image to obtain a machine learning model comprises:

coding the first image to acquire image coding characteristics corresponding to the first image;

analyzing and processing the semantic graph and the image coding features by using the spatial adaptive convolution residual error network to obtain a second image corresponding to the first image, wherein the definition of the second image is different from that of the first image;

generating the machine learning model based on the reference image and the second image.

26. The method of claim 25, wherein generating the machine learning model based on the reference image and the second image comprises:

acquiring the similarity between the reference image and the second image;

when the similarity is larger than or equal to a preset threshold value, generating the machine learning model; or,

and when the similarity is smaller than a preset threshold value, continuously performing learning training on the space self-adaptive convolution residual error network to generate the machine learning model.

27. The method of claim 26, wherein obtaining the similarity between the reference image and the second image comprises:

analyzing the reference image and the second image by using a cross entropy loss function to obtain a first similarity between the second image and the reference image; and/or the presence of a gas in the gas,

analyzing and processing the reference image and the second image by utilizing a semantic loss function to obtain a second similarity between the second image and the reference image; and/or;

analyzing and processing the reference image and the second image by using a loss function generating a countermeasure network to obtain a third similarity between the second image and the reference image; and/or;

determining a similarity between the reference image and the second image based on the first, second, and third similarities.

28. The method of claim 27, wherein determining the similarity between the reference image and the second image based on the first similarity, the second similarity, and the third similarity comprises:

respectively acquiring a first weight, a second weight and a third weight corresponding to the first similarity, the second similarity and the third similarity;

and carrying out weighted summation on the first similarity, the second similarity and the third similarity based on the first weight, the second weight and the third weight to obtain the similarity between the reference image and the second image.

29. An image processing apparatus characterized by comprising:

the first acquisition module is used for acquiring an image to be processed;

30. An electronic device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image processing method of any of claims 11-20.

31. A model training apparatus, comprising:

32. An electronic device, comprising: a memory, a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the model training method of any of claims 21-28.

33. An image processing apparatus characterized by comprising:

34. An electronic device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image processing method of any of claims 1-10.