CN111739027A

CN111739027A - Image processing method, device and equipment and readable storage medium

Info

Publication number: CN111739027A
Application number: CN202010722268.1A
Authority: CN
Inventors: 张瑞; 徐昊; 程培; 俞刚; 傅斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-10-02
Anticipated expiration: 2040-07-24
Also published as: CN111739027B

Abstract

The embodiment of the application discloses an image processing method, an image processing device, image processing equipment and a readable storage medium, and belongs to the technical field of computers, wherein the method comprises the following steps: acquiring a sample image, inputting the sample image into an image recognition model, and outputting a prediction region of a sample object in the sample image through the image recognition model; acquiring a label image; determining a classification loss value according to the prediction region and the region label in the label image; obtaining a region mask of a region label, and determining a region constraint loss value according to the predicted region and the region mask; determining a pixel constraint loss value according to the prediction region and the region label; determining a target loss value of the sample image according to the classification loss value, the pixel constraint loss value and the region constraint loss value, adjusting the image recognition model according to the target loss value to obtain a target image recognition model, and performing image recognition processing based on the target image recognition model. By the aid of the method and the device, the identification accuracy of the image can be improved.

Description

Image processing method, device and equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a readable storage medium.

Background

The human image segmentation refers to extracting a human image from an image, and the human image segmentation in a video scene refers to extracting the human image from a video in real time, is a technology for separating the human image in the image from a background, and has wide application in various fields such as human image background blurring, green curtain or blue curtain photography in camera shooting and the like.

In the prior art, the human image segmentation is mainly realized by adopting a deep learning semantic segmentation model, and the human image segmentation result is calculated and output through the deep learning semantic segmentation model by inputting an image into the deep learning semantic segmentation model. However, in practical application, different images can have various scenes, a portrait can have boundary points with various objects, and the deep learning semantic segmentation model does not consider the boundary points, so that training data of the portrait having the boundary with a simple object is used for performing portrait segmentation training on the deep learning semantic segmentation model, so that the deep learning semantic segmentation model is difficult to perform portrait segmentation in a complex scene, and the obtained portrait segmentation accuracy is low.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a readable storage medium, which can improve the identification accuracy of an image.

An embodiment of the present application provides an image processing method, including:

acquiring a sample image, inputting the sample image into an image recognition model, and outputting a prediction region to which a sample object in the sample image belongs through the image recognition model;

obtaining a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

determining a classification loss value of the sample image according to the prediction region and the region label;

obtaining a region mask of the region label, and determining a region constraint loss value of the sample image according to the prediction region and the region mask;

determining a pixel constraint loss value of the sample image according to the prediction region and the region label;

determining a target loss value of the sample image according to the classification loss value, the pixel constraint loss value and the region constraint loss value, adjusting the image recognition model according to the target loss value to obtain a target image recognition model, and performing image recognition processing based on the target image recognition model.

An embodiment of the present application provides another image processing method, including:

acquiring a region mask corresponding to the region label, and generating a target loss value of the sample image according to the region mask, the prediction region and the region label;

and adjusting the image recognition model according to the target loss value to obtain a target image recognition model, and performing image recognition processing based on the target image recognition model.

An aspect of an embodiment of the present application provides an image processing apparatus, including:

the sample image acquisition module is used for acquiring a sample image;

the prediction result output module is used for inputting the sample image into the image recognition model and outputting a prediction region to which the sample object in the sample image belongs through the image recognition model;

the label image acquisition module is used for acquiring a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

the first loss value determining module is used for determining the classification loss value of the sample image according to the prediction region and the region label;

the mask acquisition module is used for acquiring a region mask of the region label;

a second loss value determining module, configured to determine a region constraint loss value of the sample image according to the prediction region and the region mask;

the third loss value determining module is used for determining a pixel constraint loss value of the sample image according to the prediction region and the region label;

the target loss value determining module is used for determining a target loss value of the sample image according to the classification loss value, the pixel constraint loss value and the region constraint loss value;

and the model adjusting module is used for adjusting the image recognition model according to the target loss value to obtain a target image recognition model and performing image recognition processing based on the target image recognition model.

Wherein the image recognition model comprises a depth-by-depth feature convolution component;

the prediction result output module comprises:

the image characteristic extraction unit is used for inputting the sample image into the image recognition model and extracting the image characteristic of the sample image through the image recognition model; the image features comprise at least two image channel features;

the feature convolution unit is used for inputting the features of the at least two image channels into the depth-by-depth feature convolution components, and performing convolution processing on the features of the at least two image channels respectively through at least two depth-by-depth convolution kernels in the depth-by-depth feature convolution components to obtain convolution image features; one depth-wise convolution kernel corresponds to one image channel feature;

and the prediction result determining unit is used for determining a prediction region to which the sample object in the sample image belongs according to the characteristics of the convolution image.

Wherein the at least two image channel features comprise a first image channel feature and a second image channel feature, and the at least two depth-wise convolution kernels comprise a first depth-wise convolution kernel and a second depth-wise convolution kernel;

the feature convolution unit includes:

the characteristic input subunit is used for inputting the characteristics of at least two image channels into the depth-by-depth characteristic convolution component;

the channel feature convolution subunit is used for performing convolution processing on the first image channel feature through a first depth-by-depth convolution kernel in the depth-by-depth feature convolution component to obtain a first convolution channel feature;

the channel feature convolution subunit is also used for performing convolution processing on the second image channel feature through a second depth-by-depth convolution kernel in the depth-by-depth feature convolution component to obtain a second convolution channel feature;

and the characteristic splicing subunit is used for splicing the first convolution channel characteristic and the second convolution channel characteristic to generate a convolution image characteristic.

The image recognition model further comprises a feature fusion component;

the prediction result determination unit includes:

the normalization subunit is used for inputting the characteristics of the convolution image into the characteristic fusion component and normalizing the characteristics of the convolution image through a normalization layer in the characteristic fusion component to obtain the characteristics of a standard image;

the feature fusion subunit is used for inputting the standard image features into a feature fusion layer in the feature fusion component, and performing convolution processing on the standard image features in the feature fusion layer to generate fusion image features;

and the prediction result determining subunit is used for determining the prediction region to which the sample object in the sample image belongs according to the characteristics of the fused image.

Wherein, label image acquisition module includes:

the annotation image acquisition unit is used for acquiring a region annotation image corresponding to the sample image; the region labeling image comprises a labeling region to which the sample object belongs;

the binary processing unit is used for carrying out binarization processing on the region label image to obtain a binary label image;

the filtering smoothing unit is used for carrying out filtering smoothing processing on the labeled area in the binary labeled image to obtain an area label;

and the label image determining unit is used for determining the binary annotation image containing the area label as a label image.

Wherein the first loss value determination module comprises:

a prediction pixel point obtaining unit, configured to obtain a prediction pixel point in the prediction region;

a marking point acquisition unit, configured to acquire a region marking point from the region label;

and the first loss value generating unit is used for acquiring a classification loss function and generating a classification loss value of the sample image according to the prediction pixel point, the region marking point and the classification loss function.

Wherein the mask acquiring module comprises:

the expansion processing unit is used for carrying out expansion morphological processing on the label image to obtain an expansion label image;

the corrosion processing unit is used for carrying out corrosion morphological processing on the label image to obtain a corrosion label image;

the expansion labeling point acquiring unit is used for acquiring expansion region labeling points in an expansion region label to which the sample object belongs in the expansion label image;

the corrosion marking point acquisition unit is used for acquiring corrosion area marking points in a corrosion area label to which the sample object belongs in the corrosion label image;

and the difference value determining unit is used for determining the difference value between the expanded region marking point and the corrosion region marking point to be used as a region mask of the region label.

Wherein the second loss value determination module includes:

the pixel point obtaining unit is used for obtaining a prediction pixel point in the prediction region;

the pixel point acquisition unit is also used for acquiring a region marking point in the region label;

the gradient feature generation unit is used for acquiring a region detection operator and determining a first gradient feature corresponding to a prediction region according to the region detection operator, the prediction pixel point and the region mask;

the gradient feature generation unit is further used for determining a second gradient feature corresponding to the area label according to the area detection operator, the area marking point and the area mask;

and the second loss value generating unit is used for acquiring a region constraint loss function and generating a region constraint loss value of the sample image according to the first gradient feature, the second gradient feature and the region constraint loss function.

Wherein the third loss value determination module includes:

the color pixel value generating unit is used for acquiring a prediction pixel point in the prediction region and generating a first color pixel value corresponding to the prediction pixel point according to the color channel pixel value;

the color pixel value generating unit is further used for acquiring a region marking point in the region label and generating a second color pixel value corresponding to the region marking point according to the color channel pixel value; (ii) a

And the third loss value generating unit is used for acquiring the pixel constraint loss function and generating the pixel constraint loss value of the sample image according to the first color pixel value, the second color pixel value and the pixel constraint loss function.

Wherein the target loss value determination module comprises:

the parameter acquisition unit is used for acquiring a first model balance parameter and a second model balance parameter;

the operation processing unit is used for multiplying the first model balance parameter and the area constraint loss value to obtain a first balance loss value;

the operation processing unit is also used for multiplying the second model balance parameter and the pixel constraint loss value to obtain a second balance loss value;

and the operation processing unit is also used for adding the classification loss value, the first balance loss value and the second balance loss value to obtain a target loss value of the sample image.

Wherein, the device still includes:

the target image acquisition module is used for acquiring a target image and inputting the target image into the target image recognition model;

the target area identification module is used for identifying a target area to which a target object in a target image belongs in the target image identification model;

the boundary marking module is used for marking the boundary of the target area to obtain a marked boundary;

and the image output module is used for outputting the target image carrying the mark boundary.

Wherein, the device still includes:

the background area determining module is used for acquiring an area outside the mark boundary in a target image carrying the mark boundary as a background area;

the list display module is used for responding to material adding operation aiming at the background area and displaying a material list;

the area updating module is used for responding to material selection operation aiming at the material list and updating the background area into a target background area with target materials; the target material is a material selected by the material selection operation;

and the output module is used for outputting a target image containing a target area and a target background area.

Wherein, the device still includes:

the region image extraction module is used for extracting a region image containing a target object from the target image according to the mark boundary, and identifying the target part type information of the target object in the region image in the target image identification model; the target part type information is the type of the target part in the target object;

the material information acquisition module is used for acquiring a material information base; the material information base comprises at least two virtual material data, and one virtual material data corresponds to one part category information;

the target material determining module is used for acquiring virtual material data matched with the target part category information from the material information base and taking the virtual material data as target virtual material data;

the part switching module is used for switching a target part in the target object into target virtual material data to obtain virtual part data;

and the object output module is used for outputting the target object containing the virtual position data.

An aspect of the present application provides another image processing apparatus, including:

the prediction region determining module is used for acquiring a sample image, inputting the sample image into an image recognition model, and outputting a prediction region to which a sample object in the sample image belongs through the image recognition model;

the area label obtaining module is used for obtaining a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

a target loss value generation module, configured to obtain a region mask corresponding to the region label, and generate a target loss value of the sample image according to the region mask, the prediction region, and the region label;

and the target model determining module is used for adjusting the image recognition model according to the target loss value to obtain a target image recognition model and performing image recognition processing based on the target image recognition model.

An aspect of an embodiment of the present application provides a computer device, including: a processor and a memory;

the memory stores a computer program that, when executed by the processor, causes the processor to perform the method in the embodiments of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, perform the method in the embodiments of the present application.

In one aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by one aspect of the embodiments of the present application.

In the embodiment of the application, a sample image is input into an image recognition model, a prediction region of a sample object in the sample image can be output through the image recognition model, and a classification loss value, a region constraint loss value and a pixel constraint loss value can be determined through the prediction region and a region label of the sample image. Wherein the classification loss value can be used to characterize a classification error between the prediction region and the region label; the area constraint loss value is determined by the area mask of the area label, the prediction area and the area label, wherein the area mask is a coverage area corresponding to the area label in the sample image and can effectively position the position of the target object, that is, the area constraint loss value introduces the actual position information of the target object in the sample image and can enhance the discrimination between the prediction area and the area label; the pixel constraint loss value introduces the color pixel value of the sample image, and can also enhance the discrimination between the prediction region and the region label. In summary, the classification loss value, the region constraint loss value and the pixel constraint loss value all enhance the discrimination between the prediction region and the region label from different dimensions, so that the target loss value obtained through the three loss values can accurately express the difference between the prediction region and the region label, and the image recognition model is trained through the target loss value, so that the prediction region output by the image recognition model is closer to the region label more and more, that is, the region to which the target object belongs in the input image can be accurately recognized through the target image recognition model obtained through the training of the target loss value.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a diagram of a network architecture provided by an embodiment of the present application;

FIG. 2 is a schematic view of a scenario provided by an embodiment of the present application;

fig. 3 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 4a is a schematic structural diagram of an image recognition model provided in an embodiment of the present application;

FIG. 4b is a block diagram of a deep separable convolution according to an embodiment of the present application;

FIGS. 4c and 4d are graphs comparing experimental data provided in examples of the present application;

FIG. 5 is a schematic flow chart of a model application provided in an embodiment of the present application;

FIG. 6a is a scene schematic diagram of an application of a target image recognition model provided in an embodiment of the present application;

FIG. 6b is a scene diagram of an application of a target image recognition model according to an embodiment of the present application;

fig. 7 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The scheme provided by the embodiment of the application belongs to computer vision technology (CV) and Machine Learning (ML) belonging to the field of artificial intelligence.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Fig. 1 is a network architecture diagram provided in an embodiment of the present application. As shown in fig. 1, the network architecture may include a service server 1000 and a background server cluster, where the background server cluster may include a plurality of each background server, and as shown in fig. 1, the network architecture may specifically include a background server 100a, a background server 100b, background servers 100c and …, and a background server 100 n. As shown in fig. 1, the backend server 100a, the backend server 100b, the backend servers 100c, …, and the backend server 100n may be respectively connected to the service server 1000 through a network, so that each backend server may perform data interaction with the service server 1000 through the network connection, so that the service server 1000 may receive service data from each backend server.

Each background server shown in fig. 1 corresponds to a user terminal, and may be configured to store service data of the corresponding user terminal. Each user terminal may be integrally installed with a target application, and when the target application runs in each user terminal, the background server corresponding to each user terminal may store service data in the application and perform data interaction with the service server 1000 shown in fig. 1. The target application may include an application having a function of displaying data information such as text, images, audio, and video. For example, the application may be an entertainment application, and may be used for a user to upload a picture or a video and obtain a special effect picture or a special effect video with a special effect (e.g., a funny effect, a quadratic effect, an amplification effect, etc.); the application can also be a picture beautifying application, and can be used for uploading pictures or videos by users and acquiring beautified pictures or beautified videos with beautifying effects (such as eye magnification, skin color whitening and the like). The service server 1000 in the present application may collect service data from the backstage (such as the above background server cluster) of these applications, for example, the service data may be pictures uploaded by users, and the service server 1000 may identify the region to which the target object belongs from these service data, and perform segmentation to obtain the target region; subsequently, the business server 1000 may perform subsequent processing on the target object in the target area (e.g., performing special effect processing, beautification processing, etc. on the target object).

In the embodiment of the present application, one user terminal may be selected from a plurality of user terminals as a target user terminal, and the user terminal may include: smart terminals carrying an image recognition function (e.g., recognizing a portrait area in an image) such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, a smart speaker, a desktop computer, and a smart watch, but are not limited thereto. For example, in the embodiment of the present application, a user terminal corresponding to the backend server 100a shown in fig. 1 may be used as the target user terminal, and the target application may be integrated in the target user terminal, and at this time, the backend server 100a corresponding to the target user terminal may perform data interaction with the service server 1000.

For example, when a user uses a target application (e.g., an entertainment application) in a user terminal, the service server 1000 may detect and collect a target image including a portrait uploaded by the user through a background server corresponding to the user terminal, the service server 1000 may identify an area to which the portrait belongs in the target image and extract the area from the target image to obtain an image including only a target object but not a background, then, the service server 1000 may perform special effect processing on the image including only the target object (e.g., replace the target object with a favorite expression), so as to obtain a target object with special effects, and then, the service server 1000 may place the target object with special effects back into the area to which the portrait belongs in the target image to obtain a special effect image corresponding to the target image. Subsequently, the service server 1000 returns the special effect image to the background server, so that the user can view the special effect image (view the target object with special effect) on the display page of the user terminal corresponding to the background server.

The specific method for the business server 1000 to identify the region to which the portrait belongs in the target image may be determined according to an image identification model. In order to improve the accuracy of the region of the target object (e.g., the portrait) in the target image recognized by the image recognition model, the image recognition model may be trained and adjusted, so that the target image recognition model obtained after training and adjustment is optimal, and based on the target image recognition model, image recognition processing may be performed. For a specific process of training and adjusting the image recognition model to obtain the target image recognition model, reference may be made to the following description of steps S101 to S106 in the embodiment corresponding to fig. 3.

Optionally, it may be understood that the background server may detect and collect a picture or a video uploaded by the user, and the background server may identify a region to which a target object (e.g., a portrait) in the target image belongs, and extract the region from the target image to obtain an image that only contains the target object and does not contain a background. Subsequently, the background server may perform subsequent processing (e.g., special effect processing, beautification processing, etc.) on the image only containing the target object or on the background image not containing the target object, so that the target object or the background image with special effects may be obtained.

It is understood that the method provided by the embodiment of the present application can be executed by a computer device, including but not limited to a user terminal or a service server. The service server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and an artificial intelligence platform.

The user terminal and the service server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

For easy understanding, please refer to fig. 2, and fig. 2 is a schematic view of a scenario provided by an embodiment of the present application. The service server shown in fig. 2 may be the service server 1000, and the user terminal shown in fig. 2 may be any one user terminal selected from the user terminal cluster in the embodiment corresponding to fig. 1, for example, the user terminal may be the user terminal 100 b.

As shown in fig. 2, a user a may be a target user, and the user a uploads an image 20a through a user terminal a, where the image 20a includes a target object B, the image 20a may be a target image, and a service server may receive the target image 20a through a background server of the user terminal a. Subsequently, the service server may input the target image 20a into a target image recognition model, and the target image recognition model may recognize the area of the target object B in the target image 20 a. As shown in fig. 2, the target image recognition model recognizes that the region of the target object B in the target image 20a is a region P (i.e., a region included in the boundary of the target object B), the target image recognition model may extract the region P including the target object B, and then, the service server may not consider the other regions except the region P in the target image 20a, and only perform special effect processing on the target object B in the region P.

As shown in fig. 2, the service server adds a "cat effect" to the target object B in the region P, and further, the service server can replace the target object B with the "cat effect" in the region P in the target image 20a, thereby obtaining the target image 20a with the "cat effect". The target image with the "cat effect" 20a is as shown in fig. 2, and then the service server may return the target image with the "cat effect" 20a to the user terminal a, and the user a may view the target image with the "cat effect" 20a on the display page of the user terminal a.

Further, for convenience of understanding, please refer to fig. 3, and fig. 3 is a schematic flowchart of an image processing method according to an embodiment of the present application. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2) or may be performed by both the user terminal and a service server (e.g., the service server 1000 in the embodiment corresponding to fig. 1 or fig. 2). For ease of understanding, the present embodiment is described as an example in which the method is executed by the user terminal described above. Wherein, the image processing method at least comprises the following steps S101-S106:

step S101, obtaining a sample image, inputting the sample image into an image recognition model, and outputting a prediction region to which a sample object in the sample image belongs through the image recognition model.

In the application, the sample image can be used for training the image recognition model, and the image recognition model can be optimized through training of the sample image, so that the prediction result output by the image recognition model can be more and more accurate. Wherein, the sample image may include a sample object (e.g., a portrait, an animal, etc.), the prediction region may refer to a region where the sample object is located in the sample image, which is identified by the image identification model, and the prediction region may be a region surrounded by a boundary of the sample object. For example, as shown in the embodiment corresponding to fig. 2, the region P identified by the image recognition model may be a predicted region, and the region P refers to a region of the target object B identified by the image recognition model in the target image 20a, and it can be seen that the region P is determined by a boundary of the target object B.

The image recognition model may be a semantic segmentation model, which may include an encoder and a decoder. The encoder can be composed of modules which adopt depth separable convolution as a basic structure, the decoder can adopt a deconvolution structure, and the decoder with the deconvolution structure can up-sample the characteristics output by the encoder layer by layer.

For easy understanding, please refer to fig. 4a together, and fig. 4a is a schematic structural diagram of an image recognition model provided in an embodiment of the present application. As shown in fig. 4a, the image recognition model may include an encoder and a decoder, the encoder may include a plurality of convolutional layers (e.g., convolutional layer a1, convolutional layer b1, convolutional layer c1, and convolutional layer d1 as shown in fig. 4 a), and the decoder may include a plurality of deconvolution layers (e.g., deconvolution layer d2, deconvolution layer c2, deconvolution layer b2, and deconvolution layer a2 as shown in fig. 4 a). The encoder and the decoder can perform feature transfer of the shallow features and the deep features through skip connection, so that the image recognition model can fuse features of different stages (for example, semantic information in the shallow features and structural information in the deep features are fused), and thus, a fused feature finally fused with the shallow features and the deep features can be obtained, and the output of a final image segmentation result can be obtained according to the fused feature.

As shown in fig. 4a, the skip connection between the encoder and the decoder can be understood as that convolutional layer a1 in the encoder is connected to deconvolution layer a2 in the decoder, convolutional layer b1 in the encoder is connected to deconvolution layer b2 in the decoder, convolutional layer c1 in the encoder is connected to deconvolution layer c2 in the decoder, and convolutional layer d1 in the encoder is connected to deconvolution layer d2 in the decoder. It can be seen that the output characteristics of convolutional layer a1 can be used as the input characteristics of convolutional layer b1, the output characteristics of convolutional layer b1 can be used as the input characteristics of convolutional layer c1, the output characteristics of convolutional layer c1 can be used as the input characteristics of convolutional layer d1, and the output characteristics of convolutional layer d1 can be used as the input characteristics of deconvolution layer d2, that is, deconvolution layer d2 can deconvolute the output characteristics of convolutional layer d1, and the output characteristics obtained after deconvolution are input to deconvolution layer c 2. Since the deconvolution layer c2 is connected to the convolution layer c1, the output characteristic of the convolution layer c1 also serves as the input characteristic of the deconvolution layer c2, the deconvolution layer c2 receives 2 characteristics in total of the output characteristic of the convolution layer c1 and the output characteristic of the deconvolution layer d2, and the deconvolution layer c2 fuses the two characteristics, i.e., the output characteristic of the deconvolution layer d2 and the output characteristic of the convolution layer c1, to obtain the first fused characteristic. It is understood that the process of feature fusion performed in the deconvolution layer c2 is a stage of feature fusion process in the image recognition model.

Similarly, it is understood that the fused features obtained by the deconvolution layer c2 can be used as input features of the deconvolution layer b2, and the deconvolution layer b2 can perform deconvolution on the fused features of the deconvolution layer c 2. Since the deconvolution layer b2 is connected to the convolution layer b1, the output feature of the convolution layer b1 can also be used as the input feature of the deconvolution layer b2, so that the deconvolution layer b1 can receive 2 features in total of the output feature of the convolution layer b1 and the output feature of the deconvolution layer c2 (i.e., the first fusion feature output by the deconvolution layer c 2), and in the deconvolution layer b1, the output feature of the convolution layer b1 and the 2 features of the output feature of the deconvolution layer c2 can be fused to obtain the second fusion feature. It is understood that the process of feature fusion in the deconvolution layer b2 is another stage of feature fusion process in the image recognition model. Similarly, it should be understood that in the deconvolution layer a2, the output feature of deconvolution layer b2 (i.e., the second fused feature) can be fused with the 2 features of the output feature of convolution layer a1 to obtain a third fused feature.

It is understood that the feature fusion process performed in the deconvolution layer a2 is another stage of the feature fusion process in the image recognition model. Immediately, the third fused feature is the final feature obtained after feature transfer and feature fusion are performed by the convolutional layer and the deconvolution layer, and the output of the final image segmentation result can be obtained according to the final feature. For example, as shown in fig. 4a, a target image is input to the image recognition model, the features of the target image can be obtained in the image recognition model in the above manner (the manner of feature transfer and feature fusion between the convolution layer and the deconvolution layer), and an image recognition result (e.g., a human image segmentation result, etc.) for the target image can be obtained based on the features of the target image.

For further description of the model structure of the image recognition model based on the depth separable convolution structure, please refer to fig. 4b together, and fig. 4b is a schematic diagram of a module of depth separable convolution according to an embodiment of the present application. As shown in fig. 4b, an input module (i.e., an input module) may be included in the depth separable convolution structure, which may be used to receive an image (e.g., a sample image); the depth separable convolution structure can also comprise a feature extraction module, and the feature extraction module can be used for extracting the image features of the input image; a plurality of depth-wise convolution kernels can be further included in the structure, and the depth-wise convolution kernels can be respectively ' split _3x3_1 ', ' split _3x3_2 ', … ' and ' split _3x3_ n ', wherein each depth-wise convolution kernel is respectively a convolution kernel of 3x3, and each depth-wise convolution kernel can respectively perform convolution processing of 3x3 for each channel in the image feature; that is to say, the image features include a plurality of channel features, each channel feature corresponds to one depth-by-depth convolution kernel, and each depth-by-depth convolution kernel can perform convolution processing on one channel feature to obtain a convolution channel feature; as shown in fig. 4b, the structure may further include a concatenation module (i.e., concat module), and it can be seen that, after each depth-wise convolution kernel is subjected to convolution processing to obtain a convolution channel feature, the convolution channel feature may be input to the concatenation module, and the concatenation module may concatenate the convolution channel features output by the depth-wise convolution kernels, so as to obtain a convolution image feature corresponding to the image feature; here, the plurality of depth-wise convolution kernels (including "split _3x3_ 1", "split _3x3_ 2", … "," split _3x3_ n ") in the depth-separable convolution structure and the concatenation module may constitute a depth-wise feature convolution component of the pattern recognition model.

As shown in fig. 4b, the structure may further include a Normalization (BN) layer and a feature fusion layer (i.e., conv _1x1 layer), where the Normalization layer may be configured to normalize the convolution image features output by the splicing module to obtain standard image features; then, in the feature fusion layer, the standard image feature can be convolved with the convolution kernel of 1 × 1 to obtain a fusion image feature. It can be understood that the stitching module stitches the plurality of convolution channel features to obtain a convolution image feature including the plurality of convolution channel features, and in the feature fusion layer, a fusion feature may be generated by performing convolution fusion on the plurality of convolution channel features in the convolution image feature.

It should be understood that, as for a specific method for identifying the prediction region to which the sample object belongs in the sample image by the image recognition model, the sample image may be input to the image recognition model, and the image features of the sample image may be extracted by the image recognition model (for example, extracted by the feature extraction module); wherein the image features comprise at least two image channel features; then, inputting the at least two image channel features to a depth-by-depth feature convolution component (comprising a plurality of depth-by-depth convolution kernels and a splicing module), and performing convolution processing on the at least two image channel features respectively through the at least two depth-by-depth convolution kernels in the depth-by-depth feature convolution component to obtain a convolution image feature; one depth-by-depth convolution kernel corresponds to one image channel feature, that is, each image channel feature has a corresponding depth-by-depth convolution kernel for convolution processing, and the correspondence may be a random correspondence or a correspondence specified manually.

For example, the at least two image channel features include a first image channel feature and a second image channel feature, and the at least two depth-wise convolution kernels include a first depth-wise convolution kernel and a second depth-wise convolution kernel, then the first image channel feature may be input to the first depth-wise convolution kernel (that is, the first image channel feature and the first depth-wise convolution kernel have a corresponding relationship), and the first image channel feature may be subjected to convolution processing by the first depth-wise convolution kernel to obtain a first convolution channel feature; similarly, the second image channel feature may be input to a second depth-wise convolution kernel (that is, the second image channel feature and the second depth-wise convolution kernel have a corresponding relationship), and the second image channel feature may be subjected to convolution processing by the second depth-wise convolution kernel to obtain a second convolution channel feature.

Further, the first convolution channel feature and the second convolution channel feature may be input to a stitching module, and the first convolution channel feature and the second convolution channel feature may be stitched by the stitching module, so that the convolution image feature may be generated. Subsequently, the convolution image features may be input to a feature fusion component, and through a batch normalization layer (i.e., BN layer) in the feature fusion component, normalization (normalization) processing may be performed on the convolution image features to obtain standard image features; subsequently, inputting the standard image feature into a feature fusion layer (i.e., conv _1x1 layer) in the feature fusion component, and performing convolution processing on the standard image feature (i.e., fusing the normalized first convolution channel feature and the normalized second convolution channel feature) to generate a fused image feature; then, according to the fused image feature, a prediction region to which the sample object in the sample image belongs can be determined.

Optionally, it may be understood that the at least two image channel features may include a first image channel feature and a second image channel feature, and the at least two depth-wise convolution kernels include a first depth-wise convolution kernel and a second depth-wise convolution kernel, and then the first image channel feature may be input to the second depth-wise convolution kernel (that is, the first image channel feature and the second depth-wise convolution kernel have a corresponding relationship), and the first image channel feature may be subjected to convolution processing by the second depth-wise convolution kernel to obtain a first convolution channel feature; similarly, the second image channel feature may be input to the first depth-wise convolution kernel (that is, the second image channel feature and the first depth-wise convolution kernel have a corresponding relationship), and the first depth-wise convolution kernel may perform convolution processing on the second image channel feature to obtain the second convolution channel feature.

It is understood that when feature extraction and feature generation are performed using the image recognition model structure shown in fig. 4b, the amount of calculation can be reduced. For example, taking the input image size [ N, H, W ] and the output feature size [ N, H, W ] as an example, the overall computation of the image recognition model is 1 × 3 × N +1 × N × M, while the structure computation stacked by the conventional conv _3 × 3 convolution layer and conv _1 × 1 convolution layer is N × M3 × 3+1 × N M. It can be seen that the reduction was obtained compared to N x M3 x 3+ 1x N M, 1x 3x N + 1x N M. In addition, the depth separable convolution module is used for learning spatial correlation by utilizing the 3x3 convolution layer and learning correlation between channels by utilizing the 1x1 convolution layer, and each convolution layer is endowed with different functions, so that the convergence process of the image recognition model is faster and the accuracy is higher.

Step S102, obtaining a label image corresponding to a sample image; the label image includes a region label to which the sample object belongs.

In the method, the boundary of the area where the sample object is located in the sample image is labeled (for example, in an artificial stroking labeling mode), so that the labeled area where the sample object belongs is obtained, and thus the area labeled image corresponding to the sample image and containing the labeled area can be obtained; then, the region label image can be subjected to binarization processing to obtain a binary label image; and performing filtering smoothing processing on the labeled area in the binary labeled image to obtain an area label corresponding to the labeled area. The filtering and smoothing processing here may adopt filtering and smoothing processing modes such as median filtering, mean filtering, and the like.

Taking the median filtering smoothing processing mode as an example, the median filtering of the fixed-size kernel may be used to process the binary labeled image, and the median filtering may generate a smoothing effect on the edge (e.g., labeled region) of the region to which the sample object belongs in the binary labeled image. For the benefit of the median filtering, please refer to fig. 4c and 4d together, and fig. 4c and 4d are a comparison graph of experimental data provided in the embodiment of the present application. As shown in fig. 4c, fig. 4c is a binary labeled image before median filtering processing, where a region Q is a labeled region to which a sample object in the binary labeled image belongs; fig. 4d is a binary labeled image after the median filtering process. Comparing fig. 4c with fig. 4d, it can be seen that the edge of the labeling area Q after the median filtering process is smoother. It should be understood that, in a binary labeled image with a burr edge, after the median filtering process, the inconsistent parts (e.g., the concave and convex parts of the hair and other regions) in the edge of the sample object (e.g., the portrait) in the binary labeled image are smoothed, so that the occurrence of the boundary region definition blurring problem caused by the inconsistent edge labeling is reduced, and the model convergence is facilitated.

It should be appreciated that the label image may be used to train the image recognition model, and the label image may serve as a standard for training the image recognition model, so that the predicted region output by the image recognition model may be closer to the region label in the label image. When the image recognition model is trained, in order to reduce the problem of nonuniform definition of the edges of the label graph, median filtering of a kernel with a fixed size can be adopted for training.

Step S103, according to the prediction area and the area label, the classification loss value of the sample image is determined.

According to the prediction region and the region label output by the image recognition model, a classification loss value can be determined, and a specific method can be that prediction pixel points can be obtained in the prediction region; in the area label, area marking points can be obtained; then, a classification loss function can be obtained, and a classification loss value of the sample image can be generated according to the prediction pixel point, the region labeling point and the classification loss function.

It is understood that, for a specific implementation of determining the classification loss value of the sample image, the following formula (1) may be used:

wherein L is_ceCan be used to characterize the classification loss value of the sample image,

may be used to characterize a pixel value (e.g., the ith pixel value) in the region label α_iCan be used to characterize a pixel in the predicted region corresponding to the region label (e.g., the predicted pixel i corresponding to the ith pixel value in the region label). By this equation (1), a classification error between the prediction region and the region label can be obtained.

And step S104, acquiring a region mask of the region label, and determining a region constraint loss value of the sample image according to the predicted region and the region mask.

In the present application, a region constraint loss value may be determined according to a prediction region and a region label output by an image recognition model, and a specific method may be that a region mask of an edge region (region label) of a sample object is obtained by performing dilation morphological processing on a label image and performing erosion morphological processing on the label image, and the region constraint loss value of the sample image may be obtained according to the edge region mask. The specific method for determining the edge area mask may be to perform expansion morphological processing on the label image to obtain an expanded label image; carrying out corrosion morphological treatment on the label image to obtain a corrosion label image; then, in the expansion label image, an expansion area labeling point in an expansion area label to which the sample object belongs can be obtained; obtaining an erosion area marking point in an erosion area label to which the sample object belongs in the erosion label image; a difference value between the expanded region mark point and the erosion region mark point is determined, and the difference value can be used as a region mask (edge region mask) of the region label.

It will be appreciated that for a specific implementation of the area mask for determining the area label, the following may be implemented as shown in equation (2):

wherein, the RM can be used for characterizing the area mask corresponding to the area label,

can be used to characterize a pixel value (e.g., the ith pixel value) in the region label, one pixel value is a region label point,

can be used to characterize morphological dilation operations (dilation morphological processing);

can be used to characterize morphological etching operations (etch morphology processing). Will be provided with

And

and performing subtraction, namely performing subtraction between the expansion region label after the expansion form treatment and the corrosion region label after the corrosion form treatment to obtain a difference value, namely the region mask corresponding to the region label. The area mask herein can better determine the location of the area label (i.e., the area where the sample object is located) in the sample image.

Further, after the area label is obtained, a specific method for determining the area constraint loss value according to the area label may be that, in the predicted area, a predicted pixel point may be obtained; in the area label, area marking points can be obtained; then, a region detection operator can be obtained, and according to the region detection operator, the prediction pixel point and the region mask, a first gradient feature corresponding to the prediction region can be determined; similarly, according to the region detection operator, the region marking point and the region mask, a second gradient feature corresponding to the region label can be determined; subsequently, a region constraint loss function may be obtained, and a region constraint loss value of the sample image may be generated according to the first gradient feature, the second gradient feature, and the region constraint loss function.

It can be understood that, for a specific implementation of determining the area constraint penalty value, it can be as shown in formula (3):

wherein L is_gradCan be used to characterize the regional constraint loss value G (RM α)_iCan be used to characterize the gradient of a pixel value (e.g., the ith pixel value) in the predicted region according to the region mask RM G (RM α)^gt)_iCan be used to characterize the gradient of a pixel value (e.g., the ith pixel value) in the area label, i.e., an area label point, according to the area mask RM.

Wherein for determining G (RM α)_iThe specific implementation manner of (3) can be as shown in formula (4):

G(RM*α)_i＝(S*RM*α)_i-(S^T*RM*α)_iformula (4)

Wherein S can be used to characterize a region detection operator (e.g., Sobel operator) in the x-direction; s^TCan be used to characterize the region detection operator in the y-direction (e.g., Sobel operator). it will be appreciated that by associating the region detection operator S in the x-direction with the region mask RM and with the pixel values α on the prediction region_iMultiplying to detect the region in y direction by an operator S^TAnd the region mask RM and the pixel values α on the prediction region_iThe pixel value α can be obtained by multiplying and then subtracting the two multiplication results_iCorresponding gradient characteristics.

Wherein for determining G (RM α)^gt)_iThe specific implementation manner of (3) can be as shown in formula (5):

G(RM*α)_i＝(S*RM*α^gt)_i-(S^T*RM*α^gt)_iformula (5)

Wherein S can be used to characterize a region detection operator (e.g., Sobel operator) in the x-direction; s^TCan be used to characterize yDirectional region detection operators (e.g., Sobel operators); it will be appreciated that by associating the region detection operator S in the x-direction with the region mask RM and the pixel values on the region labels

Multiplying to detect the region in y direction by an operator S^TWith the area mask RM and the pixel values on the area label

The pixel value can be obtained by multiplying the two multiplication results and then subtracting the two multiplication results

Corresponding gradient characteristics.

It is understood that, according to formula (3), formula (4) and formula (5), formula (6) can be obtained:

wherein equation (6) may be used to characterize the pixel value α_iCorresponding gradient feature (S RM α)_i-(S^T*RM*α)_iAnd pixel value

Corresponding gradient characteristics (S RM α)^gt)_i-(S^T*RM*α^gt)_iA subtraction is performed. This equation (6) can be used to characterize the determination of the region constraint penalty L from the region detection operator, the region mask RM, the prediction region, and the region label_gradThe detailed implementation of (1).

Step S105, determining a pixel constraint loss value of the sample image according to the prediction region and the region label.

According to the prediction region and the region label output by the image recognition model, a pixel constraint loss value can be determined, and the specific method can be that in the prediction region, a prediction pixel point can be obtained, and a first color pixel value corresponding to the prediction pixel point can be generated according to a color channel pixel value; in the area label, an area marking point (a pixel value) can be obtained, and a second color pixel value corresponding to the area marking point can be generated according to the color channel pixel value; subsequently, a pixel constraint loss function may be obtained, from which the pixel constraint loss values for the sample image may be generated.

It will be appreciated that for a specific implementation of determining the pixel constraint penalty value, it can be as shown in equation (7):

wherein L is_compCan be used to characterize pixel constraint penalty values; i is_ijCan be used to characterize the color channel pixel values of the sample image at the (i, j) th location α_ijCan be used to characterize the pixel point of the prediction region at the (i, j) th position, α^gt _ijCan be used to characterize the pixel point of the region label at the (i, j) th position. Therein, it is understood that the color channel pixel value may be standard color channel (RGB) information, I, of the input sample image_ijIt can be understood as the pixel value of the input RBG original sample image at the (i, j) position.

And S106, determining a target loss value of the sample image according to the classification loss value, the pixel constraint loss value and the region constraint loss value, adjusting the image recognition model according to the target loss value to obtain a target image recognition model, and performing image recognition processing based on the target image recognition model.

In the present application, the classification loss value, the area constraint loss value, and the pixel constraint loss value may be fused to obtain a total target loss value. The specific mode can be that a first model balance parameter and a second model balance parameter are obtained; multiplying the first model balance parameter by the region constraint loss value to obtain a first balance loss value; multiplying the second model balance parameter by the pixel constraint loss value to obtain a second balance loss value; subsequently, the classification loss value, the first balance loss value and the second balance loss value may be added to obtain a target loss value corresponding to the sample image.

It is understood that, for a specific implementation of determining the target loss value of the sample image, it can be shown as formula (8):

L＝L_ce+αL_grad+βL_compformula (8)

Wherein, L may be used to characterize the target loss value, α may be used to characterize the balance coefficient (i.e., the first model balance parameter), and β may also be used to characterize the balance coefficient (i.e., the second model balance parameter); the values of α and β may be specified values, for example, α is 0.5, β is 1, or other values, which is not limited in this application. The larger the value of alpha is, the stronger the image recognition model is in constraint on the edge of the area label.

It should be understood that the target penalty value in this application incorporates the classification penalty value L_ceRegion constraint loss value L_gradAnd a pixel constraint penalty value L_compWherein the classification loss value L_ceIt can be understood that the classification error between the prediction region and the region label can be obtained by converting the problem of the identification (e.g., identification of the prediction region) of the image into the classification problem; while the area constraint penalty L_gradIntroducing edge constraint on an image segmentation result (a prediction region to which a sample object belongs) of an image recognition model; pixel constraint penalty L_compRGB information (color channel pixel values) of an RGB original sample image of the sample image is introduced, and pixel constraint between a prediction region output by the image recognition model and a sample object is enhanced; by applying a classification penalty value L_ceRegion constraint loss value L_gradAnd a pixel constraint penalty value L_compCombining to obtain a target loss value, the target loss value can not only include the classification error between the prediction region and the region label, but also increase the sample object by introducing the edge constraint and the pixel constraintThe image recognition model is trained by the target loss value composed of the three loss values, so that the recognition accuracy of the trained image recognition model is higher.

The specific method for training the image recognition model according to the target loss value may be to determine whether the target loss value satisfies a model convergence condition, adjust the image recognition model according to the target loss value if the target loss value does not satisfy the model convergence condition (for example, adjust model parameters in the image recognition model), perform a new round of training after the adjustment to obtain a new target loss value, and if the new target loss value satisfies the model convergence condition, consider that a prediction result of the image recognition model at this time is accurate enough, determine the image recognition model after the adjustment as the target image recognition model without adjusting the image recognition model according to the new target loss value. It will be appreciated that based on the target image recognition model, an image recognition process may be performed.

In the embodiment of the application, a sample image is input into an image recognition model, a prediction region of a sample object in the sample image can be output through the image recognition model, and a classification loss value, a region constraint loss value and a pixel constraint loss value can be determined through the prediction region and a region label of the sample image. Wherein the classification loss value can be used to characterize a classification error between the prediction region and the region label; the region constraint loss value is determined by a region mask of the region label, the prediction region and the region label, wherein the region mask is a coverage region corresponding to the region label in the sample image and can effectively position the position of the target object, that is, the region constraint loss value introduces the actual position information of the target object in the sample image, can enhance the edge constraint between the prediction region and the sample object, and further can enhance the discrimination between the prediction region and the region label; the pixel constraint loss value introduces a color channel pixel value (RGB information) of the sample image, so that the pixel constraint between the prediction region and the sample object can be enhanced, and the discrimination between the prediction region and the region label can be further enhanced. In summary, the classification loss value, the region constraint loss value and the pixel constraint loss value all enhance the discrimination between the prediction region and the region label from different dimensions, so that the target loss value obtained through the three loss values can accurately express the difference between the prediction region and the region label, and the image recognition model is trained through the target loss value, so that the prediction region output by the image recognition model is closer to the region label more and more, that is, the region to which the target object belongs in the input image can be accurately recognized through the target image recognition model obtained through the training of the target loss value.

For ease of understanding, please refer to fig. 5, and fig. 5 is a schematic flowchart of a model application provided in an embodiment of the present application. As shown in fig. 5, the process may include:

step S201, acquiring a target image, and inputting the target image into a target image recognition model.

In the application, the target image recognition model may be a model obtained by training an image recognition model, and the target image recognition model may be applied to an image recognition scene.

Step S202, in the target image recognition model, a target area to which a target object in the target image belongs is recognized.

In this application, the target image may include a target object (e.g., a portrait, an animal, etc.), and the target area may refer to an area in the target image where the target object is identified by the target image identification model, and the target area may be an area surrounded by a boundary of the target object. For example, as described above in the embodiment corresponding to fig. 2, the region P may be a target region, and the region P refers to a region of the target object B in the target image 20a, and it can be seen that the region P is determined by a boundary (edge) of the target object B.

For a specific implementation manner of step S201 to step S202, refer to step S101 in the embodiment corresponding to fig. 3, and details will be described again for a description of a prediction region to which the image recognition model recognizes the sample object.

Step S203, mark the boundary of the target area to obtain a marked boundary.

In the present application, the marking method may be a method of displaying the boundary (edge) of the target area in a bold manner, a method of displaying the boundary in an additive color, or the like, and the present application is not limited to the method of marking the boundary. The significance of marking the boundary is to highlight the target area where the target object is located.

And step S204, outputting the target image carrying the mark boundary.

In the application, after the target image recognition model marks the boundary, the target image carrying the marked boundary may be output, and then after the target image recognition model outputs the target image carrying the marked boundary, it can be understood that subsequent processing may be performed based on the target image carrying the marked boundary, for example, a region other than the marked boundary may be obtained in the target image carrying the marked boundary as a background region; subsequently, a material list can be displayed in response to the material adding operation aiming at the background area; further, the background area may be updated to a target background area having target material in response to a material selection operation for the material list; wherein, the target material is the material selected by the material selection operation; further, a target image including the target area and the target background area may be output. The material may refer to a special effect material (e.g., a solid background material, a quadratic element effect material, etc.), that is, after the target image carrying the mark boundary is obtained, a background region outside the mark boundary may be extracted, and then only the background region is specially processed, while the target object inside the mark boundary is kept unchanged.

For easy understanding, please refer to fig. 6a, and fig. 6a is a schematic view of a scene in which a target image recognition model is applied according to an embodiment of the present application. The service server shown in fig. 6a may be the service server 1000, and the user terminal E shown in fig. 6a may be any one user terminal selected from the user terminal cluster in the embodiment corresponding to fig. 1, for example, the user terminal may be the user terminal 100 a.

As shown in fig. 6a, the user E clicks the image upload button in the game home page interface of the user terminal E, the user terminal E may respond to the click operation of the user E to create an image selection interface for the user E, and the user E may view all images in the own computer in the image selection interface and may select images in the image selection interface. As shown in fig. 6a, if the user E selects the image 50d in the image selection interface, the image 50d may be the target image. As shown in fig. 6a, the user terminal E may create a material selection interface in which the user E may select a material to update the background area in the target image 50d to obtain a target background area with the material. As shown in fig. 6a, if the material selected by the user E is a material C (zoo background material), the material C can be used as a target material. Subsequently, the user terminal E may send the target image 50d to a service server, and the service server may send the target image to a target image recognition model, and through the target image recognition model, a target area (for example, an area P shown in fig. 6 a) where the target object M is located in the target image may be recognized, and a boundary of the area P is marked, so as to obtain an area P with a marked boundary. Further, the service server may return the target image 50d with the mark boundary to the user terminal E, the user terminal E may determine a background region (a region other than the region P) of the target image 50d based on the region P with the mark boundary, and then the user terminal E may switch the background region, as shown in fig. 6a, the user terminal E switches the background region of the target image 50d to a target background region with the target material C (zoo background material), so that the target image 50d including the target background region and the target object M may be obtained. The user E can view the target image 50d including the background area and the target object M on the display page of the user terminal E.

Optionally, it may be understood that, based on the target image carrying the mark boundary, there may be other applications, for example, according to the mark boundary, a region image including a target object may be extracted from the target image, and in the target image recognition model, target part category information of the target object in the region image may be recognized; the target portion type information is a type to which a target portion in the target object belongs, for example, the target object is a portrait, and the target portion may refer to a type (for example, eyes, nose, lips, eyebrows, hands, neck, etc.) to which a portion in the portrait belongs. Then, a material information base can be obtained, wherein the material information base comprises at least two virtual material data, and one virtual material data corresponds to one part type information; in the material information base, virtual material data matched with the target part category information can be acquired as target virtual material data; subsequently, the target portion in the target object may be switched to the target virtual material data to obtain virtual portion data, so that the target object including the virtual portion data may be obtained.

For easy understanding, please refer to fig. 6b, and fig. 6b is a schematic view of a scene in which a target image recognition model is applied according to an embodiment of the present application. The service server shown in fig. 6b may be the service server 1000, and the user terminal V shown in fig. 6b may be any one user terminal selected from the user terminal cluster in the embodiment corresponding to fig. 1, for example, the user terminal may be the user terminal 100 b.

As shown in fig. 6b, the user V uploads a target image 60a through the user terminal V, wherein the target image includes the target object T. The user terminal V may send the target image 60a to a service server, which may input the target image 60a into a target image recognition model. A target area (such as an area R shown in fig. 6 b) where the target object T is located can be identified through the target image identification model, and the boundary of the area R is marked to obtain an area R with a marked boundary; subsequently, a region image including the target object T may be extracted from the target image 60a according to the region P with the marked boundary; subsequently, in the target image recognition model, target part type information in the target object T, which is "eyebrow", "eye", "nose", "mouth", and "ear", respectively, as shown in fig. 6b, can be recognized; the service server may then return the identification result to the user terminal V.

Further, the user terminal V may acquire a material information library in which target virtual materials corresponding to the target portion category information are acquired, wherein the target virtual materials corresponding to the "eyebrow", "eye", "nose", "mouth", and "ear" may be shown in fig. 6b, and then, the target portions (including the "eyebrow", "eye", "nose", "mouth", and "ear") in the target object T in the region image may be replaced with the corresponding target virtual materials, so that a region image including the target virtual materials may be obtained as shown in fig. 6b, and a target image 60a including the target virtual materials may be obtained. The user terminal V can display the target image 60a containing the target virtual material in an image display interface, and the user V can view the target image 60a containing the target virtual material in the image display interface.

In the embodiment of the application, a sample image is input into an image recognition model, a prediction region of a sample object in the sample image can be output through the image recognition model, and a classification loss value, a region constraint loss value and a pixel constraint loss value can be determined through the prediction region and a region label of the sample image. Wherein the classification loss value can be used to characterize a classification error between the prediction region and the region label; the region constraint loss value is determined by a region mask of the region label, the prediction region and the region label, wherein the region mask is a coverage region corresponding to the region label in the sample image and can effectively position the position of the target object, that is, the region constraint loss value introduces the actual position information of the target object in the sample image, can enhance the edge constraint between the prediction region and the sample object, and further can enhance the discrimination between the prediction region and the region label; the pixel constraint loss value introduces a color channel pixel value (RGB information) of the sample image, so that the pixel constraint between the prediction region and the sample object can be enhanced, and the discrimination between the prediction region and the region label can be further enhanced. In summary, the classification loss value, the region constraint loss value, and the pixel constraint loss value all enhance the discrimination between the prediction region and the region label from different dimensions, so that the target loss value obtained through the three loss values can accurately express the difference between the prediction region and the region label, and the image recognition model is trained through the target loss value, so that the prediction region output by the image recognition model is closer to the region label more and more, that is, the region to which the target object belongs in the input image (e.g., the target image) can be accurately recognized through the target image recognition model obtained through the training of the target loss value.

Referring to fig. 7, fig. 7 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. As shown in fig. 7, the method may be performed by a user terminal (e.g., the user terminal shown in fig. 1) or a service server (e.g., the service server 1000 in the embodiment corresponding to fig. 1), or may be performed by both the user terminal and the service server (e.g., the service server 1000 in the embodiment corresponding to fig. 1). For ease of understanding, the present embodiment is described as an example in which the method is executed by the user terminal described above. Wherein, the image processing method at least comprises the following steps S101-S104:

step S301, a sample image is obtained, the sample image is input into an image recognition model, and a prediction region to which a sample object in the sample image belongs is output through the image recognition model.

Step S302, obtaining a label image corresponding to the sample image; the label image includes a region label to which the sample object belongs.

In this application, for a specific implementation manner of steps S301 to S302, refer to the description of steps S101 to S102 in the embodiment corresponding to fig. 3, which will not be described herein again.

Step S303, obtaining a region mask corresponding to the region label, and generating a target loss value of the sample image according to the region mask, the prediction region, and the region label.

In this application, according to the prediction region and the region label, a classification loss value of the sample image may be generated, where the classification loss value may represent a classification error between the prediction region and the region label, and for a specific implementation manner of generating the classification loss value of the sample image, reference may be made to the description of determining the classification loss value in step S103 in the embodiment corresponding to fig. 3, which will not be described again here.

In this application, according to the area mask, the prediction area, and the area label, the area constraint loss value of the sample image may be generated, and for a specific implementation manner of generating the area constraint loss value, reference may be made to the description of determining the area constraint loss value in step S104 in the embodiment corresponding to fig. 3, which will not be described again here.

In this application, the pixel constraint loss value of the sample image may be determined according to the color channel pixel value corresponding to the sample image, that is, the color channel pixel value corresponding to the sample image is obtained, and according to the color channel pixel value, the prediction region, and the region label, the pixel constraint loss value of the sample image may be generated, and for a specific implementation manner of generating the pixel constraint loss value according to the color channel pixel value, the prediction region, and the region label, reference may be made to the description of step S105 in the embodiment corresponding to fig. 3, which will not be described herein again.

Further, the classification loss value, the area constraint loss value and the pixel constraint loss value of the sample image may be fused to obtain a target loss value of the sample image. For a specific implementation manner of generating the target loss value according to the classification loss value, the area constraint loss value, and the pixel constraint loss value of the sample image, reference may be made to the description of determining the target loss value in step S106 in the embodiment corresponding to fig. 3, which will not be described herein again.

And step S304, adjusting the image recognition model according to the target loss value to obtain a target image recognition model, and performing image recognition processing based on the target image recognition model.

In this application, for a specific implementation manner of step S304, refer to the description of the model adjustment in step S106 in the embodiment corresponding to fig. 3, which will not be described herein again.

Further, please refer to fig. 8, where fig. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The image processing apparatus may be a computer program (including program code) running on a computer device, for example, the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 8, the image processing apparatus 1 may include: a sample image acquisition module 11, a prediction result output module 12, a label image acquisition module 13, a first loss value determination module 14, a mask acquisition module 15, a second loss value determination module 16, a third loss value determination module 17, a target loss value determination module 18, and a model adjustment module 19.

A sample image obtaining module 11, configured to obtain a sample image;

a prediction result output module 12, configured to input the sample image into an image recognition model, and output a prediction region to which the sample object in the sample image belongs through the image recognition model;

a label image obtaining module 13, configured to obtain a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

a first loss value determining module 14, configured to determine a classification loss value of the sample image according to the prediction region and the region label;

a mask acquiring module 15, configured to acquire an area mask of the area label;

a second loss value determining module 16, configured to determine a region constraint loss value of the sample image according to the prediction region and the region mask;

a third loss value determining module 17, configured to determine a pixel constraint loss value of the sample image according to the prediction region and the region label;

a target loss value determining module 18, configured to determine a target loss value of the sample image according to the classification loss value, the pixel constraint loss value, and the region constraint loss value;

and the model adjusting module 19 is configured to adjust the image recognition model according to the target loss value to obtain a target image recognition model, and perform image recognition processing based on the target image recognition model.

For specific implementation manners of the sample image obtaining module 11, the prediction result output module 12, the label image obtaining module 13, the first loss value determining module 14, the mask obtaining module 15, the second loss value determining module 16, the third loss value determining module 17, the target loss value determining module 18, and the model adjusting module 19, reference may be made to the descriptions of step S101 to step S104 in the embodiment corresponding to fig. 3, and details will not be described here.

referring to fig. 8, the prediction result output module 12 may include: an image feature extraction unit 121, a feature convolution unit 122, and a prediction result determination unit 123.

An image feature extraction unit 121, configured to input the sample image into an image recognition model, and extract an image feature of the sample image through the image recognition model; the image features comprise at least two image channel features;

the feature convolution unit 122 is configured to input the at least two image channel features into the depth-by-depth feature convolution component, and perform convolution processing on the at least two image channel features through at least two depth-by-depth convolution kernels in the depth-by-depth feature convolution component, respectively, to obtain a convolution image feature; one depth-wise convolution kernel corresponds to one image channel feature;

and a prediction result determining unit 123, configured to determine, according to the feature of the convolved image, a prediction region to which the sample object in the sample image belongs.

For specific implementation manners of the image feature extraction unit 121, the feature convolution unit 122, and the prediction result determination unit 123, reference may be made to the description in step S101 in the embodiment corresponding to fig. 3, and details will not be repeated here.

referring to fig. 8, the feature convolution unit 122 may include: a feature input subunit 1221, a channel feature convolution subunit 1222, a channel feature convolution subunit 1223, and a feature stitching subunit 1224.

A feature input sub-unit 1221, configured to input features of at least two image channels into the depth-wise feature convolution component;

a channel feature convolution subunit 1222, configured to perform convolution processing on the first image channel feature through a first depth-by-depth convolution kernel in the depth-by-depth feature convolution component, so as to obtain a first convolution channel feature;

the channel feature convolution subunit 1223 is further configured to perform convolution processing on the second image channel feature through a second depth-by-depth convolution kernel in the depth-by-depth feature convolution component to obtain a second convolution channel feature;

and a feature splicing subunit 1224, configured to splice the first convolution channel feature and the second convolution channel feature, so as to generate a convolution image feature.

For a specific implementation manner of the feature input subunit 1221, the channel feature convolution subunit 1222, the channel feature convolution subunit 1223, and the feature splicing subunit 1224, reference may be made to the description in step S101 in the embodiment corresponding to fig. 3, and details will not be described here.

The image recognition model further comprises a feature fusion component;

the prediction result determination unit 123 may include: a normalization subunit 1231, a feature fusion subunit 1232, and a prediction result determination subunit 1233.

A normalization subunit 1231, configured to input the feature of the convolution image into the feature fusion component, and perform normalization processing on the feature of the convolution image through a normalization layer in the feature fusion component to obtain a standard image feature;

a feature fusion subunit 1232, configured to input the standard image features into a feature fusion layer in the feature fusion component, and perform convolution processing on the standard image features in the feature fusion layer to generate fusion image features;

and the prediction result determining subunit 1233 is configured to determine, according to the feature of the fused image, a prediction region to which the sample object in the sample image belongs.

For a specific implementation manner of the normalization subunit 1231, the feature fusion subunit 1232, and the prediction result determination subunit 1233, reference may be made to the description in step S101 in the embodiment corresponding to fig. 3, which will not be described herein again.

Referring to fig. 8, the tag image obtaining module 13 may include: an annotation image acquisition unit 131, a binary processing unit 132, a filter smoothing unit 133, and a label image determination unit 134.

An annotation image obtaining unit 131, configured to obtain a region annotation image corresponding to the sample image; the region labeling image comprises a labeling region to which the sample object belongs;

a binary processing unit 132, configured to perform binarization processing on the region labeling image to obtain a binary labeling image;

a filtering and smoothing unit 133, configured to perform filtering and smoothing processing on an labeled region in the binary labeled image to obtain a region label;

a label image determining unit 134, configured to determine a binary annotation image containing a region label as a label image.

For specific implementation manners of the annotation image obtaining unit 131, the binary processing unit 132, the filtering smoothing unit 133, and the label image determining unit 134, reference may be made to the description in step S102 in the embodiment corresponding to fig. 3, and details will not be repeated here.

Referring to fig. 8, the first loss value determining module 14 may include: a predicted pixel point obtaining unit 141, a labeling point obtaining unit 142, and a first loss value generating unit 143.

A prediction pixel point obtaining unit 141, configured to obtain a prediction pixel point in the prediction region;

a labeling point obtaining unit 142, configured to obtain a region labeling point from the region label;

the first loss value generating unit 143 is configured to obtain a classification loss function, and generate a classification loss value of the sample image according to the prediction pixel, the region labeling point, and the classification loss function.

For specific implementation manners of the predicted pixel point obtaining unit 141, the annotation point obtaining unit 142, and the first loss value generating unit 143, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, which will not be described herein again.

Referring to fig. 8, the mask acquiring module 15 may include: an expansion processing unit 151, an erosion processing unit 152, an expansion mark point acquisition unit 153, an erosion mark point acquisition unit 154, and a difference value determination unit 155.

An expansion processing unit 151 configured to perform expansion morphological processing on the label image to obtain an expanded label image;

an erosion processing unit 152, configured to perform erosion morphological processing on the label image to obtain an erosion label image;

an expansion labeling point obtaining unit 153, configured to obtain an expansion region labeling point in an expansion region label to which the sample object belongs in the expansion label image;

an erosion marking point obtaining unit 154, configured to obtain, in the erosion label image, an erosion area marking point in an erosion area label to which the sample object belongs;

a difference value determining unit 155, configured to determine a difference value between the expanded region labeling point and the erosion region labeling point as a region mask of the region label.

For a specific implementation manner of the expansion processing unit 151, the erosion processing unit 152, the expansion mark point obtaining unit 153, the erosion mark point obtaining unit 154, and the difference value determining unit 155, reference may be made to the description of obtaining the area mask in step S104 in the embodiment corresponding to fig. 3, which will not be described again here.

Referring to fig. 8, the second loss value determining module 16 may include: a pixel point acquisition unit 161, a gradient feature generation unit 162, a gradient feature generation unit 163, and a second loss value generation unit 164.

A pixel point obtaining unit 161, configured to obtain a predicted pixel point in the predicted region;

the pixel point obtaining unit 161 is further configured to obtain a region labeling point in the region label;

the gradient feature generation unit 162 is configured to obtain a region detection operator, and determine a first gradient feature corresponding to the prediction region according to the region detection operator, the prediction pixel point, and the region mask;

the gradient feature generating unit 163 is further configured to determine a second gradient feature corresponding to the region label according to the region detection operator, the region labeling point, and the region mask;

the second loss value generating unit 164 is configured to obtain a region constraint loss function, and generate a region constraint loss value of the sample image according to the first gradient feature, the second gradient feature, and the region constraint loss function.

For specific implementation manners of the pixel point obtaining unit 161, the gradient feature generating unit 162, the gradient feature generating unit 163, and the second loss value generating unit 164, reference may be made to the description in step S104 in the embodiment corresponding to fig. 3, and details will not be described here.

Referring to fig. 8, the third loss value determining module 17 may include: a color pixel value generating unit 171 and a third loss value generating unit 172.

A color pixel value generating unit 171, configured to obtain a prediction pixel point in the prediction region, and generate a first color pixel value corresponding to the prediction pixel point according to the color channel pixel value;

the color pixel value generating unit 171 is further configured to obtain a region labeling point in the region label, and generate a second color pixel value corresponding to the region labeling point according to the color channel pixel value; (ii) a

The third loss value generating unit 172 is configured to obtain a pixel constraint loss function, and generate a pixel constraint loss value of the sample image according to the first color pixel value, the second color pixel value, and the pixel constraint loss function.

For a specific implementation of the color pixel value generating unit 171 and the third loss value generating unit 172, reference may be made to the description of step S105 in the embodiment corresponding to fig. 3, which will not be repeated herein.

Referring to fig. 8, the target loss value determination module 18 may include: a parameter acquisition unit 181 and an arithmetic processing unit 182.

A parameter obtaining unit 181, configured to obtain a first model balance parameter and a second model balance parameter;

an arithmetic processing unit 182, configured to multiply the first model balance parameter and the area constraint loss value to obtain a first balance loss value;

the operation processing unit 182 is further configured to multiply the second model balance parameter by the pixel constraint loss value to obtain a second balance loss value;

the arithmetic processing unit 182 is further configured to add the classification loss value, the first balance loss value, and the second balance loss value to obtain a target loss value of the sample image.

For a specific implementation manner of the parameter obtaining unit 181 and the operation processing unit 182, reference may be made to the description of determining the target loss value in step S106 in the embodiment corresponding to fig. 3, which will not be described herein again.

Referring to fig. 8, the image processing apparatus 1 may further include: a target image acquisition module 20, a target area identification module 21, a boundary marking module 22 and an image output module 23.

A target image obtaining module 20, configured to obtain a target image, and input the target image into a target image recognition model;

a target area identification module 21, configured to identify, in the target image identification model, a target area to which a target object in the target image belongs;

a boundary marking module 22, configured to mark a boundary of the target area to obtain a marked boundary;

and the image output module 23 is configured to output a target image carrying the mark boundary.

For specific implementation manners of the target image obtaining module 20, the target area identifying module 21, the boundary marking module 22, and the image output module 23, reference may be made to the descriptions of step S201 to step S204 in the embodiment corresponding to fig. 5, and details will not be repeated here.

Referring to fig. 8, the image processing apparatus 1 may further include: a background area determination module 24, a list presentation module 25, an area update module 26, and an output module 27.

A background region determining module 24, configured to obtain, in a target image carrying a mark boundary, a region outside the mark boundary as a background region;

a list display module 25, configured to respond to a material addition operation for the background area, and display a material list;

a region updating module 26, configured to update the background region to a target background region with target materials in response to a material selection operation for the material list; the target material is a material selected by the material selection operation;

and an output module 27, configured to output a target image including a target area and a target background area.

For specific implementation manners of the background area determining module 24, the list displaying module 25, the area updating module 26, and the output module 27, reference may be made to the description in step S204 in the embodiment corresponding to fig. 5, which will not be described herein again.

Referring to fig. 8, the image processing apparatus 1 may further include: a region image extraction module 28, a material information acquisition module 29, a target material determination module 30, a part switching module 31, and an object output module 32.

A region image extracting module 28, configured to extract a region image including the target object from the target image according to the mark boundary, and identify, in the target image identification model, target part type information of the target object in the region image; the target part type information is the type of the target part in the target object;

a material information acquisition module 29 for acquiring a material information base; the material information base comprises at least two virtual material data, and one virtual material data corresponds to one part category information;

a target material determining module 30, configured to obtain, in a material information base, virtual material data that matches the target portion category information, as target virtual material data;

a part switching module 31, configured to switch a target part in a target object to target virtual material data to obtain virtual part data;

and an object output module 32, configured to output a target object including the virtual location data.

For specific implementation manners of the region image extraction module 28, the material information acquisition module 29, the target material determination module 30, the part switching module 31, and the object output module 32, reference may be made to the description in step S204 in the embodiment corresponding to fig. 5, which will not be described herein again.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus may be a computer program (including program code) running on a computer device, for example, the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 9, the image processing apparatus 2 may include: a prediction region determination module 100, a region label acquisition module 200, a target loss value generation module 300, and a target model determination module 400.

A prediction region determining module 100, configured to obtain a sample image, input the sample image into an image recognition model, and output a prediction region to which a sample object in the sample image belongs through the image recognition model;

an area label obtaining module 200, configured to obtain a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

a target loss value generating module 300, configured to obtain a region mask corresponding to the region label, and generate a target loss value of the sample image according to the region mask, the prediction region, and the region label;

and the target model determining module 400 is configured to adjust the image recognition model according to the target loss value to obtain a target image recognition model, and perform image recognition processing based on the target image recognition model.

For specific implementation manners of the prediction region determining module 100, the region label obtaining module 200, the target loss value generating module 300, and the target model determining module 400, reference may be made to the descriptions of step S301 to step S304 in the embodiment corresponding to fig. 7, which will not be described herein again.

It is understood that the image processing apparatus 2 in the embodiment of the present application can perform the description of the image processing method in the embodiment corresponding to fig. 7, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, please refer to fig. 10, where fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, both the apparatus 1 in the embodiment corresponding to fig. 8 and the apparatus 2 in the embodiment corresponding to fig. 9 may be applied to the computer device 1000, and the computer device 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 further includes: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

Or, to implement:

It should be understood that the computer device 1000 described in this embodiment of the application may perform the description of the image processing method in the embodiment corresponding to fig. 3 or fig. 7, and may also perform the description of the image processing apparatus 1 in the embodiment corresponding to fig. 8 or the image processing apparatus 2 in the embodiment corresponding to fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned image processing computer device 1000 is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the image processing method in the embodiment corresponding to fig. 3 or fig. 7 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.

The computer-readable storage medium may be the image processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image processing method, comprising:

acquiring a label image corresponding to the sample image; the label image comprises a region label to which the sample object belongs;

acquiring a region mask of the region label, and determining a region constraint loss value of the sample image according to the prediction region and the region mask;

2. The method of claim 1, wherein the image recognition model comprises a depth-by-depth feature convolution component;

the inputting the sample image into an image recognition model, and outputting the prediction region to which the sample object in the sample image belongs through the image recognition model, includes:

inputting the sample image into the image recognition model, and extracting image features of the sample image through the image recognition model; the image features comprise at least two image channel features;

inputting the at least two image channel characteristics into the depth-by-depth characteristic convolution component, and performing convolution processing on the at least two image channel characteristics through at least two depth-by-depth convolution cores in the depth-by-depth characteristic convolution component to obtain convolution image characteristics; one depth-wise convolution kernel corresponds to one image channel feature;

and determining a prediction region to which the sample object in the sample image belongs according to the characteristics of the convolution image.

3. The method of claim 2, wherein the at least two image channel features comprise a first image channel feature and a second image channel feature, and wherein the at least two depth-wise convolution kernels comprise a first depth-wise convolution kernel and a second depth-wise convolution kernel;

the inputting the at least two image channel features into the depth-by-depth feature convolution component, and performing convolution processing on the at least two image channel features respectively through at least two depth-by-depth convolution cores in the depth-by-depth feature convolution component to obtain convolution image features includes:

inputting the at least two image channel features into the depth-by-depth feature convolution component;

performing convolution processing on the first image channel feature through the first depth-by-depth convolution kernel in the depth-by-depth feature convolution component to obtain a first convolution channel feature;

performing convolution processing on the second image channel characteristic through the second depth-by-depth convolution kernel in the depth-by-depth characteristic convolution component to obtain a second convolution channel characteristic;

and splicing the first convolution channel characteristic and the second convolution channel characteristic to generate the convolution image characteristic.

4. The method of claim 3, wherein the image recognition model further comprises a feature fusion component;

the determining a prediction region to which a sample object in the sample image belongs according to the convolutional image feature comprises:

inputting the convolution image features into the feature fusion component, and carrying out normalization processing on the convolution image features through a normalization layer in the feature fusion component to obtain standard image features;

inputting the standard image features into a feature fusion layer in the feature fusion component, and performing convolution processing on the standard image features in the feature fusion layer to generate fused image features;

and determining a prediction region to which the sample object in the sample image belongs according to the fusion image characteristics.

5. The method of claim 1, wherein the obtaining of the label image corresponding to the sample image comprises:

acquiring a region labeling image corresponding to the sample image; the region labeling image comprises a labeling region to which the sample object belongs;

carrying out binarization processing on the region labeling image to obtain a binary labeling image;

filtering and smoothing the labeled region in the binary labeled image to obtain a region label;

and determining the binary annotation image containing the area label as the label image.

6. The method of claim 1, wherein determining the classification loss value of the sample image according to the prediction region and the region label comprises:

obtaining a prediction pixel point in the prediction region;

acquiring a region marking point in the region label;

and acquiring a classification loss function, and generating a classification loss value of the sample image according to the prediction pixel point, the region marking point and the classification loss function.

7. The method of claim 1, wherein obtaining the area mask of the area label comprises:

performing expansion morphological processing on the label image to obtain an expansion label image;

carrying out corrosion morphological processing on the label image to obtain a corrosion label image;

acquiring an expansion region marking point in an expansion region label to which the sample object belongs in the expansion label image;

obtaining an erosion area marking point in an erosion area label to which the sample object belongs in the erosion label image;

and determining a difference value between the expansion region marking point and the corrosion region marking point to be used as a region mask of the region label.

8. The method of claim 1, wherein determining the region constraint loss value for the sample image based on the prediction region and the region mask comprises:

obtaining a prediction pixel point in the prediction region;

acquiring a region marking point in the region label;

acquiring a region detection operator, and determining a first gradient feature corresponding to the prediction region according to the region detection operator, the prediction pixel point and the region mask;

determining a second gradient feature corresponding to the region label according to the region detection operator, the region marking point and the region mask;

and acquiring a region constraint loss function, and generating a region constraint loss value of the sample image according to the first gradient feature, the second gradient feature and the region constraint loss function.

9. The method of claim 1, wherein determining the pixel constraint loss value for the sample image based on the prediction region and the region label comprises:

in the prediction region, obtaining a prediction pixel point, and generating a first color pixel value corresponding to the prediction pixel point according to a color channel pixel value;

obtaining a region marking point in the region label, and generating a second color pixel value corresponding to the region marking point according to the color channel pixel value;

and acquiring a pixel constraint loss function, and generating a pixel constraint loss value of the sample image according to the first color pixel value, the second color pixel value and the pixel constraint loss function.

10. The method of claim 1, wherein determining the target loss value for the sample image based on the classification loss value, the pixel constraint loss value, and the region constraint loss value comprises:

acquiring a first model balance parameter and a second model balance parameter;

multiplying the first model balance parameter and the area constraint loss value to obtain a first balance loss value;

multiplying the second model balance parameter by the pixel constraint loss value to obtain a second balance loss value;

and adding the classification loss value, the first balance loss value and the second balance loss value to obtain a target loss value of the sample image.

11. The method of claim 1, further comprising:

acquiring a target image, and inputting the target image into the target image recognition model;

in the target image recognition model, recognizing a target area to which a target object in the target image belongs;

marking the boundary of the target area to obtain a marked boundary;

and outputting the target image carrying the mark boundary.

12. The method of claim 11, further comprising:

extracting a region image containing the target object from the target image according to the mark boundary, and identifying target part type information of the target object in the region image in the target image identification model; the target part type information is the type of the target part in the target object;

acquiring a material information base; the material information base comprises at least two virtual material data, and one virtual material data corresponds to one part category information;

virtual material data matched with the target part category information are obtained from the material information base and serve as target virtual material data;

switching a target part in the target object to the target virtual material data to obtain virtual part data;

outputting a target object containing the virtual location data.

13. An image processing method, comprising:

14. A computer device, comprising: a processor, a memory, and a network interface;

the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide network communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1-13.

15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-13.