CN112581481A

CN112581481A - Image processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112581481A
Application number: CN202011629334.7A
Authority: CN
Inventors: 王顺飞
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-03-30
Anticipated expiration: 2040-12-30
Also published as: CN112581481B

Abstract

The application relates to an image processing method, an image processing device, an electronic device and a computer readable storage medium. The method comprises the following steps: acquiring a portrait image and a candidate hair image in an image to be processed; respectively carrying out binarization processing on the portrait image and the candidate hair image to obtain a corresponding first portrait mask image and a corresponding first hair mask image; determining a hair region to be filtered in the first hair mask image according to the first portrait mask image and the first hair mask image; and screening out a target hair image of the portrait subject from the candidate hair images based on the hair region to be filtered. By adopting the method, the hair area of the passerby in the image can be identified, so that the hair area of the portrait main body can be accurately screened out.

Description

Image processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Portrait blurring is generally performed by firstly segmenting a portrait, separating the portrait from a background in an image, and using the separated portrait for subsequent blurring processing to achieve the effect of single-shot photography. In order to improve the blurring effect, the hair of the portrait is further recognized and divided. The traditional method is inaccurate in identifying the hair region of the portrait in the image, so that the subsequent portrait blurring effect is poor.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which can accurately identify a hair area of a portrait.

An image processing method comprising:

acquiring a portrait image and a candidate hair image in an image to be processed;

respectively carrying out binarization processing on the portrait image and the candidate hair image to obtain a corresponding first portrait mask image and a corresponding first hair mask image;

determining a hair region to be filtered in the first hair mask image according to the first portrait mask image and the first hair mask image;

and screening out a target hair image of the portrait subject from the candidate hair images based on the hair region to be filtered.

An image processing apparatus comprising:

the image acquisition module is used for acquiring a portrait image and a candidate hair image in the image to be processed;

the binarization module is used for respectively carrying out binarization processing on the portrait image and the candidate hair image to obtain a corresponding first portrait mask image and a corresponding first hair mask image;

the first determining module is used for determining a hair region to be filtered in the first hair mask image according to the first portrait mask image and the first hair mask image;

and the first screening module is used for screening the target hair image of the portrait main body from the candidate hair image based on the hair region to be filtered.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The image processing method, the image processing device, the electronic equipment and the computer readable storage medium acquire the portrait image and the candidate hair image in the image to be processed, and carry out binarization processing on the portrait image and the candidate hair image respectively to obtain the corresponding first portrait mask image and first hair mask image. According to the first portrait mask image and the first hair mask image, the hair area of the passerby in the first hair mask image can be identified, namely the hair area to be filtered. And screening out the target hair image from the candidate hair image based on the hair region to be filtered, so that the hair region of the passerby in the image can be removed, the target hair of the portrait main body is obtained, and the hair of the portrait main body is accurately identified. Meanwhile, the accuracy of hair identification and segmentation of a multi-person scene can be improved.

A method of training an image recognition model, the method comprising:

obtaining a sample image and a label corresponding to the sample image;

performing image segmentation processing on the sample image through an image recognition model to be trained to obtain a predicted portrait image and a first predicted hair image;

respectively carrying out binarization processing on the predicted portrait image and the first predicted hair image to obtain a corresponding predicted portrait mask image and a corresponding predicted hair mask image;

determining a predicted filtered hair region in the predicted hair mask image according to the predicted portrait mask image and the predicted hair mask image;

screening out a second predicted hair image of the portrait subject from the first predicted hair image based on the predicted hair filtering area;

training the image recognition model based on the difference between the second predicted hair image and the label, and obtaining the trained image recognition model when a training stopping condition is met.

obtaining a sample image and a label corresponding to the sample image;

An apparatus for training an image recognition model, comprising:

the sample acquisition module is used for acquiring a sample image and a label corresponding to the sample image;

the segmentation module is used for carrying out image segmentation processing on the sample image through an image recognition model to be trained to obtain a predicted portrait image and a first predicted hair image;

the processing module is used for respectively carrying out binarization processing on the predicted portrait image and the first predicted hair image to obtain a corresponding predicted portrait mask image and a corresponding predicted hair mask image;

the second determination module is used for determining a predicted filtered hair area in the predicted hair mask image according to the predicted portrait mask image and the predicted hair mask image;

the second screening module is used for screening out a second predicted hair image of the portrait main body from the first predicted hair image based on the predicted hair filtering area;

and the training module is used for training the image recognition model based on the difference between the second predicted hair image and the label, and obtaining the trained image recognition model when the training stopping condition is met.

obtaining a sample image and a label corresponding to the sample image;

According to the training method, the training device, the electronic equipment and the computer-readable storage medium of the image recognition model, the sample image and the corresponding label are used for training the image recognition model to be trained, the hair image of the portrait subject in the sample image, namely the second predicted hair image, is predicted and processed through the image recognition model, the parameter of the image recognition model is adjusted based on the difference between the predicted hair image and the label and training is continued, the trained image recognition model is obtained under the condition that the training stopping condition is met, and the precision and the accuracy of the image recognition model are improved. The trained image recognition model can accurately recognize the hair of the portrait subject and the hair of the passerby from a multi-person scene, so that the hair of the portrait subject is accurately segmented.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram showing an internal configuration of an electronic apparatus according to an embodiment;

FIG. 2 is a flow diagram of a method of image processing in one embodiment;

FIG. 3 is a flow chart of determining a hair region to be filtered out in a first hair mask image based on a first portrait mask image and the first hair mask image in one embodiment;

FIG. 4 is a flow diagram of a method for training an image recognition model in one embodiment;

FIG. 5 is a flowchart of an image processing method in another embodiment;

FIG. 6 is a diagram illustrating an exemplary image recognition model;

FIG. 7 is a diagram of a second portrait mask in one embodiment;

FIG. 8 is a schematic view of a second hair mask in one embodiment;

FIG. 9 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 10 is a block diagram showing the construction of an image recognition model training apparatus according to an embodiment;

fig. 11 is a block diagram showing an internal configuration of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various images, but these images are not limited by these terms. These terms are only used to distinguish a first image from another image. For example, the first hair mask pattern may be referred to as a second hair mask pattern, and similarly, the second hair mask pattern may be referred to as a first hair mask pattern, without departing from the scope of the present application. Both the first hair mask pattern and the second hair mask pattern are guest hair mask patterns, but they are not the same hair mask pattern.

The embodiment of the application provides electronic equipment. The electronic device includes therein an Image Processing circuit, which may be implemented using hardware and/or software components, and may include various Processing units defining an ISP (Image Signal Processing) pipeline. FIG. 1 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 1, for convenience of explanation, only aspects of the image processing technology related to the embodiments of the present application are shown.

As shown in fig. 1, the image processing circuit includes an ISP processor 140 and control logic 150. The image data captured by the imaging device 110 is first processed by the ISP processor 140, and the ISP processor 140 analyzes the image data to capture image statistics that may be used to determine and/or control one or more parameters of the imaging device 110. The imaging device 110 may include a camera having one or more lenses 112 and an image sensor 114. The image sensor 114 may include an array of color filters (e.g., Bayer filters), and the image sensor 114 may acquire light intensity and wavelength information captured with each imaging pixel of the image sensor 114 and provide a set of raw image data that may be processed by the ISP processor 140. The attitude sensor 120 (e.g., three-axis gyroscope, hall sensor, accelerometer) may provide parameters of the acquired image processing (e.g., anti-shake parameters) to the ISP processor 140 based on the type of interface of the attitude sensor 120. The attitude sensor 120 interface may utilize an SMIA (Standard Mobile imaging architecture) interface, other serial or parallel camera interfaces, or a combination of the above.

In addition, the image sensor 114 may also send raw image data to the attitude sensor 120, the sensor 120 may provide the raw image data to the ISP processor 140 based on the type of interface of the attitude sensor 120, or the attitude sensor 120 may store the raw image data in the image memory 130.

The ISP processor 140 processes the raw image data pixel by pixel in a variety of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 140 may perform one or more image processing operations on the raw image data, gathering statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

The ISP processor 140 may also receive image data from the image memory 130. For example, the attitude sensor 120 interface sends raw image data to the image memory 130, and the raw image data in the image memory 130 is then provided to the ISP processor 140 for processing. The image Memory 130 may be a portion of a Memory device, a storage device, or a separate dedicated Memory within an electronic device, and may include a DMA (Direct Memory Access) feature.

Upon receiving raw image data from the image sensor 114 interface or from the attitude sensor 120 interface or from the image memory 130, the ISP processor 140 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 130 for additional processing before being displayed. ISP processor 140 receives processed data from image memory 130 and performs image data processing on the processed data in the raw domain and in the RGB and YCbCr color spaces. The image data processed by ISP processor 140 may be output to display 160 for viewing by a user and/or further processed by a Graphics Processing Unit (GPU). Further, the output of the ISP processor 140 may also be sent to the image memory 130, and the display 160 may read image data from the image memory 130. In one embodiment, image memory 130 may be configured to implement one or more frame buffers.

The statistical data determined by the ISP processor 140 may be transmitted to the control logic 150 unit. For example, the statistical data may include image sensor 114 statistics such as gyroscope vibration frequency, auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, lens 112 shading correction, and the like. The control logic 150 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the imaging device 110 and control parameters of the ISP processor 140 based on the received statistical data. For example, the control parameters of the imaging device 110 may include attitude sensor 120 control parameters (e.g., gain, integration time of exposure control, anti-shake parameters, etc.), camera flash control parameters, camera anti-shake displacement parameters, lens 112 control parameters (e.g., focal length for focusing or zooming), or a combination of these parameters. The ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (e.g., during RGB processing), as well as lens 112 shading correction parameters.

In one embodiment, the image to be processed is acquired by the lens 112 and the image sensor 114 in the imaging device (camera) 110 and sent to the ISP processor 140. After receiving the image to be processed, the ISP processor 140 performs binarization processing on the portrait image and the candidate hair image respectively to obtain a corresponding first portrait mask image and a corresponding first hair mask image. When the ISP processor 140 detects that there is a focusing frame in the current frame image, a target subject of the current frame image is determined according to the image in the focusing frame. The ISP processor 140 determines a hair region to be filtered in the first hair mask image according to the first portrait mask image and the first hair mask image, and screens out a target hair image of the portrait subject from the candidate hair images based on the hair region to be filtered.

FIG. 2 is a flow diagram of a method of image processing in one embodiment. The image processing method in this embodiment is described by taking an example of running on a terminal or a server. As shown in fig. 2, the image processing method includes:

step 202, acquiring a portrait image and a candidate hair image in the image to be processed.

The image to be processed can be any one of a color (Red, Green, Blue, RGB for short), a black-and-white image, a raw image, a depth image, an image corresponding to a Y component in a YUV image, and the like, and can be obtained by shooting a scene containing a portrait at will by a camera. Wherein "Y" in the YUV image represents brightness (Luma) and gray scale value, and "U" and "V" represent Chroma (Chroma) for describing the color and saturation of the image, and are used to specify the color of the pixel. The raw image is image data captured by the camera without further processing. The image to be processed may be stored locally by the electronic device, may be stored by other devices, may be stored from a network, and may also be captured in real time by the electronic device, without being limited thereto. The portrait image is an image including a human body and hair in an image to be processed.

Specifically, the ISP processor or the central processing unit of the electronic device may obtain the image to be processed from a local or other device or a network, or obtain the image to be processed by shooting through a camera. The ISP processor or central processor of the electronic device may identify the image to be processed to segment the portrait area and the hair area from the image to be processed.

In one embodiment, acquiring a portrait image and a candidate hair image in an image to be processed includes: acquiring an image to be processed, and identifying a portrait area and a hair area in the image to be processed; and performing image segmentation processing on the image to be processed based on the portrait area and the hair area to obtain a corresponding portrait image and a candidate hair image.

The ISP processor or the central processor of the electronic equipment identifies the image to be processed so as to identify the portrait area and the hair area in the image to be processed. Wherein, the portrait area comprises the hair of the portrait. And the ISP processor or the central processor of the electronic equipment performs image segmentation processing on the image to be processed based on the identified portrait area and the identified hair area to obtain a portrait image corresponding to the portrait area and a candidate hair image corresponding to the hair area.

In one embodiment, before acquiring the portrait image and the candidate hair image in the image to be processed, the method further includes: and zooming the image to be processed to obtain the image to be processed with a preset size.

The preset size may be set as required, such as 224 × 224, 256 × 256,800 × 600, and the like, but is not limited thereto. The image to be processed is reduced to the preset size, so that the data volume can be saved, the processing efficiency is improved, and the system resource consumption is reduced.

In one embodiment, before acquiring the portrait image and the candidate hair image in the image to be processed, the method further includes: and carrying out normalization processing on the image to be processed to obtain the normalized image to be processed.

Specifically, in an ISP processor or a central processing unit of the electronic device, the pixel values of the pixels in the image to be processed and the pixel values of the pixels in the depth map are normalized respectively. Further, the threshold value may be subtracted from the R, G, B three-channel value of each pixel point of the image to be processed, and then divided by the threshold value.

In one embodiment, the integer normalization processing of the pixel values of the pixel points in the image to be processed from 0 to 255 is performed to be a floating point type numerical value from-1 to +1, and the pixel values of the pixel points in the depth map are normalized to be a floating point type numerical value from 0 to 1.

In this embodiment, the pixel values in the image to be processed are respectively normalized, so that the data size can be reduced, and the processor resources consumed by calculation can be saved.

The method for acquiring the portrait image and the candidate hair image in the image to be processed comprises the following steps: and acquiring a portrait image and a candidate hair image from the zoomed or normalized image to be processed.

And 204, respectively carrying out binarization processing on the portrait image and the candidate hair image to obtain a corresponding first portrait mask image and a corresponding first hair mask image.

Binarization is to obtain a mask image by setting a pixel gray level greater than or equal to a certain critical gray level value, i.e., a threshold value, in an image to a gray level maximum value, e.g., 255, and a pixel gray level less than the critical gray level value to a gray level minimum value, e.g., 0. The portrait mask image is an image filter template used for identifying the portrait in the image, and can shield other parts of the image and screen out the portrait in the image. The hair mask image is an image filter template used for identifying hairs in the image, and can shield other parts of the image and screen out the hairs in the image.

Specifically, an ISP processor or a central processing unit of the electronic device performs binarization processing on the human image to obtain a corresponding first human image mask image. And carrying out binarization processing on the candidate hair image by an ISP processor or a central processing unit of the electronic equipment to obtain a corresponding first hair mask image.

In one embodiment, the ISP processor or the central processor respectively represents the pixel points with the pixel values greater than or equal to the critical gray value in the portrait image and the candidate hair image by 1, and represents the pixel points with the pixel values less than or equal to the critical gray value by 0, so as to respectively obtain the first portrait mask image and the first hair mask image.

And step 206, determining a hair region to be filtered in the first hair mask image according to the first portrait mask image and the first hair mask image.

The hair region to be filtered refers to the hair region of the passerby in the first hair mask image. The portrait main body refers to a portrait recognized as a main body in an image, and the passerby refers to a portrait other than the portrait of the main body in the image.

Specifically, the ISP processor or the central processor of the electronic device may determine an overlapping area in the first portrait mask image and the first hair mask image, and identify a hair region to be filtered out in the first hair mask image according to the overlapping area.

And step 208, screening out a target hair image of the portrait subject from the candidate hair images based on the hair region to be filtered.

The target hair image is the hair image of the portrait subject.

Specifically, an ISP processor or a central processing unit of the electronic device determines a corresponding region of a hair region to be filtered in the candidate hair image, and screens out a target hair image of the portrait subject from the candidate hair image according to the corresponding region of the hair region to be filtered in the candidate hair image.

In one embodiment, the method for screening out the target hair image from the candidate hair images based on the hair region to be filtered out comprises the following steps: and removing the corresponding area of the hair area to be filtered in the candidate hair image to obtain the target hair image of the portrait subject.

In one embodiment, the method for screening out the target hair image from the candidate hair images based on the hair region to be filtered out includes: and obtaining the target hair image of the portrait subject by using the hair regions except the corresponding region of the hair region to be filtered in the candidate hair image.

In this embodiment, a portrait image and a candidate hair image in an image to be processed are obtained, and binarization processing is performed on the portrait image and the candidate hair image respectively to obtain a corresponding first portrait mask image and a corresponding first hair mask image. According to the first portrait mask image and the first hair mask image, the hair area of the passerby in the first hair mask image can be identified, namely the hair area to be filtered. And screening out the target hair image from the candidate hair image based on the hair region to be filtered, so that the hair region of the passerby in the image can be removed, the target hair of the portrait main body is obtained, and the hair of the portrait main body is accurately identified. Meanwhile, the accuracy of hair identification and segmentation of a multi-person scene can be improved.

In one embodiment, as shown in fig. 3, determining the hair region to be filtered out in the first hair mask image according to the first portrait mask image and the first hair mask image includes:

step 302, performing connected domain processing on the first portrait mask image and the first hair mask image respectively to obtain a portrait area block in the first portrait mask image and a hair area block in the first hair mask image.

The connected region processing refers to finding and marking each connected region in the image. The hair region block is an image region formed by pixel points which have the same pixel value and are adjacent in position in the first hair mask image. The portrait area block refers to an image area which is formed by pixel points with the same pixel value and adjacent positions in the first portrait mask image.

Specifically, an ISP processor or a central processing unit of the electronic device searches for an image area composed of pixels with the same pixel value and adjacent positions in the first portrait mask image, and obtains each portrait area block in the first portrait mask image. An ISP processor or a central processing unit of the electronic equipment searches for an image area which has the same pixel value and is formed by adjacent pixels in the first head mask image, and obtains each portrait area block in the first head mask image.

And 304, screening out the portrait region blocks with the area larger than or equal to the area threshold value from the first portrait mask image to obtain a second portrait mask image.

Specifically, the ISP processor or the central processing unit of the electronic device obtains an area threshold, where the area threshold may be set to a × W × H, where a is a constant, for example, 0.05, W is the width of the first portrait mask after downsampling, and H is the height of the first portrait mask after downsampling.

Aiming at each portrait area block in the first portrait mask image, comparing the portrait area block with an area threshold value by an ISP processor or a central processing unit of the electronic equipment, screening out the portrait area blocks with the area larger than or equal to the area threshold value from the first portrait mask image, and forming a second portrait mask image by the screened portrait area blocks.

In this embodiment, the ISP processor or the central processing unit of the electronic device may remove the portrait area blocks in the first portrait mask image whose area is smaller than the area threshold, to obtain the second portrait mask image. Further, the ISP processor or the central processing unit of the electronic device may determine a portrait area block in the first portrait mask image, where the area of the portrait area block is smaller than the area threshold, and set the pixel value of each pixel point in the portrait area block, where the area of the portrait area block is smaller than the area threshold, as a minimum gray value, for example, set to 0.

In one embodiment, the ISP processor or central processor of the electronic device down-samples the first portrait mask, for example, from 800 × 600 to 200 × 150 in size. And determining an area threshold value by an ISP processor or a central processing unit of the electronic equipment according to the size of the downsampled first portrait mask image.

And step 306, screening out hair region blocks with the areas within the preset area range from the first hair mask image to obtain a second hair mask image.

Specifically, an ISP processor or a central processing unit of the electronic device obtains a preset area range [ b, w, c). Where b and c are threshold constants, e.g., b is set to 0.001, c is set to 0.01, w is the width of the first hairmask after downsampling, and h is the height of the first hairmask after downsampling.

And aiming at each hair region block in the first hair mask image, comparing the hair region block with a preset area range by an ISP processor or a central processing unit of the electronic equipment, screening the hair region block with the area within the preset area range from the first hair mask image, and forming a second hair mask image by the screened hair region blocks.

In this embodiment, the ISP processor or the central processing unit of the electronic device may remove the hair region blocks not within the predetermined area range in the first hair mask image to obtain the second hair mask image. Further, the ISP processor or the central processing unit of the electronic device may determine a hair region block in the first hair mask image, the area of which is not within the preset area range, and set the pixel value of each pixel point in the hair region block which is not within the preset area range to the minimum gray value, for example, to 0.

In one embodiment, the ISP processor or central processor of the electronic device down-samples the first reticle map, for example from 800 x 600 to 200 x 150 in size. An ISP processor or a central processing unit of the electronic equipment determines a preset area range according to the size of the first head mask image after down sampling.

And 308, determining a hair region to be filtered in the first hair mask image based on the overlapping region of the second portrait mask image and the second hair mask image.

Specifically, the ISP processor or the central processor of the electronic device may determine that there is an overlapping region where the second portrait mask and the second hair mask overlap, and determine the hair region to be filtered out in the first hair mask according to the area of the overlapping region.

In this embodiment, the first portrait mask image and the first hair mask image are respectively subjected to connected domain processing to obtain portrait region blocks in the first portrait mask image and hair region blocks in the first hair mask image, and portrait region blocks with areas larger than or equal to an area threshold value are screened from the first portrait mask image to obtain a second portrait mask image, so that the mask images of portrait bodies can be accurately screened, and the mask images of passersby are removed. And screening out hair region blocks with the areas within a preset area range from the first hair mask image to obtain a second hair mask image, wherein the maximum probability of the second hair mask image is the hair mask image of the passerby. According to the mask image of the portrait main body and the hair mask image of the passerby, the hair mask image of the passerby in the first hair mask image is further accurately identified, namely the hair area to be filtered.

In one embodiment, determining the hair region to be filtered out in the first hair mask image based on the overlapping region of the second portrait mask image and the second hair mask image comprises:

determining the overlapping area of the portrait area block in the second portrait mask image and the hair area block in the second hair mask image; and under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, marking the hair area block as a hair area to be filtered.

Specifically, the ISP processor or the central processor of the electronic device selects hair region blocks from the second hair mask map, and calculates the overlapping area between each hair region block and each portrait region block in the second portrait mask map. And calculating the ratio of the overlapping area to the area of each portrait area block by an ISP processor or a central processing unit of the electronic equipment, and marking the selected hair area block as a hair area to be filtered under the condition that the ratio of the overlapping area to the area of each portrait area block is smaller than a ratio threshold value.

In the same manner, the ISP processor or central processor of the electronic device may mark the hair regions to be filtered out in the second hair mask image.

In this embodiment, the overlapping area of the portrait area block in the second portrait mask image and the hair area block in the second hair mask image is determined, so as to determine the same area between the mask image of the portrait main body and the hair mask image with a high probability of being a passerby. Under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, the hair area block of the second hair mask diagram is the hair area block of the passerby, so that the area blocks in the second hair mask diagram can be accurately identified to belong to the hair area block of the passerby, and the hair area block of the passerby can be accurately marked.

In one embodiment, the method for screening out the target hair image from the candidate hair images based on the hair region to be filtered out comprises the following steps: and performing expansion processing on the hair region block to be filtered, and removing the corresponding region of the hair region to be filtered after the expansion processing in the candidate hair image to obtain the target hair image of the portrait main body.

Dilation is an operation that finds a local maximum, i.e. convolving an image or a region of an image with a convolution kernel. Reference points are set in the convolution kernel. For example, the hair region block a to be filtered and the kernel B are convolved, that is, the maximum value of the pixel points in the region covered by the kernel B in the hair region block a to be filtered is calculated, and the maximum value is assigned to the pixel points specified by the reference points.

The ISP processor or the central processing unit of the electronic equipment expands the hair region block to be filtered, so that the hair region to be filtered after expansion is more obvious. And determining the hair region corresponding to the expanded hair region to be filtered in the candidate hair image by the ISP processor or the central processor of the electronic equipment, and removing the corresponding hair region to obtain the target hair image of the portrait subject.

In this embodiment, the ISP processor or the central processing unit of the electronic device determines the hair region corresponding to the expanded hair region to be filtered in the candidate hair image, and adjusts the pixel value of each pixel point in the corresponding hair region to 0, so as to obtain the target hair image of the portrait main body.

The expansion processing is carried out on the hair region block to be filtered by the ISP processor or the central processing unit of the electronic equipment, the effect of neighborhood expansion is achieved, the hair region to be filtered after the expansion processing has a larger highlight region in the candidate hair mask image, so that the hair region to be filtered is more obvious, the hair region to be filtered in the candidate hair image can be fully removed, and the obtained target hair region is more accurate. The marked passerby hair area is subjected to expansion processing, so that the passerby hair area is more obvious, the passerby hair area in the candidate hair image can be removed more accurately, and the hair image only containing the portrait main body is obtained.

In one embodiment, the method further comprises: and performing portrait blurring processing on the image to be processed based on the target hair image and the portrait image to obtain a corresponding blurred image.

Specifically, an ISP processor or a central processing unit of the electronic equipment accurately identifies a portrait subject image and a background image of an image to be processed based on a target hair image and a portrait image. The background image of the image to be processed is virtualized by an ISP processor or a central processing unit of the electronic equipment, and the portrait subject image is not virtualized to obtain a corresponding virtualized image.

In this embodiment, after removing the hair region of the passerby in the candidate hair image, the hair image of the portrait subject is obtained. The portrait blurring is accurately performed according to the hair image and the portrait main body of the portrait main body, and the accuracy and the stability of the portrait blurring are effectively improved.

In one embodiment, the portrait color-keeping processing is carried out on the image to be processed based on the target hair image and the portrait image, so that the color-keeping stability can be improved. The portrait color-keeping processing means that the color of the portrait main body is kept, and the background color is changed into black and white.

In one embodiment, FIG. 4 is a flow diagram of a method for training an image recognition model in one embodiment. The training method of the image recognition model in this embodiment is described by taking an example in which the training method runs on a terminal or a server. As shown in fig. 4, the training method of the image recognition model includes:

step 402, obtaining a sample image and a label corresponding to the sample image.

The sample image is an image including a portrait subject and a hair region of the portrait subject, and may further include passers-by and hair regions of passers-by. The sample image comprises a sample portrait image and a sample hair image.

Specifically, an ISP processor or a central processor of the electronic device may obtain a plurality of sample images and obtain a label corresponding to each sample image.

In one embodiment, the ISP processor or the central processor obtains the sample hair image from the sample image, and manually removes the hair region of the passerby in the sample hair image, and uses the sample hair image with the hair region of the passerby removed as the label corresponding to the sample image. Or, the hair area of the portrait subject is separated from the sample hair image to be used as the label corresponding to the sample image.

And step 404, performing image segmentation processing on the sample image through the image recognition model to be trained to obtain a predicted portrait image and a first predicted hair image.

Specifically, an ISP processor or central processor of the electronic device inputs a sample image into an image recognition model to be trained. And the image recognition model to be trained performs image recognition on the sample image so as to recognize a portrait area and a hair area in the sample image. A predicted portrait image and a first predicted hair image are segmented from the sample image based on the portrait area.

And step 406, performing binarization processing on the predicted portrait image and the first predicted hair image respectively to obtain a corresponding predicted portrait mask image and a corresponding predicted hair mask image.

Specifically, the image recognition model to be trained performs binarization processing on the predicted portrait image to obtain a corresponding predicted portrait mask image. And carrying out binarization processing on the first predicted hair image to obtain a corresponding predicted hair mask image.

And step 408, determining a predicted filtered hair area in the predicted hair mask image according to the predicted portrait mask image and the predicted hair mask image.

Specifically, the image recognition model respectively performs connected domain processing on the predicted portrait mask image and the predicted hair mask image to obtain a portrait region block in the predicted portrait mask image and a hair region block in the predicted hair mask image. And the image recognition model screens out portrait region blocks with the area larger than or equal to an area threshold value from the predicted portrait mask image, and screens out hair region blocks with the area within a preset area range from the predicted hair mask image.

The image recognition model determines the overlapping area of the portrait area block and the hair area block, and under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, the image recognition model marks the hair area block as a hair area to be filtered.

And step 410, screening out a second predicted hair image of the portrait subject from the first predicted hair image based on the predicted filtered hair region.

Specifically, the hair region block to be filtered is subjected to expansion processing, the corresponding region of the hair region to be filtered, which is subjected to expansion processing, in the first predicted hair image is removed, and a second predicted hair image of the portrait main body is obtained.

And 412, training the image recognition model based on the difference between the second predicted hair image and the label, and obtaining the trained image recognition model when the training stopping condition is met.

Wherein the trained image recognition model is a trained image recognition model. The trained image recognition model is used to identify a target hair image of the person's body from the images.

Specifically, the electronic device trains the image recognition model to be trained based on the sample image and the corresponding label. And adjusting parameters of the image recognition model in the training process and continuing training until the image recognition model meets the training stopping condition, so as to obtain the trained image recognition model.

In this embodiment, the training stop condition may be that the similarity between the second predicted hair image and the label is greater than or equal to a similarity threshold, or the difference between the similarity between the second predicted hair image and the label and the similarity threshold is less than or equal to a difference threshold, or the loss error of the image recognition model is less than or equal to a loss threshold, or the training stop condition is that the number of iterations of the image recognition model reaches a preset number of iterations.

For example, the training stop condition may be that the similarity between the second predicted hair image and the label is greater than or equal to a similarity threshold. And the electronic equipment determines the similarity between the second predicted hair image and the label, and adjusts the parameters of the image recognition model and continues training under the condition that the similarity between the second predicted hair image and the label is smaller than a similarity threshold value. In the case where the similarity between the second predicted hair image and the label is greater than or equal to the similarity threshold, the training is stopped.

Alternatively, the training stop condition is that a difference between the similarity between the second predicted hair image and the label and the similarity threshold is less than or equal to the difference threshold. The electronic device determines a similarity between the second predicted hair image and the label, and adjusts parameters of the image recognition model and continues training when a difference between the similarity between the second predicted hair image and the label and the similarity threshold is greater than a difference threshold. And stopping training when the difference between the similarity between the second predicted hair image and the label and the similarity threshold value is less than or equal to the difference threshold value.

For example, the training stop condition is that the loss error of the image recognition model is less than or equal to a loss threshold. And calculating loss errors generated by the image recognition model in each training, and adjusting the parameters of the image recognition model and continuing the training when the loss errors are larger than a loss threshold value. And stopping training under the condition that the loss error is less than or equal to the loss threshold value to obtain the trained image recognition model.

Or the training stopping condition is that the iteration times of the image recognition model reach the preset iteration times.

And the ISP processor or the central processing unit of the electronic equipment calculates the iteration times of the image recognition model in the training process, and stops training when the iteration times of the image recognition model in the training process reach the preset iteration times to obtain the trained image recognition model.

In this embodiment, the image recognition model to be trained is trained through the sample image and the corresponding label, the hair image of the human body in the sample image, namely the second predicted hair image, is predicted and processed through the image recognition model, the parameter of the image recognition model is adjusted and training is continued based on the difference between the predicted hair image and the label, the trained image recognition model is obtained under the condition that the training stop condition is met, and the precision and the accuracy of the image recognition model are improved. The trained image recognition model can accurately recognize the hair of the portrait subject and the hair of the passerby from a multi-person scene, so that the hair area of the passerby is removed, and the hair of the portrait subject is accurately segmented.

In one embodiment, as shown in fig. 5, there is provided an image processing method including:

step 502, the ISP processor or the central processing unit of the electronic device obtains the image to be processed, and step 504 is executed, that is, the image to be processed is zoomed, so as to obtain the image to be processed with the preset size.

Step 506, an ISP processor or a central processing unit of the electronic device normalizes pixel values of pixel points in the to-be-processed image with a preset size to obtain a normalized to-be-processed image.

And step 508, the ISP processor or the central processing unit of the electronic equipment identifies the normalized image to be processed to obtain a portrait area and a hair area. Next, step 510 and step 512 are performed, respectively.

And 510, performing image segmentation processing on the image to be processed based on the portrait area to obtain a corresponding portrait image.

And step 512, performing image segmentation processing on the image to be processed based on the hair region to obtain a corresponding candidate hair image.

Step 514, down-sampling the portrait image and the candidate hair image, and executing step 516.

And 516, respectively carrying out binarization processing on the down-sampled portrait image and candidate hair image to obtain a corresponding first portrait mask image 518 and a corresponding first hair mask image 520. Next, step 522 is performed.

Step 522, performing connected domain processing on the first portrait mask image and the first hair mask image respectively to obtain a portrait area block in the first portrait mask image and a hair area block in the first hair mask image. And screening out the portrait area blocks with the area larger than or equal to the area threshold value from the first portrait mask image to obtain a second portrait mask image 524. From the first hair mask map, hair region blocks having an area within a predetermined area range are screened out, resulting in a second hair mask map 526. Next, step 528 is performed.

In step 528, the overlapping area of the image area block in the second image mask and the hair area block in the second hair mask is determined.

And step 530, under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, marking the hair area block as a hair area to be filtered.

Step 532, performing expansion processing on the hair region block to be filtered, performing up-sampling on the hair region to be filtered after the expansion processing, removing the corresponding region of the hair region to be filtered after the up-sampling in the candidate hair image, and executing step 534.

And 534, normalizing the candidate hair image of the corresponding area without the hair area to be filtered to obtain the target hair image of the portrait subject.

In the embodiment, the portrait image and the candidate hair image are identified according to the image identification model, whether the hair region belongs to a passerby in the multi-person scene is accurately judged by combining the connected domain and the overlapping area calculation, and the hair region of the passerby is filtered, so that the portrait blurring and the color-remaining effect defect caused by the mistaken segmentation of the hair region of the passerby in the multi-person scene containing the passerby are avoided. By adopting the scheme, the output precision of the hair region in the multi-user scene can be improved, so that the portrait blurring effect and the color remaining effect and stability in the multi-user scene are effectively improved.

In one embodiment, there is provided an image processing method including:

the electronic equipment shoots a multi-person scene through the camera to obtain an RGB image. Passerby exists in the RGB image.

The ISP processor or central processor of the electronic device scales the RGB image to obtain an image with a size of 800 × 600.

The R, G, B channel values for each pixel in the 800 x 600 image were determined, and the R, G, B channel values were subtracted by the first threshold 127.5 and divided by the variance threshold 127.5. For example, can be as follows

And carrying out normalization processing on the pixel value of each pixel point. Where x is the value of the channel.

The normalized 800 × 600 images were input to the image recognition model. The image recognition model can be formed by a Convolutional Neural Network (CNN), and the CNN comprises but is not limited to a deeplab series segmentation algorithm, a U-Net, an FCN and other algorithms. The CNN network of the image recognition model comprises a feature Encoder Encoder and two feature decoders Decoder, wherein the output of the two Decoder feature decoding layers is a portrait image corresponding to a portrait region and a candidate hair image corresponding to a hair region respectively. The portrait image and candidate hair image size is 800 x 600.

As shown in fig. 6, the normalized 800 × 600 image is input to the image recognition model, the feature Encoder of the image recognition model performs feature encoding on the input 800 × 600 image, and the feature maps output by the feature Encoder output two feature decoders Decoder respectively. The feature Encoder and the two feature decoders Decoder are connected in a jump. The two feature decoders decode the input feature images to obtain a portrait image output by one feature Decoder and a candidate hair image output by the other feature Decoder. As can be seen from the portrait image, the passerby is not segmented from the 800 × 600 image, but the hair of the passerby is segmented from the 800 × 600 image by the image recognition model. The human image and the candidate hair image obtained by the segmentation may be RGB images or may be mask images.

The image recognition model performs downsampling processing on the 800 × 600 portrait images and candidate hairs to obtain 200 × 150 portrait images and candidate hairs.

And (3) carrying out binarization processing on pixel values of pixel points of the portrait image and the candidate hair image of 200 × 150. And (4) carrying out binarization processing on the figure image of 200 × 150 by using a critical gray value of 127.5 to obtain a first figure mask image. The 200 × 150 candidate hair image is binarized using a critical gray value of 10 to obtain a first hair mask image.

And respectively carrying out connected domain processing on the first portrait mask image and the first hair mask image of 200 × 150 to obtain a portrait area block in the first portrait mask image and a hair area block in the first hair mask image.

And filtering out the portrait region blocks with the area smaller than the area threshold value of 0.05 × 200 × 150 from the first portrait mask map of 200 × 150 to obtain a second portrait mask map. The second portrait mask image is the portrait subject, as shown in fig. 7.

From the first hair mask map of 200 × 150, hair region blocks having areas within a predetermined area range [0.001 × 200 × 150, 0.01 × 200 × 150] were retained to obtain a second hair mask map. This second hair mask image has a high probability of being a region of passerby's hair in the image, as shown in FIG. 8.

And selecting any hair region block from the second hair mask image, and calculating the overlapping area I of the selected hair region block and each portrait region block in the second portrait mask image. And acquiring the area X of the selected hair region block, and calculating the ratio of each I to X, namely I/X. And under the condition that each ratio I/X is smaller than the threshold value T, marking the selected hair area block as a hair area to be filtered. In the same way, all the hair regions to be filtered out in the second hair mask image can be obtained.

And performing expansion treatment on the hair region block to be filtered, and performing up-sampling on the hair region to be filtered after the expansion treatment.

And determining the corresponding area of the hair area to be filtered after the up-sampling in the candidate hair image, adjusting the pixel values of the pixel points of the corresponding area to be 0, performing normalization processing, and outputting the target hair image of the portrait subject.

In the embodiment, the portrait image and the candidate hair image are identified according to the image identification model, whether the hair region belongs to a passerby in the multi-person scene is accurately judged by combining the connected domain and the overlapping area calculation, and the hair region of the passerby is filtered, so that the portrait blurring and the color-remaining effect defect caused by the mistaken segmentation of the hair region of the passerby in the multi-person scene containing the passerby are avoided. By adopting the scheme, the output precision of the hair region in the multi-user scene can be improved, so that the portrait blurring and color retention effects and stability in the follow-up multi-user scene are effectively improved.

It should be understood that although the various steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

Fig. 9 is a block diagram showing the configuration of an image processing apparatus according to an embodiment. As shown in fig. 9, the image processing apparatus includes:

an image obtaining module 902, configured to obtain a portrait image and a candidate hair image in an image to be processed.

A binarization module 904, configured to perform binarization processing on the portrait image and the candidate hair image respectively to obtain a corresponding first portrait mask image and a corresponding first hair mask image.

A first determining module 906, configured to determine a hair region to be filtered out in the first hair mask map according to the first portrait mask map and the first hair mask map.

A first screening module 908, configured to screen out a target hair image of the portrait subject from the candidate hair images based on the hair region to be filtered.

In one embodiment, the first determining module 906 is further configured to: respectively carrying out connected domain processing on the first portrait mask image and the first hair mask image to obtain a portrait area block in the first portrait mask image and a hair area block in the first hair mask image; screening out a portrait region block with the area larger than or equal to an area threshold value from the first portrait mask image to obtain a second portrait mask image; screening out hair region blocks with the areas within a preset area range from the first hair mask image to obtain a second hair mask image; and determining the hair region to be filtered in the first hair mask image based on the overlapping region of the second portrait mask image and the second hair mask image.

In this embodiment, the first portrait mask image and the first hair mask image are respectively subjected to connected domain processing to obtain portrait region blocks in the first portrait mask image and hair region blocks in the first hair mask image, and portrait region blocks with areas larger than or equal to an area threshold value are screened from the first portrait mask image to obtain a second portrait mask image, so that the mask images of portrait bodies can be accurately screened, and the mask images of passersby are removed, that is, the mask images of passersby are removed. And screening out hair region blocks with the areas within a preset area range from the first hair mask image to obtain a second hair mask image, wherein the maximum probability of the second hair mask image is the hair mask image of the passerby, namely the hair mask image of the passerby. According to the mask image of the portrait main body and the hair mask image of the passerby, the hair mask image of the passerby in the first hair mask image is further accurately identified, namely the hair area to be filtered.

In one embodiment, the first determining module 906 is further configured to: determining the overlapping area of the portrait area block in the second portrait mask image and the hair area block in the second hair mask image; and under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, marking the hair area block as a hair area to be filtered.

In one embodiment, the first filtering module 908 is further configured to: and performing expansion processing on the hair region block to be filtered, and removing the corresponding region of the hair region to be filtered after the expansion processing in the candidate hair image to obtain the target hair image of the portrait main body.

In the embodiment, the expansion processing is performed on the hair region block to be filtered, the neighborhood expansion effect is achieved, the hair region to be filtered after the expansion processing has a larger highlight region in the candidate hair mask image, so that the hair region to be filtered is more obvious, the hair region to be filtered in the candidate hair image can be fully removed, and the obtained target hair region is more accurate. The marked passerby hair area is subjected to expansion processing, so that the passerby hair area is more obvious, the passerby hair area in the candidate hair image can be removed more accurately, and the hair image only containing the portrait main body is obtained.

In one embodiment, the apparatus further comprises: and a blurring module. The blurring module is used for performing portrait blurring processing on the image to be processed based on the target hair image and the portrait image to obtain a corresponding blurring image.

Fig. 10 is a block diagram showing a configuration of an image recognition model training apparatus according to an embodiment. As shown in fig. 10, the training apparatus for an image recognition model includes:

the sample acquiring module 1002 is configured to acquire a sample image and a label corresponding to the sample image.

And the segmentation module 1004 is configured to perform image segmentation processing on the sample image through an image recognition model to be trained to obtain a predicted portrait image and a first predicted hair image.

A processing module 1006, configured to perform binarization processing on the predicted portrait image and the first predicted hair image respectively to obtain a corresponding predicted portrait mask image and a corresponding predicted hair mask image.

A second determining module 1008, configured to determine a predicted filtered hair region in the predicted hair mask map according to the predicted portrait mask map and the predicted hair mask map.

And a second screening module 1010, configured to screen out a second predicted hair image of the portrait subject from the first predicted hair image based on the predicted filtered hair region.

A training module 1012, configured to train the image recognition model based on a difference between the second predicted hair image and the label, and obtain the trained image recognition model when a training stop condition is satisfied.

In this embodiment, the image recognition model to be trained is trained through the sample image and the corresponding label, the hair image of the human body in the sample image, namely the second predicted hair image, is predicted and processed through the image recognition model, the parameter of the image recognition model is adjusted and training is continued based on the difference between the predicted hair image and the label, the trained image recognition model is obtained under the condition that the training stop condition is met, and the precision and the accuracy of the image recognition model are improved. The trained image recognition model can accurately recognize the hair of the portrait subject and the hair of the passerby from a multi-person scene, so that the hair of the portrait subject is accurately segmented.

The division of the modules in the image processing apparatus and the training apparatus for the image recognition model is only for illustration, and in other embodiments, the image processing apparatus and the training apparatus for the image recognition model may be divided into different modules as needed to complete all or part of the functions of the image processing apparatus and the training apparatus for the image recognition model.

Fig. 11 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 11, the electronic device includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor for implementing an image processing method and an image recognition model training method provided in the following embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The electronic device may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device, etc.

The image processing apparatus and the training apparatus for the image recognition model provided in the embodiments of the present application may be implemented in the form of a computer program. The computer program may be run on a terminal or a server. The program modules constituted by the computer program may be stored on the memory of the terminal or the server. Which when executed by a processor, performs the steps of the method described in the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of an image processing method, a training method for an image recognition model.

A computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of image processing, a method of training an image recognition model.

Any reference to memory, storage, database, or other medium used by embodiments of the present application may include non-volatile and/or volatile memory. Suitable non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein said determining the hair regions to be filtered out in the first hair mask image based on the first human image mask image and the first hair mask image comprises:

respectively carrying out connected domain processing on the first portrait mask image and the first hair mask image to obtain a portrait area block in the first portrait mask image and a hair area block in the first hair mask image;

screening out a portrait region block with the area larger than or equal to an area threshold value from the first portrait mask image to obtain a second portrait mask image;

screening out hair region blocks with the areas within a preset area range from the first hair mask image to obtain a second hair mask image;

and determining a hair region to be filtered in the first hair mask image based on the overlapping region of the second portrait mask image and the second hair mask image.

3. The method according to claim 2, wherein the determining the hair region to be filtered out in the first hair mask image based on the overlapping region of the second portrait mask image and the second hair mask image comprises:

determining the overlapping area of the portrait area blocks in the second portrait mask image and the hair area blocks in the second hair mask image;

and under the condition that the ratio of the overlapping area to the area of the portrait area block is smaller than a ratio threshold value, marking the hair area block as a hair area to be filtered.

4. The method according to claim 1, wherein the step of screening out a target hair image of a portrait subject from the candidate hair images based on the hair region to be filtered out comprises:

and performing expansion processing on the hair region block to be filtered, and removing the corresponding region of the hair region to be filtered after the expansion processing in the candidate hair image to obtain the target hair image of the portrait main body.

5. The method according to any one of claims 1 to 4, further comprising:

and performing portrait blurring processing on the image to be processed based on the target hair image and the portrait image to obtain a corresponding blurred image.

6. A method for training an image recognition model, the method comprising:

obtaining a sample image and a label corresponding to the sample image;

7. An image processing apparatus characterized by comprising:

8. An apparatus for training an image recognition model, comprising:

9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.