CN117876394A - Image processing method, electronic device and storage medium - Google Patents

Image processing method, electronic device and storage medium Download PDF

Info

Publication number
CN117876394A
CN117876394A CN202410020516.6A CN202410020516A CN117876394A CN 117876394 A CN117876394 A CN 117876394A CN 202410020516 A CN202410020516 A CN 202410020516A CN 117876394 A CN117876394 A CN 117876394A
Authority
CN
China
Prior art keywords
image
feature map
feature
color
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410020516.6A
Other languages
Chinese (zh)
Inventor
敖阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Original Assignee
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shuliantianxia Intelligent Technology Co Ltd filed Critical Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority to CN202410020516.6A priority Critical patent/CN117876394A/en
Publication of CN117876394A publication Critical patent/CN117876394A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application relates to an image processing method, electronic equipment and a storage medium. The method comprises the following steps: receiving an image to be processed, wherein the image to be processed has three color channels; extracting feature information of an image to be processed, and generating a first feature map; carrying out cavity convolution processing on the first feature map to generate a second feature map; the second feature map is mapped to a segmentation result map by an upsampling process. According to the method, the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the image main body with high segmentation accuracy is obtained, and the segmentation accuracy of the image main body in the image is improved.

Description

Image processing method, electronic device and storage medium
[ field of technology ]
The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an electronic device, and a storage medium.
[ background Art ]
In the digital age, multimedia digital content has become an integral part of people's life, including text, audio, images, and video. With the popularity of mobile devices with cameras and sensors, video has become a new way of communication between internet users.
This trend has led to rapid developments in video content understanding technology and its related applications, such as human segmentation technology for video, as it can segment and separate subjects from complex backgrounds, enhancing entertainment.
However, the moving scene of the image subject is complex, which easily causes that some complex background images are mistaken as subjects to be segmented, and artifacts are generated, so that the segmentation accuracy of the image subject in the image is low.
[ invention ]
The image processing method, the electronic device and the storage medium are used for improving the segmentation accuracy of the image main body in the image.
In a first aspect, an embodiment of the present application provides an image processing method. The method comprises the following steps: receiving an image to be processed, wherein the image to be processed has three color channels; extracting feature information of the image to be processed to generate a first feature map; mapping the first feature map into a segmentation result map through up-sampling processing; the extracting the feature information of the image to be processed to generate a first feature map specifically includes: respectively carrying out convolution processing on each color channel, and converting the image to be processed into an intermediate image with the same channel number; and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
Optionally, before extracting the feature information of the image to be processed, the method further includes: preprocessing the image to be processed; wherein the preprocessing comprises: determining a reference color channel among the three color channels; respectively carrying out normalization processing on the other two color channels in the three color channels according to the reference color channel to obtain a color correction image; and performing characteristic decoupling processing on the color correction image.
Optionally, the expanding the number of channels of the intermediate image to the number of target channels through point convolution processing, to obtain a first feature map includes: performing point convolution processing on the intermediate image to obtain a second feature map with the number of target channels; and carrying out cavity convolution processing on the second feature map to generate a first feature map.
Optionally, before the mapping the first feature map to the segmentation result map through the upsampling process, the method further includes: performing enhancement processing of edge features on the first feature map; wherein the enhancement processing of the edge feature comprises: performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection; and stacking the residual connected characteristic diagram and the first characteristic diagram.
Optionally, the performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection includes: performing convolution operation on the first feature map to obtain a convolved feature map; performing expansion convolution operation on the first feature map to obtain a feature map after expansion convolution; and carrying out residual connection on the characteristic diagram after convolution and the characteristic diagram after expansion convolution to obtain the characteristic diagram after residual connection.
Optionally, the method further comprises: repeatedly performing n iterations on the step of extracting the feature information of the image to be processed and generating a first feature image to obtain n feature images arranged according to a first order, wherein the first order is used for indicating the order of iteration moments from early to late, each feature image in the n feature images corresponds to one iteration moment, and the feature image with the latest iteration moment in the n feature images is determined to be the first feature image.
Optionally, the method further comprises: and repeatedly performing up-sampling processing for n times in the step of mapping the first feature map into the segmentation result map through the up-sampling processing, wherein each feature map in the n feature maps is sequentially connected to the feature map obtained after the corresponding up-sampling processing according to a second order, and the first feature map is mapped into the segmentation result map through the n up-sampling processing, and the second order is used for indicating the sequence from late to early of iteration moments.
Optionally, the normalizing processing is performed on the remaining two color channels of the three color channels according to the reference color channel, so as to obtain a color correction image, including: acquiring a first maximum pixel value in a channel array corresponding to a first color channel, acquiring a second maximum pixel value in a channel array corresponding to a second color channel, and acquiring a third maximum pixel value in the channel array corresponding to the reference color channel, wherein the first color channel and the second color channel are the rest two color channels in the three color channels; calculating a first pixel average value corresponding to the first color channel, a second pixel average value corresponding to the second color channel and a third pixel average value corresponding to the reference color channel based on the image specification of the image to be processed; calculating the ratio between the third maximum pixel value and the first maximum pixel value to obtain a first ratio, and calculating the ratio between the third maximum pixel value and the second maximum pixel value to obtain a second ratio; calculating the ratio between the third pixel average value and the first pixel average value to obtain a third ratio, and calculating the ratio between the third pixel average value and the second pixel average value to obtain a fourth ratio; calculating based on a channel array corresponding to the first color channel, a first preset formula, the first ratio and the third ratio to obtain a first color channel after color correction; calculating based on a channel array corresponding to the second color channel, the first preset formula, the second ratio and the fourth ratio to obtain a second color channel after color correction; and combining the reference color channel, the first color channel after color correction and the second color channel after color correction to obtain a color correction image.
Optionally, the performing feature decoupling processing on the color correction image includes: acquiring a preset feature vector matrix and a preset diagonal matrix; and calculating based on a second preset formula, the eigenvector matrix, the diagonal matrix and the color correction image so as to release the correlation between each image characteristic in the color correction image.
In a second aspect, embodiments of the present application provide an image processing apparatus. The image processing apparatus includes: the receiving module is used for receiving an image to be processed, and the image to be processed is provided with three color channels; the extraction module is used for extracting the characteristic information of the image to be processed and generating a first characteristic image; the mapping module is used for mapping the first feature map into a segmentation result map through up-sampling processing; the extracting the feature information of the image to be processed to generate a first feature map specifically includes: respectively carrying out convolution processing on each color channel, and converting the image to be processed into an intermediate image with the same channel number; and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
In a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method as described above.
At least one advantageous aspect of the image processing method provided in the embodiments of the present application is: the method has the advantages that the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the loss of image information elements can be reduced while the calculated amount is reduced, the image feature information is better transferred to the high-dimensional features, the extraction of the image features is more facilitated, the segmentation accuracy of the image main body in the image is improved, and the artifacts generated by the segmented image main body are avoided.
[ description of the drawings ]
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;
fig. 2 is a method flowchart of an image processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an embodiment of convolution operation provided in an embodiment of the present application;
FIG. 4 is a flow chart of an image processing method according to another embodiment of the present application;
fig. 5 is a schematic flow chart of generating a first feature map from an image to be processed according to an embodiment of the present application;
fig. 6 is a flowchart of mapping a first feature map to a segmentation result map through 3 upsampling processes according to the present embodiment;
fig. 7 is a functional block diagram of an image processing apparatus provided in an embodiment of the present application;
fig. 8 is a functional block diagram of an image processing apparatus according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
[ detailed description ] of the invention
In order that the invention may be readily understood, a more particular description thereof will be rendered by reference to specific embodiments that are illustrated in the appended drawings. It will be understood that when an element is referred to as being "fixed" to another element, it can be directly on the other element or one or more intervening elements may be present therebetween. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or one or more intervening elements may be present therebetween. The terms "upper," "lower," "inner," "outer," "bottom," and the like as used in this specification are used in an orientation or positional relationship based on that shown in the drawings, merely to facilitate the description of the invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items.
In addition, the technical features mentioned in the different embodiments of the invention described below can be combined with one another as long as they do not conflict with one another.
Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application, where an image main body and a background in an image to be processed are segmented by an image processing method provided in an embodiment of the present application to obtain a segmentation result graph, where the segmentation result graph has a first type of pixel point set and a second type of pixel point set, where the first type of pixel point set is used to indicate the image main body, and the second type of pixel point set is used to indicate the image background, and the image main body may be a person, an animal, or a plant, for example, and the application is not limited herein. As shown in fig. 1, the image includes a person and a plant, and if the person needs to be segmented, other objects (such as plants) will be determined as the background.
The method and the device can be applied to the field of sports health, and characters of each frame of image in a sports video are segmented, so that character bodies are accurately segmented, and entertainment interaction can be realized by the segmented character bodies.
It should be noted that, for simplicity and ease of presentation, the embodiments of the present application exemplarily show an application scenario of the image processing method in the field of sports health. It will be appreciated by those skilled in the art that the image processing method provided in the embodiments of the present application may also be applied to other scenes where segmentation of the image subject and the background is required, based on similar principles.
Fig. 2 is a method flowchart of an image processing method according to an embodiment of the present application. As shown in fig. 2, the image processing method includes the steps of:
s210, receiving an image to be processed, wherein the image to be processed has three color channels.
Wherein, three color channels are respectively: red (R) channel, green (G) channel, and Blue (B) channel.
It should be noted that each color channel corresponds to a channel array, and each channel array includes a pixel value corresponding to each pixel point in the color channel, for example, the R channel has 5 pixel points, and the pixel values of the 5 pixel points are respectively: 34 45, 67, 54, 60, thus, the corresponding channel array for the R channels is: (34, 45, 67, 54, 60).
S220, extracting feature information of the image to be processed to generate a first feature map, wherein the extracting of the feature information of the image to be processed to generate the first feature map specifically comprises: respectively carrying out convolution processing on each color channel, and converting an image to be processed into an intermediate image with the same channel number; and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
It should be noted that the convolution process is used to: and carrying out convolution operation on each color channel and a convolution kernel (Convolutional kernel, conv) so as to obtain a feature map or an activation map (activation map) of each color channel after convolution processing.
In the convolution operation, the convolution kernel slides on each color channel, and performs weighted summation operation on each pixel point to generate a new value, where the new value is placed in a feature map of a corresponding position, where the size of the feature map is determined by the size and step size of the convolution kernel, the size of the convolution kernel defines a view of the convolution, and by way of example and not limitation, the size of the convolution kernel may be 1*1, or 3*3, and the size of the convolution kernel may be set according to an actual application scenario. The step size of the convolution kernel defines the size of each step of the convolution kernel moving in the image, and represents the extraction precision, and by way of example and not limitation, the step size of the convolution kernel can be 1 or 2, and the step size of the convolution kernel can be set according to the actual application scene. As shown in fig. 3, fig. 3 is a schematic diagram of an embodiment of a convolution operation provided in the embodiment of the present application, where R channels are 6*6 pixels, each pixel corresponds to a pixel value, and a convolution kernel is: 3*3, the values in the convolution kernel are shown in fig. 3, the values in the convolution kernel may be set according to the actual application scenario, the step size of the convolution kernel is 1, and a 4*4 feature map is obtained, where the convolution kernel is attached to the R channel according to the size of 3*3, and then the corresponding pixel values are multiplied and added, that is, the first new value in the feature map of 4*4 is: 3×1+0×0+1 (-1) +1×1+5×0+8 (-1) +2×1+7×0+2 (-1) = -5, then shifting the convolution kernel one unit to the right according to the step size of the convolution kernel, and similarly, the second new value is: -4, it should be noted that the movement rule of the convolution kernel is: from left to right and then from top to bottom. Similarly, other new values are calculated, and will not be described in detail herein.
In some embodiments, when the size of the convolution kernel does not match the size of the image to be convolved, the image to be convolved is filled, so as to avoid losing edge information of the image, and the size of the image to be convolved is: 5*5 the size of the convolution kernel is: 3*3 if the step size is 1, only one 3*3 feature map can be slid after the convolution kernel slides along the image, and the pixel value of the image edge is only convolved once, and the middle pixel value of the image is convolved many times, which results in losing the edge information of the image, so that in order to avoid the occurrence of the above scene, the image needing to be convolved needs to be filled before the convolution processing, so as to satisfy that the pixel value of the image edge can be convolved many times.
Local features in the image to be processed, such as edges, textures or other visual information, can be extracted through convolution processing, and can be used as input of subsequent layers for feature extraction and image analysis of deeper layers.
It should be noted that in the execution step of the convolution processing for each color channel, the size of the convolution kernel is set to 3*3, the step size of the convolution kernel is set to 1, and the number of convolution kernels for each color channel is 1.
It should be noted that the point convolution process is used to: the number of channels of the input data (intermediate image) is changed.
Point-wise convolution (point-wise convolution), which is a special convolution operation, corresponds to a convolution kernel of size 1*1.
Specifically, the point convolution may multiply each value in the convolution kernel by one with the input data (intermediate image) and then add the multiplied values to generate a new value that is placed in the feature map at the corresponding location. Multiple channels of input data (intermediate images) can be processed simultaneously by the multiple convolution kernels of 1*1 to yield a signature having a target number of channels, where the target number of channels is the same as the number of convolution kernels of 1*1.
It should be noted that the dimensional transformation of the image features and the compression of the features can be achieved by changing the number of channels of the input data (intermediate image), i.e. increasing or decreasing the number of channels of the input data (intermediate image). For example, if the number of convolution kernels of the input data (intermediate image) is 3,1×1 is 16 and the step size of the convolution kernels is 1, the number of channels of the intermediate image is expanded to the target number of channels by the point convolution process, so as to obtain the first feature map, and the target number of channels is 16.
It should be noted that in the execution step of performing the point convolution processing on the intermediate image, the size of the convolution kernel is set to 1*1, the step size of the convolution kernel is set to 1, and the number of convolution kernels is 16.
For example, an image to be processed having three color channels is converted into a first feature map having 16 channels, where the size of the image to be processed is 256×256, and the convolution operation is generally directly performed on the image to be processed through the convolution kernels of 16 3*3 in the prior art, so as to obtain the first feature map having 16 channels, and parameters in the prior art are: 3 x 16 x (256-3+1) = 27870912, and by the method provided in the examples of the present application, the image to be processed is subjected to convolution operation through the convolution kernels of 3 3*3, so as to obtain an intermediate image with the same channel number, wherein, the convolution kernels of 1 3*3 correspond to one color channel of the image to be processed, and the intermediate image is subjected to convolution operation through the convolution kernels of 16 1*1 to obtain a first feature map with 16 channels, and the parameters in the embodiment of the application are as follows: 3.3.3 (256-3+1) +1.1.16 (256) -3+1) × (256-3+1) =1741932+1032256= 2774188, in summary, compared with the prior art, the embodiment of the application reduces the larger calculation amount.
S230, mapping the first feature map into a segmentation result map through up-sampling processing.
In the convolution processing, since the size of the output image tends to be small after the feature is extracted by the convolution operation, and the image needs to be restored to the original size for further calculation so that the image is mapped from the small resolution to the large resolution, the process is called up-sampling processing, and thus the up-sampling processing is used for: the resolution of the image is increased, and the expression degree of the detail information of the image is improved.
It should be noted that the segmentation result map has a first type of pixel set for indicating the image subject and a second type of pixel set for indicating the image background.
It should be noted that the up-sampling process is repeatedly performed n times in the step of mapping the first feature map into the division result map by the up-sampling process, where n is a positive integer, it is understood that the up-sampling process is performed only once when n is 1. For example, n may be 1 or 2, which is not limited herein, and may be set according to an actual application scenario.
At least one advantageous aspect of the image processing method provided in the embodiments of the present application is: the method has the advantages that the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the loss of image information elements can be reduced while the calculated amount is reduced, the image feature information is better transferred to the high-dimensional features, the extraction of the image features is more facilitated, the segmentation accuracy of the image main body in the image is improved, and the artifacts generated by the segmented image main body are avoided.
Fig. 4 is a flowchart of a method for image processing according to another embodiment of the present application. As shown in fig. 4, the image processing method includes the steps of:
s410, receiving an image to be processed, wherein the image to be processed has three color channels.
In some embodiments, receiving the image to be processed further comprises: and receiving the image to be processed, and processing the image specification of the image to be processed into a preset image specification.
As an example and not by way of limitation, the preset image specification may be 256×256 pixels, or may be 512×512 pixels, and the specific image specification may be set according to the actual application scenario, which is not limited herein.
S420, preprocessing the image to be processed.
Specifically, the pretreatment includes: (1) Among the three color channels, one reference color channel is determined; (2) Respectively carrying out normalization processing on the other two color channels in the three color channels according to the reference color channel to obtain a color correction image; and (3) performing characteristic decoupling processing on the color correction image.
It should be noted that the normalization process is used to: and adjusting the pixel values of the pixel points in the other two color channels so that the pixel values of the pixel points in the other two color channels are close to the pixel values of the pixel points in the reference color channel.
It should be noted that the feature decoupling process is used to: the correlation between each image feature in the color corrected image is released.
It will be appreciated that an image typically has the following image features: color, texture, shape, pattern, etc., wherein each pixel in the image has a color value (pixel value) that can be expressed in the form of RGB (red, green, blue) or HSV (hue, saturation, brightness), etc. Texture in an image may be described as a visually repeating pattern, such as brick walls, grass, etc. The shape in the image may be described as a contour or edge of the object. The pattern in the image may be described as a repeating structure or layout of the object, such as a grid, pattern, or the like.
In an image, the correlation between image features may refer to the association between different image features. These image features may be pixel-level features or higher-level semantic features. Illustratively, (1) color and texture: in natural images, certain colors and texture features may be related. For example, a green pixel will typically appear in a grass texture, while a blue pixel may be associated with a sky texture. Thus, in certain areas in the image, there may be some correlation between color and texture features. (2) shape and edge: shape features in an image are typically related to edge features. Edges are regions of apparent brightness or color change in an image, and shape features can be extracted by detecting edges. Thus, there is a close correlation between the shape features and the edge features. (3) objects and contexts: in an image, there may be a correlation between objects and their surrounding context. For example, when we see one cat, the surrounding background and environment may provide useful information about the cat's location, size, and class. Thus, there may be a correlation between the object features and the context features.
Thus, understanding the correlation between image features in an image can help to better understand the image content and provide more accurate information in image processing and computer vision tasks.
In some embodiments, according to the reference color channel, normalization processing is performed on the remaining two color channels of the three color channels, so as to obtain a color correction image, which specifically includes: (1) Acquiring a first maximum pixel value in a channel array corresponding to a first color channel, acquiring a second maximum pixel value in a channel array corresponding to a second color channel, and acquiring a third maximum pixel value in a channel array corresponding to a reference color channel, wherein the first color channel and the second color channel are the rest two color channels in the three color channels; (2) Calculating a first pixel average value corresponding to a first color channel, a second pixel average value corresponding to a second color channel and a third pixel average value corresponding to a reference color channel based on the image specification of the image to be processed; (3) Calculating the ratio between the third maximum pixel value and the first maximum pixel value to obtain a first ratio, and calculating the ratio between the third maximum pixel value and the second maximum pixel value to obtain a second ratio; (4) Calculating the ratio between the third pixel average value and the first pixel average value to obtain a third ratio, and calculating the ratio between the third pixel average value and the second pixel average value to obtain a fourth ratio; (5) Calculating based on a channel array corresponding to the first color channel, a first preset formula, a first ratio and a third ratio to obtain a first color channel after color correction; (6) Calculating based on a channel array corresponding to the second color channel, a first preset formula, a second ratio and a fourth ratio to obtain a second color channel after color correction; (7) And combining the reference color channel, the first color channel after color correction and the second color channel after color correction to obtain a color correction image.
It should be noted that the first preset formula is:wherein A is used for indicating the channel array corresponding to the other two color channels in the three color channels,/for the channel array>For indicating colour channel after colour correction, X max For indicating a first or a second ratio, X avg For indicating a third or a fourth ratio, < ->For representing the preset proportional parameter, c is used for representing the random parameter, and by way of example and not limitation, the preset proportional parameter may be 0.5 or 0.6, and the specific proportional parameter may be set according to the actual application scenario, which is not limited herein. The range of values of the random parameters is as follows: [ -0.005, +0.005]。
For example, F (I) =r ', G', B ', where I is used to indicate an image to be processed, F (I) is used to indicate three channel arrays corresponding to the image to be processed I, R' is used to indicate a channel array corresponding to the R channel, G 'is used to indicate a channel array corresponding to the G channel, B' is used to indicate a channel array corresponding to the B channel, assuming that the image specification of the image to be processed is 4*4, determining the G channel in the image to be processed as a reference color channel, and the remaining two color channels are respectively: the R channel and the B channel, thus, the G channel, the R channel and the B channel each have 4*4 pixels, and the channel array G' corresponding to the G channel is as follows:
Thus, the maximum pixel value in the channel array G' for the G channel is: θ max (G ')=90, and in the same manner, obtaining the maximum pixel value θ in the channel array R' corresponding to the R channel max (R ') obtaining the maximum pixel value theta in the channel array B' corresponding to the B channel max (B')。
The pixel average value of the channel array G' corresponding to the G channel is:similarly, the average pixel value of the channel array R' corresponding to the R channel is: θ avg (4, R ') the pixel average value of the channel array B' corresponding to the B channel is: θ avg (4, B'). Thus, θ max (G') and θ max The ratio between (R') is: />θ max (G') and θ max The ratio between (B') is: />θ avg (4, G') and θ avg The ratio between (4, R') is: />θ avg (4, G') and θ avg The ratio between (4, B') is: />According to a first preset formula, the R channel after color correction is: />The B channel after color correction is:to sum up, the color corrected image is: />
In some embodiments, performing feature decoupling processing on the color correction image specifically includes: (1) Acquiring a preset feature vector matrix and a preset diagonal matrix; (2) And calculating based on a second preset formula, the feature vector matrix, the diagonal matrix and the color correction image so as to remove the phase between each image feature in the color correction image.
It should be noted that the second preset formula is:wherein (1)>The to-be-processed image V used for indicating the color correction image after the characteristic decoupling processing is used for indicating a characteristic vector matrix, and D used for indicating a diagonal matrix, wherein the diagonal matrix comprises characteristic values, and the diagonal matrix is->Each element in (a) is the inverse of the square root of the corresponding element in the diagonal matrix D, V T And the transpose matrix is used for indicating the corresponding characteristic vector matrix V.
It should be noted that, the image to be processed after the feature decoupling processing has the characteristics of zero mean and unit covariance matrix, so that the correlation between the image features in the image can be eliminated.
It should be noted that the feature vector matrix is a matrix composed of feature vectors, one for each column. The eigenvector matrix is represented by the following form:wherein the eigenvector matrix is a matrix of n x m, a 11 、a 22 And a nm The values of (a) are the same, a 11 、a 22 And a nm The setting may be performed according to an actual application scenario, which is not limited herein.
Illustratively, the eigenvector matrix V is a matrix of 3*3 and has three eigenvectors, respectively: v1= [1,0]、v2=[0,1,0]And v3= [0, 1]Thus, the eigenvector matrix V is:
it should be noted that the diagonal matrix is a diagonal matrix composed of eigenvalues, in which elements on the diagonal correspond to eigenvalues. The diagonal matrix is represented by the following form: Wherein the diagonal matrix is a matrix of n x m, lambda 1 、λ 2 And lambda (lambda) nm The values of lambda may be the same or different 1 、λ 2 And lambda (lambda) nm The setting may be performed according to an actual application scenario, which is not limited herein.
Illustratively, the diagonal matrix D is a matrix of 3*3, and the eigenvalues of the diagonal matrix D are respectively: lambda (lambda) 1 =4,λ 2 =9,λ 3 =16, thus, the diagonal matrix D is:due to diagonal matrix->Is the inverse of the square root of the corresponding element in the diagonal matrix D, thus the diagonal matrix +.>The method comprises the following steps:
by way of example, assume that the size of the image to be processed is: 256×256, the number of channels is: 3, then the size of the color corrected image is: 256×256, the number of channels is: and 3, the size of the image to be processed after the characteristic decoupling treatment is as follows: 256×256, the number of channels is: 3.
according to the embodiment, not only is the RGB channels of the image to be processed subjected to separate independent self-adaptive normalization, but also the whole image is subjected to feature processing, and feature of the image can be better highlighted through feature vector matrix and diagonal matrix feature decoupling, so that effective feature information in the image can be better distinguished.
S430, extracting feature information of an image to be processed to generate a first feature map, wherein the extracting of the feature information of the image to be processed to generate the first feature map specifically comprises: respectively carrying out convolution processing on each color channel, and converting an image to be processed into an intermediate image with the same channel number; and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
It should be noted that, in the execution step of the convolution processing for each color channel, the size of the convolution kernel is set to: 3*3, the step size of the convolution kernel is set to: 1, the number of convolution kernels corresponding to each color channel is: 1.
it should be noted that, in the execution step of performing the point convolution processing on the intermediate image, the size of the convolution kernel is set to: 1*1, the step size of the convolution kernel is set to: 1, the number of convolution kernels is: 16.
illustratively, based on the example of step 420, the size of the image to be processed after the feature decoupling process is: 256×256, the number of channels is: and 3, performing convolution processing on each color channel respectively to obtain an intermediate image with the following size: 256×256, the number of channels is: and 3, carrying out point convolution processing to obtain a first characteristic diagram with the following dimensions: 256×256, the number of channels is: 16.
it should be noted that the size and number of convolution kernels has a significant impact on the performance and feature extraction capabilities of the neural network. The following are the transverse and vertical comparisons, 1, transverse comparison: (1) influence of the size of the convolution kernel: the size of the convolution kernel determines the size of features that the neural network can capture, if the size of the convolution kernel is small, the network can capture more detailed features, but may result in an overfitting, if the size of the convolution kernel is large, the network can capture more features, but may result in a loss of information. (2) influence of the number of convolution kernels: the number of convolution kernels determines the complexity of the features that the neural network can learn, and if the number of convolution kernels is small, the neural network may not capture sufficiently complex features, and if the number of convolution kernels is large, the neural network may learn more complex features, but may result in overfitting. 2. Vertical comparison: (1) influence of the size of the convolution kernel: smaller convolution kernels may learn more features with fewer parameters, thereby reducing the risk of overfitting, and larger convolution kernels may learn more features, but require more parameters, possibly resulting in overfitting. (2) influence of the number of convolution kernels: fewer convolution kernels may learn simpler features, thereby reducing the risk of overfitting, and more convolution kernels may learn more complex features, but require more parameters, possibly resulting in overfitting.
In some embodiments, the method expands the number of channels of the intermediate image to the number of target channels by point convolution processing to obtain a first feature map, specifically including: (1) Performing point convolution processing on the intermediate image to obtain a second feature map with the number of target channels; (2) And carrying out cavity convolution processing on the second feature map to generate a first feature map.
The hole convolution (atrous convolutions), also known as the expansion convolution (dilated convolutions), introduces a new parameter called the "expansion rate" to the convolution layer, which defines the spacing of the values when the convolution kernel processes the data.
In the execution step of the hole convolution processing on the second feature map, the size of the convolution kernel is set to: 3*3, the step size of the convolution kernel is set to: 1, the number of convolution kernels is: 16.
in the embodiment of the present application, the convolution kernel of 1*1 in the execution step of the point convolution process is expanded to the convolution kernel of 3*3 in the execution step of the hole convolution process.
Illustratively, the intermediate image is subjected to a point convolution process, and the size of the obtained second feature map is: 256×256, the number of channels is: 16, performing cavity convolution processing on the second feature map, wherein the size of the obtained first feature map is as follows: 128×128, the number of channels is: 16.
It should be noted that the hole convolution process is used to: the receptive field is expanded to obtain multi-scale image information. Wherein Receptive Field (Receptive Field): the size of the area mapped on the input image (second feature map) by one pixel point on the feature map in the convolutional neural network.
In some embodiments, before the hole convolution processing is performed on the second feature map, the method further includes: (1) Processing the second feature map sequentially through a preset activation function and a preset batch normalization layer to obtain a normalized second feature map; (2) And carrying out residual connection on the normalized second feature map and the image to be processed to obtain a second feature map after residual connection.
Illustratively, the second feature map has dimensions of: 256×256, the number of channels is: the size of the second feature map after residual connection is also: 256×256, the number of channels is: 16.
the activation function (Activation Function) is a function added to the neural network that is intended to help the neural network learn complex patterns in the data. Similar to the neuron-based model in the human brain, the activation function ultimately determines what is to be transmitted to the next neuron, i.e., the function running on the neuron of the neural network, responsible for mapping the input of the neuron to the output.
The activation functions include, but are not limited to, a linear rectification function (Rectified Linear Unit, reLU), a squeezing function (squashing function), and a hyperbolic tangent function (Hyperbolic Tangent, tanh), and the specific activation functions may be set according to the actual application scenario, and are not limited herein.
It should be noted that the activation function plays a vital role in the neural network, and its main role is to introduce nonlinear factors, so as to improve the expression capacity of the neural network to the model and solve the problem that the linear model cannot solve. Without the activation function, the neural network can only represent a linear mapping, and cannot deal with complex nonlinear problems. Thus, the activation function is an integral part of the neural network.
The role of the activation function can be explained by the following two aspects: (1) introducing a nonlinear factor: the main function of the activation function is to convert the input signal of the neuron into an output signal, which conversion process requires the introduction of a non-linear factor. If the function is not activated, the output of the neuron is a linear function of the input and the nonlinear problem cannot be dealt with. The activation function can perform nonlinear transformation on the input signal, so that the neural network has stronger expression capability and can solve the problem of more complexity. (2) implementing a de-linearization: the activation function may perform a nonlinear transformation on the input signal (i.e., the second feature map obtained after the point convolution process), thereby implementing the de-linearization. This process can make the neural network more stable, reducing the risk of overfitting.
The batch normalization layer (Batch Normalization Layer, BNL) is a widely used way in deep learning, which can accelerate training of the neural network and improve accuracy of the model, and the batch normalization layer makes the pixel value distribution in the second feature map more stable by performing normalization operation on data of each batch (i.e. data output by the activation function).
In conventional neural networks, the output of each layer is obtained by a nonlinear transformation of the output of the previous layer. However, as the depth of the network increases, the output of the previous layer may be excessively compressed or stretched, resulting in loss or duplication of information. In this case, the performance of the network may be affected, and a problem of gradient extinction or gradient explosion may occur.
Residual connection (residual connection) is a common way in neural networks, its role is to solve the gradient vanishing and gradient explosion problems, while also helping the model to converge faster. Residual connections are commonly applied in neural networks comprising multiple layers, such as residual network (ResidualNetwork, resNet) and dense connection network (Dense connection network, denseNet), etc.
Residual connection solves the above mentioned problem by adding a cross-layer connection between the output and input of each layer. More specifically, the residual connection adds the output of the previous layer directly to the output of the current layer (i.e. the normalized second feature map is residual-connected to the image to be processed), thereby providing a path around the nonlinear transformation. Thus, the residual connection is used to: the important image information is preserved after the image information is compressed or stretched, and gradient extinction or gradient explosion is avoided.
As shown in fig. 5, fig. 5 is a schematic flow chart of generating a first feature map by using an image to be processed provided in the embodiment of the present application, where the image to be processed is subjected to convolution processing on each color channel to obtain an intermediate image with the same number of channels, the intermediate image is subjected to point convolution processing to obtain a second feature map with the target number of channels, the second feature map is sequentially processed by an activation function and a batch normalization layer to obtain a normalized second feature map, the normalized second feature map is subjected to residual connection with the image to be processed to obtain a second feature map after residual connection, and the second feature map is subjected to hole convolution processing to generate the first feature map.
In some embodiments, the hole convolution processing is performed on the second feature map to generate a first feature map, which specifically includes: (1) Carrying out cavity convolution processing on the second feature map to obtain a feature map after cavity convolution; (2) And carrying out global average pooling treatment on the feature map after the cavity convolution to obtain a first feature map.
Global averaging pooling (Global Average Pooling), which is a special averaging pooling operation, is to divide an input feature map (feature map after hole convolution) into non-overlapping rectangular areas and perform an averaging operation on values in each rectangular area, and thus global averaging pooling processing is used to: and averaging the pixel values of all the pixel points in the feature map after the cavity convolution. Through global average pooling processing, the size of the feature map can be effectively reduced, the calculated amount is reduced, and important feature information is reserved.
It should be noted that the step size of the global average pooling is set to: 2.
s440, performing edge feature enhancement processing on the first feature map.
In some embodiments, the enhancement processing of the edge feature specifically includes: (1) Performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection; (2) And stacking the residual connected characteristic diagram and the first characteristic diagram.
The stacking of the residual connected feature map and the first feature map can be understood as: is achieved by performing a stitching operation on the two feature maps in the channel dimension. Illustratively, the dimensions of the feature map after residual connection are: [ H, W, C1], the dimensions of the first feature map are: and [ H, W, C2], wherein H is used for indicating the height, W is used for indicating the width, C1 and C2 are respectively the channel numbers of the two feature images, the two feature images are spliced in the channel dimension according to the operation of connecting in the channel dimension, a new first feature image is generated, and the size of the new first feature image is as follows: [ H, W, C1+C2], wherein C1+C2 is the total number of channels after ligation. Illustratively, the stitching of feature graphs may be performed by stitching functions, including, but not limited to, torch.stack () and torch.cat (). Assuming that the number of channels of the feature map after residual connection is 3, the number of channels of the first feature map is 4, and the sizes of the first feature map and the second feature map are: after the connection according to the channel dimension, the new first feature map is generated with the following dimensions: [32, 32,7].
By stacking the residual connected feature map and the first feature map, information on channel dimensions of the two feature maps can be fused, so that information of the two feature maps can be simultaneously utilized to perform subsequent feature extraction and processing.
In some embodiments, performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection, which specifically includes: (1) Performing convolution operation on the first feature map to obtain a convolved feature map; (2) Performing expansion convolution operation on the first feature map to obtain a feature map after expansion convolution; (3) And carrying out residual connection on the characteristic map after convolution and the characteristic map after expansion convolution to obtain the characteristic map after residual connection.
It should be noted that, in the execution step of the convolution operation on the first feature map, the size of the convolution kernel is set to: 1*1, the step size of the convolution kernel is set to: 1, the number of convolution kernels is: 64. in the performing step of performing the dilation convolution operation on the first feature map, the size of the convolution kernel is set to: 3*3, the step size of the convolution kernel is set to: 1, the convolution expansion ratio is set as: the number of convolution kernels is: 64.
according to the embodiment, the edge feature enhancement processing is carried out on the first feature map, so that the details of the image and the extraction capability of the edge feature are improved.
In some embodiments, in the step of performing convolution operation and residual connection on the first feature map, the convolution operation and residual connection on the first feature map are repeatedly performed based on a preset number of times, so as to obtain a feature map after residual connection.
The preset number of times is greater than or equal to 2, which is exemplified by but not limited to 2 or 3, and the specific preset number of times may be set according to the actual application scenario, which is not limited herein.
For example, the preset times are 2, and the convolution operation and residual connection are repeatedly performed on the first feature map based on the preset times, so as to obtain a feature map after residual connection, which can be understood as follows: (1) Carrying out convolution operation and residual connection on the first feature map for the first time to obtain a feature map after the first residual connection; (2) And secondly, carrying out convolution operation and residual connection on the feature map after the first residual connection again to obtain the feature map after the residual connection.
S450, mapping the first feature map into a segmentation result map through up-sampling processing.
In some embodiments, the step of extracting feature information of the image to be processed and generating a first feature map is repeatedly performed for n iterations, so as to obtain n feature maps arranged according to a first order, where the first order is used to indicate an order from early to late of iteration time, each feature map in the n feature maps corresponds to an iteration time, and a feature map in the n feature maps with the latest iteration time is determined as the first feature map.
In some embodiments, the step of mapping the first feature map to the segmentation result map by the upsampling process is repeatedly performed n times, wherein each of the n feature maps is sequentially connected to the feature map obtained after the corresponding respective upsampling process in a second order, and the first feature map is mapped to the segmentation result map by the n upsampling processes, and the second order is used to indicate the order of the iteration moments from late to early.
In some embodiments, mapping the first feature map to the segmentation result map through an upsampling process specifically includes: (1) Performing convolution operation on the first feature map, and upsampling the feature map after the convolution operation to obtain an upsampled feature map; (2) And processing the feature map after up-sampling through a preset activation function and a preset batch normalization layer in sequence to obtain a segmentation result map.
For example, assuming n=3, the size of the image to be processed is: 256×256, the number of channels is: fig. 6 is a schematic flow chart of mapping the first feature map to the segmentation result map through 3 upsampling processes according to the present embodiment, as shown in fig. 6. Repeating the steps for 3 times to obtain 3 feature images which are arranged according to the first order, wherein the feature images are respectively as follows: the feature map a1 (size: 128×128, number of channels: 16), the feature map a2 (size: 64×64, number of channels: 32), and the feature map a3 (size: 32×32, number of channels: 64), wherein the feature map a3 is a first feature map. (1) Performing a first convolution operation on the first feature map, and up-sampling the feature map after the first convolution operation to obtain a first up-sampled feature map (the size is 64 x 64, the number of channels is 64), connecting the first up-sampled feature map with the feature map a3, and processing the connected feature map sequentially through a preset activation function and a preset batch normalization layer to obtain a first decoded feature map (the size is 64 x 64, the number of channels is 128); (2) Performing a second convolution operation on the first decoded feature map, and performing upsampling on the feature map after the second convolution operation to obtain a second upsampled feature map (the size is 128×128, and the channel number is 32), connecting the second upsampled feature map with the feature map a2, and sequentially processing the connected feature map through an activation function and a batch normalization layer to obtain a second decoded feature map (the size is 128×128, and the channel number is 64); (3) Performing a third convolution operation on the second decoded feature map, and performing up-sampling on the feature map after the third convolution operation to obtain a third up-sampled feature map (the size is 256×256, the channel number is 16), connecting the third up-sampled feature map with the feature map a1, and processing the connected feature map sequentially through an activation function and a batch normalization layer to obtain a third decoded feature map (the size is 256×256, the channel number is 32); (4) And performing a fourth convolution operation on the third decoded feature map, and processing the feature map after the fourth convolution operation through an activation function and a batch normalization layer to obtain a segmentation result map (the size is 256×256, and the channel number is 1). In the four convolution operations, the size of the convolution kernel is set as follows: 3*3, the step sizes of the convolution kernels are all set to: 1, the number of convolution kernels corresponding to the first convolution operation is: 64, the number of convolution kernels corresponding to the second convolution operation is: 32, the number of convolution kernels corresponding to the third convolution operation is: 16, the number of convolution kernels corresponding to the fourth convolution operation is: 1.
In some embodiments, after mapping the first feature map to the segmentation result map by the upsampling process, further comprising: (1) Calculating based on a preset loss function formula, a segmentation result diagram and a preset labeling image to obtain a loss value; (2) Repeatedly adjusting the weight parameters of the convolution kernels of all target objects in the preset training model, and processing and operating the image to be processed through the adjusted preset training model until the loss value meets the preset condition, so as to obtain the image segmentation model.
Wherein each target object includes all convolution processing procedures and up-sampling processing procedures in the step of extracting characteristic information of an image to be processed.
It should be noted that the preset labeling image is obtained by labeling a main body of the image to be processed and labeling the background, and the preset labeling image includes a labeled image main body and labeled background.
The loss function formula is as follows:l is used for indicating loss value, x is used for indicating preset labeling image, +>For indicating the average value of all pixel values in the segmentation result map,/->And the method is used for indicating the average standard deviation between the preset marked image and the segmentation result graph. α is a constant, and as an example and not by way of limitation, α may be set to 3.68, and may also be set to 3.5, and the specific α may be set according to the actual application scenario, which is not limited herein. And the method is used for indicating the average absolute difference between the preset marked image and the segmentation result graph.
Illustratively, what is shown in the segmentation result graphThe average value of the pixel values is 10, and the pixel value array corresponding to the preset marked image is as follows: [5,8, 10, 12, 15](1) calculating an average standard deviation: the first step of calculating the difference between each pixel value in the array of pixel values and the average value of all pixel values in the segmentation result map, i.e., (5-10) = -5, (8-10) = -2, (10-10) = 0, (12-10) = 2, (15-10) = 5, the second step of calculating the square value of each difference, i.e., (-5)/(2=25, (-2)/(2=4, 0-2=0, 2-2=4, 5-2=25), the third step of calculating the average value of all square values, i.e., (25+4+0+4+25)/5=58/5=11.6, and the fourth step of calculating the square root of the average value of all square values, i.e.Thus, the average standard deviation between the annotation image and the segmentation result map is: 3.41. (2) calculating an average absolute difference: firstly, calculating the absolute value of the difference between each pixel value in the pixel value array and the average value of all pixel values in the segmentation result diagram, namely |5-10|=5, |8-10|=2, |10-10|=0, |12-10|=2, |15-10|=5, and secondly, calculating the average value of the absolute values, namely (5+2+0+2+5)/5=14/5=2.8, so that the average absolute difference between the preset marked image and the segmentation result diagram is 2.8.
It will be appreciated that a loss function (loss function) or cost function (cost function) is a function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "loss" of the random event. In deep learning, the function of the loss function is to measure the difference or error between the model prediction result (corresponding to the segmentation result map of the present embodiment) and the actual label (corresponding to the preset labeling image of the present embodiment). The method is an objective function of an optimization algorithm, and parameters of the model are adjusted by minimizing loss values, so that the target variables can be predicted more accurately. The choice of the loss function depends on the nature of the problem and the task type. Common Loss functions include mean square error (Mean Squared Error, MSE), cross Entropy (Cross Entropy), log Loss (Log Loss), etc., and different Loss functions are applicable to different problems, such as regression, classification, multi-classification, etc.
The preset condition is used for indicating that the loss value is smaller than or equal to a first preset threshold value, and the number of times that the loss value is continuously smaller than or equal to the first preset threshold value is larger than or equal to a second preset threshold value. As an example and not by way of limitation, the first preset threshold may be 1.5, or may be 2, and the specific first preset threshold may be set according to the actual application scenario. As an example and not by way of limitation, the second preset threshold may be 3 or 4, and the specific second preset threshold may be set according to the actual application scenario. For example, the first preset threshold is 2, the second preset threshold is 3, and when the loss value is less than or equal to 2 continuously for 3 times, the image segmentation model is obtained.
Wherein the weight parameter of the convolution kernel is a high-dimensional tensor, and the weight value of the convolution kernel is included. In the convolution operation, each convolution kernel has its own weight parameter, and they are optimized by a back propagation algorithm, so as to implement convolution calculation on the input data.
The back propagation algorithm used in this embodiment is cyclic back propagation (Recurrent backpropagation), which is a process of repeating back propagation until a certain threshold is reached, and after the threshold is reached, calculating the error and propagating it backward.
The minimization process of the loss function is implemented by a back propagation algorithm, and the gradient of the loss function is a very important concept in machine learning. Gradient refers to the rate of change of a function at a point that can help us find the minimum or maximum of the function. For the loss function we want to find the minimum, as this represents the minimum error of the model prediction. In machine learning, we typically use a gradient descent algorithm to minimize the loss value. The core of this algorithm is to calculate the gradient of the loss function and update the model parameters in the opposite direction of the gradient until the loss function is minimized, wherein the model parameters are weight parameters of the convolution kernel in the preset training model.
In machine learning, adaptive moment estimation (Adam) may also be used to optimize the weight parameters of the convolution kernels in the preset training model by Adam. Adam's algorithm can adaptively adjust the learning rate according to the gradient characteristics of different parameters. For parameters with larger gradients, the learning rate can be correspondingly reduced so as to avoid oscillation caused by too fast parameter updating; for parameters with smaller gradients, the learning rate increases accordingly to accelerate convergence. Illustratively, the number of iterations is set to 10 ten thousand times, the initialized learning rate is set to 0.001, the weight decay is set to 0.0005, and the learning rate decays to 1/10 of the original for every 1000 iterations until the model converges.
By minimizing the loss value, the deep learning model can gradually optimize itself, improving the predictive power of the input data. Namely, a preset training model is optimized, and the segmentation accuracy of an image main body is improved, so that a loss function plays a vital role in a deep learning network.
At least one advantageous aspect of the image processing method provided in the embodiments of the present application is: the method has the advantages that the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the loss of image information elements can be reduced while the calculated amount is reduced, the image feature information is better transferred to the high-dimensional features, the extraction of the image features is more facilitated, the segmentation accuracy of the image main body in the image is improved, and the artifacts generated by the segmented image main body are avoided.
Fig. 7 is a functional block diagram of an image processing apparatus provided in an embodiment of the present application. As shown in fig. 7, the image processing apparatus 700 may include:
a receiving module 710, configured to receive an image to be processed, where the image to be processed has three color channels;
the extracting module 720 is configured to extract feature information of an image to be processed, and generate a first feature map;
a mapping module 730, configured to map the first feature map into a segmentation result map through upsampling;
extracting feature information of an image to be processed to generate a first feature map, wherein the method specifically comprises the following steps:
respectively carrying out convolution processing on each color channel, and converting an image to be processed into an intermediate image with the same channel number;
and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
At least one advantageous aspect of the image processing method provided in the embodiments of the present application is: the method has the advantages that the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the loss of image information elements can be reduced while the calculated amount is reduced, the image feature information is better transferred to the high-dimensional features, the extraction of the image features is more facilitated, the segmentation accuracy of the image main body in the image is improved, and the artifacts generated by the segmented image main body are avoided.
Fig. 8 is a functional block diagram of an image processing apparatus according to another embodiment of the present application. As shown in fig. 8, the image processing apparatus 700 may include:
a receiving module 710, configured to receive an image to be processed, where the image to be processed has three color channels;
the extracting module 720 is configured to extract feature information of an image to be processed, and generate a first feature map;
a mapping module 730, configured to map the first feature map into a segmentation result map through upsampling;
extracting feature information of an image to be processed to generate a first feature map, wherein the method specifically comprises the following steps:
respectively carrying out convolution processing on each color channel, and converting an image to be processed into an intermediate image with the same channel number;
and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
Optionally, the image processing apparatus 700 may further include:
a preprocessing module 740, configured to preprocess an image to be processed;
the preprocessing module 740 is specifically configured to:
among the three color channels, one reference color channel is determined;
respectively carrying out normalization processing on the other two color channels in the three color channels according to the reference color channel to obtain a color correction image;
And performing characteristic decoupling processing on the color correction image.
Optionally, the extraction module 720 is specifically configured to:
performing point convolution processing on the intermediate image to obtain a second feature map with the number of target channels;
and carrying out cavity convolution processing on the second feature map to generate a first feature map.
Optionally, the image processing apparatus 700 may further include:
an enhancement processing module 750, configured to perform enhancement processing of edge features on the first feature map;
the enhancement processing module 750 is specifically configured to:
performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection;
and stacking the residual connected characteristic diagram and the first characteristic diagram.
Optionally, the enhancement processing module 750 is specifically further configured to:
performing convolution operation on the first feature map to obtain a convolved feature map;
performing expansion convolution operation on the first feature map to obtain a feature map after expansion convolution;
and carrying out residual connection on the characteristic map after convolution and the characteristic map after expansion convolution to obtain the characteristic map after residual connection.
Optionally, the step of extracting feature information of the image to be processed and generating a first feature map is repeatedly performed for n times to obtain n feature maps arranged according to a first order, where the first order is used to indicate an order from early to late of iteration time, each feature map in the n feature maps corresponds to one iteration time, and a feature map with the latest iteration time in the n feature maps is determined as the first feature map.
Optionally, the step of mapping the first feature map to the segmentation result map by the upsampling process is repeatedly performed n times, wherein each of the n feature maps is sequentially connected to the feature maps obtained after the corresponding upsampling processes in a second order, and the first feature map is mapped to the segmentation result map by the n upsampling processes, and the second order is used for indicating the order of iteration time from late to early.
Optionally, the preprocessing module 740 is specifically further configured to:
acquiring a first maximum pixel value in a channel array corresponding to a first color channel, acquiring a second maximum pixel value in a channel array corresponding to a second color channel, and acquiring a third maximum pixel value in a channel array corresponding to a reference color channel, wherein the first color channel and the second color channel are the rest two color channels in the three color channels;
calculating a first pixel average value corresponding to a first color channel, a second pixel average value corresponding to a second color channel and a third pixel average value corresponding to a reference color channel based on the image specification of the image to be processed;
calculating the ratio between the third maximum pixel value and the first maximum pixel value to obtain a first ratio, and calculating the ratio between the third maximum pixel value and the second maximum pixel value to obtain a second ratio;
Calculating the ratio between the third pixel average value and the first pixel average value to obtain a third ratio, and calculating the ratio between the third pixel average value and the second pixel average value to obtain a fourth ratio;
calculating based on a channel array corresponding to the first color channel, a first preset formula, a first ratio and a third ratio to obtain a first color channel after color correction;
calculating based on a channel array corresponding to the second color channel, a first preset formula, a second ratio and a fourth ratio to obtain a second color channel after color correction;
and combining the reference color channel, the first color channel after color correction and the second color channel after color correction to obtain a color correction image.
Optionally, the preprocessing module 740 is specifically further configured to:
acquiring a preset feature vector matrix and a preset diagonal matrix;
and calculating based on a second preset formula, the feature vector matrix, the diagonal matrix and the color correction image so as to release the correlation between each image feature in the color correction image.
At least one advantageous aspect of the image processing method provided in the embodiments of the present application is: the method has the advantages that the convolution processing is carried out on each color channel of the image to be processed, the point convolution processing is carried out on the intermediate image obtained after the convolution processing, and the first feature image is obtained, so that the first feature image is mapped into the segmentation result image through the up-sampling processing, the loss of image information elements can be reduced while the calculated amount is reduced, the image feature information is better transferred to the high-dimensional features, the extraction of the image features is more facilitated, the segmentation accuracy of the image main body in the image is improved, and the artifacts generated by the segmented image main body are avoided.
It should be noted that, in the embodiment of the present application, a functional naming function is taken as an example, and method steps to be implemented by the apparatus provided in the embodiment of the present application are described in detail. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and module described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of this application. The computer software may be stored in a computer readable storage medium, and the program, when executed, may include the flow of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.
Fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present application, which is not limited to a specific implementation of the electronic device.
As shown in fig. 9, the electronic device may include: a processor 902, a communication interface (Communications Interface), a memory 906, and a communication bus 908.
Wherein: processor 902, communication interface 904, and memory 906 communicate with each other via a communication bus 908. A communication interface 904 for communicating with network elements of other devices, such as clients or other servers. The processor 902 is configured to execute the program 910, and may specifically perform relevant steps in the above-described image processing method embodiment.
In particular, the program 910 may include program code including computer-operating instructions. Which may be used in particular to cause the processor 902 to perform the image processing method of any of the method embodiments described above.
In an embodiment of the present application, the processor 902 may be a central processing unit (Central Processing Unit, CPU), the processor 902 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like, depending on the type of hardware used.
The memory 906 is used to store a program 910. Memory 906 may include high-speed RAM memory or may also include non-volatile memory, such as at least one disk memory, flash memory device, or other non-volatile solid-state storage device.
It has a program storage area and a data storage area for storing the program 910 and corresponding data information, respectively. Such as a non-volatile software program, a non-volatile computer-executable program, and modules, stored in the program storage area.
Embodiments of the present application also provide a computer-readable storage medium. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program.
Wherein the computer program when executed by the processor implements one or more steps of the image processing method disclosed in the embodiments of the present application. The complete computer program product is embodied on one or more computer readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing a computer program as disclosed in embodiments of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (11)

1. An image processing method, comprising:
receiving an image to be processed, wherein the image to be processed has three color channels;
extracting feature information of the image to be processed to generate a first feature map;
mapping the first feature map into a segmentation result map through up-sampling processing;
the extracting the feature information of the image to be processed to generate a first feature map specifically includes:
respectively carrying out convolution processing on each color channel, and converting the image to be processed into an intermediate image with the same channel number;
and expanding the number of channels of the intermediate image into the number of target channels through point convolution processing to obtain a first feature map.
2. The method according to claim 1, wherein before extracting the feature information of the image to be processed, the method further comprises: preprocessing the image to be processed;
wherein the preprocessing comprises:
determining a reference color channel among the three color channels;
respectively carrying out normalization processing on the other two color channels in the three color channels according to the reference color channel to obtain a color correction image;
And performing characteristic decoupling processing on the color correction image.
3. The method according to claim 1, wherein the expanding the number of channels of the intermediate image to the target number of channels by the point convolution process to obtain the first feature map includes:
performing point convolution processing on the intermediate image to obtain a second feature map with the number of target channels;
and carrying out cavity convolution processing on the second feature map to generate a first feature map.
4. The method of claim 1, wherein prior to said mapping of said first feature map to a segmentation result map by an upsampling process, said method further comprises: performing enhancement processing of edge features on the first feature map;
wherein the enhancement processing of the edge feature comprises:
performing convolution operation and residual connection on the first feature map to obtain a feature map after residual connection;
and stacking the residual connected characteristic diagram and the first characteristic diagram.
5. The method of claim 4, wherein the convolving and residual connecting the first feature map to obtain a residual connected feature map comprises:
Performing convolution operation on the first feature map to obtain a convolved feature map;
performing expansion convolution operation on the first feature map to obtain a feature map after expansion convolution;
and carrying out residual connection on the characteristic diagram after convolution and the characteristic diagram after expansion convolution to obtain the characteristic diagram after residual connection.
6. The method according to claim 1, wherein the method further comprises:
repeatedly performing n iterations on the step of extracting the feature information of the image to be processed and generating a first feature image to obtain n feature images arranged according to a first order, wherein the first order is used for indicating the order of iteration moments from early to late, each feature image in the n feature images corresponds to one iteration moment, and the feature image with the latest iteration moment in the n feature images is determined to be the first feature image.
7. The method of claim 6, wherein the method further comprises:
and repeatedly performing up-sampling processing for n times in the step of mapping the first feature map into the segmentation result map through the up-sampling processing, wherein each feature map in the n feature maps is sequentially connected to the feature map obtained after the corresponding up-sampling processing according to a second order, and the first feature map is mapped into the segmentation result map through the n up-sampling processing, and the second order is used for indicating the sequence from late to early of iteration moments.
8. The method according to claim 2, wherein the normalizing the remaining two color channels of the three color channels according to the reference color channel respectively, to obtain a color corrected image, includes:
acquiring a first maximum pixel value in a channel array corresponding to a first color channel, acquiring a second maximum pixel value in a channel array corresponding to a second color channel, and acquiring a third maximum pixel value in the channel array corresponding to the reference color channel, wherein the first color channel and the second color channel are the rest two color channels in the three color channels;
calculating a first pixel average value corresponding to the first color channel, a second pixel average value corresponding to the second color channel and a third pixel average value corresponding to the reference color channel based on the image specification of the image to be processed;
calculating the ratio between the third maximum pixel value and the first maximum pixel value to obtain a first ratio, and calculating the ratio between the third maximum pixel value and the second maximum pixel value to obtain a second ratio;
calculating the ratio between the third pixel average value and the first pixel average value to obtain a third ratio, and calculating the ratio between the third pixel average value and the second pixel average value to obtain a fourth ratio;
Calculating based on a channel array corresponding to the first color channel, a first preset formula, the first ratio and the third ratio to obtain a first color channel after color correction;
calculating based on a channel array corresponding to the second color channel, the first preset formula, the second ratio and the fourth ratio to obtain a second color channel after color correction;
and combining the reference color channel, the first color channel after color correction and the second color channel after color correction to obtain a color correction image.
9. The method of claim 2, wherein said performing feature decoupling on said color corrected image comprises:
acquiring a preset feature vector matrix and a preset diagonal matrix;
and calculating based on a second preset formula, the eigenvector matrix, the diagonal matrix and the color correction image so as to release the correlation between each image characteristic in the color correction image.
10. An electronic device, the electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor;
Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-9.
11. A computer-readable storage medium storing computer-executable instructions for causing a computer device to perform the image processing method according to any one of claims 1-9.
CN202410020516.6A 2024-01-04 2024-01-04 Image processing method, electronic device and storage medium Pending CN117876394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410020516.6A CN117876394A (en) 2024-01-04 2024-01-04 Image processing method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410020516.6A CN117876394A (en) 2024-01-04 2024-01-04 Image processing method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN117876394A true CN117876394A (en) 2024-04-12

Family

ID=90584090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410020516.6A Pending CN117876394A (en) 2024-01-04 2024-01-04 Image processing method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117876394A (en)

Similar Documents

Publication Publication Date Title
CN109949255B (en) Image reconstruction method and device
EP3678059B1 (en) Image processing method, image processing apparatus, and a neural network training method
WO2020239026A1 (en) Image processing method and device, method for training neural network, and storage medium
CN110232661B (en) Low-illumination color image enhancement method based on Retinex and convolutional neural network
US20200134778A1 (en) Image style transform methods and apparatuses, devices and storage media
US20230080693A1 (en) Image processing method, electronic device and readable storage medium
US20240062530A1 (en) Deep perceptual image enhancement
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN113034358B (en) Super-resolution image processing method and related device
WO2019218136A1 (en) Image segmentation method, computer device, and storage medium
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN104809426A (en) Convolutional neural network training method and target identification method and device
CN111951164B (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
CN113095470B (en) Training method, image processing method and device for neural network and storage medium
CN113642445B (en) Hyperspectral image classification method based on full convolution neural network
CN111881920B (en) Network adaptation method of large-resolution image and neural network training device
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN114830168A (en) Image reconstruction method, electronic device, and computer-readable storage medium
CN116882511A (en) Machine learning method and apparatus
CN114723044A (en) Error compensation method, device, chip and equipment for memory computing chip
WO2020187029A1 (en) Image processing method and device, neural network training method, and storage medium
US11997246B2 (en) Trained artificial intelligence model for raw to RGB image transformation
CN117876394A (en) Image processing method, electronic device and storage medium
CN110489584B (en) Image classification method and system based on dense connection MobileNet model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination