WO2021003936A1 - Image segmentation method, electronic device, and computer-readable storage medium - Google Patents

Image segmentation method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
WO2021003936A1
WO2021003936A1 PCT/CN2019/118294 CN2019118294W WO2021003936A1 WO 2021003936 A1 WO2021003936 A1 WO 2021003936A1 CN 2019118294 W CN2019118294 W CN 2019118294W WO 2021003936 A1 WO2021003936 A1 WO 2021003936A1
Authority
WO
WIPO (PCT)
Prior art keywords
sampling
image
pooling feature
feature set
segmented
Prior art date
Application number
PCT/CN2019/118294
Other languages
French (fr)
Chinese (zh)
Inventor
陈玥蓉
韩茂琨
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021003936A1 publication Critical patent/WO2021003936A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to an image segmentation method and device, electronic equipment, and computer-readable storage media.
  • CNN Convolutional Neural Networks
  • some fully connected layers are often added at the end of the network.
  • the output content of the fully connected layer is processed by the softmax function to obtain category probability information.
  • the obtained category probability information is one-dimensional, that is, it can only identify the category of the entire picture, and cannot identify the category of each pixel, especially when processing the edge of the image, the effect is very unsatisfactory.
  • the embodiments of the present application provide an image segmentation method and device, electronic equipment, and computer-readable storage medium, which aim to solve the technical problem of insufficient accuracy of image semantic segmentation in related technologies, and can replace the fully connected layer by the deconvolution layer And add another fully connected layer to classify each pixel of the image to further improve the accuracy of image semantic segmentation.
  • an embodiment of the present application provides an image segmentation method, including: acquiring an image to be segmented; performing convolution, activation, and pooling processing on the image to be segmented to obtain five pooling feature sets;
  • the up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented is performed on the designated pooling feature set among the five pooling feature sets; in the process of the up-sampling processing, according to the prediction mask
  • the total score of the mask is calculated by the cross-combination ratio with the actual mask and the mask score of the original network classification of the image to be segmented; the final up-sampling process is calculated based on the smooth L2 loss function based on the mask total score
  • the result is segmented, and segmented images are obtained.
  • an embodiment of the present application provides an image segmentation device, including: an image acquisition unit for acquiring an image to be segmented; a down-sampling processing unit for convolving, activating, and pooling the image to be segmented Processing to obtain five pooled feature sets; an up-sampling processing unit, configured to perform an up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented, to determine the specified pooling feature in the five pooled feature sets
  • the set is up-sampling processing;
  • the mask total score calculation unit is used in the process of the up-sampling processing, according to the intersection ratio of the predicted mask and the actual mask and the mask of the original network classification of the image to be divided
  • the modulus score is used to calculate the total score of the mask;
  • the image segmentation unit is used to segment the final result of the upsampling process based on the total score of the mask through the smooth L2 loss function to obtain a segmented image.
  • an embodiment of the present application provides an electronic device, including: at least one processor; and, a memory communicatively connected with the at least one processor; wherein the memory stores the memory that can be processed by the at least one processor;
  • the instruction executed by the device, the instruction is configured to execute the method of any one of the above-mentioned first aspects.
  • an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method procedures described in any one of the first aspects.
  • each pixel of the image can be classified by the way of replacing the fully connected layer with the deconvolution layer.
  • the convolutional neural network includes a convolutional layer, an activation layer, and a pooling layer, and also includes a deconvolutional layer that replaces the original fully connected layer, where the image to be segmented is obtained
  • the features in the image to be segmented that is, pixels
  • the convolutional layer can be classified according to different feature types or subjects through the convolutional layer, and then the important features in the classification results are highlighted through the activation layer, and then the pooling layer
  • the data from the activation layer is processed to reduce the size of the parameter matrix, thereby realizing data reduction and reducing the number of parameters to be processed in the next step, which can speed up the calculation speed and prevent overfitting.
  • the category probability information obtained is one-dimensional, that is, it can only identify the category of the entire image , Cannot identify the category of each pixel, especially when processing the edge of the image, the effect is very unsatisfactory. Therefore, in the technical solution of the present application, the fully connected layer is replaced by the deconvolution layer, since the deconvolution is equivalent to the reverse of the ordinary convolution, for example, the input blue 2x2 matrix, the size of the convolution kernel is still 3x3.
  • convolution is down-sampling processing
  • deconvolution is up-sampling processing.
  • the output image of the convolutional neural network is restored in pixel dimensions, thereby facilitating effective classification of the features of the output image, and improving the accuracy of image semantic segmentation.
  • Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present application
  • Figure 2 shows a schematic diagram of image segmentation according to an embodiment of the present application
  • Fig. 3 shows a flowchart of an image segmentation method according to another embodiment of the present application.
  • Fig. 4 shows a block diagram of an image segmentation device according to an embodiment of the present application
  • Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.
  • Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present application.
  • the process of the image segmentation method of an embodiment of the present application includes:
  • Step 102 Obtain an image to be divided.
  • Step 104 Perform convolution, activation and pooling processing on the image to be segmented to obtain five pooled feature sets.
  • the convolutional neural network includes a convolutional layer, an activation layer and a pooling layer. It also includes a deconvolutional layer that replaces the original fully connected layer. After obtaining the image to be segmented, the convolutional layer can be used to segment the image The features in the image, that is, pixels, are classified according to different feature types or subjects. Then, the important features in the classification results are highlighted through the activation layer, and then the data from the activation layer is reduced by the pooling layer. The processing of the size can achieve data reduction and reduce the number of parameters to be processed in the next step, which can speed up the calculation speed and prevent overfitting.
  • Step 106 Perform an upsampling process on a specified pooling feature set among the five pooling feature sets according to an upsampling manner corresponding to the predetermined downsampling multiple of the image to be segmented.
  • the output image of the convolutional neural network is restored in pixel dimensions, thereby facilitating effective classification of the features of the output image, and improving the accuracy of image semantic segmentation.
  • the up-sampling processing includes interpolation processing and deconvolution processing.
  • the interpolation processing refers to the use of appropriate interpolation algorithms to insert new elements between pixels on the basis of the original image pixels
  • the deconvolution processing refers to the compression of basic wavelets to improve the vertical resolution of the data. It can be seen that these two methods can effectively improve the accuracy of the image.
  • Step 108 In the process of the up-sampling process, calculate the total mask score according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be segmented;
  • Step 110 Segment the final result of the up-sampling process based on the total score of the mask using a smooth L2 loss function to obtain a segmented image.
  • intersection ratio is equal to the intersection ratio of the predicted mask and the actual mask and the product of the mask score of the original network classification of the image to be segmented. In this way, if the classification score is high, if the calculation used is If the ratio is low, the branch with the total score of the mask will be penalized. Thus, the total score of the mask can be trained to optimization in the upsampling process, and the optimized upsampling result can be obtained.
  • the smooth L2 loss function is also called the least square error. In general, it is to minimize the sum of squares of the difference between the target value and the estimated value, so that the weight of the feature is not too large, and the weight of the feature is more average, which helps to obtain an optimized segmentation image.
  • the smooth L2 loss function and the softmax function can be combined to perform image segmentation, that is, in the technology of the smooth L2 loss function segmentation result, the softmax function is used for accurate segmentation.
  • the softmax function or normalized exponential function, is the normalization of the gradient logarithm of the discrete probability distribution of finite items.
  • the softmax maps the output of multiple neurons to the (0,1) interval, which can be regarded as the current output The probability belonging to each category, so as to facilitate the selection of the category with the highest probability as the target of prediction.
  • the exponent is used in softmax, which can make the large value larger and the small one smaller, increase the discrimination contrast, and make the learning efficiency of the neural network higher.
  • the deconvolution layer can be used to replace the fully connected layer and add another fully connected layer. Classification can improve the accuracy of image semantic segmentation.
  • Fig. 2 shows a schematic diagram of image segmentation according to an embodiment of the present application.
  • w represents the width and h represents the height.
  • the image to be segmented (image) whose length and width are w and h respectively is convolved and pooled to generate the first pooled feature set (pool1), and the length and The width is reduced to w/2 and h/2
  • the first pooled feature set is convolved and pooled to generate a second pooled feature set (pool2), the length and width are reduced to w/4 and h/4
  • the second The pooling feature set is convolved and pooled to generate the third pooling feature set (pool3), the length and width are reduced to w/8 and h/8
  • the third pooling feature set is convolution and pooling to generate the third pooling feature set (pool3).
  • the length and width are reduced to w/16 and h/16
  • the fourth pooling feature set is convolved and pooled to generate the fifth pooling feature set (pool5)
  • the length and width are reduced It is w/32 and h/32.
  • the resolution of the picture is greatly reduced as the length and width are reduced, resulting in a reduction in image quality.
  • deconvolution that is, up-sampling processing
  • deconvolution is equivalent to the reverse of ordinary convolution.
  • the size of the convolution kernel is still 3x3.
  • the fifth pooling feature set of the five pooling feature sets is subjected to 32 times upsampling processing, and then the The result obtained by the 32-fold upsampling process is divided into softmax, so as to realize the 32-fold restoration of the fifth pooling feature set, and the accuracy of the result obtained by the 32-fold upsampling process is improved.
  • the fifth pooling feature set among the five pooling feature sets is subjected to 2 times up-sampling processing to obtain the first up-sampling feature Collection; the first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are merged to obtain the final result of the upsampling process, and then the final result is performed.
  • the softmax segmentation realizes the restoration of the fourth pooling feature set, and improves the accuracy of the result obtained by the 16 times upsampling process.
  • the 16-fold reduction of the fourth pooled feature set can improve the accuracy of the result to a certain extent.
  • the fifth pooled feature set has been generated, that is, because the fourth pooled feature set has been further filtered
  • the highlight is in the fifth pooling feature set that is down-sampled by 32 times, so it can be effectively used by reducing it twice to the length and width of w/16 and h/16, respectively, which is the same as the fourth pooling feature set It has the same length and width, so it can be fused with the fourth pooling feature set, and the fusion will perform 16 times upsampling.
  • the fusion mentioned here refers to merging the features of the pixels of the fourth pooling feature set and the features of the pixels obtained after 2 times upsampling of the fifth pooling feature set one by one.
  • the accuracy of the up-sampling processing result is further improved, which is beneficial to further sharpening the edge of the image and improving the accuracy of the classification of the image edge.
  • the fifth pooling feature set among the five pooling feature sets is subjected to a 2-fold up-sampling process to obtain the first up-sampling feature
  • the first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are fused to obtain a fusion result; the fusion result is up-sampled twice to obtain the second Up-sampling feature set; fusing the second up-sampling feature set with the third pooling feature set of the five pooling feature sets to obtain the final result of the up-sampling process, and then to The final result is segmented by softmax, thereby realizing the restoration of the third pooling feature set, and improving the accuracy of the result obtained by the upsampling process by 8 times.
  • the fusion described here refers to combining the features of the pixels of the fourth pooling feature set and the features of the pixels obtained after upsampling the fifth pooling feature set by 2 times, thereby completing the fourth pooling
  • the one-time feature correction of pixels in the feature set makes its features more categorical.
  • the fusion result can be up-sampled twice and restored to the length and width of w/8 and h/8 respectively, which have the same length and width as the third pooling feature set, which is convenient for the third pooling feature set
  • 8 times of up-sampling processing is performed, so that the features of the pixels in the third pooling feature set can be corrected through the filtering and highlighting of the fourth pooling feature set and the fifth pooling feature set, so that the final The features of the pixels in the fusion result are more accurate and suitable for classification.
  • the accuracy of the up-sampling processing result can be further improved, which is conducive to further sharpening the image edges and improving the accuracy of image edge classification.
  • Fig. 3 shows a flowchart of an image segmentation method according to another embodiment of the present application.
  • the process of the image segmentation method of another embodiment of the present application includes:
  • Step 302 Obtain an image to be divided.
  • Step 304 Perform convolution, activation and pooling processing on the image to be segmented to obtain five pooled feature sets.
  • Step 306 Perform an up-sampling process on the designated pooling feature set among the five pooling feature sets according to the up-sampling mode corresponding to the predetermined down-sampling multiple of the image to be segmented.
  • Step 308 Determine whether the number of fusions in the up-sampling process is the same as the specified number of fusions of the predetermined down-sampling multiple, if the result of the judgment is yes, go to step 310, if the result of the judgment is no, return to step 306, continue Perform upsampling processing including the fusion process.
  • each predetermined down-sampling multiple corresponds to the number of fusions that need to be achieved. Therefore, the number of fusions in the up-sampling process can be checked to determine whether the up-sampling processing step can be ended and the image segmentation step can be entered, and the number of fusions can be avoided.
  • Reaching the standard means outputting the up-sampling result when the feature reduction level is insufficient. Through this effective monitoring of up-sampling processing, the accuracy of the final result can be further guaranteed.
  • step 310 the final result of the upsampling process is segmented through the smooth L2 loss function and the softmax function to obtain segmented images.
  • intersection ratio is equal to the intersection ratio of the predicted mask and the actual mask and the product of the mask score of the original network classification of the image to be segmented. In this way, if the classification score is high, if the calculation used is If the ratio is low, the branch with the total score of the mask will be penalized. Thus, the total score of the mask can be trained to optimization in the upsampling process, and the optimized upsampling result can be obtained.
  • the smooth L2 loss function is also called the least square error. In general, it is to minimize the sum of squares of the difference between the target value and the estimated value, so that the weight of the feature is not too large, and the weight of the feature is more average, so that the fully connected layer can be replaced by the deconvolution layer, and Add another fully connected layer to obtain an optimized segmented image.
  • the softmax function is used for precise segmentation.
  • the softmax function or normalized exponential function, is the normalization of the gradient logarithm of the discrete probability distribution of finite items.
  • the softmax maps the output of multiple neurons to the (0,1) interval, which can be regarded as the current output The probability belonging to each category, so as to facilitate the selection of the category with the highest probability as the target of prediction.
  • the exponent is used in softmax, which can make the large value larger and the small one smaller, increase the discrimination contrast, and make the learning efficiency of the neural network higher.
  • the method of replacing the fully connected layer with the deconvolution layer and adding another fully connected layer to classify each pixel of the image can improve the accuracy of image semantic segmentation.
  • Fig. 4 shows a block diagram of an image segmentation device according to an embodiment of the present application.
  • the image segmentation device 400 of an embodiment of the present application includes: an image acquisition unit 402 for acquiring an image to be segmented; a down-sampling processing unit 404 for convolving and activating the image to be segmented And pooling processing to obtain five pooled feature sets; the up-sampling processing unit 406 is configured to perform an up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented, The specified pooling feature set is subjected to up-sampling processing; the mask total score calculation unit 408 is configured to perform the up-sampling process according to the intersection ratio between the predicted mask and the actual mask and the original There are mask scores for network classification, and the total mask score is calculated; the image segmentation unit 410 is used to segment the final result of the upsampling process based on the mask total score through a smooth L2 loss function to obtain a segmented image.
  • the up-sampling processing unit 406 includes: a first processing unit, configured to perform processing on all the images when the predetermined down-sampling multiple of the image to be divided is 32 times The fifth pooling feature set among the five pooling feature sets is subjected to 32 times upsampling processing.
  • the up-sampling processing unit 406 includes: a second processing unit, in the case that the predetermined down-sampling multiple of the image to be divided is 16 times, the five The fifth pooled feature set in the pooled feature sets is subjected to 2 times upsampling processing to obtain the first upsampling feature set; the first fusion unit is used to combine the first upsampling feature set with the five pools The fourth pooled feature set in the optimized feature set is fused to obtain the final result of the upsampling process.
  • the up-sampling processing unit 406 includes: a second processing unit, configured to perform processing on all the images when the predetermined down-sampling multiple of the image to be divided is 8 times
  • the fifth pooled feature set among the five pooled feature sets is subjected to 2 times upsampling processing to obtain the first upsampled feature set;
  • the first fusion unit is used to combine the first upsampled feature set with the five
  • the fourth pooled feature set in the pooled feature sets is fused to obtain the fusion result;
  • the third processing unit is used to perform 2 times upsampling processing on the fusion result to obtain the second upsampling feature set;
  • the second fusion A unit for fusing the second up-sampling feature set with the third pooling feature set of the five pooling feature sets to obtain the final result of the up-sampling process.
  • the up-sampling processing includes interpolation processing and deconvolution processing.
  • the image segmentation device 400 uses the solution described in any one of the embodiments shown in FIG. 1 to FIG. 3, and therefore, has all the above technical effects, which will not be repeated here.
  • Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.
  • an electronic device 500 of an embodiment of the present application includes at least one memory 502; and a processor 504 communicatively connected to the at least one memory 502; wherein the memory stores the At least one instruction executed by the processor 504, where the instruction is configured to execute the solution described in any one of the foregoing embodiments in FIG. 1 to FIG. 3. Therefore, the electronic device 500 has the same technical effect as any one of the embodiments in FIGS. 1 to 3, and details are not described herein again.
  • the electronic devices in the embodiments of this application exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • Server A device that provides computing services.
  • the structure of a server includes a processor, hard disk, memory, system bus, etc.
  • the server is similar to a general-purpose computer architecture, but because it needs to provide highly reliable services, it is in terms of processing capacity and stability. , Reliability, security, scalability, and manageability.
  • an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method flow described in any one of the above-mentioned embodiments in FIGS. 1 to 3.
  • the output image of the convolutional neural network can be restored in pixel dimensions, thereby facilitating effective classification of the features of the output image and improving image semantic segmentation. Accuracy.
  • first, second, etc. may be used to describe pooling feature sets in the embodiments of the present application, these pooling feature sets should not be limited to these terms. These terms are only used to distinguish pooled feature sets from each other.
  • the first pooling feature set can also be referred to as the second pooling feature set, and similarly, the second pooling feature set can also be referred to as the first pooling feature set.
  • Feature collection can also be used to describe pooling feature sets in the embodiments of the present application.
  • the word “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute the method described in each embodiment of the present application Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Abstract

Disclosed are an image segmentation method and apparatus, an electronic device, and a computer-readable storage medium, wherein same relate to the technical field of artificial intelligence. The method comprises: acquiring an image to be segmented (102); performing convolution, activation, and pooling processing on the image to be segmented to obtain five pooling feature sets (104); performing up-sampling processing on a specified pooling feature set among the five pooling feature sets according to an up-sampling mode corresponding to a pre-determined down-sampling multiple of the image to be segmented (106); during the process of up-sampling processing, calculating a total mask score according to the intersection over union of a predicted mask and an actual mask and a mask score of the original network classification of the image to be segmented (108); and segmenting, by means of a smooth L2 loss function and on the basis of the total mask score, a final result of the up-sampling processing to obtain a segmented image (110). By means of the method, an output image of a convolutional neural network is restored in terms of pixel dimensions, thereby improving the accuracy of semantic image segmentation.

Description

图像分割方法、电子设备和计算机可读存储介质Image segmentation method, electronic equipment and computer readable storage medium
本申请要求于2019年07月05日提交中国专利局、申请号为201910602691.5、发明名称为“图像分割方法及装置、电子设备和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910602691.5, and the invention title is "Image Segmentation Method and Apparatus, Electronic Equipment, and Computer-readable Storage Medium" on July 5, 2019, and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种图像分割方法及装置、电子设备和计算机可读存储介质。This application relates to the field of artificial intelligence technology, in particular to an image segmentation method and device, electronic equipment, and computer-readable storage media.
背景技术Background technique
对于用于分类的卷积神经网络(Convolutional Neural Networks,CNN),往往会在网络的最后加入一些全连接层,全连接层输出的内容经过softmax函数处理后就可以获得类别概率信息。For Convolutional Neural Networks (CNN) used for classification, some fully connected layers are often added at the end of the network. The output content of the fully connected layer is processed by the softmax function to obtain category probability information.
但是,这个获得类别概率信息是一维的,即只能标识整个图片的类别,不能标识每个像素点的类别,尤其在对图像边缘进行处理的时候,效果很不理想。However, the obtained category probability information is one-dimensional, that is, it can only identify the category of the entire picture, and cannot identify the category of each pixel, especially when processing the edge of the image, the effect is very unsatisfactory.
因此,如何进一步提升图像语义分割的准确性,成为目前亟待解决的技术问题。Therefore, how to further improve the accuracy of image semantic segmentation has become a technical problem to be solved urgently.
申请内容Application content
本申请实施例提供了一种图像分割方法及装置、电子设备和计算机可读存储介质,旨在解决相关技术中图像语义分割的准确性不足的技术问题,能够通过反卷积层取代全连接层以及新增另一个全连接层的方式对图像的每个像素点进行分类,以进一步提升图像语义分割的准确性。The embodiments of the present application provide an image segmentation method and device, electronic equipment, and computer-readable storage medium, which aim to solve the technical problem of insufficient accuracy of image semantic segmentation in related technologies, and can replace the fully connected layer by the deconvolution layer And add another fully connected layer to classify each pixel of the image to further improve the accuracy of image semantic segmentation.
第一方面,本申请实施例提供了一种图像分割方法,包括:获取待分割图像;对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合 进行上采样处理;在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。In the first aspect, an embodiment of the present application provides an image segmentation method, including: acquiring an image to be segmented; performing convolution, activation, and pooling processing on the image to be segmented to obtain five pooling feature sets; The up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented is performed on the designated pooling feature set among the five pooling feature sets; in the process of the up-sampling processing, according to the prediction mask The total score of the mask is calculated by the cross-combination ratio with the actual mask and the mask score of the original network classification of the image to be segmented; the final up-sampling process is calculated based on the smooth L2 loss function based on the mask total score The result is segmented, and segmented images are obtained.
第二方面,本申请实施例提供了一种图像分割装置,包括:图像获取单元,用于获取待分割图像;下采样处理单元,用于对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;上采样处理单元,用于根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;掩模总得分计算单元,用于在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;图像分割单元,用于通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。In a second aspect, an embodiment of the present application provides an image segmentation device, including: an image acquisition unit for acquiring an image to be segmented; a down-sampling processing unit for convolving, activating, and pooling the image to be segmented Processing to obtain five pooled feature sets; an up-sampling processing unit, configured to perform an up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented, to determine the specified pooling feature in the five pooled feature sets The set is up-sampling processing; the mask total score calculation unit is used in the process of the up-sampling processing, according to the intersection ratio of the predicted mask and the actual mask and the mask of the original network classification of the image to be divided The modulus score is used to calculate the total score of the mask; the image segmentation unit is used to segment the final result of the upsampling process based on the total score of the mask through the smooth L2 loss function to obtain a segmented image.
第三方面,本申请实施例提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被设置为用于执行上述第一方面中任一项所述的方法。In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and, a memory communicatively connected with the at least one processor; wherein the memory stores the memory that can be processed by the at least one processor; The instruction executed by the device, the instruction is configured to execute the method of any one of the above-mentioned first aspects.
第四方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述第一方面中任一项所述的方法流程。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method procedures described in any one of the first aspects.
通过以上技术方案,针对相关技术中图像语义分割的准确性不足的技术问题,能够通过反卷积层取代全连接层的方式对图像的每个像素点进行分类。Through the above technical solutions, in view of the technical problem of insufficient accuracy of image semantic segmentation in related technologies, each pixel of the image can be classified by the way of replacing the fully connected layer with the deconvolution layer.
具体来说,在本技术方案中,卷积神经网络包括卷积层、激活层和池化层,还包括取代了原有的全连接层的反卷积层,其中,在获得了待分割图像后,可通过卷积层将待分割图像中的特征,也就是像素点,按照不同的特征类型或者说主体进行分类,接着,通过激活层突出分类结果中的重要特征,再通过池化层将来自激活层的 数据进行缩小参数矩阵的尺寸的处理,从而实现数据的缩减,减少下一步待处理的参数的数量,既可以加快计算速度,也可以防止过拟合。Specifically, in this technical solution, the convolutional neural network includes a convolutional layer, an activation layer, and a pooling layer, and also includes a deconvolutional layer that replaces the original fully connected layer, where the image to be segmented is obtained After that, the features in the image to be segmented, that is, pixels, can be classified according to different feature types or subjects through the convolutional layer, and then the important features in the classification results are highlighted through the activation layer, and then the pooling layer The data from the activation layer is processed to reduce the size of the parameter matrix, thereby realizing data reduction and reducing the number of parameters to be processed in the next step, which can speed up the calculation speed and prevent overfitting.
在相关技术的卷积神经网络中,通过每步卷积后,输出的图像尺寸会逐渐降低,最终到达全连接层时,获得的类别概率信息是一维的,即只能标识整个图片的类别,不能标识每个像素点的类别,尤其在对图像边缘进行处理的时候,效果很不理想。因此,在本申请的技术方案中,通过反卷积层取代全连接层,由于反卷积相当于把普通卷积反过来,比如,输入蓝色2x2矩阵,卷积核大小还是3x3。当设置反卷积参数pad=0,stride=1时输出绿色4x4矩阵,这相当于完全将卷积倒过来,其中,卷积即为下采样处理,而反卷积即为上采样处理。In the convolutional neural network of the related technology, after each step of convolution, the output image size will gradually decrease. When finally reaching the fully connected layer, the category probability information obtained is one-dimensional, that is, it can only identify the category of the entire image , Cannot identify the category of each pixel, especially when processing the edge of the image, the effect is very unsatisfactory. Therefore, in the technical solution of the present application, the fully connected layer is replaced by the deconvolution layer, since the deconvolution is equivalent to the reverse of the ordinary convolution, for example, the input blue 2x2 matrix, the size of the convolution kernel is still 3x3. When the deconvolution parameter pad=0 and stride=1, the green 4x4 matrix is output, which is equivalent to completely inverting the convolution. Among them, convolution is down-sampling processing, and deconvolution is up-sampling processing.
因此,在每步反卷积也就是上采样处理后,输出的图像的维度会逐步还原回去,那么对于每一个像素点来说,其特征经过每一次反卷积,都会更加准确。故通过本申请的技术方案,使得卷积神经网络的输出图像得到像素维度的还原,从而便于对输出图像的特征进行有效分类,提升了图像语义分割的准确性。Therefore, after each step of deconvolution, that is, the up-sampling process, the dimensions of the output image will be gradually restored, so for each pixel, its feature will be more accurate after each deconvolution. Therefore, through the technical solution of the present application, the output image of the convolutional neural network is restored in pixel dimensions, thereby facilitating effective classification of the features of the output image, and improving the accuracy of image semantic segmentation.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1示出了本申请的一个实施例的图像分割方法的流程图;Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present application;
图2示出了本申请的一个实施例的进行图像分割的示意图;Figure 2 shows a schematic diagram of image segmentation according to an embodiment of the present application;
图3示出了本申请的另一个实施例的图像分割方法的流程图;Fig. 3 shows a flowchart of an image segmentation method according to another embodiment of the present application;
图4示出了本申请的一个实施例的图像分割装置的框图;Fig. 4 shows a block diagram of an image segmentation device according to an embodiment of the present application;
图5示出了本申请的一个实施例的电子设备的框图。Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
为了更好的理解本申请的技术方案,下面结合附图对本申请实 施例进行详细描述。In order to better understand the technical solutions of the present application, the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms of "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings.
图1示出了本申请的一个实施例的图像分割方法的流程图。Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present application.
如图1所示,本申请的一个实施例的图像分割方法的流程包括:As shown in Fig. 1, the process of the image segmentation method of an embodiment of the present application includes:
步骤102,获取待分割图像。Step 102: Obtain an image to be divided.
步骤104,对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合。Step 104: Perform convolution, activation and pooling processing on the image to be segmented to obtain five pooled feature sets.
卷积神经网络包括卷积层、激活层和池化层,还包括取代了原有的全连接层的反卷积层,其中,在获得了待分割图像后,可通过卷积层将待分割图像中的特征,也就是像素点,按照不同的特征类型或者说主体进行分类,接着,通过激活层突出分类结果中的重要特征,再通过池化层将来自激活层的数据进行缩小参数矩阵的尺寸的处理,从而实现数据的缩减,减少下一步待处理的参数的数量,既可以加快计算速度,也可以防止过拟合。The convolutional neural network includes a convolutional layer, an activation layer and a pooling layer. It also includes a deconvolutional layer that replaces the original fully connected layer. After obtaining the image to be segmented, the convolutional layer can be used to segment the image The features in the image, that is, pixels, are classified according to different feature types or subjects. Then, the important features in the classification results are highlighted through the activation layer, and then the data from the activation layer is reduced by the pooling layer. The processing of the size can achieve data reduction and reduce the number of parameters to be processed in the next step, which can speed up the calculation speed and prevent overfitting.
步骤106,根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理。Step 106: Perform an upsampling process on a specified pooling feature set among the five pooling feature sets according to an upsampling manner corresponding to the predetermined downsampling multiple of the image to be segmented.
在相关技术的卷积神经网络中,通过每步卷积后,输出的图像尺寸会逐渐降低,最终到达全连接层时,获得的类别概率信息是一维的,即只能标识整个图片的类别,不能标识每个像素点的类别,尤其在对图像边缘进行处理的时候,效果很不理想。因此,在本申请的技术方案中,通过反卷积层取代全连接层,由于反卷积相当于 把普通卷积反过来,比如,输入蓝色2x2矩阵,卷积核大小还是3x3。当设置反卷积参数pad=0,stride=1时输出绿色4x4矩阵,这相当于完全将卷积倒过来,其中,卷积即为下采样处理,而反卷积即为上采样处理。In the convolutional neural network of the related technology, after each step of convolution, the output image size will gradually decrease. When finally reaching the fully connected layer, the category probability information obtained is one-dimensional, that is, it can only identify the category of the entire image , Cannot identify the category of each pixel, especially when processing the edge of the image, the effect is very unsatisfactory. Therefore, in the technical solution of the present application, the fully connected layer is replaced by a deconvolution layer, since deconvolution is equivalent to the reverse of ordinary convolution. For example, if a blue 2x2 matrix is input, the size of the convolution kernel is still 3x3. When the deconvolution parameter pad=0 and stride=1, the green 4x4 matrix is output, which is equivalent to completely inverting the convolution. Among them, convolution is down-sampling processing, and deconvolution is up-sampling processing.
因此,在每步反卷积也就是上采样处理后,输出的图像的维度会逐步还原回去,那么对于每一个像素点来说,其特征经过每一次反卷积,都会更加准确。故通过本申请的技术方案,使得卷积神经网络的输出图像得到像素维度的还原,从而便于对输出图像的特征进行有效分类,提升了图像语义分割的准确性。Therefore, after each step of deconvolution, that is, the up-sampling process, the dimensions of the output image will be gradually restored, so for each pixel, its feature will be more accurate after each deconvolution. Therefore, through the technical solution of the present application, the output image of the convolutional neural network is restored in pixel dimensions, thereby facilitating effective classification of the features of the output image, and improving the accuracy of image semantic segmentation.
其中,所述上采样处理包括内插值处理和反褶积处理。其中,内插值处理指的是在原有图像像素的基础上在像素点之间采用合适的插值算法插入新的元素,反褶积处理指的是通过压缩基本子波来提高数据垂向分辨率。由此可知,这两种方式均能够有效提升图像的精确度。Wherein, the up-sampling processing includes interpolation processing and deconvolution processing. Among them, the interpolation processing refers to the use of appropriate interpolation algorithms to insert new elements between pixels on the basis of the original image pixels, and the deconvolution processing refers to the compression of basic wavelets to improve the vertical resolution of the data. It can be seen that these two methods can effectively improve the accuracy of the image.
步骤108,在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;Step 108: In the process of the up-sampling process, calculate the total mask score according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be segmented;
步骤110,通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。Step 110: Segment the final result of the up-sampling process based on the total score of the mask using a smooth L2 loss function to obtain a segmented image.
在每步反卷积也就是上采样处理后,增加全连接层,去预测mask iou,然后使用smooth L2损失函数去回归mask iou。smooth L2损失函数的权重是设置为1时,图像分割效果最优。具体来说,在所述上采样处理的过程中,需要根据预测掩模(prediction mask)与实际掩模(ground truth mask)的交并比(iou)以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分(mask score),其中,交并比指的是两个边界框交集和并集之比,两个边界框的并集是a区域,而交集就是b区域,那么交并比等于预测掩模与实际掩模的交并比以及待分割图像的原有网络分类的掩模得分的乘积,这样一来,对分类得分高的情况,若计算所用的交并比低,就会惩罚掩模 总得分的分支。由此,即可在上采样过程中将掩模总得分训练至优化,得到优化上采样结果。After each step of deconvolution, that is, up-sampling, a fully connected layer is added to predict the mask iou, and then the smooth L2 loss function is used to regress the mask iou. When the weight of the smooth L2 loss function is set to 1, the image segmentation effect is optimal. Specifically, in the process of upsampling processing, it is necessary to base on the intersection ratio (iou) of the prediction mask (prediction mask) and the actual mask (ground truth mask) and the original network classification of the image to be segmented Calculate the mask score for the mask score, where the intersection ratio refers to the ratio of the intersection and union of two bounding boxes. The union of the two bounding boxes is the area a, and the intersection is the area b , Then the intersection ratio is equal to the intersection ratio of the predicted mask and the actual mask and the product of the mask score of the original network classification of the image to be segmented. In this way, if the classification score is high, if the calculation used is If the ratio is low, the branch with the total score of the mask will be penalized. Thus, the total score of the mask can be trained to optimization in the upsampling process, and the optimized upsampling result can be obtained.
最终,用于通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。smooth L2损失函数也被称为最小平方误差。总的来说,它是把目标值与估计值的差值的平方和最小化,会让特征的权重不过大,使得特征的权重比较平均,从而有助于得到效果优化的分割图像。Finally, it is used to segment the final result of the up-sampling process based on the total score of the mask through a smooth L2 loss function to obtain a segmented image. The smooth L2 loss function is also called the least square error. In general, it is to minimize the sum of squares of the difference between the target value and the estimated value, so that the weight of the feature is not too large, and the weight of the feature is more average, which helps to obtain an optimized segmentation image.
另外,在本申请的一种实现方式中,可选地,还可以结合smooth L2损失函数与softmax函数共同进行图像分割,即在smooth L2损失函数的分割结果的技术上,使用softmax函数再进行精确分割。softmax函数或称归一化指数函数,为有限项离散概率分布的梯度对数归一化,softmax将多个神经元的输出,映射到(0,1)区间内,可以看成是当前输出是属于各个分类的概率,从而便于选取概率最大的分类作为预测的目标。相对于其他可完成最大值选取的函数,softmax中使用了指数,这样可以让大的值更大,让小的更小,增加了区分对比度,使得神经网络的学习效率更高。In addition, in an implementation of the present application, optionally, the smooth L2 loss function and the softmax function can be combined to perform image segmentation, that is, in the technology of the smooth L2 loss function segmentation result, the softmax function is used for accurate segmentation. The softmax function, or normalized exponential function, is the normalization of the gradient logarithm of the discrete probability distribution of finite items. The softmax maps the output of multiple neurons to the (0,1) interval, which can be regarded as the current output The probability belonging to each category, so as to facilitate the selection of the category with the highest probability as the target of prediction. Compared with other functions that can complete the maximum selection, the exponent is used in softmax, which can make the large value larger and the small one smaller, increase the discrimination contrast, and make the learning efficiency of the neural network higher.
通过以上技术方案,针对相关技术中图像语义分割的准确性不足的技术问题,能够通过反卷积层取代全连接层、以及额外增加另一全连接层的方式,对图像的每个像素点进行分类,可提升图像语义分割的准确性。Through the above technical solutions, in view of the technical problem of insufficient accuracy of image semantic segmentation in related technologies, the deconvolution layer can be used to replace the fully connected layer and add another fully connected layer. Classification can improve the accuracy of image semantic segmentation.
图2示出了本申请的一个实施例的进行图像分割的示意图。Fig. 2 shows a schematic diagram of image segmentation according to an embodiment of the present application.
如图2所示,w代表宽度,h代表高度,则长和宽分别为w和h的待分割图像(image)经卷积和池化,生成第一池化特征集合(pool1),长和宽缩减为w/2和h/2,第一池化特征集合经卷积和池化,生成第二池化特征集合(pool2),长和宽缩减为w/4和h/4,第二池化特征集合经卷积和池化,生成第三池化特征集合(pool3),长和宽缩减为w/8和h/8,第三池化特征集合经卷积和池化,生成第四池化特征集合(pool4),长和宽缩减为w/16和h/16,第四池化特征集合经卷积和池化,生成第五池化特征集合(pool5),长和宽缩 减为w/32和h/32。此时,画面的分辨率随着长和宽的缩减,也大大缩减,造成图像质量降低。As shown in Figure 2, w represents the width and h represents the height. The image to be segmented (image) whose length and width are w and h respectively is convolved and pooled to generate the first pooled feature set (pool1), and the length and The width is reduced to w/2 and h/2, the first pooled feature set is convolved and pooled to generate a second pooled feature set (pool2), the length and width are reduced to w/4 and h/4, the second The pooling feature set is convolved and pooled to generate the third pooling feature set (pool3), the length and width are reduced to w/8 and h/8, and the third pooling feature set is convolution and pooling to generate the third pooling feature set (pool3). Four-pooling feature set (pool4), the length and width are reduced to w/16 and h/16, the fourth pooling feature set is convolved and pooled to generate the fifth pooling feature set (pool5), the length and width are reduced It is w/32 and h/32. At this time, the resolution of the picture is greatly reduced as the length and width are reduced, resulting in a reduction in image quality.
因此,可采用反卷积也就是上采样处理,由于反卷积相当于把普通卷积反过来,比如,输入蓝色2x2矩阵,卷积核大小还是3x3。当设置反卷积参数pad=0,stride=1时输出绿色4x4矩阵,这相当于完全将卷积倒过来。由此可知,上采样处理可以将原本的分辨率进行增大,而应用于卷积和池化后的池化特征集合,则能够实现池化特征集合的分辨率还原。Therefore, deconvolution, that is, up-sampling processing, can be used, because deconvolution is equivalent to the reverse of ordinary convolution. For example, if a blue 2x2 matrix is input, the size of the convolution kernel is still 3x3. When the deconvolution parameter pad=0 and stride=1 are set, the green 4x4 matrix is output, which is equivalent to completely inverting the convolution. It can be seen that the up-sampling process can increase the original resolution, and when applied to the pooled feature set after convolution and pooling, the resolution of the pooled feature set can be restored.
具体来说,在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理,再对32倍上采样处理得到的结果进行softmax分割,从而实现对第五池化特征集合的32倍还原,提升了32倍上采样处理得到的结果的精确度。Specifically, in the case that the predetermined downsampling multiple of the image to be segmented is 32 times, the fifth pooling feature set of the five pooling feature sets is subjected to 32 times upsampling processing, and then the The result obtained by the 32-fold upsampling process is divided into softmax, so as to realize the 32-fold restoration of the fifth pooling feature set, and the accuracy of the result obtained by the 32-fold upsampling process is improved.
在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果,再对所述最终结果进行softmax分割,从而实现对第四池化特征集合的还原,提升了16倍上采样处理得到的结果的精确度。In the case that the predetermined down-sampling multiple of the image to be segmented is 16 times, the fifth pooling feature set among the five pooling feature sets is subjected to 2 times up-sampling processing to obtain the first up-sampling feature Collection; the first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are merged to obtain the final result of the upsampling process, and then the final result is performed The softmax segmentation realizes the restoration of the fourth pooling feature set, and improves the accuracy of the result obtained by the 16 times upsampling process.
单纯对第四池化特征集合的16倍还原可在一定程度上提升结果的精确度,但是,由于已生成第五池化特征集合,也就是说,由于已将第四池化特征集合进一步筛选和突出在32倍下采样的第五池化特征集合中,故可以对其有效利用,将其2倍还原至长和宽分别为w/16和h/16,即与第四池化特征集合具有同样的长和宽,从而可与第四池化特征集合融合,融合后进行16倍上采样处理。这里所述的融合,指的是将第四池化特征集合的像素点的特征与第五池化特征集合2倍上采样后得到的像素点的特征进行逐个合并。The 16-fold reduction of the fourth pooled feature set can improve the accuracy of the result to a certain extent. However, since the fifth pooled feature set has been generated, that is, because the fourth pooled feature set has been further filtered And the highlight is in the fifth pooling feature set that is down-sampled by 32 times, so it can be effectively used by reducing it twice to the length and width of w/16 and h/16, respectively, which is the same as the fourth pooling feature set It has the same length and width, so it can be fused with the fourth pooling feature set, and the fusion will perform 16 times upsampling. The fusion mentioned here refers to merging the features of the pixels of the fourth pooling feature set and the features of the pixels obtained after 2 times upsampling of the fifth pooling feature set one by one.
因此,相对于单纯对第四池化特征集合的16倍还原进一步提升 了上采样处理结果的精确性,有利于对图像边缘的进一步锐化,提升了对图像边缘的分类的准确性。Therefore, compared to the 16-fold restoration of the fourth pooling feature set, the accuracy of the up-sampling processing result is further improved, which is beneficial to further sharpening the edge of the image and improving the accuracy of the classification of the image edge.
在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到融合结果;对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果,再对所述最终结果进行softmax分割,从而实现对第三池化特征集合的还原,提升了8倍上采样处理得到的结果的精确度。In the case that the predetermined down-sampling multiple of the image to be segmented is 8 times, the fifth pooling feature set among the five pooling feature sets is subjected to a 2-fold up-sampling process to obtain the first up-sampling feature The first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are fused to obtain a fusion result; the fusion result is up-sampled twice to obtain the second Up-sampling feature set; fusing the second up-sampling feature set with the third pooling feature set of the five pooling feature sets to obtain the final result of the up-sampling process, and then to The final result is segmented by softmax, thereby realizing the restoration of the third pooling feature set, and improving the accuracy of the result obtained by the upsampling process by 8 times.
单纯对第三池化特征集合的8倍上采样可在一定程度上提升结果的精确度,但是,由于已生成第四池化特征集合和第五池化特征集合,也就是说,由于已将第三池化特征集合进一步筛选和突出在16倍下采样的第四池化特征集合中,并已将第四池化特征集合进一步筛选和突出在32倍下采样的第五池化特征集合中,故可以对这些下采样结果进行有效利用,将第五池化特征集合2倍还原至长和宽分别为w/16和h/16,即与第四池化特征集合具有同样的长和宽,从而可直接与第四池化特征集合融合,融合后进行16倍上采样处理。这里所述的融合,指的是将第四池化特征集合的像素点的特征与第五池化特征集合2倍上采样后得到的像素点的特征进行逐个合并,从而完成了对第四池化特征集合的像素点的一次特征修正,使其特征更具分类性。接着,可将融合后的结果2倍上采样后还原至长和宽分别为w/8和h/8,与第三池化特征集合具有相同的长和宽,便于与第三池化特征集合融合,融合后进行8倍上采样处理,从而可通过第四池化特征集合和第五池化特征集合筛选和突出后的特征对第三池化特征集合内像素点的特征进行修正,使得最终的融合结果内的像素点的特征更加准确,适于分类。Simply upsampling the third pooled feature set by 8 times can improve the accuracy of the result to a certain extent. However, since the fourth pooled feature set and the fifth pooled feature set have been generated, that is, because the The third pooled feature set is further filtered and highlighted in the 16-fold down-sampled fourth pooled feature set, and the fourth pooled feature set has been further filtered and highlighted in the 32-fold down-sampled fifth pooled feature set Therefore, these down-sampling results can be effectively used to restore the fifth pooling feature set twice to the length and width of w/16 and h/16, that is, the same length and width as the fourth pooling feature set , Which can be directly fused with the fourth pooling feature set, and then perform 16 times upsampling after fusion. The fusion described here refers to combining the features of the pixels of the fourth pooling feature set and the features of the pixels obtained after upsampling the fifth pooling feature set by 2 times, thereby completing the fourth pooling The one-time feature correction of pixels in the feature set makes its features more categorical. Then, the fusion result can be up-sampled twice and restored to the length and width of w/8 and h/8 respectively, which have the same length and width as the third pooling feature set, which is convenient for the third pooling feature set After fusion, 8 times of up-sampling processing is performed, so that the features of the pixels in the third pooling feature set can be corrected through the filtering and highlighting of the fourth pooling feature set and the fifth pooling feature set, so that the final The features of the pixels in the fusion result are more accurate and suitable for classification.
因此,相对于单纯对第三池化特征集合的8倍还原,可进一步 提升了上采样处理结果的精确性,有利于对图像边缘的进一步锐化,提升了对图像边缘的分类的准确性。Therefore, compared to the 8-fold restoration of the third pooling feature set alone, the accuracy of the up-sampling processing result can be further improved, which is conducive to further sharpening the image edges and improving the accuracy of image edge classification.
图3示出了本申请的另一个实施例的图像分割方法的流程图。Fig. 3 shows a flowchart of an image segmentation method according to another embodiment of the present application.
如图3所示,本申请的另一个实施例的图像分割方法的流程包括:As shown in FIG. 3, the process of the image segmentation method of another embodiment of the present application includes:
步骤302,获取待分割图像。Step 302: Obtain an image to be divided.
步骤304,对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合。Step 304: Perform convolution, activation and pooling processing on the image to be segmented to obtain five pooled feature sets.
步骤306,根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理。Step 306: Perform an up-sampling process on the designated pooling feature set among the five pooling feature sets according to the up-sampling mode corresponding to the predetermined down-sampling multiple of the image to be segmented.
步骤308,判断所述上采样处理内的融合次数是否与所述预定下采样倍数的指定融合次数相同,在判断结果为是时,进入步骤310,在判断结果为否时,返回步骤306,继续进行包含融合过程的上采样处理。Step 308: Determine whether the number of fusions in the up-sampling process is the same as the specified number of fusions of the predetermined down-sampling multiple, if the result of the judgment is yes, go to step 310, if the result of the judgment is no, return to step 306, continue Perform upsampling processing including the fusion process.
结合图2示出的实施例可知,在所述待分割图像的所述预定下采样倍数为32倍的情况下,由于第五池化特征集合后没有后续更精确的特征集合,故仅上采样处理一次即可,对应的指定融合次数为0次。在所述待分割图像的所述预定下采样倍数为16倍的情况下,由于第四池化特征集合后具有特征更精确的第五池化特征集合,故需要与第五池化特征集合的2倍上采样结果进行1次融合。同理,在所述待分割图像的所述预定下采样倍数为16倍的情况下,由于第三池化特征集合后具有特征更精确的第四池化特征集合和第五池化特征集合,需要进行2次融合。With reference to the embodiment shown in FIG. 2, it can be seen that when the predetermined down-sampling multiple of the image to be segmented is 32 times, since there is no subsequent more accurate feature set after the fifth pooling feature set, only up-sampling is It can be processed once, and the corresponding designated fusion number is 0. In the case where the predetermined down-sampling multiple of the image to be segmented is 16 times, since the fourth pooling feature set has the fifth pooling feature set with more accurate features, it needs to be compared with the fifth pooling feature set. The result of 2 times upsampling is fused once. Similarly, when the predetermined down-sampling multiple of the image to be segmented is 16 times, since the third pooling feature set has the fourth pooling feature set and the fifth pooling feature set with more accurate features, Need to perform 2 fusions.
因此,每种预定下采样倍数对应有需达成的融合次数,故可通过对上采样处理过程中的融合次数的检验,确定是否可结束上采样处理步骤进入图像分割步骤,并避免在融合次数未达标也就是特征还原水平不足时输出上采样结果,通过此种对上采样处理的有效监控,可进一步保证最终结果的准确性。Therefore, each predetermined down-sampling multiple corresponds to the number of fusions that need to be achieved. Therefore, the number of fusions in the up-sampling process can be checked to determine whether the up-sampling processing step can be ended and the image segmentation step can be entered, and the number of fusions can be avoided. Reaching the standard means outputting the up-sampling result when the feature reduction level is insufficient. Through this effective monitoring of up-sampling processing, the accuracy of the final result can be further guaranteed.
步骤310,通过smooth L2损失函数和softmax函数对所述上采样处理的最终结果进行分割,得到分割图像。In step 310, the final result of the upsampling process is segmented through the smooth L2 loss function and the softmax function to obtain segmented images.
在每步反卷积也就是上采样处理后,增加全连接层,去预测mask iou,然后使用smooth L2损失函数去回归mask iou。smooth L2损失函数的权重是设置为1时,图像分割效果最优。具体来说,在所述上采样处理的过程中,需要根据预测掩模(prediction mask)与实际掩模(ground truth mask)的交并比(iou)以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分(mask score),其中,交并比指的是两个边界框交集和并集之比,两个边界框的并集是a区域,而交集就是b区域,那么交并比等于预测掩模与实际掩模的交并比以及待分割图像的原有网络分类的掩模得分的乘积,这样一来,对分类得分高的情况,若计算所用的交并比低,就会惩罚掩模总得分的分支。由此,即可在上采样过程中将掩模总得分训练至优化,得到优化上采样结果。After each step of deconvolution, that is, up-sampling, a fully connected layer is added to predict the mask iou, and then the smooth L2 loss function is used to regress the mask iou. When the weight of the smooth L2 loss function is set to 1, the image segmentation effect is optimal. Specifically, in the process of upsampling processing, it is necessary to base on the intersection ratio (iou) of the prediction mask (prediction mask) and the actual mask (ground truth mask) and the original network classification of the image to be segmented Calculate the mask score for the mask score, where the intersection ratio refers to the ratio of the intersection and union of two bounding boxes. The union of the two bounding boxes is the area a, and the intersection is the area b , Then the intersection ratio is equal to the intersection ratio of the predicted mask and the actual mask and the product of the mask score of the original network classification of the image to be segmented. In this way, if the classification score is high, if the calculation used is If the ratio is low, the branch with the total score of the mask will be penalized. Thus, the total score of the mask can be trained to optimization in the upsampling process, and the optimized upsampling result can be obtained.
最终,用于通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。smooth L2损失函数也被称为最小平方误差。总的来说,它是把目标值与估计值的差值的平方和最小化,会让特征的权重不过大,使得特征的权重比较平均,从而能够通过反卷积层取代全连接层、以及额外增加另一全连接层的方式,得到效果优化的分割图像。Finally, it is used to segment the final result of the up-sampling process based on the total score of the mask through a smooth L2 loss function to obtain a segmented image. The smooth L2 loss function is also called the least square error. In general, it is to minimize the sum of squares of the difference between the target value and the estimated value, so that the weight of the feature is not too large, and the weight of the feature is more average, so that the fully connected layer can be replaced by the deconvolution layer, and Add another fully connected layer to obtain an optimized segmented image.
在smooth L2损失函数的分割结果的技术上,使用softmax函数再进行精确分割。softmax函数或称归一化指数函数,为有限项离散概率分布的梯度对数归一化,softmax将多个神经元的输出,映射到(0,1)区间内,可以看成是当前输出是属于各个分类的概率,从而便于选取概率最大的分类作为预测的目标。相对于其他可完成最大值选取的函数,softmax中使用了指数,这样可以让大的值更大,让小的更小,增加了区分对比度,使得神经网络的学习效率更高。In the technique of the smooth L2 loss function segmentation result, the softmax function is used for precise segmentation. The softmax function, or normalized exponential function, is the normalization of the gradient logarithm of the discrete probability distribution of finite items. The softmax maps the output of multiple neurons to the (0,1) interval, which can be regarded as the current output The probability belonging to each category, so as to facilitate the selection of the category with the highest probability as the target of prediction. Compared with other functions that can complete the maximum selection, the exponent is used in softmax, which can make the large value larger and the small one smaller, increase the discrimination contrast, and make the learning efficiency of the neural network higher.
综上,通过反卷积层取代全连接层的方式、以及额外增加另一全连接层的方式,对图像的每个像素点进行分类,可提升图像语义 分割的准确性。In summary, the method of replacing the fully connected layer with the deconvolution layer and adding another fully connected layer to classify each pixel of the image can improve the accuracy of image semantic segmentation.
图4示出了本申请的一个实施例的图像分割装置的框图。Fig. 4 shows a block diagram of an image segmentation device according to an embodiment of the present application.
如图4所示,本申请的一个实施例的图像分割装置400包括:图像获取单元402,用于获取待分割图像;下采样处理单元404,用于对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;上采样处理单元406,用于根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;掩模总得分计算单元408,用于在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;图像分割单元410,用于通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。As shown in FIG. 4, the image segmentation device 400 of an embodiment of the present application includes: an image acquisition unit 402 for acquiring an image to be segmented; a down-sampling processing unit 404 for convolving and activating the image to be segmented And pooling processing to obtain five pooled feature sets; the up-sampling processing unit 406 is configured to perform an up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented, The specified pooling feature set is subjected to up-sampling processing; the mask total score calculation unit 408 is configured to perform the up-sampling process according to the intersection ratio between the predicted mask and the actual mask and the original There are mask scores for network classification, and the total mask score is calculated; the image segmentation unit 410 is used to segment the final result of the upsampling process based on the mask total score through a smooth L2 loss function to obtain a segmented image.
在本申请上述实施例中,可选地,所述上采样处理单元406包括:第一处理单元,用于在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理。In the above-mentioned embodiment of the present application, optionally, the up-sampling processing unit 406 includes: a first processing unit, configured to perform processing on all the images when the predetermined down-sampling multiple of the image to be divided is 32 times The fifth pooling feature set among the five pooling feature sets is subjected to 32 times upsampling processing.
在本申请上述实施例中,可选地,所述上采样处理单元406包括:第二处理单元,在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;第一融合单元,用于将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果。In the above-mentioned embodiment of the present application, optionally, the up-sampling processing unit 406 includes: a second processing unit, in the case that the predetermined down-sampling multiple of the image to be divided is 16 times, the five The fifth pooled feature set in the pooled feature sets is subjected to 2 times upsampling processing to obtain the first upsampling feature set; the first fusion unit is used to combine the first upsampling feature set with the five pools The fourth pooled feature set in the optimized feature set is fused to obtain the final result of the upsampling process.
在本申请上述实施例中,可选地,所述上采样处理单元406包括:第二处理单元,用于在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;第一融合单元,用于将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到融合结果;第三处理单元,用于对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;第二 融合单元,用于将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果。In the above-mentioned embodiment of the present application, optionally, the up-sampling processing unit 406 includes: a second processing unit, configured to perform processing on all the images when the predetermined down-sampling multiple of the image to be divided is 8 times The fifth pooled feature set among the five pooled feature sets is subjected to 2 times upsampling processing to obtain the first upsampled feature set; the first fusion unit is used to combine the first upsampled feature set with the five The fourth pooled feature set in the pooled feature sets is fused to obtain the fusion result; the third processing unit is used to perform 2 times upsampling processing on the fusion result to obtain the second upsampling feature set; the second fusion A unit for fusing the second up-sampling feature set with the third pooling feature set of the five pooling feature sets to obtain the final result of the up-sampling process.
在本申请上述实施例中,可选地,所述上采样处理包括内插值处理和反褶积处理。In the foregoing embodiment of the present application, optionally, the up-sampling processing includes interpolation processing and deconvolution processing.
该图像分割装置400使用图1至图3示出的实施例中任一项所述的方案,因此,具有上述所有技术效果,在此不再赘述。The image segmentation device 400 uses the solution described in any one of the embodiments shown in FIG. 1 to FIG. 3, and therefore, has all the above technical effects, which will not be repeated here.
图5示出了本申请的一个实施例的电子设备的框图。Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.
如图5所示,本申请的一个实施例的电子设备500,包括至少一个存储器502;以及,与所述至少一个存储器502通信连接的处理器504;其中,所述存储器存储有可被所述至少一个处理器504执行的指令,所述指令被设置为用于执行上述图1至图3实施例中任一项所述的方案。因此,该电子设备500具有和图1至图3实施例中任一项相同的技术效果,在此不再赘述。As shown in FIG. 5, an electronic device 500 of an embodiment of the present application includes at least one memory 502; and a processor 504 communicatively connected to the at least one memory 502; wherein the memory stores the At least one instruction executed by the processor 504, where the instruction is configured to execute the solution described in any one of the foregoing embodiments in FIG. 1 to FIG. 3. Therefore, the electronic device 500 has the same technical effect as any one of the embodiments in FIGS. 1 to 3, and details are not described herein again.
本申请实施例的电子设备以多种形式存在,包括但不限于:The electronic devices in the embodiments of this application exist in various forms, including but not limited to:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。(1) Mobile communication equipment: This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features. Such terminals include: PDA, MID and UMPC devices, such as iPad.
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。(3) Portable entertainment equipment: This type of equipment can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The structure of a server includes a processor, hard disk, memory, system bus, etc. The server is similar to a general-purpose computer architecture, but because it needs to provide highly reliable services, it is in terms of processing capacity and stability. , Reliability, security, scalability, and manageability.
(5)其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction functions.
另外,本申请实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述图1至图3实施例中任一项所述的方法流程。In addition, an embodiment of the present application provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method flow described in any one of the above-mentioned embodiments in FIGS. 1 to 3.
以上结合附图详细说明了本申请的技术方案,通过本申请的技术方案,使得卷积神经网络的输出图像得到像素维度的还原,从而便于对输出图像的特征进行有效分类,提升了图像语义分割的准确性。The technical solutions of the present application are described in detail above in conjunction with the drawings. Through the technical solutions of the present application, the output image of the convolutional neural network can be restored in pixel dimensions, thereby facilitating effective classification of the features of the output image and improving image semantic segmentation. Accuracy.
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used in this article is only an association relationship describing associated objects, which means that there can be three relationships. For example, A and/or B can mean that there is A alone, and both A and B, there are three cases of B alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.
应当理解,尽管在本申请实施例中可能采用术语第一、第二等来描述池化特征集合,但这些池化特征集合不应限于这些术语。这些术语仅用来将池化特征集合彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一池化特征集合也可以被称为第二池化特征集合,类似地,第二池化特征集合也可以被称为第一池化特征集合。It should be understood that, although the terms first, second, etc. may be used to describe pooling feature sets in the embodiments of the present application, these pooling feature sets should not be limited to these terms. These terms are only used to distinguish pooled feature sets from each other. For example, without departing from the scope of the embodiments of the present application, the first pooling feature set can also be referred to as the second pooling feature set, and similarly, the second pooling feature set can also be referred to as the first pooling feature set. Feature collection.
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrase "if determined" or "if detected (statement or event)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event) )" or "in response to detection (statement or event)".
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接, 可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined Or it can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute the method described in each embodiment of the present application Part of the steps. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above are only the preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in this application Within the scope of protection.

Claims (20)

  1. 一种图像分割方法,其特征在于,包括:An image segmentation method, characterized in that it comprises:
    获取待分割图像;Obtain the image to be segmented;
    对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;Performing convolution, activation and pooling processing on the image to be segmented to obtain five pooling feature sets;
    根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;Performing an up-sampling process on a specified pooling feature set among the five pooling feature sets according to an up-sampling manner corresponding to the predetermined down-sampling multiple of the image to be segmented;
    在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;In the process of the up-sampling processing, calculating the total mask score according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be segmented;
    通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。The final result of the upsampling process is segmented based on the total score of the mask by a smooth L2 loss function to obtain segmented images.
  2. 根据权利要求1所述的图像分割方法,其特征在于,所述根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理的步骤,包括:The image segmentation method according to claim 1, wherein the specified pooling feature in the five pooling feature sets is determined according to the up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented The steps for upsampling the collection include:
    在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理。In the case that the predetermined down-sampling multiple of the image to be segmented is 32 times, 32-fold up-sampling processing is performed on the fifth pooling feature set among the five pooling feature sets.
  3. 根据权利要求1所述的图像分割方法,其特征在于,所述根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理的步骤,包括:The image segmentation method according to claim 1, wherein the specified pooling feature in the five pooling feature sets is determined according to the up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented The steps for upsampling the collection include:
    在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 16 times, the fifth pooling feature set among the five pooling feature sets is subjected to 2 times up-sampling processing to obtain the first up-sampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果。The first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are merged to obtain the final result of the up-sampling process.
  4. 根据权利要求1所述的图像分割方法,其特征在于,所述根 据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理的步骤,包括:The image segmentation method according to claim 1, wherein the specified pooling feature in the five pooling feature sets is determined according to the up-sampling method corresponding to the predetermined down-sampling multiple of the image to be segmented The steps for upsampling the collection include:
    在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 8 times, the fifth pooling feature set among the five pooling feature sets is subjected to a 2-fold up-sampling process to obtain the first up-sampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到融合结果;Fusing the first up-sampling feature set with the fourth pooling feature set among the five pooling feature sets to obtain a fusion result;
    对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;Performing 2 times upsampling processing on the fusion result to obtain a second upsampling feature set;
    将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果。The second upsampling feature set is merged with the third pooling feature set of the five pooling feature sets to obtain the final result of the upsampling process.
  5. 根据权利要求1至4中任一项所述的图像分割方法,其特征在于,The image segmentation method according to any one of claims 1 to 4, wherein:
    所述上采样处理包括内插值处理和反褶积处理。The up-sampling processing includes interpolation processing and deconvolution processing.
  6. 一种图像分割装置,其特征在于,包括:An image segmentation device, characterized in that it comprises:
    图像获取单元,用于获取待分割图像;An image acquisition unit for acquiring an image to be divided;
    下采样处理单元,用于对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;The down-sampling processing unit is configured to perform convolution, activation and pooling processing on the image to be segmented to obtain five pooling feature sets;
    上采样处理单元,用于根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;An up-sampling processing unit, configured to perform up-sampling processing on a specified pooling feature set among the five pooling feature sets according to an up-sampling mode corresponding to a predetermined down-sampling multiple of the image to be divided;
    掩模总得分计算单元,用于在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;The mask total score calculation unit is used to calculate the mask according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be divided during the up-sampling process Total Score;
    图像分割单元,用于通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。The image segmentation unit is configured to segment the final result of the up-sampling process based on the total score of the mask through a smooth L2 loss function to obtain a segmented image.
  7. 根据权利要求6所述的图像分割装置,其特征在于,所述上采样处理单元包括:The image segmentation device according to claim 6, wherein the up-sampling processing unit comprises:
    第一处理单元,用于在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理。The first processing unit is configured to perform 32-fold up-sampling processing on the fifth pooling feature set among the five pooling feature sets when the predetermined down-sampling multiple of the image to be divided is 32 times .
  8. 根据权利要求6所述的图像分割装置,其特征在于,所述上采样处理单元包括:The image segmentation device according to claim 6, wherein the up-sampling processing unit comprises:
    第二处理单元,在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;The second processing unit, when the predetermined down-sampling multiple of the image to be segmented is 16 times, performs 2-fold up-sampling processing on the fifth pooling feature set among the five pooling feature sets to obtain The first upsampling feature set;
    第一融合单元,用于将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果。The first fusion unit is configured to merge the first up-sampling feature set with the fourth pooling feature set among the five pooling feature sets to obtain the final result of the up-sampling processing.
  9. 根据权利要求6所述的图像分割装置,其特征在于,所述上采样处理单元包括:The image segmentation device according to claim 6, wherein the up-sampling processing unit comprises:
    第二处理单元,用于在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;The second processing unit is configured to perform 2-fold up-sampling processing on the fifth pooling feature set among the five pooling feature sets when the predetermined down-sampling multiple of the image to be divided is 8 times , Get the first upsampling feature set;
    第一融合单元,用于将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到融合结果;The first fusion unit is configured to merge the first up-sampling feature set with the fourth pooling feature set among the five pooling feature sets to obtain a fusion result;
    第三处理单元,用于对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;A third processing unit, configured to perform 2 times upsampling processing on the fusion result to obtain a second upsampling feature set;
    第二融合单元,用于将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果。The second fusion unit is configured to fuse the second up-sampling feature set with the third pooling feature set among the five pooling feature sets to obtain the final result of the up-sampling processing.
  10. 根据权利要求6至9中任一项所述的图像分割装置,其特征在于,The image segmentation device according to any one of claims 6 to 9, wherein:
    所述上采样处理包括内插值处理和反褶积处理。The up-sampling processing includes interpolation processing and deconvolution processing.
  11. 一种电子设备,其特征在于,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;An electronic device, characterized by comprising: at least one processor; and a memory connected in communication with the at least one processor;
    其中,所述存储器存储有可被所述至少一个处理器执行的指令, 所述指令被设置为用于执行以下步骤:Wherein, the memory stores instructions executable by the at least one processor, and the instructions are configured to execute the following steps:
    获取待分割图像;Obtain the image to be segmented;
    对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;Performing convolution, activation and pooling processing on the image to be segmented to obtain five pooling feature sets;
    根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;Performing an up-sampling process on a specified pooling feature set among the five pooling feature sets according to an up-sampling manner corresponding to the predetermined down-sampling multiple of the image to be segmented;
    在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;In the process of the up-sampling processing, calculating the total mask score according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be segmented;
    通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。The final result of the upsampling process is segmented based on the total score of the mask by a smooth L2 loss function to obtain segmented images.
  12. 根据权利要求11所述的电子设备,其特征在于,所述指令被设置为用于执行以下步骤:The electronic device according to claim 11, wherein the instruction is configured to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理。In the case that the predetermined down-sampling multiple of the image to be segmented is 32 times, 32-fold up-sampling processing is performed on the fifth pooling feature set among the five pooling feature sets.
  13. 根据权利要求11所述的电子设备,其特征在于,所述指令被设置为用于执行以下步骤:The electronic device according to claim 11, wherein the instruction is configured to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 16 times, the fifth pooling feature set among the five pooling feature sets is subjected to 2 times up-sampling processing to obtain the first up-sampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果。The first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are merged to obtain the final result of the up-sampling process.
  14. 根据权利要求11所述的电子设备,其特征在于,所述指令被设置为用于执行以下步骤:The electronic device according to claim 11, wherein the instruction is configured to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 8 times, the fifth pooling feature set among the five pooling feature sets is subjected to a 2-fold up-sampling process to obtain the first up-sampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四 池化特征集合进行融合,得到融合结果;Fusing the first up-sampling feature set with the fourth pooling feature set of the five pooling feature sets to obtain a fusion result;
    对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;Performing 2 times upsampling processing on the fusion result to obtain a second upsampling feature set;
    将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果。The second upsampling feature set is merged with the third pooling feature set of the five pooling feature sets to obtain the final result of the upsampling process.
  15. 根据权利要求11至14中任一项所述的电子设备,其特征在于,The electronic device according to any one of claims 11 to 14, wherein:
    所述上采样处理包括内插值处理和反褶积处理。The up-sampling processing includes interpolation processing and deconvolution processing.
  16. 一种计算机可读存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行指令用于执行以下步骤:A computer-readable storage medium is characterized by storing computer-executable instructions, and the computer-executable instructions are used to execute the following steps:
    获取待分割图像;Obtain the image to be segmented;
    对所述待分割图像进行卷积、激活和池化处理,得到五个池化特征集合;Performing convolution, activation and pooling processing on the image to be segmented to obtain five pooling feature sets;
    根据所述待分割图像的预定下采样倍数所对应的上采样方式,对所述五个池化特征集合中的指定池化特征集合进行上采样处理;Performing an up-sampling process on a specified pooling feature set among the five pooling feature sets according to an up-sampling manner corresponding to the predetermined down-sampling multiple of the image to be segmented;
    在所述上采样处理的过程中,根据预测掩模与实际掩模的交并比以及所述待分割图像的原有网络分类的掩模得分,计算掩模总得分;In the process of the up-sampling processing, calculating the total mask score according to the intersection ratio of the predicted mask and the actual mask and the mask score of the original network classification of the image to be segmented;
    通过smooth L2损失函数基于所述掩模总得分对所述上采样处理的最终结果进行分割,得到分割图像。The final result of the upsampling process is segmented based on the total score of the mask by a smooth L2 loss function to obtain segmented images.
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述计算机可执行指令用于执行以下步骤:The computer-readable storage medium of claim 16, wherein the computer-executable instructions are used to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为32倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行32倍上采样处理。In the case that the predetermined down-sampling multiple of the image to be segmented is 32 times, 32-fold up-sampling processing is performed on the fifth pooling feature set among the five pooling feature sets.
  18. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述计算机可执行指令用于执行以下步骤:The computer-readable storage medium of claim 16, wherein the computer-executable instructions are used to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为16倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理, 得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 16 times, the fifth pooled feature set of the five pooled feature sets is subjected to a 2-fold upsampling process to obtain the first upsampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到所述上采样处理的所述最终结果。The first up-sampling feature set and the fourth pooling feature set of the five pooling feature sets are merged to obtain the final result of the up-sampling process.
  19. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述计算机可执行指令用于执行以下步骤:The computer-readable storage medium of claim 16, wherein the computer-executable instructions are used to perform the following steps:
    在所述待分割图像的所述预定下采样倍数为8倍的情况下,对所述五个池化特征集合中的第五池化特征集合进行2倍上采样处理,得到第一上采样特征集合;In the case that the predetermined down-sampling multiple of the image to be segmented is 8 times, the fifth pooling feature set among the five pooling feature sets is subjected to a 2-fold up-sampling process to obtain the first up-sampling feature set;
    将所述第一上采样特征集合与所述五个池化特征集合中的第四池化特征集合进行融合,得到融合结果;Fusing the first up-sampling feature set with the fourth pooling feature set among the five pooling feature sets to obtain a fusion result;
    对所述融合结果进行2倍上采样处理,得到第二上采样特征集合;Performing 2 times upsampling processing on the fusion result to obtain a second upsampling feature set;
    将所述第二上采样特征集合与所述五个池化特征集合中的第三池化特征集合进行融合,得到所述上采样处理的所述最终结果。The second upsampling feature set is merged with the third pooling feature set of the five pooling feature sets to obtain the final result of the upsampling process.
  20. 根据权利要求16至19中任一项所述的计算机可读存储介质,其特征在于,The computer-readable storage medium according to any one of claims 16 to 19, wherein:
    所述上采样处理包括内插值处理和反褶积处理。The up-sampling processing includes interpolation processing and deconvolution processing.
PCT/CN2019/118294 2019-07-05 2019-11-14 Image segmentation method, electronic device, and computer-readable storage medium WO2021003936A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910602691.5 2019-07-05
CN201910602691.5A CN110490203B (en) 2019-07-05 2019-07-05 Image segmentation method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021003936A1 true WO2021003936A1 (en) 2021-01-14

Family

ID=68546051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118294 WO2021003936A1 (en) 2019-07-05 2019-11-14 Image segmentation method, electronic device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110490203B (en)
WO (1) WO2021003936A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340820B (en) * 2020-02-10 2022-05-17 中国科学技术大学 Image segmentation method and device, electronic equipment and storage medium
CN111340813B (en) * 2020-02-25 2023-09-01 北京字节跳动网络技术有限公司 Image instance segmentation method and device, electronic equipment and storage medium
CN111523548B (en) * 2020-04-24 2023-11-28 北京市商汤科技开发有限公司 Image semantic segmentation and intelligent driving control method and device
CN113744276A (en) * 2020-05-13 2021-12-03 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, electronic device, and readable storage medium
CN112150470B (en) * 2020-09-22 2023-10-03 平安科技(深圳)有限公司 Image segmentation method, device, medium and electronic equipment
CN113160263A (en) * 2021-03-30 2021-07-23 电子科技大学 Improved method based on YOLACT instance segmentation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171663A (en) * 2017-12-22 2018-06-15 哈尔滨工业大学 The image completion system for the convolutional neural networks that feature based figure arest neighbors is replaced
CN108230329A (en) * 2017-12-18 2018-06-29 孙颖 Semantic segmentation method based on multiple dimensioned convolutional neural networks
CN109636807A (en) * 2018-11-27 2019-04-16 宿州新材云计算服务有限公司 A kind of grape disease blade split plot design of image segmentation and pixel recovery
CN109816011A (en) * 2019-01-21 2019-05-28 厦门美图之家科技有限公司 Generate the method and video key frame extracting method of portrait parted pattern
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109886971A (en) * 2019-01-24 2019-06-14 西安交通大学 A kind of image partition method and system based on convolutional neural networks

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130189A1 (en) * 2017-10-30 2019-05-02 Qualcomm Incorporated Suppressing duplicated bounding boxes from object detection in a video analytics system
CN109584251A (en) * 2018-12-06 2019-04-05 湘潭大学 A kind of tongue body image partition method based on single goal region segmentation
CN109784283B (en) * 2019-01-21 2021-02-09 陕西师范大学 Remote sensing image target extraction method based on scene recognition task
CN109800735A (en) * 2019-01-31 2019-05-24 中国人民解放军国防科技大学 Accurate detection and segmentation method for ship target

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230329A (en) * 2017-12-18 2018-06-29 孙颖 Semantic segmentation method based on multiple dimensioned convolutional neural networks
CN108171663A (en) * 2017-12-22 2018-06-15 哈尔滨工业大学 The image completion system for the convolutional neural networks that feature based figure arest neighbors is replaced
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109636807A (en) * 2018-11-27 2019-04-16 宿州新材云计算服务有限公司 A kind of grape disease blade split plot design of image segmentation and pixel recovery
CN109816011A (en) * 2019-01-21 2019-05-28 厦门美图之家科技有限公司 Generate the method and video key frame extracting method of portrait parted pattern
CN109886971A (en) * 2019-01-24 2019-06-14 西安交通大学 A kind of image partition method and system based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GDTOP818: "Mask Scoring R-CNN[Detailed]", 6 March 2019 (2019-03-06), pages 1 - 10, XP009525484, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_37993251/article/details/88248361> *

Also Published As

Publication number Publication date
CN110490203A (en) 2019-11-22
CN110490203B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
WO2021003936A1 (en) Image segmentation method, electronic device, and computer-readable storage medium
JP7110502B2 (en) Image Background Subtraction Using Depth
KR102140340B1 (en) Deep-running-based image correction detection system and method for providing non-correction detection service using the same
CN111753727B (en) Method, apparatus, device and readable storage medium for extracting structured information
WO2019085793A1 (en) Image classification method, computer device and computer readable storage medium
CN111144242B (en) Three-dimensional target detection method, device and terminal
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
TW201947509A (en) Data processing method, apparatus and device for claim settlement service, and server
Liu et al. Exposing splicing forgery in realistic scenes using deep fusion network
CN109086753B (en) Traffic sign identification method and device based on two-channel convolutional neural network
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
WO2023065665A1 (en) Image processing method and apparatus, device, storage medium and computer program product
KR102262671B1 (en) Method and storage medium for applying bokeh effect to video images
CN111340139B (en) Method and device for judging complexity of image content
Chen et al. SCPA‐Net: Self‐calibrated pyramid aggregation for image dehazing
CN114639143B (en) Portrait archiving method, device and storage medium based on artificial intelligence
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN115100731B (en) Quality evaluation model training method and device, electronic equipment and storage medium
CN110738261A (en) Image classification and model training method and device, electronic equipment and storage medium
US11132762B2 (en) Content aware image fitting
KR20200046182A (en) Deep-running-based image correction detection system and method for providing non-correction detection service using the same
WO2021190412A1 (en) Video thumbnail generation method, device, and electronic apparatus
CN115471413A (en) Image processing method and device, computer readable storage medium and electronic device
CN113610856A (en) Method and device for training image segmentation model and image segmentation
CN114820423A (en) Automatic cutout method based on saliency target detection and matching system thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937242

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19937242

Country of ref document: EP

Kind code of ref document: A1