CN114022679A

CN114022679A - Image segmentation method, model training device and electronic equipment

Info

Publication number: CN114022679A
Application number: CN202111430927.5A
Authority: CN
Inventors: 刘仕通; 雷翔; 田鑫钰; 何春鸿
Original assignee: Chongqing Cisai Tech Co Ltd
Current assignee: Chongqing Cisai Tech Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-02-08

Abstract

The application provides an image segmentation method, a model training device and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of down-sampling an input first image to obtain a second image with a specified resolution; performing feature extraction on the first image and the second image by using a deep neural network model to obtain I first-class image features with different sizes and J second-class image features with different sizes; performing feature enhancement on part or all of the I first-class image features and the J second-class image features through a deep neural network, and fusing the enhanced image features to obtain fused image features; the fused image features are segmented through the deep neural network model, the segmentation result representing the first image is output, the detection precision of the small target can be improved while the large target and the medium target are considered, and the robustness and the accuracy of the algorithm are improved.

Description

Image segmentation method, model training device and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image segmentation method, a model training method, an apparatus, and an electronic device.

Background

In the field of machine vision, various real objects can be classified by using a corresponding algorithm of the machine vision. For example, when using a Region-generated Network (RPN) in a deep learning algorithm and a tow stage detection algorithm of a subsequent classification regression, scale differences of various targets in an image and interference factors such as water stain may cause difficulty in semantic segmentation and affect detection accuracy.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image segmentation method, a model training method, an image segmentation device, and an electronic device, which can improve the accuracy of target detection and improve the robustness and accuracy of an algorithm.

In order to achieve the above object, embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides an image segmentation method, where the method includes: the method comprises the steps of carrying out downsampling on an input first image to obtain a second image with specified resolution, wherein the specified resolution is lower than the resolution of the first image; performing feature extraction on the first image through a tested deep neural network model to obtain I first-class image features with different sizes, and performing feature extraction on the second image through the tested deep neural network model to obtain J second-class image features with different sizes, wherein I, J are integers larger than 0; performing feature enhancement on part or all of the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features; and segmenting the fused image features through the deep neural network model, and outputting a segmentation result representing the first image.

In the above embodiment, the feature extraction of different sizes is performed on the two types of high-resolution and low-resolution images, i.e., the first image and the second image, so as to increase the richness of multi-scale features, improve the detection precision of small targets while considering both large and medium targets, and further improve the robustness and accuracy of the algorithm.

With reference to the first aspect, in some optional embodiments, the deep neural network model comprises a feature pyramid network; performing feature extraction on the first image through a tested deep neural network model to obtain I first-class image features with different sizes, wherein the method comprises the following steps: extracting the features of the first image according to a preset size proportion through the feature pyramid network to obtain the first image feature; performing feature extraction on the ith image feature according to the preset size proportion through the feature pyramid network to obtain an (I + 1) th image feature, wherein I is sequentially from 1 to I-1 and stops when the I image features are obtained, and I is a positive integer, wherein when I is 1, the ith image feature is the first image feature; and performing convolution fusion on the I image features to obtain I first-class image features.

With reference to the first aspect, in some optional embodiments, the deep neural network model comprises a feature pyramid network;

performing feature extraction on the second image through the deep neural network model to obtain J second-class image features with different sizes, wherein the feature extraction comprises the following steps:

extracting the features of the second image according to a preset size proportion through the feature pyramid network to obtain the first image feature;

performing feature extraction on the jth image feature according to the preset size proportion through the feature pyramid network to obtain a jth +1 image feature, wherein J is from 1 to J-1 in sequence and stops when the jth image feature is obtained, and J is a positive integer, wherein when J is 1, the jth image feature is the first image feature;

and performing convolution fusion on the J image features to obtain J second-class image features.

With reference to the first aspect, in some optional embodiments, performing feature enhancement on part or all of the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features, where the feature enhancement includes:

performing convolution pooling on part or all of the I first-class image features through the deep neural network to obtain enhanced first-class image features;

performing convolution pooling on part or all of the J second-class image features through the deep neural network to obtain enhanced second-class image features;

and performing convolution fusion on the enhanced first type image features and the enhanced second type image features through the deep neural network to obtain the fused image features.

With reference to the first aspect, in some optional embodiments, before downsampling the input first image, the method further comprises:

acquiring an image set, wherein the image set comprises a plurality of images with preset marks;

performing image division on each training image in the image set to obtain a plurality of first image blocks;

generating a plurality of first image block copies according to the plurality of first image blocks, wherein the first image block copies are images obtained by expanding the first image blocks through a preset data expansion algorithm;

and inputting the plurality of first image blocks and the plurality of first image block copies into a preset deep neural network model for training to obtain a trained deep neural network model.

In the above embodiment, each training image is subjected to image segmentation to obtain image blocks, and then corresponding image block copies are generated, so that training image data can be enriched and training data content can be enhanced. And then, the deep neural network model is trained by utilizing the plurality of first image blocks and the plurality of first image block copies, so that the training effect can be improved, and the accuracy of the trained model on target detection can be improved.

performing image segmentation on each test image in the image set to obtain a plurality of second image blocks;

generating a plurality of second image block copies according to the plurality of second image blocks, wherein the second image block copies are images obtained by expanding the second image blocks through the preset data expansion algorithm;

and inputting the plurality of second image blocks and the plurality of second image block copies into the trained deep neural network model for testing to obtain the tested deep neural network model.

In a second aspect, the present application further provides a model training method, including:

In a third aspect, the present application further provides an image segmentation apparatus, comprising:

the down-sampling unit is used for down-sampling an input first image to obtain a second image with specified resolution, and the specified resolution is lower than the resolution of the first image;

the feature extraction unit is used for performing feature extraction on the first image through a tested deep neural network model to obtain I first-class image features with different sizes, and performing feature extraction on the second image through the tested deep neural network model to obtain J second-class image features with different sizes, wherein I, J are integers larger than 0;

the enhancement fusion unit is used for performing feature enhancement on part or all of the image features in the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features;

and the segmentation unit is used for segmenting the fused image features through the deep neural network model and outputting a segmentation result representing the first image.

In a fourth aspect, the present application further provides a model training apparatus, the apparatus comprising:

an acquisition unit configured to acquire an image set including a plurality of images having preset marks;

the dividing unit is used for carrying out image division on each training image in the image set to obtain a plurality of first image blocks;

the image processing device comprises a copy generating unit, a first image block generating unit and a second image block generating unit, wherein the copy generating unit is used for generating a plurality of first image block copies according to the plurality of first image blocks, and the first image block copies are images obtained by expanding the first image blocks through a preset data expansion algorithm;

and the training unit is used for inputting the plurality of first image blocks and the plurality of first image block copies into a preset deep neural network model for training to obtain a trained deep neural network model.

In a fifth aspect, the present application further provides an electronic device, which includes a processor and a memory coupled to each other, wherein the memory stores a computer program, and when the computer program is executed by the processor, the electronic device executes the image segmentation method or the model training method.

In a sixth aspect, the present application further provides a computer-readable storage medium, in which a computer program is stored, which, when run on a computer, causes the computer to perform the image segmentation method described above, or to perform the model training method described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application.

Fig. 2 is a schematic diagram of image division according to an embodiment of the present application.

Fig. 3 is a schematic network structure diagram of a deep neural network model according to an embodiment of the present disclosure.

Fig. 4 is a block diagram of a model training apparatus according to an embodiment of the present application.

Fig. 5 is a schematic flowchart of an image segmentation method according to an embodiment of the present application.

Fig. 6 is a block diagram of an image segmentation apparatus according to an embodiment of the present application.

Icon: 200-a model training device; 210-an obtaining unit; 220-a dividing unit; 230-a replica generation unit; 240-a training unit; 400-image segmentation means; 410-a down-sampling unit; 420-a feature extraction unit; 430-enhanced fusion unit; 440-split unit.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that the terms "first," "second," and the like are used merely to distinguish one description from another, and are not intended to indicate or imply relative importance. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

First embodiment

Referring to fig. 1, the present application provides a model training method, which can be applied to an electronic device and executed by the electronic device or used for implementing each step of the method.

Understandably, the electronic device may include a processing module and a memory module. A computer program is stored in the memory module, which when executed by the processing module, enables the electronic device to perform the steps of the image segmentation method or the model training method described below. The electronic device may be, but is not limited to, a personal computer, a server, and the like.

In this embodiment, the model training method may include the following steps:

step S110, obtaining an image set, wherein the image set comprises a plurality of images with preset marks;

step S120, performing image division on each training image in the image set to obtain a plurality of first image blocks;

step S130, generating a plurality of first image block copies according to the plurality of first image blocks, wherein the first image block copies are images obtained by expanding the first image blocks through a preset data expansion algorithm;

step S140, inputting the plurality of first image blocks and the plurality of first image block copies into a preset deep neural network model for training, so as to obtain a trained deep neural network model.

In the above embodiment, each training image is subjected to image segmentation to obtain image blocks, and then corresponding image block copies are generated, so that training image data can be enriched and training data content can be enhanced. In addition, in this embodiment, small objects in an image block may also be copied (for example, in the same image block, one or more regions of the small objects are copied and the copied regions are fused in the image block), so as to increase the number and diversity of the small objects. The small target can be flexibly determined according to actual conditions, for example, an inclusion with a small particle size in the image.

And then, the deep neural network model is trained by utilizing the plurality of first image blocks and the plurality of first image block copies, so that the training effect can be improved, the accuracy of the trained model on target detection can be improved, and the low accuracy of model detection caused by small training data volume is avoided.

The individual steps in the process are explained in detail below, as follows:

in step S110, the image set acquired by the electronic device is a set of images prepared by the user in advance for model training and testing. In the image set, each image is provided with a corresponding marker. The marking mode can be flexibly determined according to actual conditions.

For example, in a metal smelting process, when the types of inclusions such as carbon, sulfur, silicon, boron and the like in the metal to be smelted need to be detected, images of the types of inclusions need to be shot, and then corresponding types are marked on regions of the types of inclusions in the images. The image set formed by the marked images can be uploaded to the electronic equipment by a user.

For another example, when it is necessary to detect whether the smelted metal has defects (e.g., cracks, cavities), the defect parts in the images can be marked manually according to the shot images of the smelted metal.

The number of images in the image set may be flexibly determined according to actual situations, and is not particularly limited herein.

In step S120, the manner of dividing the image may be flexibly determined according to actual situations. For example, when dividing an image, the boundaries of adjacent image blocks may not intersect, e.g., divide the image blocks in a squared manner.

Alternatively, referring to fig. 2, an image a may be divided into 9 image blocks B in a nine-square grid, and the divided image blocks have overlapping regions.

Understandably, since the number of images in an image set is usually limited, if there are overlapping regions between adjacent image blocks, it is beneficial to enrich the image content of the image blocks on the basis of increasing the number of image blocks.

In step S130, the preset data expansion algorithm may be flexibly determined according to actual situations. For example, the preset data extension algorithm may be a TTA (Test Time Augmentation) algorithm. For example, through the TTA algorithm, one image block may be copied to obtain three copied image blocks (the number of copies may be flexibly determined according to actual conditions), then, the first image block is rotated, the second image block is translated, and the third image block is scaled, so as to obtain three image blocks with different postures or sizes from the original image block. And rotating, translating and scaling to obtain three image blocks which are the image block copies. In this case, the total of four image blocks including the original image block can increase the data amount of the image block, thereby enhancing the data. Wherein each image block may be subjected to the data enhancement described above.

It should be noted that the number of image block copies generated based on one image block may be flexibly determined according to actual situations. In addition, the rotation angle, the scaling ratio and the translation distance of the image block can be flexibly set according to the actual situation.

In step S140, the preset deep neural network model and the trained or tested deep neural network model have the same network structure, and can be flexibly determined according to actual conditions. For example, referring to fig. 3, the deep neural network model includes a feature pyramid network and a path enhancing network.

Referring to fig. 3 again, when the deep neural network model is trained, each image block may be used as an input image, then the input image is input to the preset deep neural network model, and then the input image is subjected to learning training by using the feature pyramid network and the path enhancing network, so as to obtain the deep neural network model having functions of identifying the target and dividing the map region of the target.

After step S140, the method may further comprise:

step S150, carrying out image segmentation on each test image in the image set to obtain a plurality of second image blocks;

step S160, generating a plurality of second image block copies according to the plurality of second image blocks, wherein the second image block copies are images obtained by expanding the second image blocks through the preset data expansion algorithm;

step S170, inputting the plurality of second image blocks and the plurality of second image block copies into the trained deep neural network model for testing, so as to obtain the tested deep neural network model.

Understandably, during the testing of the model, the execution of steps S150 and S160 is similar to the above steps S120 and S130, and will not be described again here.

Each original image block and the multiple image block copies generated based on the original image block may be used as a group of image blocks.

In step S170, each group of image blocks is input into the trained deep neural network model to obtain a corresponding segmentation result, then all the segmentation results of each group are inversely transformed to restore the preset marks corresponding to the segmentation results (i.e., according to the obtained segmentation results and the preset marks corresponding to the group of image blocks, model parameters are adjusted to make the output segmentation results match or the same as the preset marks), and finally a union of all the copy segmentation results is obtained to obtain the segmentation results of the group of image blocks, and the purpose of testing the model is also achieved.

In this embodiment, after the model is tested, the image of the image to be tested may be segmented by using the tested deep neural network model to obtain the segmentation result. The image to be detected can be flexibly determined according to the actual situation. For example, the image to be measured may include an image obtained by photographing inclusions for incorporation into the molten metal. The segmentation result comprises a region corresponding to the inclusion and a type identifier.

For example, in steel making, it is necessary to introduce a plurality of non-metallic inclusions to obtain steels having different properties by smelting. The inclusions may be, but are not limited to, carbon, sulfur, silicon, boron, and the like. The electronic equipment can identify and segment images of various inclusions obtained by shooting by using the tested deep neural network model to obtain a segmentation map area and a type corresponding to each type of inclusion.

Referring to fig. 4, an embodiment of the present application further provides a model training apparatus 200, which can be applied to the electronic device described above for executing steps of the model training method. The model training apparatus 200 includes at least one software functional module which can be stored in a memory module in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of an electronic device. The processing module is used for executing executable modules stored in the storage module, such as software functional modules and computer programs included in the model training apparatus 200.

The model training apparatus 200 may include an obtaining unit 210, a dividing unit 220, a duplicate generation unit 230, and a training unit 240, and each unit may have the following functions:

an obtaining unit 210, configured to obtain an image set, where the image set includes a plurality of images with preset marks;

a dividing unit 220, configured to perform image division on each training image in the image set to obtain a plurality of first image blocks;

a copy generating unit 230, configured to generate a plurality of first image block copies according to the plurality of first image blocks, where the first image block copies are images obtained by expanding the first image blocks through a preset data expansion algorithm;

the training unit 240 is configured to input the plurality of first image blocks and the plurality of first image block copies into a preset deep neural network model for training, so as to obtain a trained deep neural network model.

Optionally, the model training apparatus 200 may further include a test unit.

The dividing unit 220 may further be configured to perform image segmentation on each test image in the image set to obtain a plurality of second image blocks;

the copy generating unit 230 may further be configured to generate a plurality of second image block copies according to the plurality of second image blocks, where the second image block copies are images obtained by expanding the second image blocks through the preset data expansion algorithm;

the testing unit is used for inputting the plurality of second image blocks and the plurality of second image block copies into the trained deep neural network model for testing to obtain the tested deep neural network model.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working process of the electronic device and the model training apparatus 200 described above may refer to the corresponding process of each step in the model training method, and will not be described in detail herein.

Second embodiment

Referring to fig. 5, the present application further provides an image segmentation method, which can be applied to the electronic device, and is executed or implemented by the electronic device.

The method may comprise the steps of:

step S310, carrying out down-sampling on an input first image to obtain a second image with specified resolution, wherein the specified resolution is lower than the resolution of the first image;

step S320, performing feature extraction on the first image through the tested deep neural network model to obtain I first-class image features with different sizes, and performing feature extraction on the second image through the tested deep neural network model to obtain J second-class image features with different sizes, wherein I, J are integers which are larger than 0;

step S330, performing feature enhancement on part or all of the image features in the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features;

step S340, segmenting the fused image features through the deep neural network model, and outputting segmentation results representing the first image.

The individual steps in the process are explained in detail below, as follows:

the deep neural network model has been trained and tested before the method performs step S310. If the deep neural network model is not trained and tested, before step S310 is executed, a preset deep neural network model may be trained and tested by the model training method described in the first embodiment, so as to obtain a model subjected to the training test. Then, the first image is input into the deep neural network model for image segmentation.

In step S310, the first image may understand an initial image or artwork. The down sampling mode of the first image can be flexibly determined according to the actual situation. E.g. in the length and width of the first image

In the case of the ratio of (a) to (b), the first image is downsampled to obtain a second image having a resolution lower than that of the first image, and in this case, the area of the second image is that of the first image

The specified resolution may be flexibly determined according to actual conditions, and may be understood as a size ratio of the down-sampled image to the first image, and the specified resolution is not particularly limited herein.

In step S320, the deep neural network model includes a feature pyramid network. I. J can be flexibly set according to actual conditions.

In step S320, performing feature extraction on the first image through the tested deep neural network model to obtain I first-class image features with different sizes, including:

extracting the features of the first image according to a preset size proportion through the feature pyramid network to obtain the first image feature;

performing feature extraction on the ith image feature according to the preset size proportion through the feature pyramid network to obtain an (I + 1) th image feature, wherein I is sequentially from 1 to I-1 and stops when the I image features are obtained, and I is a positive integer, wherein when I is 1, the ith image feature is the first image feature;

and performing convolution fusion on the I image features to obtain I first-class image features.

Understandably, the preset size proportion can be flexibly determined according to the actual situation.

Illustratively, referring again to FIG. 3, assume that I is 4 and the predetermined size ratio is

The feature pyramid network may be applied to the first image of the input to

The length and the width of the first image are multiplied, the first image is convoluted and activated to carry out feature extraction, and the first image with the size is obtained

The 1 st image feature P2 of multiple length and width. Then, the 1 st image is characterized by P2 again

The length-width ratio of the image is multiplied, convolution and activation are carried out to extract the features, and a first image with the size of the first image is obtained

The 2 nd image feature P3 of multiple length and width. By the feature of each obtained image

The length-width ratio of the image is multiplied, convolution and activation are carried out, and therefore the first image with the size of the image can be obtained sequentially

Length and width multiplied 3 rd image feature P4 and first image

The 4 th image feature P5 of multiple length and width.

Then, the image feature P5 is convolved and pooled by using the feature pyramid network, and the size ratio is maintained unchanged, so that an image feature P5' is obtained. And then, utilizing the characteristic pyramid network to perform upsampling on the image characteristic P5 'to obtain an image characteristic with the same size as the image characteristic P4, and fusing the image characteristic with P4 to obtain an image characteristic P4'. By up-sampling each layer of image features obtained by the fusion and fusing the up-sampled image features with the image features of the same size, the image features P3 'and P2' can be obtained. For example, the image feature P3 'is obtained by up-sampling the image feature P4' line to obtain an image feature having the same size as the image feature P3 and fusing the image feature with P3.

The obtained image features P2', P3', P4 'and P5' are the 4 first-class image features with different sizes.

Similarly, in step S320, performing feature extraction on the second image through the deep neural network model to obtain J second-class image features with different sizes, including:

Referring again to fig. 3, J is illustratively 3. The extraction mode of the second type of image features is similar to that of the first type of image features. The difference is that in the second image feature extraction process, the size of the second image input to the feature pyramid network is the length and width of the first image

And (4) doubling. By using the same feature extraction method for the first-class image features, the 1 st image feature Q2, the 2 nd image feature Q3 and the 3 rd image feature Q4 can be obtained. Then, the image features are fused, whereby image features Q2', Q3', and Q4' can be obtained. Among them, in order to reduce the amount of computation, the image feature Q2' may not need to be fused, because in the image feature extraction process of low resolution, a small-size image feature is more concerned.

Based on step S320, a plurality of image features of different sizes of the first image and the second image subjected to feature extraction can be obtained, which is beneficial to diversification of the features of different sizes.

In step S330, by performing enhanced fusion on features with different sizes, the feature content can be enriched, and the detection and segmentation of the model are facilitated.

In this embodiment, step S330 may include:

Illustratively, referring again to fig. 3, image feature P2' is convolved and pooled by using a path enhancement network, and the size of the image feature is maintained, thereby obtaining image feature N2; then, the image is characterized by N2

Down-sampling the length-width ratio, and fusing the obtained image features with the image features P3' to obtain image features N3; then, the image is characterized by N3

The aspect ratio is down-sampled and the resulting image feature is fused with image feature Q3' to yield image feature N4. Then, the obtained image features N2, N3 and N4 are processed

Operation, the operator

And performing convolution-connection (concatenate) -convolution operation to obtain a fused image feature, wherein the size of the fused image feature is the same as that of the input image, so that the deep neural network model can perform detection and segmentation. Wherein, carry out

The operation is well known to those skilled in the art and will not be described herein.

In step S340, the deep neural network model may directly perform detection segmentation based on the fused image features, so as to obtain a corresponding segmentation result. The segmentation result can be flexibly determined according to the actual situation. For example, in the process of identifying and classifying inclusions in the steel smelting process, the first image is an image obtained by shooting various inclusions, and the classification result may include the categories and regions corresponding to the various inclusions.

Referring to fig. 6, an embodiment of the present application further provides an image segmentation apparatus 400, which can be applied to the electronic device described above for executing the steps of the method. The image segmentation apparatus 400 includes at least one software functional module which can be stored in a memory module in the form of software or Firmware (Firmware) or solidified in an Operating System (OS) of the electronic device. The processing module is used for executing executable modules stored in the storage module, such as software functional modules and computer programs included in the image segmentation apparatus 400.

The image segmentation unit 440 may include a down-sampling unit 410, a feature extraction unit 420, an enhanced fusion unit 430, and a segmentation unit 440, and each unit may have the following functions:

a down-sampling unit 410, configured to down-sample an input first image to obtain a second image with a specified resolution, where the specified resolution is lower than a resolution of the first image;

a feature extraction unit 420, configured to perform feature extraction on the first image through a tested deep neural network model to obtain I first-class image features of different sizes, and perform feature extraction on the second image through the tested deep neural network model to obtain J second-class image features of different sizes, where I, J are integers greater than 0;

an enhancement fusion unit 430, configured to perform feature enhancement on part or all of the I first-class image features and the J second-class image features through the deep neural network, and fuse the enhanced image features to obtain fused image features;

a segmentation unit 440, configured to segment the fused image features through the deep neural network model, and output a segmentation result representing the first image.

Optionally, the deep neural network model comprises a feature pyramid network; the feature extraction unit 420 may also be configured to:

Optionally, the feature extraction unit 420 may be further configured to:

Optionally, the enhanced fusion unit 430 may be further configured to:

In this embodiment, the processing module may be an integrated circuit chip having signal processing capability. The processing module may be a general purpose processor. For example, the Processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), or the like; the method, the steps and the logic block diagram disclosed in the embodiments of the present Application may also be implemented or executed by a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The memory module may be, but is not limited to, a random access memory, a read only memory, a programmable read only memory, an erasable programmable read only memory, an electrically erasable programmable read only memory, and the like. In this embodiment, the storage module may be configured to store the first image, the second image, the deep neural network model, and the like. Of course, the storage module may also be used to store a program, and the processing module executes the program after receiving the execution instruction.

It should be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the electronic device and the image segmentation apparatus 400 described above may refer to the corresponding processes of the steps in the foregoing method, and are not described in detail herein.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform a model training method or an image segmentation method as described in the above embodiments.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by hardware, or by software plus a necessary general hardware platform, and based on such understanding, the technical solution of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments of the present application.

In summary, in the scheme, the first image and the second image are subjected to feature extraction of different sizes, so that richness of multi-scale features is increased, detection accuracy of small targets can be improved while large and medium targets are considered, and robustness and accuracy of an algorithm are improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, system, and method may be implemented in other ways. The apparatus, system, and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of image segmentation, the method comprising:

the method comprises the steps of carrying out downsampling on an input first image to obtain a second image with specified resolution, wherein the specified resolution is lower than the resolution of the first image;

performing feature extraction on the first image through a tested deep neural network model to obtain I first-class image features with different sizes, and performing feature extraction on the second image through the tested deep neural network model to obtain J second-class image features with different sizes, wherein I, J are integers larger than 0;

performing feature enhancement on part or all of the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features;

and segmenting the fused image features through the deep neural network model, and outputting a segmentation result representing the first image.

2. The method of claim 1, wherein the deep neural network model comprises a feature pyramid network;

performing feature extraction on the first image through a tested deep neural network model to obtain I first-class image features with different sizes, wherein the method comprises the following steps:

3. The method of claim 1, wherein the deep neural network model comprises a feature pyramid network;

4. The method according to claim 1, wherein the performing feature enhancement on some or all of the I first-class image features and the J second-class image features through the deep neural network, and fusing the enhanced image features to obtain fused image features comprises:

5. The method of claim 1, wherein prior to downsampling the input first image, the method further comprises:

6. The method of claim 5, wherein prior to downsampling the input first image, the method further comprises:

7. A method of model training, the method comprising:

8. An image segmentation apparatus, characterized in that the apparatus comprises:

9. A model training apparatus, the apparatus comprising:

10. An electronic device, characterized in that the electronic device comprises a processor and a memory coupled to each other, in which a computer program is stored which, when executed by the processor, causes the electronic device to perform the method of any of claims 1-6 or to perform the method of claim 7.

11. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to perform the method of any one of claims 1-6, or to perform the method of claim 7.