CN113139546A

CN113139546A - Training method of image segmentation model, and image segmentation method and device

Info

Publication number: CN113139546A
Application number: CN202010058528.XA
Authority: CN
Inventors: 王再冉; 刘裕峰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-01-19
Filing date: 2020-01-19
Publication date: 2021-07-20
Anticipated expiration: 2040-01-19
Also published as: CN113139546B

Abstract

The application relates to a training method of an image segmentation model, and a method and a device for image segmentation, and belongs to the technical field of image processing. The method includes acquiring a plurality of sample images; inputting each sample image into a neural network model, carrying out multi-stage processing, and carrying out multilayer convolution processing in each stage to obtain a characteristic image; inputting the feature image output by each stage into a corresponding output layer, and outputting the position of a predicted key point and predicted semantic segmentation information corresponding to each stage; obtaining loss values of the sample images at each stage based on the corresponding predicted key point positions or predicted semantic segmentation information of each stage of each sample image and the reference key point positions or reference semantic segmentation information of the sample images; and training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model. By using the method, the accuracy of image segmentation can be improved.

Description

Training method of image segmentation model, and image segmentation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for an image segmentation model, and an image segmentation method and apparatus.

Background

The image segmentation technology is often applied in the field of image processing, and a specific flow in the application may be roughly that, first, an image is input into a recognition model trained in advance to obtain position information of a target region, and then, an image corresponding to the target region is segmented from the image according to the position information of the target region.

In the course of implementing the present application, the inventors found that the related art has at least the following problems:

in practical application, because a region image corresponding to a target region in an image is generally complex, the obtained region image is not accurate enough by using the recognition model, and the accuracy of image segmentation is low.

Disclosure of Invention

The application provides a training method of an image segmentation model, and a method and a device for image segmentation, which can overcome the problems in the related art.

According to an aspect of the embodiments of the present application, there is provided a training method of an image segmentation model, including:

obtaining a plurality of sample images, wherein the sample images comprise a first sample image with a reference key point position and a second sample image with reference semantic segmentation information;

inputting each sample image into a neural network model, carrying out multi-stage processing, and carrying out multilayer convolution processing in each stage to obtain a characteristic image, wherein the first stage is to carry out multilayer convolution processing on the sample image, and the other stages except the first stage are to carry out multilayer convolution processing on the characteristic image obtained in the previous stage;

inputting the feature image output by each stage into a corresponding output layer, and outputting the position of a predicted key point and predicted semantic segmentation information corresponding to each stage;

obtaining loss values of the sample images at various stages based on the corresponding predicted key point positions or predicted semantic segmentation information of various stages of each sample image and the reference key point positions or reference semantic segmentation information of the sample images;

and training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model.

Optionally, the obtaining a loss value of the sample image at each stage based on the predicted keypoint position or the predicted semantic segmentation information corresponding to each stage of each sample image and the reference keypoint position or the reference semantic segmentation information of the sample image includes:

obtaining loss values of the first sample image in each stage based on the corresponding predicted key point position of each stage of each first sample image and the reference key point position of the first sample image;

and obtaining the loss value of the second sample image at each stage based on the prediction semantic segmentation information corresponding to each stage of each second sample image and the reference semantic segmentation information of the second sample image.

Optionally, the training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model includes:

calculating an average loss value of each sample image based on the loss value of each stage of each sample image;

and training the neural network model based on the average loss value of each sample image to obtain an image segmentation model.

Optionally, before the acquiring the plurality of sample images, the method further includes:

acquiring a plurality of initial sample images;

for each initial sample image, performing at least one of the following adjustments to obtain a plurality of sample images:

adjusting the resolution to a target resolution, adjusting the noise value to any noise value within a range of noise values, adjusting the illumination value to any illumination value within a range of illumination values, adjusting the rotation value to any rotation value within a range of rotation values, adjusting the translation value to any translation value within a range of translation values, adjusting the shear value to any shear value within a range of shear values, and adjusting the miscut value to any miscut value within a range of miscut values.

Optionally, the acquiring a plurality of sample images includes:

acquiring a plurality of initial images, and horizontally turning each initial image to obtain an image which is mirror-symmetrical to the initial image;

the initial image before the horizontal inversion and the image after the horizontal inversion are used as sample images.

training the neural network model based on the loss value of each stage of each sample image to obtain a quasi-image segmentation model;

inputting the obtained test image with the reference semantic segmentation information into the quasi-image segmentation model to obtain test semantic segmentation information;

and if the similarity between the reference semantic segmentation information and the test semantic segmentation information of the test image is not less than a similarity threshold value, determining the quasi-image segmentation model as the image segmentation model.

Optionally, before the obtained test image with the reference semantic segmentation information is input into the quasi-image segmentation model to obtain the test semantic segmentation information, the method further includes:

acquiring an initial test image with reference semantic segmentation information;

and adjusting the resolution of the initial test image to a target resolution to obtain a test image with the reference semantic segmentation information.

According to another aspect of embodiments of the present application, there is provided an image segmentation method, including:

acquiring an image to be segmented;

and inputting the image to be segmented into the image segmentation model to obtain semantic segmentation information of the image to be segmented.

According to another aspect of the embodiments of the present application, there is provided an apparatus for training an image segmentation model, the apparatus including:

a first obtaining module configured to perform obtaining a plurality of sample images, wherein the sample images include a first sample image with a reference key point position and a second sample image with reference semantic segmentation information;

the processing module is configured to input each sample image into a neural network model, perform multi-stage processing, and perform multilayer convolution processing in each stage to obtain a feature image, wherein the first stage is to perform multilayer convolution processing on the sample image, and the other stages except the first stage are to perform multilayer convolution processing on the feature image obtained in the previous stage;

the output module is configured to input the feature image output by each stage into a corresponding output layer and output the corresponding predicted key point position and the predicted semantic segmentation information of each stage;

a loss value determining module configured to perform a prediction key point position or prediction semantic segmentation information corresponding to each stage of each sample image, and a reference key point position or reference semantic segmentation information of the sample image, to obtain a loss value of the sample image at each stage;

and the model determining module is configured to train the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model.

Optionally, the loss value determining module is specifically configured to perform:

Optionally, the model determining module is specifically configured to perform:

Optionally, the apparatus further comprises:

a second acquisition module configured to perform acquiring a plurality of initial sample images;

a preprocessing module configured to perform at least one of the following adjustments on each of the initial sample images to obtain a plurality of sample images:

Optionally, the first obtaining module is specifically configured to perform:

Optionally, the model determining module includes:

the quasi-model determining unit is configured to execute training on the neural network model based on the loss value of each stage of each sample image to obtain a quasi-image segmentation model;

the test unit is configured to input the acquired test image with the reference semantic segmentation information into the quasi image segmentation model to obtain test semantic segmentation information;

Optionally, the model determining module further includes:

an acquisition unit configured to perform acquisition of an initial test image having reference semantic segmentation information;

and the adjusting unit is configured to adjust the resolution of the initial test image to a target resolution to obtain a test image with reference semantic segmentation information.

According to another aspect of embodiments of the present application, there is provided an apparatus for image segmentation, the apparatus including:

an acquisition module configured to perform acquisition of an image to be segmented;

and the determining module is configured to input the image to be segmented into the image segmentation model to obtain semantic segmentation information of the image to be segmented.

According to another aspect of embodiments of the present application, there is provided an electronic device including:

one or more processors;

one or more memories for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to perform the above-described method of training the image segmentation model;

alternatively, the one or more processors are configured to perform the method of image segmentation described above.

According to another aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above-mentioned training method for an image segmentation model;

or, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the image segmentation method.

According to another aspect of embodiments of the present application, there is provided a computer program product comprising one or more instructions executable by a processor of an electronic device to perform the method steps described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in an embodiment of the present application, a training method of an image segmentation model may include obtaining a plurality of sample images; inputting each sample image into a neural network model, carrying out multi-stage processing, and carrying out multilayer convolution processing in each stage to obtain a characteristic image; inputting the feature image output by each stage into a corresponding output layer, and outputting the position of a predicted key point and predicted semantic segmentation information corresponding to each stage; obtaining loss values of the sample images at each stage based on the corresponding predicted key point positions or predicted semantic segmentation information of each stage of each sample image and the reference key point positions or reference semantic segmentation information of the sample images; and training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model. According to the method, two kinds of reference information, namely the reference key point position and the reference semantic segmentation information, are used for training, the accuracy of the obtained image segmentation model is higher, and the accuracy of image segmentation can be improved when the image segmentation is carried out by the image segmentation model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a method of training an image segmentation model according to an exemplary embodiment.

FIG. 2 is an exemplary diagram illustrating semantic segmentation information in an image according to one exemplary embodiment.

FIG. 3 is a flow chart illustrating a method of training an image segmentation model in accordance with an exemplary embodiment.

FIG. 4 is a flow chart illustrating a method of image segmentation in accordance with an exemplary embodiment.

FIG. 5 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment.

FIG. 7 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating an apparatus for training an image segmentation model according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an apparatus for image segmentation in accordance with an exemplary embodiment.

FIG. 10 is a block diagram illustrating an electronic device implementing a training method for an image segmentation model or image segmentation, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 illustrates a training method of an image segmentation model according to an exemplary embodiment, which may be executed by an electronic device, where the electronic device may be a terminal or a server, and this embodiment is not particularly limited thereto. As shown in fig. 1, which is a flow chart of the method, the following steps may be included.

In step 101, the electronic device acquires a plurality of sample images, which may include a first sample image having a reference keypoint location and a second sample image having semantic segmentation information.

In the method, a content image usually contains a plurality of objects according to the content in the image, for example, a face, a hand, eyes, etc. in a person image can be referred to as an object in the person image.

The key point position is the position information of the key point of an object in the image, and the key point is the point with actual physical meaning. For example, the key points of a human hand may be the joints of fingers, the vertices of fingers, the points of connection of fingers to the palm, the joints at the connection of the palm to the wrist, and the like.

The reference keypoint location is the keypoint location marked by the technician in the sample image, and is opposite to the predicted keypoint location, which is the keypoint location output by the neural network model.

The semantic division information is an area image corresponding to a certain object in the image, for example, as shown in fig. 2, an area image corresponding to a human hand in a person image, an area image corresponding to a human face, and the like.

The reference semantic segmentation information is semantic segmentation information labeled by a technician in the sample image, and the predicted semantic segmentation information is semantic segmentation information output by the neural network model, as opposed to the predicted semantic segmentation information.

In the implementation, for convenience of description, a sample image having a reference key point position among the plurality of acquired sample images may be referred to as a first sample image, and a sample image having reference semantic segmentation information may be referred to as a second sample image. The sample image with the reference key point position may be a sample image with only the reference key point position, or may be a sample image with the reference key point position and the reference semantic segmentation. The sample image having the reference semantic segmentation information may be a sample image having only the reference semantic segmentation information, or may be a sample image having the reference semantic segmentation information and the reference key point position.

In step 102, the electronic device inputs each sample image into the neural network model, performs multi-stage processing, and performs multi-layer convolution processing in each stage to obtain a feature image.

The first stage is to perform multilayer convolution processing on the sample image, and the other stages except the first stage are to perform multilayer convolution processing on the characteristic image obtained in the last stage.

The multi-stage process may include two-stage process or more than two-stage process, and the number of stages is not limited in this embodiment, and a technician may flexibly set the number of stages according to actual situations.

For example, reference may be made to fig. 3, where fig. 3 is an example of two stages, in the first stage a sample image may be input into a neural network model, performing multilayer convolution processing to obtain characteristic image, taking the characteristic image obtained in the first stage as input in the second stage processing, performing multilayer convolution processing on the characteristic image obtained in the first stage to obtain a characteristic image in the second stage, taking the characteristic image obtained in the second stage as input in the third stage, performing multilayer convolution processing on the characteristic image obtained in the second stage to obtain a characteristic image in a third stage, and so on, in the ith stage, the characteristic image obtained in the (i-1) stage is used as an input, and performing multilayer convolution processing on the characteristic image obtained in the i-1 stage to obtain the characteristic image in the i stage, wherein i is an integer larger than 1.

In step 103, the electronic device inputs the feature image output at each stage into the corresponding output layer, and outputs the predicted key point position and the predicted semantic segmentation information corresponding to each stage.

The predicted keypoint location is a keypoint location of the sample image identified by the neural network model, and is opposite to the reference keypoint location. The predicted semantic segmentation information is semantic segmentation information of the sample image recognized by the neural network model, and is opposite to the reference semantic segmentation information.

The output layer may be an output layer of the neural network model, or an output layer corresponding to each stage.

And obtaining the predicted key point position and the predicted semantic segmentation information of the sample image in each stage.

For example, referring to fig. 3, the feature image obtained in the first stage may be divided into two branches, one branch is input to the output layer of the first stage to obtain the predicted key point position and the predicted semantic segmentation information of the first stage, and the other branch is used as the input of the second stage. In the second stage, the characteristic image obtained in the first stage is subjected to multilayer convolution processing to obtain a characteristic image in the second stage, if a third stage exists, the characteristic image in the second stage is also divided into two branches, one branch is input into an output layer in the second stage to obtain the predicted key point position and the predicted semantic segmentation information in the second stage, and the other branch is used as the input of the third stage. And if the third stage does not exist, the characteristic image obtained in the second stage is only input into the output layer of the second stage.

Thus, for each sample image, a set of predicted keypoint location and predicted semantic segmentation information may be obtained in each stage, e.g., if the neural network model includes two stages, then two sets of predicted keypoint location and predicted semantic segmentation information may be obtained for each sample image, and if the neural network model includes three stages, then three sets of predicted keypoint location and predicted semantic segmentation information may be obtained for each sample image.

In step 104, the electronic device obtains a loss value of the sample image at each stage based on the predicted keypoint position or the predicted semantic segmentation information corresponding to each stage of each sample image and the reference keypoint position or the reference semantic segmentation information of the sample image.

In implementation, for each sample image, after obtaining the prediction key point position and the prediction semantic segmentation information corresponding to each stage of the sample image, the electronic device may calculate a loss value of the sample image at each stage. For example, if the sample image is a first sample image, the electronic device may obtain the loss value of the first sample image at each stage based on the predicted keypoint position corresponding to each stage of the first sample image and the reference keypoint position of the first sample image. And if the sample image is a second sample image, the electronic device may obtain a loss value of the second sample image at each stage based on the prediction semantic segmentation information corresponding to each stage of the second sample image and the reference semantic segmentation information of the second sample image.

For a sample image with a reference key point position and reference semantic segmentation information, the loss value can be obtained by weighted sum of the loss value obtained from the reference key point position and the predicted key point position and the loss value obtained from the reference semantic segmentation information and the predicted semantic segmentation information.

In step 105, the electronic device trains the neural network model based on the loss value of each stage of each sample image, and obtains an image segmentation model.

In implementation, after the electronic device obtains the loss values of each sample image at each stage, the electronic device may train the neural network model based on the loss values to obtain an image segmentation model. For example, to simplify parameter tuning of the neural network model, the electronic device may first calculate an average loss value for each sample image based on the loss values for each stage of each sample image. Then, the electronic device trains the neural network model based on the average loss value of each sample image to obtain an image segmentation model.

For example, for each sample image, after the electronic device obtains its loss values at various stages, the loss values may be added and divided by the number of stages to obtain an average loss value for the sample image. For another example, each stage of the neural network model may be set with a weight value, and for each sample image, the loss value of each stage may be multiplied by the corresponding weight value, and then the sum of the products is the average loss value of the sample image.

In implementation, after the electronic device obtains the average loss value of each sample image, the parameters of the neural network model can be adjusted by using the average loss value, and finally, the image segmentation model can be obtained.

Based on the above, compared with an image segmentation model obtained by training only with single reference information, the method obtains the image segmentation model by training with two reference information, namely the reference key point position and the reference semantic segmentation information, and the accuracy is higher.

In addition, since the image segmentation model can output not only the predicted semantic segmentation information but also the predicted key point position, the image segmentation model can be used for acquiring not only the semantic segmentation information but also the key point position of the semantic segmentation information. That is, the image segmentation model can be used to acquire not only an area image corresponding to a certain object in an image, but also the key point position of the object. Furthermore, the application scene of the image segmentation model can be improved, and the application range is expanded.

Optionally, in order to acquire more sample images, the electronic device may acquire a plurality of initial images, and then acquire more images from the plurality of initial images, and use all the acquired images as sample images. For example, each time the electronic device acquires an initial image, the electronic device may horizontally flip the initial image to obtain an image that is mirror-symmetric to the initial image, and the electronic device may use both the initial image before horizontal flipping and the image after horizontal flipping as sample images. And then can expand the quantity of the sample picture, under the condition that the quantity of the sample picture is more, can improve the precision of the image segmentation model.

Optionally, to further improve the accuracy of the image segmentation model, correspondingly, before the electronic device acquires a plurality of sample images, a plurality of initial sample images may be acquired first, and then, for each initial sample image, at least one of the following adjustments is performed to obtain a plurality of sample images:

The illumination processing may include one or more of light intensity processing, gray scale processing, contrast processing, and the like. The rotation processing is processing for rotating the content in the sample image. The cutting process includes cutting the enlarged sample image and filling the reduced sample image. The miscut processing is a process of slightly distorting the sample image to slightly deform the content in the sample image.

In implementation, in order to unify the resolution of the sample images, the resolution of each of the acquired initial sample images may be adjusted to the target resolution. For example, the target resolution is a × b, if the resolution of the initial sample image is c × d, the electronic device may first adjust the resolution of the initial sample image to a × d and then to a × b, or first adjust the resolution of the initial sample image to a × a and then to a × b, where a is not equal to c and d is not equal to b, where the present embodiment does not specifically limit the specific adjustment process.

Therefore, when the image segmentation model is trained, the resolution of each sample image is the target resolution, and the problem of inaccurate identification of the image segmentation model caused by the resolution can be avoided, so that the accuracy of the image segmentation model is improved.

In order to reduce the influence of the noise of the image on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the noise value of the initial sample image may be adjusted to any noise value within a noise value range, that is, a noise value is randomly selected within the noise value range as the noise value of the initial sample image. In this way, the noise values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various noise values within the range of the noise values when in use, so that the influence of the noise values on the image segmentation model can be reduced.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first noise value can be selected within a noise value range, and when the sample image is used for training for the second time, a second noise value can be selected within the noise value range, the two sample images with the same content and different noise values are identical, but the reference information is identical, so that the influence of the noise value on the two sample images can be reduced by the trained image segmentation model.

Based on a similar principle, in order to reduce the influence of the illumination of the image on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the illumination value of the initial sample image may be adjusted to any illumination value within an illumination value range, that is, an illumination value is randomly selected within the illumination value range as the illumination value of the initial sample image. In this way, the illumination values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various illumination values in the illumination value range when in use, so that the influence of the illumination values on the image segmentation model can be reduced.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first illumination value can be selected within an illumination value range, and when the sample image is used for training for the second time, a second illumination value can be selected within the illumination value range, the two sample images are the same in content and different in illumination value, but the reference information is the same, so that the trained image segmentation model can reduce the influence of the illumination value on the sample images.

In order to reduce the influence of the rotation of the image on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the rotation value of the initial sample image may be adjusted to any rotation value within the range of rotation values, that is, a rotation value is randomly selected within the range of rotation values as the rotation value of the initial sample image. In this way, the rotation values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various rotation values within the rotation value range when in use, thereby reducing the influence of the rotation values on the image segmentation model.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first rotation value can be selected within a rotation value range, and when the sample image is used for training for the second time, a second rotation value can be selected within the rotation value range, the two sample images with the same content but different rotation values are identical to each other, but the reference information is identical to each other, so that the influence of the rotation value on the sample images can be reduced by the trained image segmentation model.

In order to reduce the influence of the image translation on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the electronic device may adjust the translation value of the initial sample image to any translation value within the translation value range, that is, randomly select one translation value within the translation value range as the translation value of the initial sample image. In this way, the translation values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various translation values in the translation value range when in use, thereby reducing the influence of the translation values on the image segmentation model.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first translation value can be selected within a translation value range, and when the sample image is used for training for the second time, a second translation value can be selected within the translation value range, the two sample images with the same content and different translation values are identical, but the reference information is identical, so that the influence of the translation value on the sample images can be reduced by the trained image segmentation model.

In order to reduce the influence of image shearing on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the shearing value of the initial sample image may be adjusted to any shearing value within a shearing value range, that is, a shearing value is randomly selected within the shearing value range as the shearing value of the initial sample image. In this way, the cut values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various cut values in the cut value range when in use, thereby reducing the influence of the cut values on the image segmentation model.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first clipping value can be selected within a clipping value range, and when the sample image is used for training for the second time, a second clipping value can be selected within the clipping value range, the two sample images with the same content but different clipping values are identical to each other, but the reference information is identical to each other, so that the influence of the clipping value on the sample images can be reduced by the trained image segmentation model.

In order to reduce the influence of image miscut on the image segmentation model, correspondingly, after the electronic device acquires each initial sample image, the miscut value of the initial sample image can be adjusted to any miscut value within a miscut value range, that is, a miscut value is randomly selected within the miscut value range to serve as the miscut value of the initial sample image. In this way, the miscut values of the sample images are not completely consistent, and the image segmentation model trained by using the sample images can be applied to the images to be segmented with various miscut values within the range of the miscut values when in use, thereby reducing the influence of the miscut values on the image segmentation model.

In addition, when the image segmentation model is trained, each sample image is not only used once, but may be used multiple times. For example, when a certain sample image is used for training for the first time, a first miscut value can be selected within a miscut value range, and when the sample image is used for training for the second time, a second miscut value can be selected within the miscut value range, the two sample images with the same content and different miscut values are identical, but the reference information is identical, so that the influence of the miscut values on the image segmentation model can be reduced.

Based on the above, the image segmentation model trained in this way can be used to segment images of noise values within a noise value range, illumination values within an illumination value range, rotation values within a rotation value range, translation values within a translation value range, and shear values within a shear value range, and thus, not only can the accuracy of image segmentation performed by the image segmentation model be improved, but also the application range of the image segmentation model is expanded.

Optionally, in order to test the accuracy of the trained image segmentation model, correspondingly, the electronic device may train the neural network model based on the loss value of each stage of each sample image to obtain a quasi image segmentation model; and then, the electronic equipment inputs the acquired test image with the reference semantic segmentation information into the quasi-image segmentation model to obtain the test semantic segmentation information.

And if the similarity between the reference semantic segmentation information and the test semantic segmentation information of the test image is not less than the similarity threshold, determining the quasi-image segmentation model as the image segmentation model.

In implementation, in the test, a technician may use a plurality of test images to perform the test, calculate the similarity corresponding to each test image, and determine the quasi-image segmentation model as the image segmentation model if each similarity is not less than the similarity threshold. Or, the technician still uses a plurality of test images to perform the test, calculates the similarity corresponding to each test image, and if the proportion of the number of the similarity not less than the similarity threshold value exceeds a preset proportion value, such as exceeds 0.9, the quasi-image segmentation model can be determined as the image segmentation model.

And if the test result does not meet the above, continuing to train the neural network model until the test result meets the above during testing.

In one possible application, if a technician intends to use a trained image segmentation model to obtain semantic segmentation information, then a test image with reference semantic segmentation information may be used at the time of testing. Whereas if the technician intends to use the trained image segmentation model to acquire keypoint locations, a test image with reference keypoint locations may be used during testing. If the technician intends to use the trained image segmentation model to acquire the semantic segmentation information and the key point positions, the test image with the reference semantic segmentation information and the reference key point positions can be used, or the test image with the reference semantic segmentation information is used for testing, and then the test image with the reference key point positions is used for testing.

Optionally, in order to avoid the influence of the resolution on the test result in the test, correspondingly, before the obtained test image with the reference semantic segmentation information is input into the quasi-image segmentation model and the test semantic segmentation information is obtained, the method may further include that the electronic device obtains an initial test image with the reference semantic segmentation information; and adjusting the resolution of the initial test image to the target resolution to obtain the test image with the reference semantic segmentation information.

That is, before the test image is input to the quasi-image segmentation model, the resolution of the test image needs to be adjusted, the resolution is adjusted to the target resolution, the uniformity of the resolution size is realized, and the influence of the resolution on the test result can be weakened.

In an embodiment of the present application, a training method of an image segmentation model may include obtaining a plurality of sample images; inputting each sample image into a neural network model, carrying out multi-stage processing, and carrying out multilayer convolution processing in each stage to obtain a characteristic image; inputting the feature image output by each stage into a corresponding output layer, and outputting the position of a predicted key point and predicted semantic segmentation information corresponding to each stage; obtaining loss values of the sample images at each stage based on the corresponding predicted key point positions or predicted semantic segmentation information of each stage of each sample image and the reference key point positions or reference semantic segmentation information of the sample images; and training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model. According to the method, two kinds of reference information, namely the reference key point position and the reference semantic segmentation information, are used for training, the accuracy of the obtained image segmentation model is higher, and when the image segmentation is carried out by the image segmentation model, the accuracy of image segmentation can be improved, so that the accuracy of semantic segmentation can be improved.

The present application further provides an image segmentation method, which may be executed by an electronic device, where the electronic device and the electronic device described above may be the same electronic device or different electronic devices, and the electronic device may be a terminal or a server, which is not limited in this embodiment.

The method may be performed according to the flow shown in fig. 4:

in step 401, the electronic device acquires an image to be segmented.

In implementation, after the electronic device acquires the image to be segmented, the resolution of the image to be segmented may be adjusted to obtain the image to be segmented with the resolution as the target resolution.

In step 402, the electronic device inputs an image to be segmented into an image segmentation model, and obtains semantic segmentation information of the image to be segmented.

The image segmentation image is the model obtained by the training.

In implementation, the electronic device may input the image to be segmented with the target resolution into the image segmentation model, so as to obtain semantic segmentation information of the image to be segmented.

The image segmentation model can not only output the semantic segmentation information, but also output the key point positions, and further can obtain the key point positions of the semantic segmentation information when the image to be segmented is segmented.

In the embodiment of the application, in image segmentation, an image to be segmented can be input into an image segmentation model which is trained in advance by using reference semantic segmentation information and reference key point positions, so as to obtain the semantic segmentation information. The image segmentation model is obtained by training two kinds of reference information, namely the reference key point position and the reference semantic segmentation information, and is high in accuracy, so that the semantic segmentation information obtained by the image segmentation model is high in accuracy, and the semantic segmentation accuracy can be improved.

The present application further provides an apparatus for training an image segmentation model, as shown in fig. 5, the apparatus may include:

a first obtaining module 510 configured to perform obtaining a plurality of sample images, the sample images including a first sample image having a reference key point position and a second sample image having reference semantic segmentation information;

a processing module 520 configured to perform multi-stage processing by inputting each sample image into the neural network model, and perform multi-layer convolution processing in each stage to obtain a feature image, wherein the first stage is to perform multi-layer convolution processing on the sample image, and the other stages except the first stage are to perform multi-layer convolution processing on the feature image obtained in the previous stage;

an output module 530 configured to perform input of the feature image output at each stage to a corresponding output layer, and output a predicted key point position and predicted semantic segmentation information corresponding to each stage;

a loss value determining module 540 configured to perform a process of obtaining a loss value of each sample image at each stage based on a corresponding predicted key point position or predicted semantic segmentation information of each sample image at each stage and a reference key point position or reference semantic segmentation information of each sample image;

and a model determining module 550 configured to perform training on the neural network model based on the loss value of each stage of each sample image, so as to obtain an image segmentation model.

Optionally, the loss value determining module 540 is specifically configured to perform:

Optionally, the model determining module 550 is specifically configured to perform:

Optionally, as shown in fig. 6, the apparatus further includes:

a second acquisition module 508 configured to perform acquiring a plurality of initial sample images;

a preprocessing module 509 configured to perform at least one of the following adjustments on each of the initial sample images to obtain a plurality of sample images:

Optionally, the first obtaining module 510 is specifically configured to perform:

Optionally, as shown in fig. 7, the model determining module 550 includes:

a quasi-model determining unit 551 configured to perform training on the neural network model based on a loss value of each stage of each sample image, resulting in a quasi-image segmentation model;

a testing unit 552 configured to input the acquired test image with the reference semantic segmentation information into the quasi image segmentation model to obtain test semantic segmentation information;

Optionally, as shown in fig. 8, the model determining module 550 further includes:

an acquisition unit 553 configured to perform acquisition of an initial test image having reference semantic segmentation information;

an adjusting unit 554 configured to perform adjusting the resolution of the initial test image to a target resolution, resulting in a test image with reference semantic segmentation information.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present application further provides an image segmentation apparatus, as shown in fig. 9, the apparatus includes:

an obtaining module 910 configured to perform obtaining an image to be segmented;

the determining module 920 is configured to input the image to be segmented into the image segmentation model, so as to obtain semantic segmentation information of the image to be segmented.

Fig. 10 is a schematic structural diagram of an electronic device, which may be a schematic structural diagram of a server, where the electronic device 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1001 to implement the above training method of the image segmentation model or the image segmentation method.

There is also provided, in accordance with an embodiment of the present application, a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above-mentioned method for training an image segmentation model or the method for image segmentation.

According to an embodiment of the present application, there is provided a computer program product including one or more instructions executable by a processor of an electronic device to perform the method for training an image segmentation model or the method steps for image segmentation described above.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for training an image segmentation model, the method comprising:

2. The method according to claim 1, wherein the obtaining the loss value of the sample image at each stage based on the corresponding predicted key point position or predicted semantic segmentation information of each stage of the sample image and the reference key point position or reference semantic segmentation information of the sample image comprises:

3. The method of claim 1, wherein training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model comprises:

4. The method according to any one of claims 1 to 3, wherein the training the neural network model based on the loss value of each stage of each sample image to obtain an image segmentation model comprises:

5. The method according to claim 4, wherein before inputting the acquired test image with the reference semantic segmentation information into the quasi-image segmentation model to obtain the test semantic segmentation information, the method further comprises:

6. A method of image segmentation, the method comprising:

acquiring an image to be segmented;

inputting the image to be segmented into the image segmentation model according to any one of claims 1 to 5 to obtain semantic segmentation information of the image to be segmented.

7. An apparatus for training an image segmentation model, the apparatus comprising:

8. An apparatus for image segmentation, the apparatus comprising:

a determining module configured to input the image to be segmented into the image segmentation model according to any one of claims 1 to 5, so as to obtain semantic segmentation information of the image to be segmented.

9. An electronic device, comprising:

one or more processors;

wherein the one or more processors are configured to perform the method of training the image segmentation model of any of claims 1 to 5;

or, the one or more processors are configured to perform the method of image segmentation of claim 6.

10. A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of training an image segmentation model of any one of claims 1 to 5;

or, when the instructions in the storage medium are executed by a processor of an electronic device, enable the electronic device to perform the method of image segmentation of claim 6.