WO2022048151A1 - 语义分割模型训练方法及装置、图像语义分割方法及装置 - Google Patents

语义分割模型训练方法及装置、图像语义分割方法及装置 Download PDF

Info

Publication number
WO2022048151A1
WO2022048151A1 PCT/CN2021/085721 CN2021085721W WO2022048151A1 WO 2022048151 A1 WO2022048151 A1 WO 2022048151A1 CN 2021085721 W CN2021085721 W CN 2021085721W WO 2022048151 A1 WO2022048151 A1 WO 2022048151A1
Authority
WO
WIPO (PCT)
Prior art keywords
segmentation
information
semantic
semantic segmentation
image
Prior art date
Application number
PCT/CN2021/085721
Other languages
English (en)
French (fr)
Inventor
赵姗
王氚
刘帅成
Original Assignee
北京迈格威科技有限公司
成都旷视金智科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司, 成都旷视金智科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2022048151A1 publication Critical patent/WO2022048151A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure generally relates to the field of image processing, and in particular relates to a semantic segmentation model training method, an image semantic segmentation method, a semantic segmentation model training device, an image semantic segmentation device, an electronic device and a computer-readable storage medium.
  • the semantic segmentation of the image is to classify the image at the pixel level.
  • the semantic segmentation model is used to classify the same target content in the image into one category, such as image
  • There is a vehicle in determine the pixels belonging to the vehicle and segment all the pixels belonging to the vehicle, and determine the boundary segmentation frame of the vehicle at the pixel level.
  • the target content can be a specific person, object or text, etc.
  • the target content is determined at the pixel level in the image and segmented.
  • the accuracy of semantic segmentation is poor, especially in natural scene images, the training data is small, the training cost is high, and the accuracy of the trained semantic segmentation model is low.
  • the present disclosure provides a method for training a semantic segmentation model, wherein the method may include: acquiring a training set, wherein the The training set may include multiple images and annotation information corresponding to the images, and the annotation information corresponding to any image includes segmentation frame annotations and/or semantic segmentation annotations; perform feature extraction on the image to obtain feature data of the image.
  • a loss value is determined based on the second segmentation frame information and the annotation information, and/or based on the second semantic segmentation information and the annotation information; based on the loss value, Adjust the parameters of the semantic segmentation model.
  • the semantic segmentation model may include: a segmentation frame decoding unit and a semantic decoding unit; the obtaining first segmentation frame information and first semantic segmentation information based on the feature data, The method includes: decoding through the segmentation frame decoding unit based on the feature data to obtain the first segmentation frame information; and decoding through the semantic decoding unit based on the feature data to obtain the first semantic segmentation information .
  • the second segmentation frame information and second semantic segmentation information of the image are obtained based on the feature data, the first segmentation frame information and the first semantic segmentation information
  • the segmentation information includes: decoding by the segmentation frame decoding unit based on the feature data and the first semantic segmentation information to obtain the second segmentation frame information; based on the feature data and the first segmentation frame
  • the information is decoded by the semantic decoding unit to obtain the second semantic segmentation information.
  • the semantic segmentation model may further include: a coding unit; the performing feature extraction on an image to obtain feature data of the image, including: based on the image, through the The coding unit performs coding to obtain feature data of the image.
  • the loss value may include a first loss value; the information based on the second segmentation frame and the annotation information, and/or the information based on the second semantic segmentation and determining the loss value with the annotation information, including: if the annotation information corresponding to the image only includes the segmentation frame annotation, determining the first loss based on the segmentation frame annotation and the second segmentation frame information value; if the annotation information corresponding to the image only includes the semantic segmentation annotation, the first loss value is determined based on the semantic segmentation annotation and the second semantic segmentation information; if the annotation information corresponding to the image including the segmentation frame annotation and the semantic segmentation annotation, then based on the segmentation frame annotation and the second segmentation frame information, and based on the semantic segmentation annotation and the second semantic segmentation information, determine the first loss value.
  • the loss value may include a second loss value; the information based on the second segmentation frame and the annotation information, and/or the information based on the second semantic segmentation
  • Determining a loss value with the annotation information includes: if the annotation information corresponding to the image includes the segmentation frame annotation, determining the second loss value based on the second semantic segmentation information and the segmentation frame annotation ; if the annotation information corresponding to the image does not include the segmentation frame annotation, then determine the second loss value based on the second semantic segmentation information and the second segmentation frame information.
  • the loss value may include a third loss value; the information based on the second segmentation frame and the annotation information, and/or the information based on the second semantic segmentation Determining a loss value with the label information includes: determining the third loss value based on a conditional random field.
  • the present disclosure provides a method for semantic image segmentation, wherein the method may include: acquiring an image; performing feature extraction on the image to obtain feature data of the image; From the feature data, first segmentation frame information is obtained; based on the feature data and the first segmentation frame information, second semantic segmentation information of the image is obtained.
  • the method is applied to a semantic segmentation model
  • the semantic segmentation model may include: a segmentation frame decoding unit and a semantic decoding unit; through the segmentation frame decoding unit, based on the feature data to obtain first segmentation frame information; through the semantic decoding unit, based on the feature data and the first segmentation frame information, second semantic segmentation information of the image is obtained.
  • the semantic segmentation model may further include: an encoding unit; and through the encoding unit, feature extraction is performed on the image to obtain feature data of the image.
  • the method may further include: obtaining, by the semantic decoding unit, first semantic segmentation information based on the feature data; The feature data and the first semantic segmentation information are used to obtain the second segmentation frame information of the image.
  • the present disclosure provides an apparatus for training a semantic segmentation model
  • the apparatus may include: a first acquisition module, which may be configured to acquire a training set, wherein the training set may be It includes multiple images and annotation information corresponding to the images, and the annotation information corresponding to any image includes segmentation frame annotation and/or semantic segmentation annotation;
  • the first feature extraction module can be configured to perform feature extraction on the image to obtain The feature data of the image;
  • the first semantic module can be configured to obtain the first segmentation frame information and the first semantic segmentation information based on the feature data;
  • the first semantic module can also be configured to, based on the feature data, the the first segmentation frame information and the first semantic segmentation information, to obtain the second segmentation frame information and the second semantic segmentation information of the image;
  • the loss determination module can be configured to be based on the second segmentation frame information and the The labeling information, and/or a loss value is determined based on the second semantic segmentation information and the labeling information;
  • the adjustment module may be configured to adjust parameters of the
  • the semantic segmentation model may include: a segmentation frame decoding unit and a semantic decoding unit; the first semantic module is further configured to: based on the feature data, through the segmentation frame The decoding unit performs decoding to obtain the first segmentation frame information; and based on the feature data, the semantic decoding unit performs decoding to obtain the first semantic segmentation information.
  • the first semantic module is further configured to: perform decoding by the segmentation frame decoding unit based on the feature data and the first semantic segmentation information to obtain the first semantic segmentation Two segmentation frame information; based on the feature data and the first segmentation frame information, the semantic decoding unit performs decoding to obtain the second semantic segmentation information.
  • the semantic segmentation model may further include: an encoding unit; the first feature extraction module is configured to: perform encoding through the encoding unit based on the image, to obtain the Feature data of the image.
  • the present disclosure provides an image semantic segmentation device, wherein the device may include: a second acquisition module, which can be configured to acquire an image; a second feature extraction module, which can be configured to Perform feature extraction on the image to obtain feature data of the image; the second semantic module can be configured to obtain first segmentation frame information based on the feature data; the second semantic module can also be configured to, based on the feature data The feature data and the first segmentation frame information are obtained to obtain the second semantic segmentation information of the image.
  • a second acquisition module which can be configured to acquire an image
  • a second feature extraction module which can be configured to Perform feature extraction on the image to obtain feature data of the image
  • the second semantic module can be configured to obtain first segmentation frame information based on the feature data
  • the second semantic module can also be configured to, based on the feature data
  • the feature data and the first segmentation frame information are obtained to obtain the second semantic segmentation information of the image.
  • the present disclosure provides an electronic device, which may include: a memory, which may be configured to store instructions; and a processor, which may be configured to invoke the instructions stored in the memory to execute the semantic segmentation model described above. training method or image semantic segmentation method as above.
  • the present disclosure provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed by a processor, execute the above-mentioned semantic segmentation model training method or the above-mentioned image semantics segmentation method.
  • the present disclosure provides a semantic segmentation model training method, an image semantic segmentation method, a semantic segmentation model training device, an image semantic segmentation device, an electronic device, and a computer-readable storage medium.
  • the annotations can include only segmentation frame annotations, only semantic segmentation annotations, or both, so as to easily expand training data and reduce training costs.
  • FIG. 1 shows a schematic flowchart of a method for training a semantic segmentation model according to an embodiment of the present disclosure
  • Figure 2A, Figure 2B, Figure 2C show a schematic diagram of segmentation frame information and semantic segmentation information extracted from an image
  • FIG. 3 shows a schematic structural diagram of a semantic segmentation model according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic data flow diagram of a semantic segmentation model according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic flowchart of an image semantic segmentation method according to an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of an apparatus for training a semantic segmentation model according to an embodiment of the present disclosure.
  • FIG. 7 shows a schematic diagram of an image semantic segmentation apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the training data for semantic segmentation models is small and the training cost is high, resulting in low accuracy of the trained semantic segmentation models.
  • the Text contains different colors, fonts, shapes and sizes.
  • MSER Maximum Stable Extremal Regions
  • SWT stroke width transform, stroke width transformation
  • an embodiment of the present disclosure provides a method 10 for training a semantic segmentation model. As shown in FIG. 1 , it may include steps S11 to S16. The above steps will be described in detail below:
  • Step S11 acquiring a training set, wherein the training set includes a plurality of images and annotation information corresponding to the images, and the annotation information corresponding to any image includes segmentation frame annotation and/or semantic segmentation annotation.
  • the acquired images of the training set may be photos, pictures, or video frames with annotation information.
  • the annotation information may include only segmentation frame annotations, or only semantic segmentation annotations, or may include both segmentation frame annotations and semantic segmentation annotations. Due to the various types of annotation information of the images required for the training set, it is easy to obtain a large amount of high-quality training data, thereby ensuring the training effect. Especially for semantic segmentation and annotation, the cost of manual annotation is too high; the accuracy of model annotation is insufficient, resulting in poor training effect; and in the case of training through synthetic images, the trained model is accurate for semantic segmentation of images in real scenes. Therefore, in the embodiment of the present disclosure, images with semantic segmentation annotations and images with segmentation frame annotations are used as training data, avoiding the defects of small quantity and insufficient quality of single training data.
  • the segmentation frame is the position range of the target, which is generally shown as a rectangular frame or a quadrilateral frame; the semantic segmentation is to perform pixel-level segmentation on a type of target in the image.
  • Fig. 2A is an image of a real scene, including text content
  • Fig. 2B the region of the text content is divided into text boxes, that is, the obtained text box information
  • Figure 2C the content of the text is segmented at the pixel level, that is, the acquired semantic segmentation information.
  • Step S12 extracting features from the image to obtain feature data of the image.
  • the semantic recognition model may include a coding unit, and feature extraction is performed on the image through the coding unit to obtain feature data.
  • the coding unit may include one or more convolution layers to perform convolution (Convolution) processing on the image, and may also perform pooling (Pooling) processing on the image at the same time.
  • Step S13 based on the feature data, obtain first segmentation frame information and first semantic segmentation information.
  • the first segmentation frame information and the first semantic segmentation information can be obtained respectively through different algorithms or independent calculation by different units based on the feature data.
  • the target position in the image needs to be determined, only the segmentation frame is often obtained; if semantic segmentation is required, the semantic segmentation information is often directly obtained.
  • the above two pieces of information are acquired at the same time, thereby improving the utilization rate of the image and making full use of the information in the image.
  • Step S14 based on the feature data, the first segmentation frame information and the first semantic segmentation information, obtain second segmentation frame information and second semantic segmentation information of the image.
  • first segmentation frame information and the first semantic segmentation information After initially obtaining the first segmentation frame information and the first semantic segmentation information, it is not used as the final output of image semantic segmentation, but uses these information and the extracted feature data to input again, and the two data guide each other, so as to use More information is used for semantic segmentation, and more accurate second segmentation frame information and second semantic segmentation information are obtained.
  • the semantic segmentation model may include: a segmentation frame decoding unit and a semantic decoding unit; in step S13, based on the feature data, the segmentation frame decoding unit performs decoding to obtain first segmentation frame information; The semantic decoding unit performs decoding to obtain first semantic segmentation information.
  • the semantic segmentation model in this embodiment may have two independent units. Both the segmentation frame decoding unit and the semantic decoding unit may perform operations such as upsampling and convolution on the feature data, and output the first segmentation frame information and the first segmentation frame information of the image respectively. - Semantic segmentation information.
  • Obtaining different information through two independent units can reduce the complexity of the model and the amount of data of the model compared with obtaining two pieces of information through one network while making full use of the image information, thereby reducing the computational cost. cost, and can output the accuracy of the results.
  • step S14 may include: decoding by the segmentation frame decoding unit based on the feature data and the first semantic segmentation information to obtain the second segmentation frame information; based on the feature data and the first segmentation frame information, through the semantic decoding unit Decoding is performed to obtain second semantic segmentation information.
  • the first semantic segmentation information and the feature data output by the semantic decoding unit are used as input, input to the segmentation frame decoding unit, and the second segmentation frame information is output; the first segmentation frame information output by the segmentation frame decoding unit, and feature data as input, input to the semantic decoding unit, and output the second semantic segmentation information.
  • the output of both units is used as the input of the other unit to achieve mutual guidance, and each unit can obtain more information.
  • the two units can be optimized based on any type of annotation information of the image.
  • step S13 in order to normalize the unit input, in step S13, feature data and 0-valued semantic segmentation information may be input to the segmentation frame decoding unit; feature data and 0-valued segmentation information may be input to the semantic decoding unit box information.
  • the normalization of the input information of each unit in step S13 and step S14 is guaranteed.
  • Step S15 Determine the loss value based on the second segmentation frame information and the labeling information, and/or based on the second semantic segmentation information and the labeling information.
  • the annotation information has various situations, and the corresponding output can be selected according to the type of the annotation information corresponding to the currently input image, and the loss value can be determined. Due to the mutual guidance of segmentation frame information and semantic segmentation information for training, the loss value can be determined for any type of annotation information, and the semantic segmentation model can be optimized accordingly.
  • Step S16 based on the loss value, adjust the parameters of the semantic segmentation model.
  • the parameters of the semantic segmentation model are adjusted, and the loss value is gradually reduced through multiple rounds of training.
  • the loss value is less than a threshold, the training of the semantic segmentation model is stopped.
  • images of various types of annotations can be used as training data, and any type of annotation information can determine the loss value according to the output, so as to optimize and adjust the parameters of the model, so as to facilitate the expansion
  • the number of training data reduces the training cost, and the model is trained through a large amount of high-quality training data, which ensures the training effect and makes the semantic segmentation model of the trained semantic segmentation model with high accuracy.
  • the semantic segmentation model M can be a neural network model, taking a single image I as an input, and the image can be an RGB image or a grayscale image.
  • the final output is the probability map of pixel-level text segmentation, that is, semantic segmentation information O T , and the polygonal text box segmentation probability map, that is, segmentation box information OP .
  • the semantic segmentation model M may contain a shared coding unit E, and two separate decoding units, the semantic decoding unit D T and the segmentation frame decoding unit D P .
  • the encoding unit E extracts the features of the input image and sends them to the two decoding units respectively.
  • the outputs O T and O P of the two decoding units are also used as the input of another branch task, that is, the output of the semantic decoding unit DT is used as the input of the segmentation frame decoding unit DP , and the output of the segmentation frame decoding unit DP is used as the semantic decoding unit.
  • the input to unit DT enabling dual tasks to instruct each other.
  • the semantic decoding unit DT and the segmentation frame decoding unit DP in Figure 4 are actually one semantic segmentation model M. 4 In order to represent the training process, it is represented by two. As shown in Figure 4, the feature data E I of the input image is extracted with a value of 0 and the coding unit E, and input to the input of the semantic decoding unit DT and the segmentation frame decoding unit DP respectively, and the semantic decoding unit DT is output to the first semantic segmentation.
  • the information O T and the segmented frame decoding unit DP output the first segmented frame information OP .
  • the segmentation frame decoding unit DP uses the first semantic segmentation information O T and the feature data E I as the input of the segmentation frame decoding unit DP to obtain the second segmentation frame information O'P ; take the first segmentation frame information OP and the feature data E I as the semantics
  • the input of the decoding unit DT is obtained to obtain the second semantic segmentation information O' T .
  • the corresponding output is used to determine the loss value. Since each output is output through two units, the training data of any label type can adjust the parameters of the semantic segmentation model.
  • the structure of the coding unit in the semantic segmentation model in any of the above embodiments can be set corresponding to the semantic decoding unit and the segmentation frame decoding unit.
  • the structure of the coding unit can be as shown in Table 1.
  • the semantic decoding unit and the segmentation frame decoding unit The structure of the unit can be as shown in Table 2.
  • semantic segmentation model that is, the hyperparameters of the semantic segmentation model
  • the structure of the semantic segmentation model is only an example, and can actually be set according to different image formats, accuracy requirements, and the like.
  • the loss value may include the first loss value
  • step S15 may include: if the annotation information corresponding to the image only includes the segmentation frame annotation, determining the first loss value based on the segmentation frame annotation and the second segmentation frame information. ; If the annotation information corresponding to the image only includes semantic segmentation annotation, the first loss value is determined based on the semantic segmentation annotation and the second semantic segmentation information; if the annotation information corresponding to the image includes segmentation frame annotation and semantic segmentation annotation, based on the segmentation frame annotation The annotation and the second segmentation frame information, and based on the semantic segmentation annotation and the second semantic segmentation information, a first loss value is determined.
  • the corresponding output is used to determine the first loss value.
  • the first loss value may be "1 - overlap ratio between output and label", where the overlap ratio between output and label is: area intersection of output and label/union of area of output and label. Therefore, the more accurate the output is, the closer it is to the label, the higher the overlap rate, and the lower the first loss value.
  • the first loss value can be output by the corresponding output, namely the second segmentation frame information or the second semantic segmentation information to calculate.
  • the value can be calculated from the corresponding segmentation frame annotation and semantic segmentation annotation according to the second segmentation frame information and the second semantic segmentation information at the same time. Add up as the first loss value.
  • the loss value may include a second loss value
  • step S15 may include: if the annotation information corresponding to the image includes a segmentation frame annotation, determining the second loss value based on the second semantic segmentation information and the segmentation frame annotation; if If the annotation information corresponding to the image does not include the segmentation frame annotation, the second loss value is determined based on the second semantic segmentation information and the second segmentation frame information.
  • the second loss value may represent the relationship between the outputs of the two units.
  • the range of semantic segmentation of the same content should be within the range of the segmentation frame.
  • the second loss value may be the area marked by the second semantic segmentation information beyond the segmentation frame. Based on this, a second loss value can be determined.
  • the annotation information corresponding to the image includes segmentation frame annotations, that is, the annotation information corresponding to the image includes only segmentation frame annotations
  • the annotation information corresponding to the image includes segmentation frame annotations and semantic segmentation annotations. Semantic segmentation information and segmentation box labels determine the second loss value.
  • the second loss value can be determined according to the relationship between the second segmentation frame information output by the segmentation frame decoding unit and the second semantic segmentation information.
  • the second semantic segmentation information should not exceed the range of the second segmentation frame information. Therefore, the second loss value is calculated according to this, and the parameters of the semantic segmentation model can be optimized and adjusted through the mutual supervision of the two units.
  • the loss value may further include a third loss value
  • step S15 may further include: determining the third loss value based on the conditional random field.
  • CRF conditional random field
  • the loss value may include the above-mentioned first loss value, second loss value and third loss value, and corresponding coefficients may be determined according to actual needs, so that the training of the semantic segmentation model is more efficient and the result is more reliable.
  • the loss value can be determined by the following formula:
  • L is the loss value
  • L 1 is the first loss value
  • L 2 is the second loss value
  • L 3 is the third loss value
  • the second loss value can better represent the relationship between the segmentation frame information and the semantic segmentation information, and the second loss value determined based on this can well optimize the parameters of the model, so the coefficient of the second loss value can be taken as A higher value improves training efficiency and training effect.
  • the correlation between the third loss value determined by introducing the conditional random field and the segmentation frame decoding unit and the semantic decoding unit is relatively small, so the coefficient of the third loss value can be relatively small, so as to avoid excessively biased adjustment parameters in the training process. .
  • an embodiment of the present disclosure further provides a method 20 for semantic image segmentation.
  • the method 20 for semantic image segmentation includes steps S21 to S24. The above steps are described in detail below:
  • Step S21 acquiring an image.
  • Images can be acquired in real time, for example, images are acquired in real time through cameras, photographic equipment, and the like. It is also possible to obtain images that need to be semantically segmented. For example, in some cases, it is necessary to perform target recognition on the image or to perform image processing on the target in the image, and the image needs to be semantically segmented first. It can also obtain images that need to be semantically segmented in batches for labeling or other purposes.
  • the image may be a photo, or one or more frames in a video.
  • Step S22 extracting features from the image to obtain feature data of the image.
  • the semantic recognition model may include a coding unit, and feature extraction is performed on the image through the coding unit to obtain feature data.
  • the coding unit may include one or more convolution layers to perform convolution (Convolution) processing on the image, and may also perform pooling (Pooling) processing on the image at the same time.
  • Step S23 based on the feature data, obtain first segmentation frame information.
  • Step S24 based on the feature data and the first segmentation frame information, obtain second semantic segmentation information of the image.
  • the second semantic segmentation information is obtained based on the first segmentation frame information and the feature data, so as to make full use of the image information, and calculate independently through different algorithms or different units.
  • the segmentation frame information and the semantic segmentation information are based on the first segmentation frame information and combined with the feature data to obtain the semantic segmentation information, thereby improving the accuracy of the semantic segmentation.
  • the target position in the image needs to be determined, only the segmentation frame is often obtained; if semantic segmentation is required, the semantic segmentation information is often directly obtained.
  • the above two pieces of information are acquired at the same time, thereby improving the utilization rate of the image, making full use of the information in the image, and improving the accuracy of semantic segmentation.
  • the semantic segmentation model applied to the image semantic segmentation method 20 may include: a segmentation frame decoding unit and a semantic decoding unit; step S23 is performed by the segmentation frame decoding unit; step S24 is performed by the semantic decoding unit.
  • the image semantic segmentation method 20 may further include: obtaining first semantic segmentation information based on feature data through a semantic decoding unit; obtaining image segmentation information based on feature data and first semantic segmentation information through a segmentation frame decoding unit. Second segmentation box information.
  • the semantic segmentation model in this embodiment may have two independent units. Both the segmentation frame decoding unit and the semantic decoding unit may perform operations such as upsampling and convolution on the feature data, and output the first segmentation frame information and the first segmentation frame information of the image respectively. - Semantic segmentation information. Obtaining different information through two independent units can reduce the complexity of the model and the amount of data of the model compared with obtaining two pieces of information through one network while making full use of the image information, thereby reducing the computational cost. cost, and can output the accuracy of the results.
  • first segmentation frame information and the first semantic segmentation information After initially obtaining the first segmentation frame information and the first semantic segmentation information, it is not used as the final output of image semantic segmentation, but uses these information and the extracted feature data to input again, so as to use more information to carry out Semantic segmentation, to obtain more accurate second segmentation frame information and second semantic segmentation information.
  • the semantic segmentation model applied to the image semantic segmentation method 20 can be obtained by training the semantic segmentation model training method 10 in any of the foregoing embodiments. Therefore, the accuracy of semantic segmentation of the semantic segmentation model can be improved, and the training data can be easily obtained, thereby reducing the training cost.
  • the present disclosure also provides a semantic segmentation model training apparatus 100.
  • the semantic segmentation model training apparatus 100 includes: a first acquisition module 110 for acquiring a training set, wherein the training set includes a plurality of An image and the annotation information corresponding to the image, the annotation information corresponding to any image includes segmentation frame annotation and/or semantic segmentation annotation; the first feature extraction module 120 is used to perform feature extraction on the image to obtain the feature data of the image; first The semantic module 130 is used to obtain the first segmentation frame information and the first semantic segmentation information based on the feature data; the first semantic module 130 is also used to obtain the image based on the feature data, the first segmentation frame information and the first semantic segmentation information The second segmentation frame information and the second semantic segmentation information of the , which is used to adjust the parameters of the semantic segmentation model based on the loss value.
  • the semantic segmentation model includes: a segmentation frame decoding unit and a semantic decoding unit; the first semantic module 130 is used for: decoding the segmentation frame decoding unit based on the feature data to obtain the first segmentation frame information; based on the feature data , decoded by the semantic decoding unit to obtain the first semantic segmentation information.
  • the first semantic module 130 is further configured to: decode the segmentation frame decoding unit based on the feature data and the first semantic segmentation information to obtain the second segmentation frame information; based on the feature data and the first segmentation frame information, Decoding is performed by the semantic decoding unit to obtain second semantic segmentation information.
  • the semantic segmentation model further includes: an encoding unit; the first feature extraction module 120 is configured to: perform encoding through the encoding unit based on the image to obtain feature data of the image.
  • the loss value includes a first loss value
  • the loss determination module 140 is further configured to: when the annotation information corresponding to the image only includes the segmentation frame annotation, determine the first loss based on the segmentation frame annotation and the second segmentation frame information. value; when the annotation information corresponding to the image only includes semantic segmentation annotations, the first loss value is determined based on the semantic segmentation annotations and the second semantic segmentation information; when the annotation information corresponding to the image includes segmentation frame annotations and semantic segmentation annotations, based on segmentation The frame annotation and the second segmentation frame information, and based on the semantic segmentation annotation and the second semantic segmentation information, the first loss value is determined.
  • the loss value includes a second loss value
  • the loss determination module 140 is further configured to: when the annotation information corresponding to the image includes a segmentation frame annotation, determine the second loss value based on the second semantic segmentation information and the segmentation frame annotation. ; When the annotation information corresponding to the image does not include the segmentation frame annotation, determine the second loss value based on the second semantic segmentation information and the second segmentation frame information.
  • the loss value includes a third loss value; the loss determination module 140 is further configured to: determine the third loss value based on the conditional random field.
  • the present disclosure also provides an image semantic segmentation device 200.
  • the image semantic segmentation device 200 includes: a second acquisition module 210 for acquiring images; a second feature extraction module 220 for The image is subjected to feature extraction to obtain feature data of the image; the second semantic module 230 is used to obtain the first segmentation frame information based on the feature data; the second semantic module 230 is also used to obtain the first segmentation frame information based on the feature data and the first segmentation frame information.
  • the second semantic segmentation information of the image includes: a second acquisition module 210 for acquiring images; a second feature extraction module 220 for The image is subjected to feature extraction to obtain feature data of the image; the second semantic module 230 is used to obtain the first segmentation frame information based on the feature data; the second semantic module 230 is also used to obtain the first segmentation frame information based on the feature data and the first segmentation frame information.
  • the second semantic segmentation information of the image includes: a second acquisition module 210 for acquiring images; a second feature extraction module 220
  • the image semantic segmentation apparatus 200 is applied to a semantic segmentation model, and the semantic segmentation model includes: a segmentation frame decoding unit and a semantic decoding unit; the segmentation frame decoding unit obtains first segmentation frame information based on feature data; The decoding unit obtains the second semantic segmentation information of the image based on the feature data and the first segmentation frame information.
  • the semantic segmentation model further includes: an encoding unit; and through the encoding unit, feature extraction is performed on the image to obtain feature data of the image.
  • the second semantic module 230 is further configured to: obtain the first semantic segmentation information based on the feature data through the semantic decoding unit; obtain the first semantic segmentation information based on the feature data and the first semantic segmentation information through the segmentation frame decoding unit; Second segmentation box information.
  • an embodiment of the present disclosure provides an electronic device 300 .
  • the electronic device 300 includes a memory 301 , a processor 302 , and an input/output (I/O) interface 303 .
  • the memory 301 is used for storing instructions.
  • the processor 302 is configured to invoke the instructions stored in the memory 301 to execute the semantic segmentation model training method or the image semantic segmentation method according to the embodiment of the present disclosure.
  • the processor 302 is respectively connected with the memory 301 and the I/O interface 303, for example, it can be connected through a bus system and/or other forms of connection mechanisms (not shown).
  • the memory 301 can be used to store programs and data, including the programs of the semantic segmentation model training method or the image semantic segmentation method involved in the embodiments of the present disclosure, and the processor 302 executes various functions of the electronic device 300 by running the programs stored in the memory 301 applications and data processing.
  • the processor 302 may adopt a digital signal processor (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), a Programmable Logic Array (Programmable Logic Array, PLA). At least one hardware form is implemented, and the processor 302 may be a central processing unit (Central Processing Unit, CPU) or one or more of other forms of processing units with data processing capability and/or instruction execution capability. combination.
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the memory 301 in embodiments of the present disclosure may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (Random Access Memory, RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (Read-Only Memory, ROM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive, HDD), or a solid-state drive (Solid-State Drive, SSD), etc. .
  • the I/O interface 303 can be used to receive input instructions (such as numeric or character information, and generate key signal input related to user settings and function control of the electronic device 300 , etc.), and can also output various external information (for example, images or sounds, etc.).
  • the I/O interface 303 may include one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, and a touch panel, etc. indivual.
  • a software module is implemented using a computer program product comprising a computer-readable medium containing computer program code executable by a computer processor for performing any or all of the described steps, operations or procedures.
  • the semantic segmentation model training method, image semantic segmentation method, semantic segmentation model training device, image semantic segmentation device, electronic device and computer-readable storage medium provided by the present disclosure can easily expand training data, reduce training costs, and ensure image semantics The accuracy of segmentation results.
  • the semantic segmentation model training method, image semantic segmentation method, semantic segmentation model training device, image semantic segmentation device, electronic device and computer-readable storage medium provided by the present disclosure are reproducible and can be used with in a variety of industrial applications.
  • the semantic segmentation model training method, image semantic segmentation method, semantic segmentation model training apparatus, image semantic segmentation apparatus, electronic device, and computer-readable storage medium of the present disclosure can be used in any training set for training the semantic segmentation model.

Abstract

一种语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质,其中语义分割模型训练方法包括:获取训练集,其中,训练集包括多个图像、以及图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注(S11);将图像进行特征提取,得到图像的特征数据(S12);基于特征数据,得到第一分割框信息以及第一语义分割信息(S13);基于特征数据、第一分割框信息以及第一语义分割信息,得到图像的第二分割框信息以及第二语义分割信息(S14);基于第二分割框信息与标注信息、和/或基于第二语义分割信息与标注信息,确定损失值(S15);基于损失值,调整语义分割模型的参数(S16)。语义分割模型训练方法能够方便获取训练数据,从而通过大量的、高质量的数据能够提高训练效果。

Description

语义分割模型训练方法及装置、图像语义分割方法及装置
相关申请的交叉引用
本公开要求于2020年09月02日提交中国专利局的申请号为202010912041.3、名称为“语义分割模型训练方法及装置、图像语义分割方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开一般地涉及图像处理领域,具体涉及一种语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质。
背景技术
在图像处理领域中,在一些应用场景下,需要对图像中的某些图像进行目标识别、或需要对图像中存在的文本进行文本识别,或者需要对图像中的一些内容进行替换等。在很多情况下需要对图像中的一些目标内容进行语义分割,图像的语义分割是对图像在像素级别上的分类,通过语义分割模型将图像中的属于同类的目标内容分为一类,例如图像中的存在一车辆,判断属于该车辆的像素并将全部属于该车辆的像素分割出来,确定该车辆在像素级别的边界分割框。目标内容可以是特定的人物、物体或文字等,将目标内容在图像确定像素级别的边界并进行分割。
而目前语义分割的准确性差,尤其在自然场景图像中,训练数据少、训练成本高,训练出的语义分割模型准确率低。
发明内容
为了解决现有技术中存在的上述问题,在可选的一个或多个实施例中,本公开提供一种语义分割模型训练方法,其中,所述方法可包括:获取训练集,其中,所述训练集可包括多个图像、以及所述图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注;将所述图像进行特征提取,得到所述图像的特征数据;基于所述特征数据,得到第一分割框信息以及第一语义分割信息;基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息; 基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值;基于所述损失值,调整所述语义分割模型的参数。
在可选的一个或多个实施例中,所述语义分割模型可包括:分割框解码单元以及语义解码单元;所述基于所述特征数据,得到第一分割框信息以及第一语义分割信息,包括:基于所述特征数据,通过所述分割框解码单元进行解码,得到所述第一分割框信息;基于所述特征数据,通过所述语义解码单元进行解码,得到所述第一语义分割信息。
在可选的一个或多个实施例中,所述基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息,包括:基于所述特征数据以及所述第一语义分割信息,通过所述分割框解码单元进行解码,得到所述第二分割框信息;基于所述特征数据以及所述第一分割框信息,通过所述语义解码单元进行解码,得到所述第二语义分割信息。
在可选的一个或多个实施例中,所述语义分割模型还可包括:编码单元;所述将图像进行特征提取,得到所述图像的特征数据,包括:基于所述图像,通过所述编码单元进行编码,得到所述图像的特征数据。
在可选的一个或多个实施例中,所述损失值可包括第一损失值;所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:若所述图像对应的标注信息仅包括所述分割框标注,则基于所述分割框标注以及所述第二分割框信息,确定所述第一损失值;若所述图像对应的标注信息仅包括所述语义分割标注,则基于所述语义分割标注以及所述第二语义分割信息,确定所述第一损失值;若所述图像对应的标注信息包括所述分割框标注和所述语义分割标注,则基于所述分割框标注以及所述第二分割框信息、且基于所述语义分割标注以及所述第二语义分割信息,确定所述第一损失值。
在可选的一个或多个实施例中,所述损失值可包括第二损失值;所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:若所述图像对应的标注信息包括所述分割框标注,则基于所述第二语义分割信息以及所述分割框标注,确定所述第二损失值;若所述图像对应的标注信息不包括所述分割框标注,则基于所述第二语义分割信息以及所述第二分割框信息,确定所述第二损失值。
在可选的一个或多个实施例中,所述损失值可包括第三损失值;所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:基于条件随机场,确定所述第三损失值。
在可选的一个或多个实施例中,本公开提供一种图像语义分割方法,其中,所述方法 可包括:获取图像;将所述图像进行特征提取,得到所述图像的特征数据;基于所述特征数据,得到第一分割框信息;基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
在可选的一个或多个实施例中,所述方法应用于语义分割模型,所述语义分割模型可包括:分割框解码单元以及语义解码单元;通过所述分割框解码单元,基于所述特征数据,得到第一分割框信息;通过所述语义解码单元,基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
在可选的一个或多个实施例中,所述语义分割模型还可包括:编码单元;通过所述编码单元,将所述图像进行特征提取,得到所述图像的特征数据。
在可选的一个或多个实施例中,所述方法还可包括:通过所述语义解码单元,基于所述特征数据,得到第一语义分割信息;通过所述分割框解码单元,基于所述特征数据以及所述第一语义分割信息,得到所述图像的第二分割框信息。
在可选的一个或多个实施例中,本公开提供一种语义分割模型训练装置,其中,所述装置可包括:第一获取模块,可配置成获取训练集,其中,所述训练集可包括多个图像、以及所述图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注;第一特征提取模块,可配置成将所述图像进行特征提取,得到所述图像的特征数据;第一语义模块,可配置成基于所述特征数据,得到第一分割框信息以及第一语义分割信息;第一语义模块还可配置成,基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息;损失确定模块,可配置成基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值;调整模块,可配置成基于所述损失值,调整所述语义分割模型的参数。
在可选的一个或多个实施例中,所述语义分割模型可包括:分割框解码单元以及语义解码单元;所述第一语义模块还配置成:基于所述特征数据,通过所述分割框解码单元进行解码,得到所述第一分割框信息;基于所述特征数据,通过所述语义解码单元进行解码,得到所述第一语义分割信息。
在可选的一个或多个实施例中,所述第一语义模块还配置成:基于所述特征数据以及所述第一语义分割信息,通过所述分割框解码单元进行解码,得到所述第二分割框信息;基于所述特征数据以及所述第一分割框信息,通过所述语义解码单元进行解码,得到所述第二语义分割信息。
在可选的一个或多个实施例中,所述语义分割模型还可包括:编码单元;所述第一特征提取模块配置成:基于所述图像,通过所述编码单元进行编码,得到所述图像的特征数 据。
在可选的一个或多个实施例中,本公开提供一种图像语义分割装置,其中,所述装置可包括:第二获取模块,可配置成获取图像;第二特征提取模块,可配置成将所述图像进行特征提取,得到所述图像的特征数据;第二语义模块,可配置成基于所述特征数据,得到第一分割框信息;所述第二语义模块还可配置成,基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
在可选的一个或多个实施例中,本公开提供一种电子设备,可包括:存储器,可配置成存储指令;以及处理器,可配置成调用存储器存储的指令执行如上述的语义分割模型训练方法或如上述的图像语义分割方法。
在可选的一个或多个实施例中,本公开提供一种计算机可读存储介质,其中存储有指令,指令被处理器执行时,执行如上述的语义分割模型训练方法或如上述的图像语义分割方法。
本公开提供的语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质,本公开实施例用于训练语义分割模型的训练集中,图像标注可以仅包括分割框标注、或仅包括语义分割标注,也可以包括两者,从而方便的扩充训练数据,降低了训练成本,并且通过确定图像的分割框信息和语义分割信息,并基于两者得到最终的语义分割结果,充分利用了图像中的信息,分割框和语义分割的识别结果相互促进,从而保证了图像语义分割结果的准确性。
附图说明
通过参考附图阅读下文的详细描述,本公开实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,其中:
图1示出了根据本公开一实施例语义分割模型训练方法的流程示意图;
图2A、图2B、图2C示出了对一个图像提取的分割框信息和语义分割信息示意图;
图3示出了根据本公开一实施例的语义分割模型结构示意图;
图4示出了根据本公开一实施例的语义分割模型的数据流程示意图;
图5示出了根据本公开一实施例图像语义分割方法的流程示意图;
图6示出了根据本公开一实施例的语义分割模型训练装置示意图。
图7示出了根据本公开一实施例的图像语义分割装置示意图。
图8是本公开实施例提供的一种电子设备示意图。
在附图中,相同或对应的标号表示相同或对应的部分。
具体实施方式
下面将参考若干示例性实施方式来描述本公开的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本公开,而并非以任何方式限制本公开的范围。
需要注意,虽然本文中使用“第一”、“第二”等表述来描述本公开的实施方式的不同模块、步骤和数据等,但是“第一”、“第二”等表述仅是为了在不同的模块、步骤和数据等之间进行区分,而并不表示特定的顺序或者重要程度。实际上,“第一”、“第二”等表述完全可以互换使用。
目前用于语义分割模型的训练数据少、训练成本高,导致训练出的语义分割模型准确率低,例如,在对图像中的文本进行语义分割的场景中,在一些相关技术中,图像中的文本包含不同的颜色,字体,形状以及大小。早期基于MSER(Maximally Stable Extremal Regions,最大稳定极值区域)和SWT(stroke width transform,笔画宽度变换)的图像处理方法,仅通过图像中的连接区域等先验信息检测文本,但因为缺乏基于学习的机制,其性能远远不能应用于自然场景图像,分割准确率很低。在另一些相关技术中,基于深度学习的方法,需要提供大量的、并且类型丰富的训练数据,由于真实场景图片的标记成本很高,现有的高质量真实场景的文本分割数据非常有限,导致模型实际语义分割的准确率低下。若使用合成数据,真实数据与合成数据之间仍存在无法通过算法完全跨越的差距;若采用人工标注,则成本很高。
为了解决上述问题,本公开实施例提供了一种语义分割模型训练方法10,如图1所示,可以包括步骤S11-步骤S16,下文分别对上述步骤进行详细说明:
步骤S11,获取训练集,其中,训练集包括多个图像、以及图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注。
本公开实施例中,获取的训练集的图像可以是带有标注信息的照片、图片或视频帧等。标注信息可以仅包括分割框标注,或仅包括语义分割标注,也可以同时包括分割框标注和语义分割标注。由于训练集所需图像的标注信息的类型多样,因此很容易获取大量的、质量高的训练数据,从而保证了训练效果。尤其对于语义分割标注,人工标注的成本过高;模型标注的准确率不足,导致训练效果不佳;而通过合成图进行训练的情况下,训练出的模型对于真实场景的图像进行语义分割的准确率低,因此,本公开实施例中,通过具有语 义分割标注的图像,以及具有分割框标注的图像均作为训练数据,避免了单一训练数据数量少、质量不足的缺陷。
本公开实施例中,分割框为目标的位置范围,一般以矩形框或四边形框示出;语义分割为对图像中的一类目标进行像素级别的分割。以文本分割为例,如图2A-图2C所示,图2A为一个真实场景的图像,其中包括文字内容;图2B中将文字内容的区域进行了文本框分割,即获取到的文本框信息;图2C中将文字的内容进行了像素级别的分割,即获取到的语义分割信息。
步骤S12,将图像进行特征提取,得到图像的特征数据。
将获取的图像进行特征提取,可以通过语义识别模型对图像进行卷积等处理,从而提取到图像中的特征信息。在一实施例中,语义识别模型可以包括编码单元,通过编码单元对图像进行特征提取,得到特征数据。其中,编码单元可以包括一个或多个卷积层,对图像进行卷积(Convolution)处理,同时还可以对图像进行池化(Pooling)处理等。
步骤S13,基于特征数据,得到第一分割框信息以及第一语义分割信息。
对图像进行特征提取后,可以基于特征数据通过不同算法或通过不同的的单元独立计算,分别得到第一分割框信息和第一语义分割信息。在相关技术中,如果需要确定图像中的目标位置,往往仅获取分割框;如果需要进行语义分割,则往往直接获取语义分割信息。本公开实施例中,则同时获取上述两个信息,从而提高了图像的利用率,充分利用图像中的信息。
步骤S14,基于特征数据、第一分割框信息以及第一语义分割信息,得到图像的第二分割框信息以及第二语义分割信息。
在初步得到第一分割框信息以及第一语义分割信息后,并不作为图像语义分割的最终输出,而是利用这些信息,以及提取到的特征数据,再次输入,两个数据相互指导,从而利用更多的信息来进行语义分割,获取更为准确的第二分割框信息和第二语义分割信息。
在一实施例中,语义分割模型可以包括:分割框解码单元以及语义解码单元;在步骤S13中,基于特征数据,通过分割框解码单元进行解码,得到第一分割框信息;基于特征数据,通过语义解码单元进行解码,得到第一语义分割信息。本实施例中的语义分割模型可以具有两个独立的单元,分割框解码单元和语义解码单元均可以对特征数据进行上采样、卷积等操作,并且分别输出图像的第一分割框信息和第一语义分割信息。通过两个独立的单元分别获取不同的信息,可以在充分利用图像信息的情况下,相比于通过一个网络获取两个信息,能够降低模型的复杂程度和模型的数据量,从而也能降低计算成本,并且能够输出结果的准确性。
在一实施例中,步骤S14可以包括:基于特征数据以及第一语义分割信息,通过分割框解码单元进行解码,得到第二分割框信息;基于特征数据以及第一分割框信息,通过语义解码单元进行解码,得到第二语义分割信息。本实施例中,将语义解码单元输出的第一语义分割信息、以及特征数据作为输入,输入到分割框解码单元,输出第二分割框信息;将分割框解码单元输出的第一分割框信息、以及特征数据作为输入,输入到语义解码单元,输出第二语义分割信息。两个单元的输出均作为另一个单元的输入,实现相互指导,每个单元均能获取更多的信息。同时,在训练过程中,可以基于图像的任一标注信息类型,均可以对两个单元进行优化。
在上述实施例中,为了单元输入的归一化,在步骤S13中,可以向分割框解码单元输入特征数据、以及0值的语义分割信息;向语义解码单元输入特征数据、以及0值的分割框信息。从而保证每个单元在步骤S13和步骤S14中的输入信息的归一化。
步骤S15,基于第二分割框信息与标注信息、和/或基于第二语义分割信息与标注信息,确定损失值。
本公开实施例中,标注信息具有多种情况,可以根据当前输入的图像对应的标注信息的类型,选择相对应的输出,并确定损失值。由于采用了分割框信息和语义分割信息相互指导的方式进行训练,因此,任一类标注信息,均可以确定损失值,并可以相应的对语义分割模型进行优化。
步骤S16,基于损失值,调整语义分割模型的参数。
根据损失值,调整语义分割模型的参数,通过多轮次的训练,使得损失值逐渐变小,当损失值小于一阈值时,停止语义分割模型的训练。
通过上述实施例的语义分割模型训练方法10,能够通过多种标注类型的图像作为训练数据,任一类型的标注信息均根据输出确定损失值,从而对模型的参数进行优化调整,从而方便的扩充训练数据的数量,降低了训练成本,通过大量的并且高质量的训练数据对模型进行训练,保证了训练效果,使得完成训练的语义分割模型进行语义分割的准确率高。
以图像中的文本分割为例,本公开实施例的语义分割模型的结构以及输入和输出的数据可以如图3所示。语义分割模型M可以是神经网络模型,以单个图像I作为输入,图像可以是RGB图像,也可以是灰度图。以像素级文本分割的概率图,即语义分割信息O T、以及多边形的文本框分割概率图,即分割框信息O P为最后的输出。语义分割模型M可以包含一个共享的编码单元E,以及两个单独的解码单元,语义解码单元D T和分割框解码单元D P。该编码单元E提取输入图像的特征分别送到两个解码单元中。两个解码单元的输出O T和O P同样会作为另一个分支任务的输入,即语义解码单元D T的输出作为分割框解码单 元D P的输入、分割框解码单元D P的输出作为语义解码单元D T的输入,使双重任务相互指导。
在训练过程中,为了更好的表示数据的输入和输出,可以如图4所示,图4中的语义解码单元D T和分割框解码单元D P实际在语义分割模型M均为一个,图4为了表示训练过程,分别以两个进行表示。如图4所示,以0值以及编码单元E提取输入图像的特征数据E I,分别输入语义解码单元D T和分割框解码单元D P的输入,语义解码单元D T输出到第一语义分割信息O T、分割框解码单元D P输出第一分割框信息O P。之后,以第一语义分割信息O T以及特征数据E I作为分割框解码单元D P的输入,得到第二分割框信息O’ P;以第一分割框信息O P以及特征数据E I作为语义解码单元D T的输入,得到第二语义分割信息O’ T。根据标签类型,采用对应的输出确定损失值,由于每个输出,均通过两个单元进行输出,因此,任一标签类型的训练数据均可以调整语义分割模型的参数。
上述任一实施例中的语义分割模型中的编码单元的结构,可以与语义解码单元和分割框解码单元对应设置,例如:编码单元的结构可以如表1所示,语义解码单元和分割框解码单元的结构可以如表2所示。
表1:
类型 步长 通道数
1 卷积 3 1 64
2 卷积 3 1 64
3 池化 2 2 64
4 卷积 3 1 128
5 卷积 3 1 128
6 池化 2 2 128
7 卷积 3 1 256
8 卷积 3 1 256
9 池化 2 2 256
10 卷积 3 1 256
11 卷积 3 1 256
12 池化 2 2 256
13 卷积 3 1 512
14 卷积 3 1 512
表2:
类型 步长 通道数
1 上采样 2 / 512
2 卷积 3 1 256
3 卷积 3 1 256
4 上采样 2 / 256
5 卷积 3 1 128
6 卷积 3 1 128
7 上采样 2 / 128
8 卷积 3 1 64
9 卷积 3 1 64
10 上采样 2 / 64
11 卷积 3 1 64
12 卷积 3 1 64
需要说明的是,上述语义分割模型的结构,即语义分割模型的超参数,仅是一个示例,实际可以根据不同的图片格式、精度需求等进行设置。
在一实施例中,损失值可以包括第一损失值,而步骤S15可以包括:若图像对应的标注信息仅包括分割框标注,则基于分割框标注以及第二分割框信息,确定第一损失值;若图像对应的标注信息仅包括语义分割标注,则基于语义分割标注以及第二语义分割信息,确定第一损失值;若图像对应的标注信息包括分割框标注和语义分割标注,则基于分割框标注以及第二分割框信息、且基于语义分割标注以及第二语义分割信息,确定第一损失值。
本实施例中,根据标注信息的类型,采用对应的输出确定第一损失值。第一损失值可以是“1-输出与标签之间的交叠率”,其中输出与标签之间的交叠率为:输出与标签的面积交集/输出与标签的面积并集。因此,输出的越准确,与标签越贴近,交叠率越高,该第一损失值也就越低。
由于训练集中的图像对应的标注信息可以仅包括分割框标注或仅包括语义分割标注,在这两种情况下,第一损失值可以由相应的输出,即第二分割框信息或第二语义分割信息进行计算。而,如果输入的图像对应有分割框标注和语义分割标注,则,可以同时根据第二分割框信息、第二语义分割信息,分别和对应的分割框标注、语义分割标注计算得到值,两值相加作为第一损失值。
在一实施例中,损失值可以包括第二损失值;步骤S15可以包括:若图像对应的标注信息包括分割框标注,则基于第二语义分割信息以及分割框标注,确定第二损失值;若图 像对应的标注信息不包括分割框标注,则基于第二语义分割信息以及第二分割框信息,确定第二损失值。
本实施例中,第二损失值可以是代表两个单元的输出之间的关系,在图像中,同一内容的语义分割的范围应当在分割框的范围内。例如,第二损失值可以是第二语义分割信息超出分割框标注的面积。基于此,可以确定第二损失值。在图像对应的标注信息包括分割框标注的情况下,即包括图像对应的标注信息仅包括分割框标注、以及图像对应的标注信息包括分割框标注和语义分割标注这两种情况,可以根据第二语义分割信息与分割框标注确定第二损失值。在另一情况下,即图像对应的标注信息不包括分割框标注的情况,则可以根据分割框解码单元输出的第二分割框信息与第二语义分割信息的关系确定第二损失值,在理想识别结果情况下,第二语义分割信息不应超出第二分割框信息的范围,因此,根据此计算第二损失值,能够通过两个单元的相互监督,优化调整语义分割模型的参数。
在一实施例中,损失值还可以包括第三损失值,步骤S15还可以包括:基于条件随机场,确定第三损失值。通过引入条件随机场(Conditional Random Field,CRF),能够在像素级的分割过程中,结合相邻像素的信息,进一步优化语义分割模型的分割效果。
结合上述实施例,损失值可以包括上述第一损失值、第二损失值和第三损失值,并可以根据实际需要确定相应系数,使得语义分割模型的训练更加高效、结果更加可靠。在一具体示例中,损失值可以由以下公式确定:
L=L 11·L 22·L 3
其中,L为损失值;L 1为第一损失值;L 2为第二损失值;λ 1为第二损失值的系数,其中λ 1≥1,在一些实施例中λ 1=10;L 3为第三损失值,λ 2为第三损失值的系数,其中λ 2≤1,在一些实施例中λ 2=0.1。根据前文所述,第二损失值更能够代表分割框信息和语义分割信息的关系,基于此确定的第二损失值能够对模型的参数进行很好的优化,因此第二损失值的系数可以取值更高,提高训练效率和训练效果。同时,引入条件随机场确定的第三损失值与分割框解码单元和语义解码单元的关联相对较小,因此第三损失值的系数可以相对较小,从而避免训练过程中调整参数过分偏向于此。
基于同样的构思,本公开实施例还提供一种图像语义分割方法20,如图5所示,图像语义分割方法20包括:步骤S21-步骤S24。下文分别对上述步骤进行详细说明:
步骤S21,获取图像。
可以实时的获取图像,例如通过摄像头、照相设备等实时采集图像。也可以获取需要进行语义分割的图像,例如在一些情况下需要对图像进行目标识别或需要对图像中的目标进行图像处理,需要先将图像进行语义分割。也可以是批量的获取需要进行语义分割的图 像,用于对图像进行标注或其他用途。其中,图像可以是照片,也可以是视频中的一帧或多帧。
步骤S22,将图像进行特征提取,得到图像的特征数据。
将获取的图像进行特征提取,可以通过语义识别模型对图像进行卷积等处理,从而提取到图像中的特征信息。在一实施例中,语义识别模型可以包括编码单元,通过编码单元对图像进行特征提取,得到特征数据。其中,编码单元可以包括一个或多个卷积层,对图像进行卷积(Convolution)处理,同时还可以对图像进行池化(Pooling)处理等。
步骤S23,基于特征数据,得到第一分割框信息。
步骤S24,基于特征数据以及第一分割框信息,得到图像的第二语义分割信息。
在基于特征数据,得到第一分割框信息后,再基于第一分割框信息与特征数据获取第二语义分割信息,从而更加充分的利用了图像的信息,并且通过不同算法或不同的单元独立计算分割框信息和语义分割信息,在第一分割框信息的基础上,结合特征数据得到语义分割信息,从而提高语义分割的准确程度。在相关技术中,如果需要确定图像中的目标位置,往往仅获取分割框;如果需要进行语义分割,则往往直接获取语义分割信息。本公开实施例中,则同时获取上述两个信息,从而提高了图像的利用率,充分利用图像中的信息,提高了语义分割的准确性。
在一实施例中,应用于图像语义分割方法20的语义分割模型可以包括:分割框解码单元以及语义解码单元;通过分割框解码单元执行步骤S23;通过语义解码单元执行步骤S24。
在一实施例中,图像语义分割方法20还可以包括:通过语义解码单元,基于特征数据,得到第一语义分割信息;通过分割框解码单元,基于特征数据以及第一语义分割信息,得到图像的第二分割框信息。
本实施例中的语义分割模型可以具有两个独立的单元,分割框解码单元和语义解码单元均可以对特征数据进行上采样、卷积等操作,并且分别输出图像的第一分割框信息和第一语义分割信息。通过两个独立的单元分别获取不同的信息,可以在充分利用图像信息的情况下,相比于通过一个网络获取两个信息,能够降低模型的复杂程度和模型的数据量,从而也能降低计算成本,并且能够输出结果的准确性。在初步得到第一分割框信息以及第一语义分割信息后,并不作为图像语义分割的最终输出,而是利用这些信息,以及提取到的特征数据,再次输入,从而利用更多的信息来进行语义分割,获取更为准确的第二分割框信息和第二语义分割信息。
在上述实施例中,应用于图像语义分割方法20的语义分割模型可以通过前述任一实 施例中的语义分割模型训练方法10训练得到。从而能够提高语义分割模型的语义分割准确率,并且训练数据容易获取,降低训练成本。
基于同一构思,本公开还提供一种语义分割模型训练装置100,如图6所示,语义分割模型训练装置100包括:第一获取模块110,用于获取训练集,其中,训练集包括多个图像、以及图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注;第一特征提取模块120,用于将图像进行特征提取,得到图像的特征数据;第一语义模块130,用于基于特征数据,得到第一分割框信息以及第一语义分割信息;第一语义模块130还用于,基于特征数据、第一分割框信息以及第一语义分割信息,得到图像的第二分割框信息以及第二语义分割信息;损失确定模块140,用于基于第二分割框信息与标注信息、和/或基于第二语义分割信息与标注信息,确定损失值;调整模块150,用于基于损失值,调整语义分割模型的参数。
在一实施例中,语义分割模型包括:分割框解码单元以及语义解码单元;第一语义模块130用于:基于特征数据,通过分割框解码单元进行解码,得到第一分割框信息;基于特征数据,通过语义解码单元进行解码,得到第一语义分割信息。
在一实施例中,第一语义模块130还用于:基于特征数据以及第一语义分割信息,通过分割框解码单元进行解码,得到第二分割框信息;基于特征数据以及第一分割框信息,通过语义解码单元进行解码,得到第二语义分割信息。
在一实施例中,语义分割模型还包括:编码单元;第一特征提取模块120用于:基于图像,通过编码单元进行编码,得到图像的特征数据。
在一实施例中,损失值包括第一损失值;损失确定模块140还用于:当图像对应的标注信息仅包括分割框标注时,基于分割框标注以及第二分割框信息,确定第一损失值;当图像对应的标注信息仅包括语义分割标注时,基于语义分割标注以及第二语义分割信息,确定第一损失值;当图像对应的标注信息包括分割框标注和语义分割标注时,基于分割框标注以及第二分割框信息、且基于语义分割标注以及第二语义分割信息,确定第一损失值。
在一实施例中,损失值包括第二损失值;损失确定模块140还用于:当图像对应的标注信息包括分割框标注时,基于第二语义分割信息以及分割框标注,确定第二损失值;当图像对应的标注信息不包括分割框标注时,基于第二语义分割信息以及第二分割框信息,确定第二损失值。
在一实施例中,损失值包括第三损失值;损失确定模块140还用于:基于条件随机场,确定第三损失值。
关于上述实施例中的语义分割模型训练装置100,其中各个模块执行操作的具体方式 已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
基于同一构思,本公开还提供一种图像语义分割装置200,如图7所示,图像语义分割装置200包括:第二获取模块210,用于获取图像;第二特征提取模块220,用于将图像进行特征提取,得到图像的特征数据;第二语义模块230,用于基于特征数据,得到第一分割框信息;第二语义模块230还用于,基于特征数据以及第一分割框信息,得到图像的第二语义分割信息。
在一实施例中,图像语义分割装置200应用于语义分割模型,语义分割模型包括:分割框解码单元以及语义解码单元;通过分割框解码单元,基于特征数据,得到第一分割框信息;通过语义解码单元,基于特征数据以及第一分割框信息,得到图像的第二语义分割信息。
在一实施例中,语义分割模型还包括:编码单元;通过编码单元,将图像进行特征提取,得到图像的特征数据。
在一实施例中,第二语义模块230还用于:通过语义解码单元,基于特征数据,得到第一语义分割信息;通过分割框解码单元,基于特征数据以及第一语义分割信息,得到图像的第二分割框信息。
关于上述实施例中的图像语义分割装置200,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
如图8所示,本公开的一个实施方式提供了一种电子设备300。其中,该电子设备300包括存储器301、处理器302、输入/输出(Input/Output,I/O)接口303。其中,存储器301,用于存储指令。处理器302,用于调用存储器301存储的指令执行本公开实施例的语义分割模型训练方法或图像语义分割方法。其中,处理器302分别与存储器301、I/O接口303连接,例如可通过总线系统和/或其他形式的连接机构(未示出)进行连接。存储器301可用于存储程序和数据,包括本公开实施例中涉及的语义分割模型训练方法或图像语义分割方法的程序,处理器302通过运行存储在存储器301的程序从而执行电子设备300的各种功能应用以及数据处理。
本公开实施例中处理器302可以采用数字信号处理器(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现,所述处理器302可以是中央处理单元(Central Processing Unit,CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元中的一种或几种的组合。
本公开实施例中的存储器301可以包括一个或多个计算机程序产品,所述计算机程序 产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(Random Access Memory,RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD)等。
本公开实施例中,I/O接口303可用于接收输入的指令(例如数字或字符信息,以及产生与电子设备300的用户设置以及功能控制有关的键信号输入等),也可向外部输出各种信息(例如,图像或声音等)。本公开实施例中I/O接口303可包括物理键盘、功能按键(比如音量控制按键、开关按键等)、鼠标、操作杆、轨迹球、麦克风、扬声器、和触控面板等中的一个或多个。
可以理解的是,本公开实施例中尽管在附图中以特定的顺序描述操作,但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作,或是要求执行全部所示的操作以得到期望的结果。在特定环境中,多任务和并行处理可能是有利的。
本公开实施例涉及的方法和装置能够利用标准编程技术来完成,利用基于规则的逻辑或者其他逻辑来实现各种方法步骤。还应当注意的是,此处以及权利要求书中使用的词语“装置”和“模块”意在包括使用一行或者多行软件代码的实现和/或硬件实现和/或用于接收输入的设备。
此处描述的任何步骤、操作或程序可以使用单独的或与其他设备组合的一个或多个硬件或软件模块来执行或实现。在一个实施方式中,软件模块使用包括包含计算机程序代码的计算机可读介质的计算机程序产品实现,其能够由计算机处理器执行用于执行任何或全部的所描述的步骤、操作或程序。
出于示例和描述的目的,已经给出了本公开实施的前述说明。前述说明并非是穷举性的也并非要将本公开限制到所公开的确切形式,根据上述教导还可能存在各种变形和修改,或者是可能从本公开的实践中得到各种变形和修改。选择和描述这些实施例是为了说明本公开的原理及其实际应用,以使得本领域的技术人员能够以适合于构思的特定用途来以各种实施方式和各种修改而利用本公开。
工业实用性
本公开提供的语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质能够方便的扩充训练数据,降低训练成本,并且可以保证图像语义分割结果的准确性。
此外,可以理解的是,本公开提供的语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质是可以重现的,并且可以用在多种工业应用中。例如,本公开的语义分割模型训练方法、图像语义分割方法、语义分割模型训练装置、图像语义分割装置、电子设备和计算机可读存储介质可以用于训练语义分割模型的任何训练集中。

Claims (18)

  1. 一种语义分割模型训练方法,其中,所述方法包括:
    获取训练集,其中,所述训练集包括多个图像、以及所述图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注;
    将所述图像进行特征提取,得到所述图像的特征数据;
    基于所述特征数据,得到第一分割框信息以及第一语义分割信息;
    基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息;
    基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值;
    基于所述损失值,调整所述语义分割模型的参数。
  2. 根据权利要求1所述的语义分割模型训练方法,其中,所述语义分割模型包括:分割框解码单元以及语义解码单元;
    所述基于所述特征数据,得到第一分割框信息以及第一语义分割信息,包括:
    基于所述特征数据,通过所述分割框解码单元进行解码,得到所述第一分割框信息;
    基于所述特征数据,通过所述语义解码单元进行解码,得到所述第一语义分割信息。
  3. 根据权利要求2所述的语义分割模型训练方法,其中,所述基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息,包括:
    基于所述特征数据以及所述第一语义分割信息,通过所述分割框解码单元进行解码,得到所述第二分割框信息;
    基于所述特征数据以及所述第一分割框信息,通过所述语义解码单元进行解码,得到所述第二语义分割信息。
  4. 根据权利要求1-3任一项所述的语义分割模型训练方法,其中,所述语义分割模型还包括:编码单元;
    所述将所述图像进行特征提取,得到所述图像的特征数据,包括:
    基于所述图像,通过所述编码单元进行编码,得到所述图像的特征数据。
  5. 根据权利要求1-4任一项所述的语义分割模型训练方法,其中,所述损失值包 括第一损失值;
    所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:
    若所述图像对应的标注信息仅包括所述分割框标注,则基于所述分割框标注以及所述第二分割框信息,确定所述第一损失值;
    若所述图像对应的标注信息仅包括所述语义分割标注,则基于所述语义分割标注以及所述第二语义分割信息,确定所述第一损失值;
    若所述图像对应的标注信息包括所述分割框标注和所述语义分割标注,则基于所述分割框标注以及所述第二分割框信息、且基于所述语义分割标注以及所述第二语义分割信息,确定所述第一损失值。
  6. 根据权利要求5所述的语义分割模型训练方法,其中,所述损失值包括第二损失值;
    所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:
    若所述图像对应的标注信息包括所述分割框标注,则基于所述第二语义分割信息以及所述分割框标注,确定所述第二损失值;
    若所述图像对应的标注信息不包括所述分割框标注,则基于所述第二语义分割信息以及所述第二分割框信息,确定所述第二损失值。
  7. 根据权利要求5或6所述的语义分割模型训练方法,其中,所述损失值包括第三损失值;
    所述基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值,包括:
    基于条件随机场,确定所述第三损失值。
  8. 一种图像语义分割方法,其中,所述方法包括:
    获取图像;
    将所述图像进行特征提取,得到所述图像的特征数据;
    基于所述特征数据,得到第一分割框信息;
    基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
  9. 根据权利要求8所述的图像语义分割方法,其中,所述方法应用于语义分割模型,所述语义分割模型包括:分割框解码单元以及语义解码单元;
    通过所述分割框解码单元,基于所述特征数据,得到第一分割框信息;
    通过所述语义解码单元,基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
  10. 根据权利要求9所述的图像语义分割方法,其中,所述语义分割模型还包括:编码单元;
    通过所述编码单元,将所述图像进行特征提取,得到所述图像的特征数据。
  11. 根据权利要求9或10所述的图像语义分割方法,其中,所述方法还包括:
    通过所述语义解码单元,基于所述特征数据,得到第一语义分割信息;
    通过所述分割框解码单元,基于所述特征数据以及所述第一语义分割信息,得到所述图像的第二分割框信息。
  12. 一种语义分割模型训练装置,其中,所述装置包括:
    第一获取模块,配置成获取训练集,其中,所述训练集包括多个图像、以及所述图像对应的标注信息,任一图像对应的标注信息包括分割框标注和/或语义分割标注;
    第一特征提取模块,配置成将所述图像进行特征提取,得到所述图像的特征数据;
    第一语义模块,配置成基于所述特征数据,得到第一分割框信息以及第一语义分割信息;
    第一语义模块还配置成,基于所述特征数据、所述第一分割框信息以及所述第一语义分割信息,得到所述图像的第二分割框信息以及第二语义分割信息;
    损失确定模块,配置成基于所述第二分割框信息与所述标注信息、和/或基于所述第二语义分割信息与所述标注信息,确定损失值;
    调整模块,配置成基于所述损失值,调整所述语义分割模型的参数。
  13. 根据权利要求12所述的语义分割模型训练装置,其中,所述语义分割模型包括:分割框解码单元以及语义解码单元;
    所述第一语义模块还配置成:
    基于所述特征数据,通过所述分割框解码单元进行解码,得到所述第一分割框信息;
    基于所述特征数据,通过所述语义解码单元进行解码,得到所述第一语义分割信息。
  14. 根据权利要求13所述的语义分割模型训练装置,其中,所述第一语义模块还配置成:
    基于所述特征数据以及所述第一语义分割信息,通过所述分割框解码单元进行解码,得到所述第二分割框信息;
    基于所述特征数据以及所述第一分割框信息,通过所述语义解码单元进行解码,得到所述第二语义分割信息。
  15. 根据权利要求12-14任一项所述的语义分割模型训练装置,其中,所述语义分割模型还包括:编码单元;
    所述第一特征提取模块配置成:
    基于所述图像,通过所述编码单元进行编码,得到所述图像的特征数据。
  16. 一种图像语义分割装置,其中,所述装置包括:
    第二获取模块,配置成获取图像;
    第二特征提取模块,配置成将所述图像进行特征提取,得到所述图像的特征数据;
    第二语义模块,配置成基于所述特征数据,得到第一分割框信息;
    所述第二语义模块还配置成,基于所述特征数据以及所述第一分割框信息,得到所述图像的第二语义分割信息。
  17. 一种电子设备,其中,所述电子设备包括:
    存储器,配置成存储指令;以及
    处理器,配置成调用所述存储器存储的指令执行如权利要求1-7任一项所述的语义分割模型训练方法或如权利要求8-11所述的图像语义分割方法。
  18. 一种计算机可读存储介质,其中存储有指令,所述指令被处理器执行时,执行如权利要求1-7任一项所述的语义分割模型训练方法或如权利要求8-11所述的图像语义分割方法。
PCT/CN2021/085721 2020-09-02 2021-04-06 语义分割模型训练方法及装置、图像语义分割方法及装置 WO2022048151A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010912041.3A CN112232346A (zh) 2020-09-02 2020-09-02 语义分割模型训练方法及装置、图像语义分割方法及装置
CN202010912041.3 2020-09-02

Publications (1)

Publication Number Publication Date
WO2022048151A1 true WO2022048151A1 (zh) 2022-03-10

Family

ID=74115899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085721 WO2022048151A1 (zh) 2020-09-02 2021-04-06 语义分割模型训练方法及装置、图像语义分割方法及装置

Country Status (2)

Country Link
CN (1) CN112232346A (zh)
WO (1) WO2022048151A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677567A (zh) * 2022-05-27 2022-06-28 成都数联云算科技有限公司 模型训练方法、装置、存储介质及电子设备
CN114693934A (zh) * 2022-04-13 2022-07-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN115019037A (zh) * 2022-05-12 2022-09-06 北京百度网讯科技有限公司 对象分割方法及对应模型的训练方法、装置及存储介质
GB2619999A (en) * 2022-03-24 2023-12-27 Supponor Tech Limited Image processing method and apparatus

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232346A (zh) * 2020-09-02 2021-01-15 北京迈格威科技有限公司 语义分割模型训练方法及装置、图像语义分割方法及装置
CN114332104B (zh) * 2022-03-09 2022-07-29 南方电网数字电网研究院有限公司 电网输电场景rgb点云语义分割多阶段模型联合优化方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109784386A (zh) * 2018-12-29 2019-05-21 天津大学 一种用语义分割辅助物体检测的方法
CN110349167A (zh) * 2019-07-10 2019-10-18 北京悉见科技有限公司 一种图像实例分割方法及装置
CN110503097A (zh) * 2019-08-27 2019-11-26 腾讯科技(深圳)有限公司 图像处理模型的训练方法、装置及存储介质
CN111062252A (zh) * 2019-11-15 2020-04-24 浙江大华技术股份有限公司 一种实时危险物品语义分割方法、装置及存储装置
CN112232346A (zh) * 2020-09-02 2021-01-15 北京迈格威科技有限公司 语义分割模型训练方法及装置、图像语义分割方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402690B2 (en) * 2016-11-07 2019-09-03 Nec Corporation System and method for learning random-walk label propagation for weakly-supervised semantic segmentation
CN108596184B (zh) * 2018-04-25 2021-01-12 清华大学深圳研究生院 图像语义分割模型的训练方法、可读存储介质及电子设备
CN110188765B (zh) * 2019-06-05 2021-04-06 京东方科技集团股份有限公司 图像语义分割模型生成方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057507A1 (en) * 2017-08-18 2019-02-21 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN109784386A (zh) * 2018-12-29 2019-05-21 天津大学 一种用语义分割辅助物体检测的方法
CN110349167A (zh) * 2019-07-10 2019-10-18 北京悉见科技有限公司 一种图像实例分割方法及装置
CN110503097A (zh) * 2019-08-27 2019-11-26 腾讯科技(深圳)有限公司 图像处理模型的训练方法、装置及存储介质
CN111062252A (zh) * 2019-11-15 2020-04-24 浙江大华技术股份有限公司 一种实时危险物品语义分割方法、装置及存储装置
CN112232346A (zh) * 2020-09-02 2021-01-15 北京迈格威科技有限公司 语义分割模型训练方法及装置、图像语义分割方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2619999A (en) * 2022-03-24 2023-12-27 Supponor Tech Limited Image processing method and apparatus
CN114693934A (zh) * 2022-04-13 2022-07-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN114693934B (zh) * 2022-04-13 2023-09-01 北京百度网讯科技有限公司 语义分割模型的训练方法、视频语义分割方法及装置
CN115019037A (zh) * 2022-05-12 2022-09-06 北京百度网讯科技有限公司 对象分割方法及对应模型的训练方法、装置及存储介质
CN114677567A (zh) * 2022-05-27 2022-06-28 成都数联云算科技有限公司 模型训练方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN112232346A (zh) 2021-01-15

Similar Documents

Publication Publication Date Title
WO2022048151A1 (zh) 语义分割模型训练方法及装置、图像语义分割方法及装置
US10452919B2 (en) Detecting segments of a video program through image comparisons
WO2019128646A1 (zh) 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
WO2021189889A1 (zh) 场景图像中的文本检测方法、装置、计算机设备及存储介质
WO2020253127A1 (zh) 脸部特征提取模型训练方法、脸部特征提取方法、装置、设备及存储介质
US20080240575A1 (en) Learning concept templates from web images to query personal image databases
WO2023284608A1 (zh) 字符识别模型生成方法、装置、计算机设备和存储介质
US11915058B2 (en) Video processing method and device, electronic equipment and storage medium
CN110555334B (zh) 人脸特征确定方法、装置、存储介质及电子设备
CN112101031B (zh) 一种实体识别方法、终端设备及存储介质
WO2023050651A1 (zh) 图像语义分割方法、装置、设备及存储介质
WO2021212601A1 (zh) 一种基于图像的辅助写作方法、装置、介质及设备
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
CN110889437A (zh) 一种图像处理方法、装置、电子设备及存储介质
CN113888541A (zh) 一种腹腔镜手术阶段的图像识别方法、装置及存储介质
TWI738045B (zh) 影像切割方法、裝置及其非暫態電腦可讀取媒體
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
WO2023010701A1 (en) Image generation method, apparatus, and electronic device
WO2023146470A2 (en) Dual-level model for segmentation
US11328179B2 (en) Information processing apparatus and information processing method
CN113781491A (zh) 图像分割模型的训练、图像分割方法及装置
CN108764106B (zh) 基于级联结构的多尺度彩色图像人脸比对方法
CN107480616B (zh) 一种基于图像分析的肤色检测单位分析方法和系统
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.07.2023).

122 Ep: pct application non-entry in european phase

Ref document number: 21863221

Country of ref document: EP

Kind code of ref document: A1