WO2024007958A1 - 图像语义分割模型优化方法、装置、电子设备及存储介质 - Google Patents

图像语义分割模型优化方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024007958A1
WO2024007958A1 PCT/CN2023/103988 CN2023103988W WO2024007958A1 WO 2024007958 A1 WO2024007958 A1 WO 2024007958A1 CN 2023103988 W CN2023103988 W CN 2023103988W WO 2024007958 A1 WO2024007958 A1 WO 2024007958A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
semantic segmentation
segmentation model
evaluation value
Prior art date
Application number
PCT/CN2023/103988
Other languages
English (en)
French (fr)
Inventor
覃杰
吴捷
任玉羲
肖学锋
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024007958A1 publication Critical patent/WO2024007958A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]

Definitions

  • Embodiments of the present disclosure relate to the field of image processing technology, and in particular, to an image semantic segmentation model optimization method, device, electronic device, and storage medium.
  • Image semantic segmentation technology refers to the technology of segmenting objects expressing different meanings in the image into different targets by identifying the content in the image. Semantic segmentation of images is a basic atomic capability that promotes human-computer understanding and interaction. , widely used in various multimedia applications.
  • an image semantic segmentation model that can achieve the effect of image semantic segmentation is usually obtained based on manual sample annotation and combined with model supervision training.
  • Embodiments of the present disclosure provide an image semantic segmentation model optimization method, device, electronic device and storage medium.
  • embodiments of the present disclosure provide a method for optimizing an image semantic segmentation model, including:
  • an image semantic segmentation model optimization device including:
  • An evaluation module is used to obtain the first unstandardized data, and evaluate the first unstandardized data based on the pre-trained target semantic segmentation model to obtain an evaluation value corresponding to the first unstandardized data.
  • the evaluation value Characterizing the effectiveness of using the first unlabeled data to train the target semantic segmentation model;
  • a determination module configured to determine target unlabeled data based on the evaluation value of the first unlabeled data, and generate target labeled data corresponding to the target unlabeled data;
  • An optimization module is used to optimize the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model.
  • an electronic device including:
  • a processor and a memory communicatively connected to the processor
  • the memory stores computer execution instructions
  • the processor executes computer execution instructions stored in the memory to implement the image semantic segmentation model optimization method described in the first aspect and various possible designs of the first aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • the processor executes the computer-executable instructions, the above first aspect and the first aspect are implemented.
  • the described image semantic segmentation model optimization method is proposed.
  • embodiments of the present disclosure provide a computer program product, including a computer program that, when executed by a processor, implements the image semantic segmentation model optimization described in the first aspect and various possible designs of the first aspect. method.
  • embodiments of the present disclosure provide a computer program that, when executed by a processor, implements the image semantic segmentation model optimization method described in the first aspect and various possible designs of the first aspect.
  • the image semantic segmentation model optimization method, device, electronic device and storage medium obtained by this embodiment obtains the first unlabeled data and evaluates the first unlabeled data based on the pre-trained target semantic segmentation model to obtain The evaluation value corresponding to the first unstandardized data, the evaluation value represents the effectiveness of using the first unstandardized data to train the target semantic segmentation model; according to the evaluation value of the first unstandardized data, Determine the target unlabeled data, and generate target labeled data corresponding to the target unlabeled data; optimize the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model Before training the target semantic segmentation model, first use the evaluation value corresponding to the unlabeled data to filter the unlabeled data, and obtain the labeled data that is more effective for the training effect, and then perform the target semantic segmentation model based on the target labeled data. train.
  • Figure 2 is a schematic flow chart 1 of the image semantic segmentation model optimization method provided by an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of a process for generating target labeled data provided by an embodiment of the present disclosure
  • FIG 4 is a flow chart of specific implementation steps of step S103 in the embodiment shown in Figure 2;
  • Figure 5 is a schematic flowchart 2 of the image semantic segmentation model optimization method provided by an embodiment of the present disclosure
  • Figure 6 is a schematic diagram of a process of data evaluation based on a target semantic segmentation model provided by an embodiment of the present disclosure
  • FIG. 7 is a flow chart of specific implementation steps of step S204 in the embodiment shown in Figure 5;
  • Figure 8 is a schematic diagram of a process for generating supervised loss and unsupervised loss based on a target semantic segmentation model provided by an embodiment of the present disclosure
  • FIG. 9 is a flow chart of specific implementation steps of step S210 in the embodiment shown in Figure 5;
  • Figure 10 is a structural block diagram of an image semantic segmentation model optimization device provided by an embodiment of the present disclosure.
  • Figure 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present disclosure.
  • Figure 1 is an application scenario diagram of the image semantic segmentation model optimization method provided by the embodiment of the disclosure.
  • the image semantic segmentation model optimization method provided by the embodiment of the disclosure can be applied to the application scenario of model training before the deployment of the image semantic segmentation model.
  • the method provided by the embodiments of the present disclosure can be applied to terminal devices, servers and other devices used for model training.
  • the server is taken as an example.
  • the server first receives the labeling instructions sent by the terminal device, labels the unlabeled data as labeled data, and then receives the training instructions sent by the terminal device to train the image semantic segmentation model and optimize it.
  • the above process can be repeated multiple times until the model convergence conditions are met, and an image semantic segmentation model that can achieve the effect of image semantic segmentation is obtained. Afterwards, the image semantic segmentation model can be deployed to the server and provide image semantic segmentation services in response to requests from other terminal devices or servers.
  • Embodiments of the present disclosure provide an image semantic segmentation model optimization method to solve the above problems.
  • Figure 2 is a schematic flowchart 1 of an image semantic segmentation model optimization method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be applied to electronic devices with computing capabilities.
  • the image semantic segmentation model optimization method includes:
  • Step S101 Obtain the first unstandardized data, and evaluate the first unstandardized data based on the pre-trained target semantic segmentation model to obtain an evaluation value corresponding to the first unstandardized data.
  • the evaluation value represents the use of the first unstandardized data pair. Effectiveness of training target semantic segmentation models.
  • unlabeled data refers to image data without label information, such as photos containing people, scenery, etc.
  • the unlabeled data can be original images captured by a camera, or can be filtered, special effects, etc. Images processed by image processing technology are not limited here. Unlabeled data can be easily obtained through the Internet and other channels, so it has the advantages of low difficulty in obtaining and rich image content.
  • the first unlabeled data is image data pre-stored in the server and used for the target semantic segmentation model.
  • the target semantic segmentation model is a semantic segmentation model that has undergone at least one round of training.
  • the target semantic segmentation model is used to process the first unlabeled data of the input model and output the corresponding segmentation result map.
  • the model will predict different Segmentation results.
  • the segmentation result result_1 is output, which is similar or identical to the real segmentation result; and after inputting the first unlabeled data data_2 into the target semantic segmentation model, the segmentation result is output result_2, this result is quite different from the real segmentation result.
  • data_2 since data_2 was not correctly predicted by the correct model (that is, the model cannot segment such image data), it is more effective to use data_2 to train the model.
  • the corresponding first unstandardized data can be evaluated according to the output result of the target semantic segmentation model, thereby obtaining an evaluation value.
  • the higher the evaluation value the higher the evaluation value is, indicating that the input first unstandardized data is used to classify the target.
  • the higher the effectiveness of training the semantic segmentation model the more suitable the first unlabeled data is for training the model; conversely, the lower the evaluation value is, it indicates that the input first unlabeled data is used to train the target semantic segmentation model.
  • the lower the effectiveness the less suitable the first unstandardized data is for training the model.
  • Step S102 Determine target unlabeled data based on the evaluation value of the first unlabeled data, and generate target labeled data corresponding to the target unlabeled data.
  • At least one first unscripted data is determined based on the size of the evaluated value.
  • the first M unstandardized data are sorted according to the size of the evaluation value of each first unstandardized data, and the M first unstandardized data before sorting are determined as the target unstandardized data, where M is greater than an integer of 1; in another possible implementation, each first unstandardized data is screened according to a preset evaluation threshold, and the first unstandardized data with an evaluation value greater than the evaluation threshold is determined as the target data.
  • the above two methods can also be combined, that is, the evaluation values of each first unstandardized data are sorted, and the first M evaluation values that are in the top M of the sorting and are greater than the evaluation threshold are sorted.
  • the tagged data is determined to be the target untagged data.
  • the above methods can be set as needed and are not specifically limited here.
  • the target unlabeled data is then processed to generate corresponding target labeled data.
  • This process is the process of labeling the target unlabeled data.
  • image segmentation is performed to generate a segmentation result map used to indicate the location, category and other information of the objects in the image.
  • the image content identifier corresponding to the target unlabeled data is identified through a pre-trained image recognition model, and then the image segmenter corresponding to the image content identifier is called to
  • the target unlabeled data is image segmented to obtain the annotation information corresponding to the target unlabeled data, and then combined into the target labeled data corresponding to the target unlabeled data.
  • the terminal device after determining the target unlabeled data, realizes segmentation of the target unlabeled data by receiving annotation instructions input by the user, thereby generating annotation information corresponding to the target unlabeled data, and then combining is the target labeled data corresponding to the target unlabeled data.
  • the above two methods of generating target labeled data can be set as needed, and are not specifically limited here.
  • Figure 3 is a schematic diagram of a process of generating target labeled data provided by an embodiment of the present disclosure.
  • the process of generating target labeled data is further introduced below in conjunction with Figure 3.
  • N groups of unlabeled data in the figure
  • the target semantic segmentation model outputs the segmentation result map corresponding to each unlabeled data (shown as segmentation results R1 to segmentation results RN), and then, using The preset sample evaluation model evaluates each segmentation result and obtains the evaluation value corresponding to each evaluation result (the illustration shows evaluation value V1 to evaluation value VN).
  • the evaluation values V1 to VN are sorted, among which the top three are ranked (example The evaluation values of (sexually) are V2, V3, and V1 respectively.
  • the unlabeled data D2, unlabeled data D3, and unlabeled data D1 corresponding to V2, V3, and V1 are determined as target unlabeled data respectively.
  • the target unlabeled data is characterized to obtain the corresponding data used to target the target.
  • the target labeled data for training the semantic segmentation model shown as labeled data M1, labeled data M2, and labeled data M3 in the figure).
  • the target semantic segmentation model before training the target semantic segmentation model, first use the current parameters of the model to evaluate the unlabeled data, and label the data with good training effect according to the evaluation results, thereby generating target labeled data with good training effect.
  • Data compared with the existing technology scheme of randomly annotating unlabeled data to obtain labeled data, it is possible to obtain target labeled data with better training effect, and then train the model through the target labeled data, which can improve the model Convergence speed reduces the demand for labeled data.
  • Step S103 Optimize the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model can improve the image segmentation capability of the target semantic segmentation model and obtain an optimized semantic segmentation model.
  • the target semantic segmentation model can be fully supervised based on the target labeled data, that is, only use the target labeled data obtained in the above steps to train the target semantic segmentation model, and adjust the model parameters based on the obtained supervised loss. This results in a high-quality optimized semantic segmentation model.
  • the target semantic segmentation model can be trained in a semi-supervised manner based on the target labeled data and combined with unlabeled data, so as to achieve the target semantic segmentation model based on a small amount of target labeled data. Training of target semantic segmentation models to improve model training effects.
  • the specific implementation steps of step S103 include:
  • Step S1031 Obtain second unlabeled data with the same number as the target labeled data.
  • Step S1032 Conduct semi-supervised training on the image semantic segmentation model through the target labeled data and the second unlabeled data to obtain an optimized semantic segmentation model.
  • semi-supervised training is a method of training a model by utilizing unlabeled data as a supplement to labeled data.
  • the target labeled data and the second unlabeled data participate in the model training process by acquiring a corresponding amount of second unlabeled data and generating pseudo labels for the second unlabeled data.
  • the pseudo label of the second unlabeled data can be obtained through model prediction.
  • the specific process of semi-supervised training of the model through labeled data and unlabeled data is an existing technology known to those skilled in the art and will not be used here. Again.
  • an evaluation value corresponding to the first unstandardized data is obtained.
  • the evaluation value representation uses the first The effectiveness of unlabeled data in training the target semantic segmentation model; determine the target unlabeled data based on the evaluation value of the first unlabeled data, and generate the target labeled data corresponding to the target unlabeled data; classify the target based on the target labeled data
  • the semantic segmentation model is optimized to obtain an optimized semantic segmentation model.
  • Training Before training the target semantic segmentation model, first use the evaluation value corresponding to the unlabeled data to filter the unlabeled data, and obtain the labeled data that is more effective for the training effect, and then perform the target semantic segmentation model based on the target labeled data. Training can improve the model training effect and reduce the demand for training samples, thereby obtaining a converged model faster and improving model performance.
  • Figure 5 is a schematic flow chart 2 of an image semantic segmentation model optimization method provided by an embodiment of the present disclosure. This embodiment further refines steps S101 and S103 based on the embodiment shown in Figure 2.
  • the image semantic segmentation model optimization method includes:
  • Step S201 Obtain a plurality of first unlabeled data and a pre-trained target semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network.
  • Step S202 Based on the first encoding and decoding network and the second encoding and decoding network, each first unlabeled data is processed respectively to obtain the first segmentation result map output by the first encoding and decoding network corresponding to each first unlabeled data and The second segmentation result map output by the second encoding and decoding network.
  • Figure 6 is a schematic diagram of a process of data evaluation based on a target semantic segmentation model provided by an embodiment of the present disclosure.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network.
  • the encoding and decoding network is a network structure including an encoder (Encoder) and a decoder (Decoder), which is used to segment image data and output the segmentation result map of the image data.
  • Encoder encoder
  • Decoder decoder
  • one or more intermediate layers for feature extraction are also included.
  • Both the first encoding and decoding network and the second encoding and decoding network have the above encoding and decoding network structures, but the first encoding and decoding network and the second encoding and decoding network correspond to different network parameters.
  • the first encoding and decoding network The encoder included in the decoding network is Encoder_A and the decoder is Decoder_A.
  • the encoder included in the second encoding and decoding network is Encoder_B and the decoder is Decoder_B. Therefore, when processing the same image data, different image segmentation results will be output.
  • the first encoding and decoding network and the second encoding and decoding network perform the coding based on their respective network parameters.
  • the first unlabeled data is processed, and the corresponding first segmentation result map P1 (shown as P1 in the figure) and the second segmentation result map P2 (shown as P2 in the figure) are respectively output.
  • the first segmentation result map P1 It is processed with the sample evaluation model preset with the input value of the second segmentation result map P2.
  • the sample evaluation model uses the first segmentation result map P1 and the second segmentation result map P2 as the overall input value, according to the evaluation strategy in the sample evaluation model ( The figure shows evaluation strategy #1, evaluation strategy #2, evaluation strategy #3, and evaluation strategy #4) are evaluated and the corresponding feature values are obtained. For example, as shown in the figure, the sample evaluation model outputs four features The values are respectively eigenvalue a, eigenvalue b, eigenvalue c and eigenvalue d. Among them, eigenvalue a, eigenvalue b, eigenvalue c, and eigenvalue d respectively represent the evaluation results of the first unstandardized data Data_A under one evaluation dimension.
  • Data_A can be determined as the target unlabeled data, and then Data_A can be labeled to generate target labeled data (not shown in the process diagram).
  • the characteristic value includes at least one of the following:
  • Information entropy evaluation value, difficulty evaluation value, diversity evaluation value, consistency evaluation value is used to characterize the amount of information in the first unstandardized data; the difficulty evaluation value is used to characterize the target semantic segmentation model pair The prediction difficulty of the first unlabeled data; the diversity evaluation value is used to characterize the prediction difference between the first segmentation result map and the second segmentation result map; the consistency evaluation value is used to characterize the first segmentation result map and the second segmentation result map The divergence distance between graphs.
  • the information entropy evaluation value is a segmentation-oriented information entropy evaluation index based on salient areas, so it can also be called regional-level information entropy.
  • the information entropy evaluation value is based on dual-branch (first encoding and decoding network and second encoding and decoding network) prediction
  • the entropy of the region-level probability distribution is used to measure the amount of information of unlabeled samples (the first unlabeled data).
  • a higher information entropy evaluation value indicates that the target semantic segmentation model has greater uncertainty in its prediction, which indicates that the first unlabeled sample
  • the amount of information of unlabeled data is larger, so it has better training effect.
  • y c represents the predicted value of category c in the segmentation result map
  • is the threshold.
  • the information entropy evaluation value is shown in equation (2):
  • m ci represents the value of the generated mask M c .
  • y ci represents the predicted value of output Y.
  • H ⁇ W is the size of the segmentation result map.
  • the calculation process of the final weighted score S RI is shown in Equation (3) :
  • S RI represents the degree of information contained in each unlabeled sample (first unlabeled data). A sample with a higher SRI score means that it contains richer information and is more valuable for subsequent annotation.
  • the difficulty evaluation value is an indicator that introduces a region-level difficulty strategy to select unlabeled data that is difficult to predict in order to measure the difficulty of segmentation-oriented tasks.
  • This strategy first follows the region-level information entropy strategy to obtain the region mask M c corresponding to each category. Then the joint mask M U of all categories is generated, and the calculation process is as shown in Equation (4):
  • M c is the region mask of category c
  • N represents the number of semantic categories
  • represents pixel-by-pixel or operation.
  • Equation (6) The calculation process of the final regional difficulty score, that is, the difficulty evaluation value S RD, is as shown in Equation (6):
  • K represents the number of branches
  • S RD represents the difficulty of the current target semantic segmentation model in predicting the first unlabeled data.
  • the diversity evaluation value that is, the patch-level diversity, is used to characterize the local correlation between the prediction results of the two branches (the first encoding and decoding network and the second encoding and decoding network).
  • a high diversity evaluation value indicates two Branches tend to produce different predictions for the same input, suggesting that there is value in labeling these samples.
  • the output predicted segmentation result map Y ⁇ R N ⁇ H ⁇ W is divided into patches and encoded into patch-level representation Each pixel in Y p represents the local content predicted at the patch level. Then use the flattened patch level representation Calculate cosine similarity between patches and generate autocorrelation matrix The calculation process is shown in equation (7):
  • ⁇ ij represents the correlation between vectors y pi and y pj , where y pi and y pj are patch vectors of Y p .
  • the autocorrelation matrix reflects the local context of the prediction results.
  • the cross-correlation matrix Also obtained in the same way as the autocorrelation matrix is calculated. Then the autocorrelation matrix and the cross-correlation matrix are weighted to calculate the patch-level diversity score, that is, the diversity evaluation value S PD .
  • the calculation process is as shown in Equation (8):
  • ⁇ 1 and ⁇ 2 respectively represent the value of the autocorrelation matrix
  • represents the value of the cross-correlation matrix
  • H p W p ⁇ H p W p represents the size of the correlation matrix
  • is the coefficient balancing the autocorrelation and cross-correlation matrices.
  • the consistency evaluation value is a method used to measure the relationship between the distribution of prediction results of two branches (the first encoding and decoding network and the second encoding and decoding network) from the global dimension.
  • the global level consistency score is proposed to calculate the two branches.
  • the evaluation value of the global KL divergence distance between predictions and the calculation process of the consistency evaluation value S GC are as shown in Equation (9):
  • N ⁇ H ⁇ W represents the size of the output result.
  • a higher SGC score means that the current sample (the first unlabeled data) is difficult to predict and needs to be annotated for training.
  • Step S204 Perform weighted fusion on at least one feature value corresponding to each first unstandardized data to obtain an evaluation value corresponding to each first unstandardized data.
  • weighted fusion is performed on each feature value to obtain an evaluation value corresponding to each first unstandardized data.
  • step S204 include:
  • Step S2041 Obtain the weighting coefficient corresponding to each feature value.
  • the weighting coefficient is determined based on the change in cross-entropy loss corresponding to the target semantic segmentation model.
  • Step S2042 Calculate the weighted sum of each feature value according to each weighting coefficient to obtain the evaluation value corresponding to the first unstandardized data.
  • the weighting coefficient corresponding to each feature value can be determined based on experience, or the method in the steps of this embodiment can be used, based on the change of the cross-entropy loss corresponding to the target semantic segmentation model.
  • the quantity is determined.
  • S All is the evaluation value obtained after weighted calculation
  • S RI is the information entropy evaluation value
  • S RD is the difficulty evaluation value
  • ⁇ 1 is the weighting coefficient of S RD
  • S PD is the diversity evaluation value
  • ⁇ 2 is S PD
  • the weighting coefficient of S GC is the consistency evaluation value
  • ⁇ 3 is the weighting coefficient of S GC . Sort by the evaluation value S All to determine the target unlabeled data. After labeling and generating the target labeled data, use the target labeled data to train the target semantic segmentation model to obtain the corresponding cross entropy loss.
  • ⁇ 1 , ⁇ 2 , and ⁇ 3 within a certain range to adjust the weight coefficient in a direction that reduces the cross-entropy loss, thereby optimizing the value of the weight coefficient and improving the evaluation accuracy of the evaluation value.
  • Step S205 Determine target unlabeled data based on the evaluation value of each first unlabeled data, and generate target labeled data corresponding to the target unlabeled data.
  • Step S206 Obtain the second unlabeled data with the same number as the target labeled data.
  • Step S207 Input the target labeled data and the corresponding second unlabeled data into the first encoding and decoding network, and obtain the first labeled segmentation result image and the first unlabeled segmentation result image output by the first encoding and decoding network.
  • Step S208 Input the target labeled data and the corresponding second unlabeled data into the second encoding and decoding network, and obtain the second labeled segmentation result image and the second unlabeled segmentation result image output by the second encoding and decoding network.
  • Figure 8 is a schematic diagram of a process for generating supervised loss and unsupervised loss based on a target semantic segmentation model provided by an embodiment of the present disclosure. The above steps will be introduced in detail below in conjunction with Figure 8:
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network. After obtaining the target labeled data, the target labeled data and the corresponding number of second unlabeled data are used as input quantities, and the target semantic segmentation is input respectively.
  • the first encoding and decoding network and the second encoding and decoding network in the model independently process their respective input data and output the first labeled segmentation result map and the first unlabeled segmentation result map respectively.
  • a labeled segmentation result map; a second labeled segmentation result map, and a second unlabeled segmentation result map for the above process, please refer to the relevant introduction in the embodiment shown in Figure 6 and will not be described again here.
  • the preset cross-entropy loss function is used to obtain the corresponding supervision loss.
  • the supervision loss includes The first supervision loss and the second supervision loss, the first supervision loss is generated by the first labeled segmentation result map output by the first encoding and decoding network, and the second supervision loss is generated by the second labeled segmentation result map output by the second encoding and decoding network. generate.
  • an additional consistency regularization loss is used to ensure that the prediction results of the two branches remain consistent based on the first unlabeled segmentation result map and the second unlabeled segmentation result map, that is, calculate Unsupervised loss of the first standard-free segmentation result map and the second standard-free segmentation result map.
  • the specific implementation methods of the consistent regular loss function and the cross-entropy loss function are existing technologies known to those skilled in the art and will not be used here. Again.
  • Step S210 Train the target semantic segmentation model based on supervised loss and unsupervised loss to obtain an optimized semantic segmentation model.
  • the first encoding and decoding network has first network parameters
  • the second encoding and decoding network has second network parameters
  • the supervision loss includes a first supervision loss corresponding to the first encoding and decoding network and a second supervision loss corresponding to the second encoding and decoding network.
  • Supervise losses Exemplarily, referring to the schematic diagram of the target semantic segmentation model shown in Figure 8, after obtaining the first supervised loss and the second supervised loss, perform three loss function values of the first supervised loss, the second supervised loss and the unsupervised loss.
  • the optimized semantic segmentation model is trained with the target labeled data and the second unlabeled data as samples, so it can achieve the purpose of improving model performance.
  • using the semi-supervised learning method provided in this embodiment to train the target semantic segmentation model can further improve the training effect, so that the optimized semantic segmentation model obtained after training has better performance.
  • step S210 include:
  • Step S2101 Based on the first supervision loss, optimize the first network parameters to obtain the first optimization parameters.
  • Step S2102 Based on the second supervision loss, optimize the second network parameters to obtain the second optimization parameters.
  • Step S2103 Based on the unsupervised loss, optimize the first optimization parameter and the second optimization parameter to obtain the third optimization parameter corresponding to the first network parameter and the fourth optimization parameter corresponding to the second network parameter.
  • Step S2104 Obtain an optimized semantic segmentation model based on the third optimization parameter and the fourth optimization parameter.
  • the parameters are optimized to independently improve the performance of the first encoding and decoding network and the second encoding and decoding network. After that, the difference between the first encoding and decoding network and the second encoding and decoding network characterized by unsupervised loss has been determined.
  • the first optimization parameters and the second optimization parameters are further optimized to obtain the third optimization parameters corresponding to the first network parameters, and the fourth optimization parameters corresponding to the second network parameters. Optimize parameters.
  • the target labeled data is used for training first, which can better guide the convergence of the model.
  • the first encoding and decoding network is further trained based on unsupervised loss. Optimizing with the second encoding and decoding network can improve the efficiency of model training and optimization, shorten training time, and improve the quality of the optimized model.
  • step S210 you can return to step S201, use the obtained optimized semantic segmentation model as a new target semantic segmentation model, and continue to optimize until the obtained optimized semantic segmentation model reaches the preset performance, or the training sample Data consumption is completed.
  • the specific process of loop optimization is the same as the (primary) optimization process provided in this embodiment, and will not be described again.
  • step S205 is the same as the implementation of step S102 in the embodiment shown in FIG. 2 of the present disclosure, and will not be described again one by one.
  • FIG. 10 is a structural block diagram of an image semantic segmentation model optimization device provided by an embodiment of the present disclosure. For convenience of explanation, only parts related to the embodiments of the present disclosure are shown.
  • the image semantic segmentation model optimization device 3 includes:
  • the evaluation module 31 is used to obtain the first unstandardized data, and evaluate the first unstandardized data based on the pre-trained target semantic segmentation model to obtain an evaluation value corresponding to the first unstandardized data.
  • the evaluation value representation uses the first unstandardized data. The effectiveness of training the target semantic segmentation model with target data;
  • the determination module 32 is configured to determine the target unlabeled data based on the evaluation value of the first unlabeled data, and generate target labeled data corresponding to the target unlabeled data;
  • the optimization module 33 is used to optimize the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first codec network and a second codec network
  • the evaluation module 31 is specifically configured to: based on the first codec network and the second codec network, respectively Process an unlabeled data to obtain the first segmentation result image output by the first encoding and decoding network and the second segmentation result image output by the second encoding and decoding network; based on the preset sample evaluation model, process the first segmentation result image and the second segmentation result image output by the second encoding and decoding network.
  • the feature value includes at least one of the following: information entropy evaluation value, difficulty evaluation value, diversity evaluation value, consistency evaluation value; wherein the information entropy evaluation value is used to characterize the first standard-free data The amount of information in; the difficulty evaluation value is used to characterize the prediction difficulty of the target semantic segmentation model for the first unlabeled data; the diversity evaluation value is used to characterize the prediction difference between the first segmentation result map and the second segmentation result map; The consistency evaluation value is used to characterize The divergence distance between the first segmentation result map and the second segmentation result map.
  • the evaluation module 31 performs weighted fusion on at least one feature value to obtain the evaluation value corresponding to the first unstandardized data, and is specifically used to: obtain the weighting coefficient corresponding to each feature value.
  • the weighting coefficient It is determined based on the change in cross-entropy loss corresponding to the target semantic segmentation model; according to each weighting coefficient, the weighted sum of each feature value is calculated to obtain the evaluation value corresponding to the first unstandardized data.
  • the optimization module 33 is specifically used to: obtain the second unlabeled data with the same number as the target labeled data; perform the image semantic segmentation model through the target labeled data and the second unlabeled data. Semi-supervised training results in an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network
  • the optimization module 33 performs semi-supervision on the image semantic segmentation model through the target labeled data and the second unlabeled data.
  • the optimized semantic segmentation model it is specifically used to: input the target labeled data and the corresponding second unlabeled data into the first encoding and decoding network, and obtain the first labeled segmentation result map and the first labeled segmentation result map output by the first encoding and decoding network.
  • An unlabeled segmentation result graph input the target labeled data and the corresponding second unlabeled data into the second encoding and decoding network to obtain the second labeled segmentation result graph and the second unlabeled segmentation result graph output by the second encoding and decoding network ;
  • the supervised loss is obtained;
  • the unsupervised loss is obtained; based on the supervised loss and the unsupervised loss The loss is used to train the target semantic segmentation model to obtain an optimized semantic segmentation model.
  • the optimization module 33 when the optimization module 33 obtains the supervision loss based on the first labeled segmentation result map and the second labeled segmentation result map, it is specifically used to: calculate the third based on the preset cross-entropy loss function.
  • the first labeled segmentation result map and the second labeled segmentation result map are used to obtain the supervised loss; when the optimization module 33 is used to obtain the unsupervised loss based on the first unlabeled segmentation result map and the second unlabeled segmentation result map, it is specifically used to: Based on the preset consistent regular loss function, the first unsupervised segmentation result map and the second unsupervised segmentation result map are calculated to obtain the unsupervised loss.
  • the first encoding and decoding network has first network parameters
  • the second encoding and decoding network has second network parameters
  • the supervision loss includes the first supervision loss and the second encoding and decoding corresponding to the first encoding and decoding network.
  • the second supervised loss corresponding to the network when the optimization module 33 trains the target semantic segmentation model based on the supervised loss and the unsupervised loss to obtain the optimized semantic segmentation model, is specifically used to: based on the first supervised loss, perform the first network parameters Optimize to obtain the first optimization parameter; based on the second supervised loss, optimize the second network parameters to obtain the second optimization parameter; based on the unsupervised loss, optimize the first optimization parameter and the second optimization parameter to obtain the first network.
  • the third optimization parameter corresponding to the parameter, and the fourth optimization parameter corresponding to the second network parameter based on the third optimization parameter and the fourth optimization parameter, an optimized semantic segmentation model is obtained.
  • the image semantic segmentation model optimization device 3 provided in this embodiment can execute the technical solution of the above method embodiment. Its implementation principles and technical effects are similar, and will not be described again in this embodiment.
  • FIG 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in Figure 11, the electronic device 4 includes:
  • Processor 41 and memory 42 communicatively connected to processor 41;
  • Memory 42 stores computer execution instructions
  • the processor 41 executes the computer execution instructions stored in the memory 42 to implement the image semantic segmentation model optimization method in the embodiment shown in FIGS. 2 to 9 .
  • processor 41 and the memory 42 are connected through the bus 43 .
  • the electronic device 900 may be a terminal device or a server.
  • terminal devices may include but are not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (Portable Android Device, PAD), portable multimedia players (Portable Media Player , PMP), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable multimedia players
  • mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals)
  • fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • the electronic device shown in FIG. 12 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 900 may include a processing device (such as a central processing unit, a graphics processor, etc.) 901, which may process data according to a program stored in a read-only memory (Read Only Memory, ROM) 902 or from a storage device 908
  • the program loaded into the random access memory (Random Access Memory, RAM) 903 performs various appropriate actions and processes.
  • RAM 903 various programs and data required for the operation of the electronic device 900 are also stored.
  • the processing device 901, ROM 902 and RAM 903 are connected to each other via a bus 904.
  • An input/output (I/O) interface 905 is also connected to bus 904.
  • the following devices can be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 907 such as a speaker, a vibrator, etc.; a storage device 908 including a magnetic tape, a hard disk, etc.; and a communication device 909.
  • the communication device 909 may allow the electronic device 900 to communicate wirelessly or wiredly with other devices to exchange data.
  • FIG. 12 illustrates electronic device 900 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 909, or from storage device 908, or from ROM 902.
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
  • Computer readable storage media may include, but are not limited to: an electrical connection having one or more conductors, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, EPROM) or flash memory, optical fiber, portable compact disk read only memory (Compact Disc Read Only Memory, CD-ROM), optical storage device, magnetic storage device, or any of the above suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may be included in the baseband or as A data signal propagated as part of a carrier wave that carries computer-readable program code. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code contained on a computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device When the one or more programs are executed by the electronic device, the electronic device performs the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer ( For example, using an Internet service provider to connect via the Internet).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
  • the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first acquisition unit can also be described as "the unit that acquires at least two Internet Protocol addresses.”
  • exemplary types of hardware logic components include: Field-Programmable Gate Array (FPGA), Application Specific Integrated Circuit (Application Specific Integrated Circuit, ASIC), Application Specification Standard Product (Application Specification) Specific Standard Parts (ASSP), System On Chip (SOC), Complex Programmable Logic Device (CPLD), etc.
  • FPGA Field-Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specification Standard Product
  • SOC System On Chip
  • CPLD Complex Programmable Logic Device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • machine The readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media may include an electrical connection based on one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • an image semantic segmentation model optimization method including:
  • the target semantic segmentation model is optimized based on the target labeled data to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network.
  • the first unlabeled data is processed Evaluate to obtain the evaluation value corresponding to the first unlabeled data, including: processing the first unlabeled data based on the first encoding and decoding network and the second encoding and decoding network respectively to obtain the third A first segmentation result map output by a codec network and a second segmentation result map output by the second codec network; based on a preset sample evaluation model, process the first segmentation result map and the second segmentation result Figure, obtain at least one feature value, the feature value represents the evaluation result of the first unstandardized data in the corresponding evaluation dimension; perform weighted fusion on the at least one feature value, obtain the corresponding value of the first unstandardized data evaluated value.
  • the feature value includes at least one of the following: information entropy evaluation value, difficulty evaluation value, diversity evaluation value, consistency evaluation value; wherein the information entropy evaluation value is used for Characterizes the amount of information in the first unlabeled data; the difficulty evaluation value is used to characterize the prediction difficulty of the target semantic segmentation model for the first unlabeled data; the diversity evaluation value is used to characterize all The predicted difference between the first segmentation result map and the second segmentation result map; the consistency evaluation value is used to characterize the divergence distance between the first segmentation result map and the second segmentation result map .
  • performing weighted fusion on the at least one feature value to obtain an evaluation value corresponding to the first unstandardized data includes: obtaining a weighting coefficient corresponding to each of the feature values, The weighting coefficient is determined based on the variation of the cross-entropy loss corresponding to the target semantic segmentation model; according to each of the weighting coefficients, the weighted sum of each of the feature values is calculated to obtain the first unstandardized data corresponding to The assessed value.
  • optimizing the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model includes: obtaining the same number of target labeled data. 2. Unlabeled data; using the target labeled data and the second unlabeled data, conduct semi-supervised training on the image semantic segmentation model to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network, and through the target labeled data and the second unlabeled data, all Conducting semi-supervised training on the image semantic segmentation model to obtain an optimized semantic segmentation model includes: inputting the target labeled data and the corresponding second unlabeled data into the first encoding and decoding network, and obtaining the output of the first encoding and decoding network.
  • the first labeled segmentation result map and the first unlabeled segmentation result map input the target labeled data and the corresponding second unlabeled data into the second encoding and decoding network to obtain the second labeled segmentation result image and the second labeled segmentation result image output by the second encoding and decoding network.
  • Two unlabeled segmentation result maps based on the first labeled segmentation result map and the second labeled segmentation result map, a supervision loss is obtained; based on the first unlabeled segmentation result map and the second unlabeled segmentation result map As a result, an unsupervised loss is obtained; the target semantic segmentation model is trained based on the supervised loss and the unsupervised loss to obtain the optimized semantic segmentation model.
  • obtaining a supervision loss based on the first labeled segmentation result map and the second labeled segmentation result map includes: calculating based on a preset cross-entropy loss function.
  • the first labeled segmentation result map and the second labeled segmentation result map are used to obtain the supervision loss; based on the first unlabeled segmentation result map and the second unlabeled segmentation result map, we obtain
  • the unsupervised loss includes: calculating the first unsupervised segmentation result map and the second unsupervised segmentation result map based on a preset consistent regular loss function to obtain the unsupervised loss.
  • the first encoding and decoding network has first network parameters
  • the second encoding and decoding network has second network parameters
  • the supervision loss includes the corresponding
  • the first supervised loss and the second supervised loss corresponding to the second encoding and decoding network are trained on the target semantic segmentation model based on the supervised loss and the unsupervised loss to obtain the optimized semantic segmentation model. , including: optimizing the first network parameters based on the first supervision loss to obtain first optimization parameters; optimizing the second network parameters based on the second supervision loss to obtain second optimization parameters ; Based on the unsupervised loss, optimize the first optimization parameter and the second optimization parameter to obtain the third optimization parameter corresponding to the first network parameter and the fourth optimization parameter corresponding to the second network parameter. Optimize parameters; based on the third optimization parameter and the fourth optimization parameter, obtain the optimized semantic segmentation model.
  • an image semantic segmentation model optimization device including:
  • An evaluation module is used to obtain the first unstandardized data, and evaluate the first unstandardized data based on the pre-trained target semantic segmentation model to obtain an evaluation value corresponding to the first unstandardized data.
  • the evaluation value Characterizing the effectiveness of using the first unlabeled data to train the target semantic segmentation model;
  • a determination module configured to determine target unlabeled data based on the evaluation value of the first unlabeled data, and generate target labeled data corresponding to the target unlabeled data;
  • An optimization module is used to optimize the target semantic segmentation model based on the target labeled data to obtain an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network
  • the evaluation module is specifically configured to: based on the first encoding and decoding network and the The second encoding and decoding network processes the first unlabeled data respectively to obtain the first segmentation result image output by the first encoding and decoding network and the second segmentation result image output by the second encoding and decoding network; based on The preset sample evaluation model processes the first segmentation result map and the second segmentation result map to obtain at least one feature value.
  • the feature value represents the evaluation of the first unstandardized data under the corresponding evaluation dimension.
  • Result Weighted fusion is performed on the at least one feature value to obtain an evaluation value corresponding to the first unstandardized data.
  • the feature value includes at least one of the following: information entropy evaluation value, difficulty evaluation value, diversity evaluation value, consistency evaluation value; wherein the information entropy evaluation value is used for Characterizes the amount of information in the first unlabeled data; the difficulty evaluation value is used to characterize the accuracy of the target semantic segmentation model on the third The prediction difficulty of unlabeled data; the diversity evaluation value is used to characterize the prediction difference between the first segmentation result map and the second segmentation result map; the consistency evaluation value is used to characterize the third segmentation result map. The divergence distance between the first segmentation result map and the second segmentation result map.
  • the evaluation module when performing weighted fusion on the at least one feature value to obtain the evaluation value corresponding to the first unstandardized data, is specifically used to: obtain each of the The weighting coefficient corresponding to the feature value, the weighting coefficient is determined based on the variation of the cross entropy loss corresponding to the target semantic segmentation model; according to each of the weighting coefficients, calculate the weighted sum of each of the feature values to obtain the The evaluation value corresponding to the first unstandardized data.
  • the optimization module is specifically configured to: obtain second unlabeled data with the same number as the target labeled data; Mark the data, conduct semi-supervised training on the image semantic segmentation model, and obtain an optimized semantic segmentation model.
  • the target semantic segmentation model includes a first encoding and decoding network and a second encoding and decoding network
  • the optimization module passes the target labeled data and the second unlabeled data.
  • perform semi-supervised training on the image semantic segmentation model to obtain the optimized semantic segmentation model specifically for: inputting the target labeled data and the corresponding second unlabeled data into the first encoding and decoding network to obtain the third
  • the first labeled segmentation result map and the first unlabeled segmentation result map output by a codec network; input the target labeled data and the corresponding second unlabeled data into the second codec network to obtain the second codec network.
  • the second labeled segmentation result map and the second unlabeled segmentation result map output by the decoding network based on the first labeled segmentation result map and the second labeled segmentation result map, a supervision loss is obtained; based on the first The unsupervised segmentation result map and the second unsupervised segmentation result map are used to obtain an unsupervised loss; the target semantic segmentation model is trained based on the supervised loss and the unsupervised loss to obtain the optimized semantic segmentation model.
  • the optimization module when the optimization module obtains a supervision loss based on the first labeled segmentation result map and the second labeled segmentation result map, it is specifically used to: based on a preset The cross-entropy loss function calculates the first labeled segmentation result map and the second labeled segmentation result map to obtain the supervision loss; the optimization module is based on the first unlabeled segmentation result map and the When obtaining the unsupervised loss, the second standard-free segmentation result map is specifically used to: calculate the first standard-free segmentation result map and the second standard-free segmentation result map based on the preset consistent regular loss function to obtain The unsupervised loss.
  • the first encoding and decoding network has first network parameters
  • the second encoding and decoding network has second network parameters
  • the supervision loss includes the corresponding The first supervised loss and the second supervised loss corresponding to the second encoding and decoding network
  • the optimization module trains the target semantic segmentation model based on the supervised loss and the unsupervised loss to obtain the optimization
  • it is specifically used to: optimize the first network parameters based on the first supervision loss to obtain first optimization parameters; optimize the second network parameters based on the second supervision loss.
  • an electronic device including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes computer execution instructions stored in the memory to implement the image semantic segmentation model optimization method described in the first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium is provided.
  • Computer-executable instructions are stored in the computer-readable storage medium.
  • a processor executes the computer-executed instructions, Implement the image semantic segmentation model optimization method described in the first aspect and various possible designs of the first aspect.
  • a computer program product including a computer program that, when executed by a processor, implements the above first aspect and various possible designs of the first aspect.
  • the image semantic segmentation model optimization method is provided, including a computer program that, when executed by a processor, implements the above first aspect and various possible designs of the first aspect.
  • a computer program is provided, the computer program being used to implement the optimization of the image semantic segmentation model as described in the first aspect and various possible designs of the first aspect. method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种图像语义分割模型优化方法、装置、电子设备及存储介质,通过获取第一无标数据,并基于预训练的目标语义分割模型,对第一无标数据进行评估,得到第一无标数据对应的评估值,评估值表征利用第一无标数据对目标语义分割模型进行训练的有效性;根据第一无标数据的评估值,确定目标无标数据,并生成目标无标数据对应的目标有标数据;基于目标有标数据对目标语义分割模型进行优化,得到优化语义分割模型。先使用无标数据对应的评估值对无标数据进行筛选,得到了训练效果更好的目标有标数据,进而基于目标有标数据对目标语义分割模型进行训练。

Description

图像语义分割模型优化方法、装置、电子设备及存储介质
相关申请交叉引用
本申请要求于2022年07月06日提交中国专利局、申请号为202210797439.6、发明名称为“图像语义分割模型优化方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本公开实施例涉及图像处理技术领域,尤其涉及一种图像语义分割模型优化方法、装置、电子设备及存储介质。
背景技术
图像语义分割技术,是指通过对图像中的内容进行识别,从而实现将图像中表达不同含义的物体分割为不同目标的技术,针对图像进行语义分割,是促进人机理解和交互的基本原子能力,广泛应用于各类多媒体应用中。
现有技术中,通常是基于人工样本标注的方式,结合模型监督训练,得到能够实现图像语义分割效果的图像语义分割模型。
发明内容
本公开实施例提供一种图像语义分割模型优化方法、装置、电子设备及存储介质。
第一方面,本公开实施例提供一种图像语义分割模型优化方法,包括:
获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
第二方面,本公开实施例提供一种图像语义分割模型优化装置,包括:
评估模块,用于获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;
确定模块,用于根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;
优化模块,用于基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
第三方面,本公开实施例提供一种电子设备,包括:
处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第六方面,本公开实施例提供一种计算机程序,所述计算机程序被处理器执行时,实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
本实施例提供的图像语义分割模型优化方法、装置、电子设备及存储介质,通过获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。由于在对目标语义分割模型训练前,首先使用无标数据对应的评估值对无标数据进行筛选,得到了对训练效果更加有效的有标数据,进而基于目标有标数据对目标语义分割模型进行训练。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的图像语义分割模型优化方法的一种应用场景图;
图2为本公开实施例提供的图像语义分割模型优化方法的流程示意图一;
图3为本公开实施例提供的一种生成目标有标数据的过程示意图;
图4为图2所示实施例中步骤S103的具体实现步骤流程图;
图5为本公开实施例提供的图像语义分割模型优化方法的流程示意图二;
图6为本公开实施例提供的一种基于目标语义分割模型进行数据评估的过程示意图;
图7为图5所示实施例中步骤S204的具体实现步骤流程图;
图8为本公开实施例提供的一种基于目标语义分割模型生成监督损失和无监督损失的过程示意图;
图9为图5所示实施例中步骤S210的具体实现步骤流程图;
图10为本公开实施例提供的图像语义分割模型优化装置的结构框图;
图11为本公开实施例提供的一种电子设备的结构示意图;
图12为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
下面对本公开实施例的应用场景进行解释:
图1为本公开实施例提供的图像语义分割模型优化方法的一种应用场景图,本公开实施例提供的图像语义分割模型优化方法,可以应用于图像语义分割模型部署前的模型训练的应用场景。具体地,本公开实施例提供的方法,可以应用于终端设备、服务器等用于模型训练的设备,图1中以服务器为例,如图1所示,示例性地,服务器内预存有无标数据和初始化后的图像语义分割模型,服务器首先接收终端设备发送的标注指令,将无标数据标注为有标数据,之后再接收终端设备发送的训练指令,对图像语义分割模型进行训练,得到优化模型,上述过程可多次重复进行,直至满足模型收敛条件,得到能够实现图像语义分割效果的图像语义分割模型。之后,可以将该图像语义分割模型部署至服务器,并响应于其他终端设备或服务器的请求,提供图像语义分割服务。
现有技术中,对图像语义分割模型进行训练,主要是基于人工样本标注的方式,结合模型的监督训练或半监督训练完成的。该过程中,需要用户通过标注指令,人工完成至少一部分的样本标注。然而,此类面向专家的、昂贵且耗时的标注过程,限制了生成的有标样本的数量,因此,在对图像语义分割模型进行训练的过程中,常导致有标训练样本数量不足的问题,从而影响模型训练效果。相关技术中,通过半监督的训练方式,可以生成带有伪标签的无标注数据,来提高模型训练效果,然而,由于所生成的无标注数据带有一定的随机性,容易导致数据样本类别分布不均衡的问题,从而造成训练后的模型的性能波动范围大,性能稳定性差等问题,影响模型的应用表现。由于人工样本标注成本高昂、效率低下,图像语义分割模型训练不充分,影响模型性能的问题。本公开实施例提供一种图像语义分割模型优化方法以解决上述问题。
参考图2,图2为本公开实施例提供的图像语义分割模型优化方法的流程示意图一。本实施例的方法可以应用在具有计算能力的电子设备中,以终端设备为例,该图像语义分割模型优化方法包括:
步骤S101:获取第一无标数据,并基于预训练的目标语义分割模型,对第一无标数据进行评估,得到第一无标数据对应的评估值,评估值表征利用第一无标数据对目标语义分割模型进行训练的有效性。
示例性地,无标数据是指不带有标签信息的图像数据,例如包含人物、风景等内容的照片,该无标数据可以是通过相机拍摄的原始图像,也可以是经过滤镜、特效等图像处理技术处理后的图像,此处不进行限制。无标数据通过互联网等途径可以轻易获取,因此具有获取难度低、获取图像内容丰富等优点。
进一步地,示例性地,第一无标数据是预存在服务器内,用于目标语义分割模型的图像数据。目标语义分割模型是经过至少一轮训练的语义分割模型,目标语义分割模型用于将输入模型的第一无标数据进行处理,输出对应的分割结果图。其中,在输入不同的第一无标数据后,由于第一无标数据的差异,在被模型处理后,模型会预测出不同的 分割结果。例如,将第一无标数据data_1输入目标语义分割模型后,输出分割结果result_1,该结果与真实的分割结果相似或相同;而将第一无标数据data_2输入目标语义分割模型后,输出分割结果result_2,该结果与真实的分割结果相差较大。在之后继续对目标语义分割模型进行优化的过程中,由于data_2未被正确模型正确预测(即模型无法对此类图像数据进行分割),因此使用data_2对模型进行训练的有效性更高。进而,可以根据目标语义分割模型的输出结果对对应的第一无标数据进行评估,从而得到一个评估值,示例性地,该评估值越高,表明利用该输入的第一无标数据对目标语义分割模型进行训练的有效性越高,该第一无标数据更适合于对模型进行训练;反之,该评估值越低,表明利用该输入的第一无标数据对目标语义分割模型进行训练的有效性越低,该第一无标数据更不适合于对模型进行训练。
步骤S102:根据第一无标数据的评估值,确定目标无标数据,并生成目标无标数据对应的目标有标数据。
示例性地,之后,在获得第一无标数据的评估值之后,基于该评估值的大小,确定至少一个第一无标数据,即目标无标数据。具体地,在一种可能的实现方式中,根据各第一无标数据的评估值的大小进行排序,将排序前M个第一无标数据,确定为目标无标数据,其中,M为大于1的整数;在另一种可能的实现方式中,根据预设的评估阈值,对各第一无标数据进行筛选,将评估值大于评估阈值的第一无标数据,确定为目标数据。在又一种可能的实现方式中,还可以结合上述两种方法,即将各第一无标数据的评估值的大小进行排序,将评估值在排序前M个,且大于评估阈值的第一无标数据,确定为目标无标数据。上述几种方式可根据需要设置,此处不进行具体限定。
示例性地,之后,对目标无标数据进行处理,生成对应的目标有标数据。该过程即对目标无标数据进行打标签的过程,基于目标无标数据的图像内容,进行图像分割,生成用于指示图像中物体的位置、类别等信息的分割结果图。在一种可能的实现方式中,在确定目标无标数据后,通过预训练的图像识别模型,识别目标无标数据对应的图像内容标识,之后调用与该图像内容标识对应的图像分割器对该目标无标数据进行图像分割,得到该目标无标数据对应的标注信息,进而组合为该目标无标数据对应的目标有标数据。
在另一种可能的实现方式中,在确定目标无标数据后,终端设备通过接收用户输入的标注指令,实现对目标无标数据的分割,从而生成目标无标数据对应的标注信息,进而组合为该目标无标数据对应的目标有标数据。上述两种生成目标有标数据的方式可根据需要设置,此处不进行具体限定。
图3为本公开实施例提供的一种生成目标有标数据的过程示意图,下面结合图3对生成目标有标数据的过程进一步介绍,如图3所示,将N组无标数据(图中示为无标数据D1至无标数据DN)分别输入目标语义分割模型后,目标语义分割模型输出各无标数据对应的分割结果图(图示为分割结果R1至分割结果RN),之后,利用预设的样本评估模型,对各分割结果进行评估,得到各评估结果对应的评估值(图示为评估值V1至评估值VN),将评估值V1至VN进行排序,其中排名前三(示例性地)的评估值分别为V2、V3、V1。之后,将V2、V3、V1对应的无标数据D2、无标数据D3、无标数据D1分别确定为目标无标数据,在之后,对目标无标数据进行表征,得到对应的用于对目标 语义分割模型进行训练的目标有标数据(图中示为有标数据M1、有标数据M2、有标数据M3)。
本实施例中,通过在对目标语义分割模型训练前,先利用模型当前的参数,对无标数据进行评估,根据评估结果将其中训练效果好的进行标注,从而生成训练效果好的目标有标数据,相比现有技术中从无标数据中进行随机标注得到有标数据的方案,能够得到训练效果更好的目标有标数据,进而通过该目标有标数据对模型进行训练,可以提高模型收敛速度,减少有标数据的需求量。
步骤S103:基于目标有标数据对目标语义分割模型进行优化,得到优化语义分割模型。
示例性地,在得到目标有标数据后,基于目标有标数据对目标语义分割模型进行训练,即可提高目标语义分割模型的图像分割能力,得到优化的语义分割模型,在一种可能的实现方式中,可以基于目标有标数据对目标语义分割模型进行全监督训练,即仅使用上述步骤中得到的目标有标数据对目标语义分割模型进行训练,基于得到的监督损失对模型参数进行调整,从而得到高质量的优化语义分割模型。
在另一种可能的实现方式中,可以在目标有标数据的基础上,结合无标数据,通过半监督方式对目标语义分割模型进行训练,从而在少量目标有标数据的基础上,实现对目标语义分割模型的训练,提高模型训练效果。示例性地,如图4所示,步骤S103的具体实现步骤包括:
步骤S1031:获取与目标有标数据数量相同的第二无标数据。
步骤S1032:通过目标有标数据和第二无标数据,对图像语义分割模型进行半监督训练,得到优化语义分割模型。
示例性地,半监督训练是一种通过利用无标数据作为有标数据的补充,来对模型进行训练的方法。本实施例中,通过获取对应数量的第二无标数据,并为第二无标数据生成伪标签,从而使目标有标数据和第二无标数据参与到模型训练过程中。其中,该第二无标数据的伪标签,可以通过模型预测而获得,通过有标数据和无标数据对模型进行半监督训练的具体过程为本领域技术人员知晓的现有技术,此处不再赘述。
在本实施例中,通过获取第一无标数据,并基于预训练的目标语义分割模型,对第一无标数据进行评估,得到第一无标数据对应的评估值,评估值表征利用第一无标数据对目标语义分割模型进行训练的有效性;根据第一无标数据的评估值,确定目标无标数据,并生成目标无标数据对应的目标有标数据;基于目标有标数据对目标语义分割模型进行优化,得到优化语义分割模型。由于在对目标语义分割模型训练前,首先使用无标数据对应的评估值对无标数据进行筛选,得到了对训练效果更加有效的有标数据,进而基于目标有标数据对目标语义分割模型进行训练,可以提高模型训练效果,降低训练样本的需求量,从而更快的得到收敛的模型,提高模型性能。
参考图5,图5为本公开实施例提供的图像语义分割模型优化方法的流程示意图二。本实施例在图2所示实施例的基础上,对步骤S101和S103进一步细化,该图像语义分割模型优化方法包括:
步骤S201:获取多个第一无标数据和预训练的目标语义分割模型,目标语义分割模型包括第一编解码网络和第二编解码网络。
步骤S202:基于第一编解码网络和第二编解码网络,分别对各第一无标数据进行处理,得到各第一无标数据对应的由第一编解码网络输出的第一分割结果图和由第二编解码网络输出的第二分割结果图。
步骤S203:基于预设的样本评估模型,处理各第一无标数据对应的第一分割结果图和第二分割结果图,得到各第一无标数据对应的至少一个特征值,特征值表征第一无标数据在对应的评估维度下的评估结果。
示例性地,图6为本公开实施例提供的一种基于目标语义分割模型进行数据评估的过程示意图,如图6所示,目标语义分割模型中包括第一编解码网络和第二编解码网络。其中,编解码网络是一种包含一个编码器(Encoder)和一个解码器(Decoder)的网络结构,用于实现对图像数据的分割,输出图像数据的分割结果图。可选地,在编码器和解码器之间,还包括一个或多个用于特征提取的中间层。第一编解码网络和第二编解码网络均具有上述编解码网络结构,但第一编解码网络和第二编解码网络对应不同的网络参数,示例性地,如图所示,即第一编解码网络中包括的编码器为Encoder_A,解码器为Decoder_A,第二编解码网络中包括的编码器为Encoder_B,解码器为Decoder_B。因此,在处理同一个图像数据时,会输出不同的图像分割结果。
进一步地,在第一无标数据Data_A(图中示为Data_A)分别输入第一编解码网络和第二编解码网络后,第一编解码网络和第二编解码网络基于各自的网络参数,对第一无标数据进行处理,并分别输出对应的第一分割结果图P1(图中示为P1)和第二分割结果图P2(图中示为P2),之后,将第一分割结果图P1和第二分割结果图P2输入值预设的样本评估模型进行处理,样本评估模型以第一分割结果图P1和第二分割结果图P2作为整体的输入值,按照样本评估模型中的评估策略(图中示为评估策略#1、评估策略#2、评估策略#3、评估策略#4)进行评估,得到对应的特征值,示例性地,如图中所示,样本评估模型输出四个特征值,分别为特征值a、特征值b、特征值c和特征值d。其中,特征值a、特征值b、特征值c、特征值d分别表示该第一无标数据Data_A在一个评估维度下的评估结果。之后,根据一个或多个特征值进行判断后,若满足判断条件,可以将Data_A确定为目标无标数据,进而,对Data_A进行标注,生成目标有标数据(该过程图中未示出)。
其中,示例性地,特征值包括以下至少一种:
信息熵评估值、难度评估值、多样性评估值、一致性评估值;其中,信息熵评估值用于表征第一无标数据中的信息量大小;难度评估值用于表征目标语义分割模型对第一无标数据的预测难度;多样性评估值用于表征第一分割结果图和第二分割结果图之间的预测差异;一致性评估值用于表征第一分割结果图和第二分割结果图之间的散度距离。参考图6所示,例如,特征值a为信息熵评估值、特征值b为难度评估值、特征值c为多样性评估值、特征值d为一致性评估值。通过上述特征值的实现方式,可以实现对第一无标数据从多个评估维度的评估,从而准确的确定第一无标数据对目标语义分割模型进行训练的有效性,提高后续步骤中生成的有标数据的质量,提高模型训练效果。
下面对上述各特征值进行详细介绍:
信息熵评估值,是基于显著区域的面向分割的信息熵评测指标,因此也可以称为区域级的信息熵。信息熵评估值是基于双分支(第一编解码网络和第二编解码网络)预测 的区域级概率分布的熵,来衡量未标记样本(第一无标数据)的信息量,较高的信息熵评估值表明目标语义分割模型对其预测的不确定性更大,这表明该第一无标数据的信息量更大,因此具有更好的训练效果,应该确定为目标无标数据执行进行后续的标记步骤(生成目标有标数据)。为了适应以像素为重点的分割的特性,该信息熵评估值更加关注前景概念并掩盖了部分背景区域。本实施例中,将预测中的像素值高于阈值τ视为分割核心区域。因此c类别对应的区域掩码Mc如式(1)所示:
其中,yc表示分割结果图中c类别的预测值,τ为阈值。进而,信息熵评估值如式(2)所示:
其中,mci表示生成掩码Mc的值。yci表示输出Y的预测值。H×W是分割结果图的大小。N表示语义类别的数量。代表了第k个分支(k=1时,为第一编解码网络,k=2时,为第二编解码网络)的信息熵,最终加权分数SRI的计算过程如式(3)所示:
其中K表示分支的数量。得到的区域级信息分数SRI表示每个未标注样本(第一无标数据)中包含的信息量程度。SRI分数较高的样本意味着它包含的信息更丰富,对后续标注更有价值。
难度评估值,是一种为了衡量面向分割任务的难度性,引入区域级难度策略来选择难以预测的未标注数据的指标。该策略首先遵循区域级信息熵策略获取每个类别对应的区域掩码Mc。然后生成所有类别的联合掩码MU,计算过程如式(4)所示:
其中,Mc为c类别的区域掩码,N表示语义类别的数量,∪表示逐像素或操作。
区域级数的计算过程如式(5)所示:
其中,表示联合掩码的值,conf(yi)表示具有最大化操作的分割结果图的置信度值。是第k个分支获得的分数。最终区域级难度得分,即难度评估值SRD的计算过程如式(6)所示:
其中,K表示分支的数量,SRD表示当前的目标语义分割模型预测第一无标数据的难度。
多样性评估值,即补丁级的多样性,用于表征两个分支(第一编解码网络和第二编解码网络)的预测结果之间的局部相关性,多样性评估值高,说明两个分支倾向于对相同的输入产生不同的预测,这表明这些样本进行标注是有价值的。
首先,输出预测的分割结果图Y∈RN×H×W被分成补丁并编码为补丁级表示Yp中的每个像素表示补丁级别预测的局部内容。然后使用展平后的补丁级别表示计算补丁之间的余弦相似度并生成自相关矩阵 计算过程如式(7)所示:
其中φij表示向量ypi和ypj之间的相关性,其中ypi和ypj是Yp的补丁向量。自相关矩阵反映了预测结果的局部上下文关系。另外,互相关矩阵也像自相关矩阵计算方式一样获得。然后将自相关矩阵和互相关矩阵加权,计算出补丁级的多样性分数,即多样性评估值SPD,计算过程如式(8)所示:
其中φ1和φ2分别表示自相关矩阵的值,ψ表示互相关矩阵的值。HpWp×HpWp表示相关矩阵的大小。α是平衡自相关和互相关矩阵的系数。较高的SPD意味着两个分支对同一样本的预测存在显著差异,说明当前样本难以区分,此类样本(第一无标数据)进行标注更有价值。
一致性评估值,是一种用于从全局维度衡量两个分支(第一编解码网络和第二编解码网络)预测结果的分布之间关系,而提出了全局级一致性分数来计算两个预测之间的全局KL散度距离的评估值,一致性评估值SGC的计算过程如式(9)所示:
其中y1i和y2i表示两个分支(第一编解码网络和第二编解码网络)的预测结果的像素值。N×H×W表示输出结果的大小。较高的SGC分数意味着当前样本(第一无标数据)难以预测,需要进行标注用于训练。
步骤S204:对各第一无标数据对应的至少一个特征值进行加权融合,得到各第一无标数据对应的评估值。
示例性地,在获得上述至少一个特征值后,对各特征值进行加权融合,得到各第一无标数据对应的评估值,其中,评估值的分数越高,代表该第一无标数据越有效,越需要被确定为目标无标数据进行后续标注。
在一种可能的实现方式中,如图7所示,步骤S204的具体实现步骤包括:
步骤S2041:获取各特征值对应的加权系数,加权系数是基于目标语义分割模型对应的交叉熵损失的变化量确定的。
步骤S2042:根据各加权系数,计算各特征值的加权和,得到第一无标数据对应的评估值。
示例性地,在对各特征值进行加权融合时,各特征值对应的加权系数,可以基于经验确定,也可以利用本实施例步骤中的方法,基于目标语义分割模型对应的交叉熵损失的变化量确定,具体地,例如,第一无标数据对应的评估值如式(10)所示:
SAll=SRI1SRD2SPD3SGC      (10)
其中,SAll为加权计算后得到的评估值,SRI为信息熵评估值,SRD为难度评估值,λ1为SRD的加权系数,SPD为多样性评估值,λ2为SPD的加权系数,SGC为一致性评估值,λ3为SGC的加权系数。通过该评估值SAll进行排序,确定目标无标数据,进行标注生成目标有标数据后,利用该目标有标数据对目标语义分割模型进行训练,得到对应的交叉熵损失,根据该交叉熵损失的大小,对λ1、λ2、λ3在一定范围内进行调整,使权重系数向使交叉熵损失变小的的方向调节,从而优化权重系数的取值,提高评估值的评估准确性。
步骤S205:根据各第一无标数据的评估值,确定目标无标数据,并生成目标无标数据对应的目标有标数据。
步骤S206:获取与目标有标数据数量相同的第二无标数据。
步骤S207:将目标有标数据和对应的第二无标数据输入第一编解码网络,得到第一编解码网络输出的第一有标分割结果图和第一无标分割结果图。
步骤S208:将目标有标数据和对应的第二无标数据输入第二编解码网络,得到第二编解码网络输出的第二有标分割结果图和第二无标分割结果图。
步骤S209:基于第一有标分割结果图和第二有标分割结果图,得到监督损失;基于第一无标分割结果图和第二无标分割结果图,得到无监督损失。
图8为本公开实施例提供的一种基于目标语义分割模型生成监督损失和无监督损失的过程示意图,下面结合图8所示,对上述步骤进行详细介绍:
目标语义分割模型中包括第一编解码网络和第二编解码网络,在得到目标有标数据后,将目标有标数据和对应数量的第二无标数据作为输入量,分别输入该目标语义分割模型中的第一编解码网络和第二编解码网络,第一编解码网络和第二编解码网络独立的对各自输入的数据进行处理后,分别输出第一有标分割结果图、第一无标分割结果图;以及第二有标分割结果图、第二无标分割结果图。上述过程可参见图6所示实施例中相关介绍,此处不再赘述。
之后,对于目标有标数据,基于第一有标分割结果图和其对应的第一标注信息,利用预设的交叉熵损失函数,得到对应的监督损失,其中,如图所示,监督损失包括第一监督损失和第二监督损失,第一监督损失由第一编解码网络输出的第一有标分割结果图生成,第二监督损失由第二编解码网络输出的第二有标分割结果图生成。
示例性地,对于第二无标数据,通过基于第一无标分割结果图和第二无标分割结果图,利用一个额外的一致性正则损失来确保两个分支的预测结果保持一致,即计算第一无标分割结果图和第二无标分割结果图的无监督损失,其中,一致性正则损失函数和交叉熵损失函数的具体实现方法为本领域技术人员知晓的现有技术,此处不再赘述。
步骤S210:基于监督损失和无监督损失对目标语义分割模型进行训练,得到优化语义分割模型。
示例性地,第一编解码网络具有第一网络参数,第二编解码网络具有第二网络参数,监督损失包括第一编解码网络对应的第一监督损失和第二编解码网络对应的第二监督损失。示例性地,参考图8所示的目标语义分割模型的示意图,在获得第一监督损失和第二监督损失之后,对第一监督损失、第二监督损失和无监督损失三个损失函数值进行加权计算后,例如计算三种平均值,之后进行反向梯度传播,更新第一编解码网络中的第一网络参数和第二编解码网络中的第二网络参数,从而得到优化语义分割模型。优化后的优化语义分割模型由于以目标有标数据和第二无标数据为样本进行了训练,因此可以实现提高模型性能的目的,同时,由于有标目标数据中的内容是经过筛选的高价值内容,因此使用本实施例提供的半监督学习方法对目标语义分割模型进行训练,可以进一步的提高训练效果,使训练后得到的优化语义分割模型具有更好的性能。
在一种可能的实现方式中,如图9所示,步骤S210的具体实现步骤包括:
步骤S2101:基于第一监督损失,对第一网络参数进行优化,得到第一优化参数。
步骤S2102:基于第二监督损失,对第二网络参数进行优化,得到第二优化参数。
步骤S2103:基于无监督损失,对第一优化参数和第二优化参数进行优化,得到第一网络参数对应的第三优化参数,以及第二网络参数对应的第四优化参数。
步骤S2104:基于第三优化参数和第四优化参数,得到优化语义分割模型。
示例性地,本实施例步骤中,在得到第一监督损失和第二监督损失后,首先通过第一监督损失和第二监督损失,对目标语义分割模型中的第一网络参数和第二网络参数进行优化,独立的提高第一编解码网络和第二编解码网络的性能,之后,在基于无监督损失所表征的第一编解码网络和第二编解码网络之间的差异性,在已得到的第一优化参数和第二优化参数的基础上,进一步对第一优化参数和第二优化参数进行优化,得到第一网络参数对应的第三优化参数,以及第二网络参数对应的第四优化参数。本实施例中,由于有标数据的性能、训练效果更好,因此,先使用目标有标数据进行训练,可以更好的引导模型收敛,之后,再基于无监督损失进一步对第一编解码网络和第二编解码网络进行优化,可以提高模型训练和优化的效率,缩短训练时间,提高优化后的模型质量。
可选地,在执行完步骤S210后,可返回步骤S201,将得到的优化语义分割模型作为新的目标语义分割模型,继续进行优化,直至得到的优化语义分割模型达到预设性能,或者训练样本数据消耗完毕。循环优化的具体过程与本实施例提供的(一次)优化过程相同,不再赘述。
本实施例中,步骤S205的实现方式与本公开图2所示实施例中的步骤S102的实现方式相同,在此不再一一赘述。
对应于上文实施例的图像语义分割模型优化方法,图10为本公开实施例提供的图像语义分割模型优化装置的结构框图。为了便于说明,仅示出了与本公开实施例相关的部分。参照图10,图像语义分割模型优化装置3,包括:
评估模块31,用于获取第一无标数据,并基于预训练的目标语义分割模型,对第一无标数据进行评估,得到第一无标数据对应的评估值,评估值表征利用第一无标数据对目标语义分割模型进行训练的有效性;
确定模块32,用于根据第一无标数据的评估值,确定目标无标数据,并生成目标无标数据对应的目标有标数据;
优化模块33,用于基于目标有标数据对目标语义分割模型进行优化,得到优化语义分割模型。
在本公开的一个实施例中,目标语义分割模型包括第一编解码网络和第二编解码网络,评估模块31,具体用于:基于第一编解码网络和第二编解码网络,分别对第一无标数据进行处理,得到第一编解码网络输出的第一分割结果图和第二编解码网络输出的第二分割结果图;基于预设的样本评估模型,处理第一分割结果图和第二分割结果图,得到至少一个特征值,特征值表征第一无标数据在对应的评估维度下的评估结果;对至少一个特征值进行加权融合,得到第一无标数据对应的评估值。
在本公开的一个实施例中,特征值包括以下至少一种:信息熵评估值、难度评估值、多样性评估值、一致性评估值;其中,信息熵评估值用于表征第一无标数据中的信息量大小;难度评估值用于表征目标语义分割模型对第一无标数据的预测难度;多样性评估值用于表征第一分割结果图和第二分割结果图之间的预测差异;一致性评估值用于表征 第一分割结果图和第二分割结果图之间的散度距离。
在本公开的一个实施例中,评估模块31,在对至少一个特征值进行加权融合,得到第一无标数据对应的评估值时,具体用于:获取各特征值对应的加权系数,加权系数是基于目标语义分割模型对应的交叉熵损失的变化量确定的;根据各加权系数,计算各特征值的加权和,得到第一无标数据对应的评估值。
在本公开的一个实施例中,优化模块33,具体用于:获取与目标有标数据数量相同的第二无标数据;通过目标有标数据和第二无标数据,对图像语义分割模型进行半监督训练,得到优化语义分割模型。
在本公开的一个实施例中,目标语义分割模型包括第一编解码网络和第二编解码网络,优化模块33在通过目标有标数据和第二无标数据,对图像语义分割模型进行半监督训练,得到优化语义分割模型时,具体用于:将目标有标数据和对应的第二无标数据输入第一编解码网络,得到第一编解码网络输出的第一有标分割结果图和第一无标分割结果图;将目标有标数据和对应的第二无标数据输入第二编解码网络,得到第二编解码网络输出的第二有标分割结果图和第二无标分割结果图;基于第一有标分割结果图和第二有标分割结果图,得到监督损失;基于第一无标分割结果图和第二无标分割结果图,得到无监督损失;基于监督损失和无监督损失对目标语义分割模型进行训练,得到优化语义分割模型。
在本公开的一个实施例中,优化模块33在基于第一有标分割结果图和第二有标分割结果图,得到监督损失时,具体用于:基于预设的交叉熵损失函数,计算第一有标分割结果图和第二有标分割结果图,得到监督损失;优化模块33在基于第一无标分割结果图和第二无标分割结果图,得到无监督损失时,具体用于:基于预设的一致性正则损失函数,计算第一无标分割结果图和第二无标分割结果图,得到无监督损失。
在本公开的一个实施例中,第一编解码网络具有第一网络参数,第二编解码网络具有第二网络参数,监督损失包括第一编解码网络对应的第一监督损失和第二编解码网络对应的第二监督损失,优化模块33在基于监督损失和无监督损失对目标语义分割模型进行训练,得到优化语义分割模型时,具体用于:基于第一监督损失,对第一网络参数进行优化,得到第一优化参数;基于第二监督损失,对第二网络参数进行优化,得到第二优化参数;基于无监督损失,对第一优化参数和第二优化参数进行优化,得到第一网络参数对应的第三优化参数,以及第二网络参数对应的第四优化参数;基于第三优化参数和第四优化参数,得到优化语义分割模型。
其中,评估模块31、确定模块32、优化模块33依次连接。本实施例提供的图像语义分割模型优化装置3可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图11为本公开实施例提供的一种电子设备的结构示意图,如图11所示,该电子设备4包括:
处理器41,以及与处理器41通信连接的存储器42;
存储器42存储计算机执行指令;
处理器41执行存储器42存储的计算机执行指令,以实现如图2-图9所示实施例中的图像语义分割模型优化方法。
其中,可选地,处理器41和存储器42通过总线43连接。
相关说明可以对应参见图2-图9所对应的实施例中的步骤所对应的相关描述和效果进行理解,此处不做过多赘述。
参考图12,其示出了适于用来实现本公开实施例的电子设备900的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图12示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图12所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(Read Only Memory,ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(Input/Output,I/O)接口905也连接至总线904。
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图12示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为 载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field-Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System On Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器 可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例可以包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种图像语义分割模型优化方法,包括:
获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
根据本公开的一个或多个实施例,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,包括:基于所述第一编解码网络和所述第二编解码网络,分别对所述第一无标数据进行处理,得到所述第一编解码网络输出的第一分割结果图和所述第二编解码网络输出的第二分割结果图;基于预设的样本评估模型,处理所述第一分割结果图和所述第二分割结果图,得到至少一个特征值,所述特征值表征所述第一无标数据在对应的评估维度下的评估结果;对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值。
根据本公开的一个或多个实施例,所述特征值包括以下至少一种:信息熵评估值、难度评估值、多样性评估值、一致性评估值;其中,所述信息熵评估值用于表征所述第一无标数据中的信息量大小;所述难度评估值用于表征所述目标语义分割模型对所述第一无标数据的预测难度;所述多样性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的预测差异;所述一致性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的散度距离。
根据本公开的一个或多个实施例,所述对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值,包括:获取各所述特征值对应的加权系数,所述加权系数是基于所述目标语义分割模型对应的交叉熵损失的变化量确定的;根据各所述加权系数,计算各所述特征值的加权和,得到所述第一无标数据对应的评估值。
根据本公开的一个或多个实施例,所述基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型,包括:获取与所述目标有标数据数量相同的第二无标数据;通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到优化语义分割模型。
根据本公开的一个或多个实施例,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到优化语义分割模型,包括:将所述目标有标数据和对应的第二无标数据输入第一编解码网络,得到所述第一编解码网络输出的第一有标分割结果图 和第一无标分割结果图;将所述目标有标数据和对应的第二无标数据输入第二编解码网络,得到所述第二编解码网络输出的第二有标分割结果图和第二无标分割结果图;基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失;基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失;基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型。
根据本公开的一个或多个实施例,所述基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失,包括:基于预设的交叉熵损失函数,计算所述第一有标分割结果图和所述第二有标分割结果图,得到所述监督损失;所述基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失,包括:基于预设的一致性正则损失函数,计算所述第一无标分割结果图和所述第二无标分割结果图,得到所述无监督损失。
根据本公开的一个或多个实施例,所述第一编解码网络具有第一网络参数,所述第二编解码网络具有第二网络参数,所述监督损失包括所述第一编解码网络对应的第一监督损失和所述第二编解码网络对应的第二监督损失,所述基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型,包括:基于所述第一监督损失,对所述第一网络参数进行优化,得到第一优化参数;基于所述第二监督损失,对所述第二网络参数进行优化,得到第二优化参数;基于所述无监督损失,对所述第一优化参数和所述第二优化参数进行优化,得到所述第一网络参数对应的第三优化参数,以及所述第二网络参数对应的第四优化参数;基于所述第三优化参数和所述第四优化参数,得到所述优化语义分割模型。
第二方面,根据本公开的一个或多个实施例,提供了一种图像语义分割模型优化装置,包括:
评估模块,用于获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;
确定模块,用于根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;
优化模块,用于基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
根据本公开的一个或多个实施例,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述评估模块,具体用于:基于所述第一编解码网络和所述第二编解码网络,分别对所述第一无标数据进行处理,得到所述第一编解码网络输出的第一分割结果图和所述第二编解码网络输出的第二分割结果图;基于预设的样本评估模型,处理所述第一分割结果图和所述第二分割结果图,得到至少一个特征值,所述特征值表征所述第一无标数据在对应的评估维度下的评估结果;对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值。
根据本公开的一个或多个实施例,所述特征值包括以下至少一种:信息熵评估值、难度评估值、多样性评估值、一致性评估值;其中,所述信息熵评估值用于表征所述第一无标数据中的信息量大小;所述难度评估值用于表征所述目标语义分割模型对所述第 一无标数据的预测难度;所述多样性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的预测差异;所述一致性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的散度距离。
根据本公开的一个或多个实施例,所述评估模块,在对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值时,具体用于:获取各所述特征值对应的加权系数,所述加权系数是基于所述目标语义分割模型对应的交叉熵损失的变化量确定的;根据各所述加权系数,计算各所述特征值的加权和,得到所述第一无标数据对应的评估值。
根据本公开的一个或多个实施例,所述优化模块,具体用于:获取与所述目标有标数据数量相同的第二无标数据;通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到优化语义分割模型。
根据本公开的一个或多个实施例,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述优化模块在通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到优化语义分割模型时,具体用于:将所述目标有标数据和对应的第二无标数据输入第一编解码网络,得到所述第一编解码网络输出的第一有标分割结果图和第一无标分割结果图;将所述目标有标数据和对应的第二无标数据输入第二编解码网络,得到所述第二编解码网络输出的第二有标分割结果图和第二无标分割结果图;基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失;基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失;基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型。
根据本公开的一个或多个实施例,所述优化模块在基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失时,具体用于:基于预设的交叉熵损失函数,计算所述第一有标分割结果图和所述第二有标分割结果图,得到所述监督损失;所述优化模块在基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失时,具体用于:基于预设的一致性正则损失函数,计算所述第一无标分割结果图和所述第二无标分割结果图,得到所述无监督损失。
根据本公开的一个或多个实施例,所述第一编解码网络具有第一网络参数,所述第二编解码网络具有第二网络参数,所述监督损失包括所述第一编解码网络对应的第一监督损失和所述第二编解码网络对应的第二监督损失,所述优化模块在基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型时,具体用于:基于所述第一监督损失,对所述第一网络参数进行优化,得到第一优化参数;基于所述第二监督损失,对所述第二网络参数进行优化,得到第二优化参数;基于所述无监督损失,对所述第一优化参数和所述第二优化参数进行优化,得到所述第一网络参数对应的第三优化参数,以及所述第二网络参数对应的第四优化参数;基于所述第三优化参数和所述第四优化参数,得到所述优化语义分割模型。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第五方面,根据本公开的一个或多个实施例,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
第六方面,根据本公开的一个或多个实施例,提供了一种计算机程序,所述计算机程序用于实现如上第一方面以及第一方面各种可能的设计所述的图像语义分割模型优化方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种图像语义分割模型优化方法,包括:
    获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;
    根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;
    基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
  2. 根据权利要求1所述的方法,其中,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,包括:
    基于所述第一编解码网络和所述第二编解码网络,分别对所述第一无标数据进行处理,得到所述第一编解码网络输出的第一分割结果图和所述第二编解码网络输出的第二分割结果图;
    基于预设的样本评估模型,处理所述第一分割结果图和所述第二分割结果图,得到至少一个特征值,所述特征值表征所述第一无标数据在对应的评估维度下的评估结果;
    对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值。
  3. 根据权利要求2所述的方法,其中,所述特征值包括以下至少一种:
    信息熵评估值、难度评估值、多样性评估值、一致性评估值;
    其中,所述信息熵评估值用于表征所述第一无标数据中的信息量大小;
    所述难度评估值用于表征所述目标语义分割模型对所述第一无标数据的预测难度;
    所述多样性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的预测差异;
    所述一致性评估值用于表征所述第一分割结果图和所述第二分割结果图之间的散度距离。
  4. 根据权利要求2或3所述的方法,其中,所述对所述至少一个特征值进行加权融合,得到所述第一无标数据对应的评估值,包括:
    获取各所述特征值对应的加权系数,所述加权系数是基于所述目标语义分割模型对应的交叉熵损失的变化量确定的;
    根据各所述加权系数,计算各所述特征值的加权和,得到所述第一无标数据对应的评估值。
  5. 根据权利要求1至4中任一项所述的方法,其中,所述基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型,包括:
    获取与所述目标有标数据数量相同的第二无标数据;
    通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到所述优化语义分割模型。
  6. 根据权利要求5所述的方法,其中,所述目标语义分割模型包括第一编解码网络和第二编解码网络,所述通过所述目标有标数据和所述第二无标数据,对所述图像语义分割模型进行半监督训练,得到所述优化语义分割模型,包括:
    将所述目标有标数据和对应的所述第二无标数据输入所述第一编解码网络,得到所述第一编解码网络输出的第一有标分割结果图和第一无标分割结果图;
    将所述目标有标数据和对应的所述第二无标数据输入所述第二编解码网络,得到所述第二编解码网络输出的第二有标分割结果图和第二无标分割结果图;
    基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失;
    基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失;
    基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型。
  7. 根据权利要求6所述的方法,其中,所述基于所述第一有标分割结果图和所述第二有标分割结果图,得到监督损失,包括:
    基于预设的交叉熵损失函数,计算所述第一有标分割结果图和所述第二有标分割结果图,得到所述监督损失;
    所述基于所述第一无标分割结果图和所述第二无标分割结果图,得到无监督损失,包括:
    基于预设的一致性正则损失函数,计算所述第一无标分割结果图和所述第二无标分割结果图,得到所述无监督损失。
  8. 根据权利要求6或7所述的方法,其中,所述第一编解码网络具有第一网络参数,所述第二编解码网络具有第二网络参数,所述监督损失包括所述第一编解码网络对应的第一监督损失和所述第二编解码网络对应的第二监督损失,所述基于所述监督损失和所述无监督损失对所述目标语义分割模型进行训练,得到所述优化语义分割模型,包括:
    基于所述第一监督损失,对所述第一网络参数进行优化,得到第一优化参数;
    基于所述第二监督损失,对所述第二网络参数进行优化,得到第二优化参数;
    基于所述无监督损失,对所述第一优化参数和所述第二优化参数进行优化,得到所述第一网络参数对应的第三优化参数,以及所述第二网络参数对应的第四优化参数;
    基于所述第三优化参数和所述第四优化参数,得到所述优化语义分割模型。
  9. 一种图像语义分割模型优化装置,包括:
    评估模块,用于获取第一无标数据,并基于预训练的目标语义分割模型,对所述第一无标数据进行评估,得到所述第一无标数据对应的评估值,所述评估值表征利用所述第一无标数据对所述目标语义分割模型进行训练的有效性;
    确定模块,用于根据所述第一无标数据的评估值,确定目标无标数据,并生成所述目标无标数据对应的目标有标数据;
    优化模块,用于基于所述目标有标数据对所述目标语义分割模型进行优化,得到优化语义分割模型。
  10. 一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1至8中任一项所述的图像语义分割模型优化方法。
  11. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至8中任一项所述的图像语义 分割模型优化方法。
  12. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的图像语义分割模型优化方法。
  13. 一种计算机程序,所述计算机程序用于实现如权利要求1至8中任一项所述的图像语义分割模型优化方法。
PCT/CN2023/103988 2022-07-06 2023-06-29 图像语义分割模型优化方法、装置、电子设备及存储介质 WO2024007958A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210797439.6 2022-07-06
CN202210797439.6A CN117409194A (zh) 2022-07-06 2022-07-06 图像语义分割模型优化方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024007958A1 true WO2024007958A1 (zh) 2024-01-11

Family

ID=89454278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103988 WO2024007958A1 (zh) 2022-07-06 2023-06-29 图像语义分割模型优化方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117409194A (zh)
WO (1) WO2024007958A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381098A (zh) * 2020-11-19 2021-02-19 上海交通大学 基于目标分割领域自学习的半监督学习方法和系统
US20220189149A1 (en) * 2019-09-27 2022-06-16 Fujifilm Corporation Information processing apparatus, method for operating information processing apparatus, and operating program of information processing apparatus
WO2022127071A1 (zh) * 2020-12-18 2022-06-23 上海商汤智能科技有限公司 网络训练方法、图像分割方法、装置、设备、介质及产品

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220189149A1 (en) * 2019-09-27 2022-06-16 Fujifilm Corporation Information processing apparatus, method for operating information processing apparatus, and operating program of information processing apparatus
CN112381098A (zh) * 2020-11-19 2021-02-19 上海交通大学 基于目标分割领域自学习的半监督学习方法和系统
WO2022127071A1 (zh) * 2020-12-18 2022-06-23 上海商汤智能科技有限公司 网络训练方法、图像分割方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN117409194A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
CN111476309B (zh) 图像处理方法、模型训练方法、装置、设备及可读介质
EP3796189A1 (en) Video retrieval method, and method and apparatus for generating video retrieval mapping relationship
WO2020107624A1 (zh) 信息推送方法、装置、电子设备及计算机可读存储介质
WO2020228405A1 (zh) 图像处理方法、装置及电子设备
CN112149699B (zh) 用于生成模型的方法、装置和用于识别图像的方法、装置
CN112364829B (zh) 一种人脸识别方法、装置、设备及存储介质
CN113327599B (zh) 语音识别方法、装置、介质及电子设备
CN111783712A (zh) 一种视频处理方法、装置、设备及介质
CN117437516A (zh) 语义分割模型训练方法、装置、电子设备及存储介质
CN113449070A (zh) 多模态数据检索方法、装置、介质及电子设备
CN115905622A (zh) 视频标注方法、装置、设备、介质及产品
WO2024012255A1 (zh) 语义分割模型训练方法、装置、电子设备及存储介质
CN109359727B (zh) 神经网络的结构确定方法、装置、设备及可读介质
CN114880513A (zh) 一种目标检索方法及相关装置
CN114626551A (zh) 文本识别模型的训练方法、文本识别方法及相关装置
CN114139703A (zh) 知识蒸馏方法及装置、存储介质及电子设备
CN113140012A (zh) 图像处理方法、装置、介质及电子设备
WO2024007958A1 (zh) 图像语义分割模型优化方法、装置、电子设备及存储介质
CN113239215B (zh) 多媒体资源的分类方法、装置、电子设备及存储介质
CN116992947A (zh) 模型训练方法、视频查询方法和装置
CN115937020A (zh) 图像处理方法、装置、设备、介质和程序产品
CN114330239A (zh) 文本处理方法及装置、存储介质及电子设备
CN116883708A (zh) 图像分类方法、装置、电子设备及存储介质
CN110704679B (zh) 视频分类方法、装置及电子设备
CN111339367B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834725

Country of ref document: EP

Kind code of ref document: A1