TWI778895B

TWI778895B - Saliency detection method and model training method, equipment and computer readable storage medium

Info

Publication number: TWI778895B
Application number: TW110147598A
Authority: TW
Inventors: 秦梓鵬; 黃健文; 黃展鵬
Original assignee: 大陸商深圳市慧鯉科技有限公司
Priority date: 2021-06-30
Filing date: 2021-12-17
Publication date: 2022-09-21
Also published as: TW202303446A; CN113505799B; WO2023273069A1; CN113505799A

Abstract

The embodiment of the application discloses a saliency detection method and a training method, equipment and computer-readable storage medium of the saliency detection model. The training method of the saliency detection model includes: acquiring at least one sample image, wherein, at least one sample image Including the target sample image belonging to the preset image type; filtering the target sample image based on the missing contour of the salient area in the target sample image; using the saliency detection model to detect the filtered sample image to obtain the predicted location information of the salient region in the sample image; adjusting the parameters of the saliency detection model based on the label location information and predicted location information of the saliency area of the sample image. In the above solution, the saliency detection model is trained by filtering the sample images and then using the filtered sample images, which can improve the accuracy of the output results of the saliency detection model.

Description

Saliency detection method and model training method, device and computer-readable storage medium

本發明關於圖像處理技術領域，特別是關於一種顯著性檢測方法及其模型的訓練方法、設備及電腦可讀儲存介質。 The present invention relates to the technical field of image processing, and in particular, to a saliency detection method and its model training method, device and computer-readable storage medium.

目前，在對模型進行訓練的過程中，只是簡單從樣本圖像資料庫中獲取一定資料的樣本圖像，並直接使用這部分樣本圖像對模型進行訓練。但是有的樣本圖像本身存在一定的缺陷，若使用這部分樣本圖像對模型進行訓練，會導致訓練後的模型對圖像進行處理得到的結果的準確度不高。 At present, in the process of training the model, the sample images of certain data are simply obtained from the sample image database, and the model is directly trained using these sample images. However, some sample images themselves have certain defects. If these sample images are used to train the model, the accuracy of the results obtained by the trained model processing the images will be low.

本發明實施例至少提供一種顯著性檢測方法及其模型的訓練方法、設備及電腦可讀儲存介質。 Embodiments of the present invention provide at least a saliency detection method and a model training method, device, and computer-readable storage medium.

本發明實施例提供了一種顯著性檢測模型的訓練方法，包括：獲取至少一張樣本圖像，其中，至少一張樣本圖像包括屬於預設圖像類型的目標樣本圖像；基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾；利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊；基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。 Embodiments of the present invention provide training of a saliency detection model The method includes: acquiring at least one sample image, wherein the at least one sample image includes a target sample image belonging to a preset image type; Filter the image; use the saliency detection model to detect the filtered sample image to obtain the predicted position information about the saliency area in the sample image; based on the labeled position information and predicted position of the saliency area in the sample image information, adjust the parameters of the saliency detection model.

因此，通過對獲取到的預設圖像類型的目標樣本圖像進行按照其顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，使得保留下的樣本圖像中顯著性區域較為完整，進而利用這種保留下的品質較高的樣本圖像對顯著性檢測模型進行訓練，可以使得訓練得到的顯著性檢測模型後續對圖像進行檢測的結果更準確。 Therefore, the target sample image is filtered according to the missing outline of the salient region of the obtained target sample image of the preset image type, so that the salient region in the retained sample image is relatively complete, Further, the saliency detection model is trained by using the retained sample images with higher quality, which can make the result of subsequent image detection by the trained saliency detection model more accurate.

在一些實施例中，基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，包括：對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異；在差異滿足預設要求的情況下，過濾目標樣本圖像。 In some embodiments, filtering the target sample image based on the missing contour of the salient region in the target sample image includes: filling the contour of the salient region in the target sample image to obtain a filled sample image; Obtain the difference between the saliency area in the filled sample image and the target sample image; filter the target sample image if the difference meets the preset requirements.

因此，通過對樣本圖像按照輪廓缺失的情況進行過濾，使得留下的樣本圖像中顯著性區域輪廓的品質更好。另外，通過獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異能夠較快的獲取顯著性區域的輪廓缺失情況。 Therefore, by filtering the sample images according to the situation of missing contours, the quality of the contours of the salient regions in the remaining sample images is better. In addition, the missing contour of the salient region can be quickly acquired by acquiring the difference between the saliency region in the fill-in sample image and the target sample image.

在一些實施例中，預設要求為差異大於預設差異值；對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像，包括：對目標樣本圖像進行閉運算，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異，包括：獲取填補樣本圖像關於顯著性區域的第一面積，以及目標樣本圖像中關於顯著性區域的第二面積；將第一面積和第二面積之差，確定為差異。 In some embodiments, the preset requirement is that the difference is greater than the preset difference value; filling the contours of the salient regions in the target sample image to obtain the filled sample image includes: performing a closing operation on the target sample image to obtain the filled sample image Sample image; obtain the difference between the saliency area in the filled sample image and the target sample image, including: obtaining the first area of the saliency area in the filled sample image, and the first area of the saliency area in the target sample image. Two areas; the difference between the first area and the second area is determined as the difference.

因此，若目標樣本圖像中的顯著性區域的輪廓存在較大的缺口，則填補前後的顯著性區域的面積可能存在較大的差異，從而根據填補前後顯著性區域的面積差，即可確定目標樣本圖像中顯著性區域的輪廓是否存在缺失。 Therefore, if there is a large gap in the contour of the salient region in the target sample image, there may be a large difference in the area of the salient region before and after filling. Whether the contours of salient regions in the target sample image are missing.

在一些實施例中，在基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾之後，方法還包括：基於填補樣本圖像的顯著性區域的位置資訊，得到目標樣本圖像關於顯著性區域的標注位置資訊。 In some embodiments, after filtering the target sample image based on the missing contour of the salient region in the target sample image, the method further includes: obtaining the target sample based on the position information of the salient region filling in the sample image Annotated location information about salient regions in the image.

因此，通過填補樣本圖像的顯著性區域的位置資訊，確定目標樣本圖像關於顯著性區域的標注位置資訊，能夠保障顯著性區域的完整性。 Therefore, by filling in the position information of the salient region of the sample image, and determining the marked position information of the target sample image about the salient region, the integrity of the salient region can be guaranteed.

在一些實施例中，至少一張樣本圖像包括多種圖像類型。 In some embodiments, the at least one sample image includes multiple image types.

因此，通過使用多種圖像類型的樣本圖像對顯著性檢測模型進行訓練，使得訓練得到的顯著性檢測模型能夠對多種類型的圖像進行圖像處理，從而提高了顯著性檢測模型的適用性。 Therefore, the saliency detection model is trained by using sample images of various image types, so that the trained saliency detection model can perform image processing on various types of images, thereby improving the applicability of the saliency detection model. .

在一些實施例中，多種圖像類型包括對真實物體拍攝得到的圖像、手繪圖以及卡通圖中的至少兩種。 In some embodiments, the plurality of image types include at least two of images captured on real objects, hand-drawn drawings, and cartoon drawings.

因此，通過將常見的圖像類型對應的樣本圖像用於對圖像處理模型進行訓練，使得訓練得到的圖像處理模型在日常生活或工作中更為適用。 Therefore, by using the sample images corresponding to common image types for training the image processing model, the image processing model obtained by training is more suitable for daily life or work.

在一些實施例中，基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數，包括：基於樣本圖像關於顯著性區域的標注位置資訊和預測位置資訊，獲取樣本圖像中各圖元的第一損失；將樣本圖像中各圖元的第一損失進行加權，得到樣本圖像的第二損失；基於第二損失，調整顯著性檢測模型的參數。 In some embodiments, adjusting the parameters of the saliency detection model based on the labeled location information and predicted location information about the saliency region of the sample image, including: based on the labeled location information and predicted location information about the saliency region of the sample image, Obtain the first loss of each image element in the sample image; weight the first loss of each image element in the sample image to obtain the second loss of the sample image; adjust the parameters of the saliency detection model based on the second loss.

因此，通過對各圖元的第一損失進行加權，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 Therefore, by weighting the first loss of each primitive, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

在一些實施例中，圖元的第一損失的權重與圖元的邊界距離相關，圖元的邊界距離為圖元與真實顯著性區域的邊界之間的距離，真實顯著性區域為樣本圖像中由標注位置資訊定義的顯著性區域。 In some embodiments, the weight of the first loss of the primitive is related to the boundary distance of the primitive, the boundary distance of the primitive is the distance between the primitive and the boundary of the real saliency area, and the real saliency area is the sample image The saliency area defined by the label position information in the .

因此，通過根據圖元的邊界距離確定權重，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 Therefore, by determining the weight according to the boundary distance of the primitives, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

在一些實施例中，圖元的邊界距離越小，圖元的第一損失的權重越大。 In some embodiments, the smaller the boundary distance of the primitive, the greater the weight of the primitive's first loss.

因此，圖元的邊界距離與圖元的第一損失的權重呈負相關，使得得到的第二損失更準確。 Therefore, the boundary distance of the primitive is proportional to the weight of the first loss of the primitive Negative correlation, making the resulting second loss more accurate.

在一些實施例中，顯著性檢測模型至少包括以下至少之一：顯著性檢測模型為MobileNetV3的網路結構、顯著性檢測模型包括特徵提取子網路和第一檢測子網路和第二檢測子網路；利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊，包括：利用特徵提取子網路對樣本圖像進行特徵提取，得到樣本圖像對應的特徵圖；利用第一檢測子網路對特徵圖進行初始檢測，得到樣本圖像中關於顯著性區域的初始位置資訊；將特徵圖和初始位置資訊進行融合，得到融合結果；利用第二檢測子網路對融合結果進行最終檢測，得到樣本圖像的預測位置資訊。 In some embodiments, the saliency detection model includes at least one of the following: the saliency detection model is the network structure of MobileNetV3, and the saliency detection model includes a feature extraction sub-network, a first detection sub-network and a second detection sub-network network; use the saliency detection model to detect the filtered sample images, and obtain the predicted location information about the salient regions in the sample images, including: using the feature extraction sub-network to perform feature extraction on the sample images to obtain The feature map corresponding to the sample image; use the first detection sub-network to perform initial detection on the feature map to obtain the initial position information about the salient region in the sample image; fuse the feature map and the initial position information to obtain a fusion result; The fusion result is finally detected by the second detection sub-network, and the predicted position information of the sample image is obtained.

因此，因MobileNetV3的網路結構簡單，通過使用MobileNetV3的網路結構，能夠加快檢測效率，而且可以使得處理能力較小的設備也可使用該顯著性檢測模型實現顯著性檢測；另，通過第一檢測子網路對特徵圖進行初始檢測之後，再使用第二檢測子網路對初始檢測結果進行最終檢測，能夠提高檢測的準確度。 Therefore, due to the simple network structure of MobileNetV3, by using the network structure of MobileNetV3, the detection efficiency can be accelerated, and devices with smaller processing capabilities can also use the saliency detection model to achieve saliency detection; After the detection sub-network performs initial detection on the feature map, the second detection sub-network is used to perform final detection on the initial detection result, which can improve the detection accuracy.

在一些實施例中，在利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊之前，方法還包括：對經過濾後的樣本圖像進行資料增強；其中，資料增強的方式包括對樣本圖像中除顯著性區域以外的背景區域進行填充。 In some embodiments, before using the saliency detection model to detect the filtered sample image to obtain the predicted position information about the saliency region in the sample image, the method further includes: Carry out data enhancement; wherein, the data enhancement method includes filling the background area except the salient area in the sample image.

因此，通過對樣本圖像進行資料增強，能夠提高顯著性檢測模型的適用性。 Therefore, by enhancing the data of the sample image, the display can be improved. Applicability of the relevance detection model.

本發明實施例提供了一種顯著性檢測方法，包括：獲取待處理圖像；利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊，其中，顯著性檢測模型是由上述顯著性檢測模型的訓練方法訓練得到的。 An embodiment of the present invention provides a saliency detection method, including: acquiring an image to be processed; processing the image to be processed by using a saliency detection model to obtain predicted location information about a saliency area in the content of the image to be processed, wherein, The saliency detection model is obtained by training the above-mentioned saliency detection model training method.

因此，通過使用顯著性檢測模型的訓練方法訓練得到的顯著性檢測模型對待處理圖像進行檢測，能夠提高得到關於顯著性區域的預測位置資訊的準確度。 Therefore, by using the saliency detection model trained by the saliency detection model training method to detect the to-be-processed image, the accuracy of obtaining the predicted position information about the saliency region can be improved.

在一些實施例中，在利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊之後，方法還包括：利用預測位置資訊，對顯著性區域進行骨骼提取，得到目標骨骼；為目標骨骼選擇一骨骼模型作為源骨骼；將與源骨骼相關的第一動畫驅動資料移轉至目標骨骼上，得到目標骨骼的第二動畫驅動資料。 In some embodiments, after using the saliency detection model to process the image to be processed to obtain predicted location information about the saliency region in the content of the image to be processed, the method further includes: using the predicted location information, perform a Bone extraction is performed to obtain the target bone; a bone model is selected for the target bone as the source bone; the first animation driving data related to the source bone is transferred to the target bone to obtain the second animation driving data of the target bone.

因此，通過利用預測位置資訊，對顯著性區域進行骨胳提取，能夠提高目標骨骼的準確度。 Therefore, by using the predicted position information to perform skeleton extraction on the salient region, the accuracy of the target skeleton can be improved.

本發明實施例提供了一種顯著性檢測模型的訓練裝置，包括：第一獲取模組，配置為獲取至少一張樣本圖像，其中，至少一張樣本圖像包括屬於預設圖像類型的目標樣本圖像；篩選模組，配置為基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾；第一檢測模組，配置為利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊；調整模組，配置為基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。 An embodiment of the present invention provides a training device for a saliency detection model, including: a first acquisition module configured to acquire at least one sample image, wherein the at least one sample image includes a target belonging to a preset image type The sample image; the screening module is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image; the first detection module is configured to use the saliency detection model to filter the filtered sample The image is detected to obtain the predicted position information about the saliency area in the sample image; the adjustment module is configured to adjust the parameters of the saliency detection model based on the labeled position information and predicted position information of the saliency area in the sample image.

在一些實施例中，篩選模組配置為基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，包括：對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異；在差異滿足預設要求的情況下，過濾目標樣本圖像。 In some embodiments, the screening module is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image, including: filling the contour of the salient region in the target sample image to obtain Fill in the sample image; obtain the difference between the saliency area in the filled sample image and the target sample image; filter the target sample image if the difference meets the preset requirements.

在一些實施例中，預設要求為差異大於預設差異值；篩選模組配置為對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像，包括：對目標樣本圖像進行閉運算，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異，包括：獲取填補樣本圖像關於顯著性區域的第一面積，以及目標樣本圖像中關於顯著性區域的第二面積；將第一面積和第二面積之差作為差異。 In some embodiments, the preset requirement is that the difference is greater than the preset difference value; the screening module is configured to fill in the contours of the salient regions in the target sample image to obtain the filled sample image, including: performing a step on the target sample image. Closing operation to obtain the filled sample image; obtaining the difference between the saliency region in the filled sample image and the target sample image, including: obtaining the first area of the salient region in the filled sample image, and about the salient region in the target sample image The second area of the significant region; the difference between the first area and the second area is taken as the difference.

在一些實施例中，在基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾之後，篩選模組還配置為：基於填補樣本圖像的顯著性區域的位置資訊，得到目標樣本圖像關於顯著性區域的標注位置資訊。 In some embodiments, after filtering the target sample image based on the absence of the contour of the salient region in the target sample image, the filtering module is further configured to: fill in the position information of the salient region of the sample image, Obtain the annotation location information about the saliency region of the target sample image.

在一些實施例中，調整模組配置為基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數，包括：基於標注位置資訊和預測位置資訊，獲取樣本圖像中各圖元的第一損失；將樣本圖像中各圖元的第一損失進行加權，得到樣本圖像的第二損失；基於第二損失，調整顯著性檢測模型的參數。 In some embodiments, the adjustment module is configured to adjust the parameters of the saliency detection model based on the labeled location information and predicted location information of the saliency region of the sample image, including: obtaining a sample image based on the labeled location information and predicted location information The first loss of each image element in the image; the first loss of each image element in the sample image is weighted to obtain the second loss of the sample image; based on the second loss, the parameters of the saliency detection model are adjusted.

在一些實施例中，顯著性檢測模型至少包括以下至少一個：顯著性檢測模型為MobileNetV3的網路結構、顯著性檢測模型包括特徵提取子網路和第一檢測子網路和第二檢測子網路；第一檢測模組配置為利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊，包括：利用特徵提取子網路對樣本圖像進行特徵提取，得到樣本圖像對應的特徵圖；利用第一檢測子網路對特徵圖進行初始檢測，得到樣本圖像中關於顯著性區域的初始位置資訊；將特徵圖和初始位置資訊進行融合，得到融合結果；利用第二檢測子網路對融合結果進行最終檢測，得到樣本圖像的預測位置資訊。 In some embodiments, the saliency detection model includes at least one of the following: the saliency detection model is the network structure of MobileNetV3, and the saliency detection model includes a feature extraction sub-network, a first detection sub-network and a second detection sub-network The first detection module is configured to use the saliency detection model to detect the filtered sample image, and obtain the predicted position information about the saliency area in the sample image, including: using the feature extraction sub-network to detect the sample image Perform feature extraction on the sample image to obtain the feature map corresponding to the sample image; use the first detection sub-network to perform initial detection on the feature map to obtain the initial position information about the salient region in the sample image; Fusion to get the fusion result; use the second detection subnet to The fusion results are used for final detection, and the predicted position information of the sample image is obtained.

在一些實施例中，第一檢測模組配置為在利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊之前，篩選模組還配置為：對經過濾後的樣本圖像進行資料增強；其中，資料增強的方式包括對樣本圖像中除顯著性區域以外的背景區域進行填充。 In some embodiments, the first detection module is configured to detect the filtered sample image by using the saliency detection model to obtain the predicted position information about the saliency region in the sample image, the screening module is further configured to The steps are: performing data enhancement on the filtered sample image; wherein, the data enhancement method includes filling in the background area other than the salient area in the sample image.

本發明實施例提供了一種顯著性檢測裝置，包括：第二獲取模組，配置為獲取待處理圖像；第二檢測模組，配置為利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊，其中，顯著性檢測模型是由上述顯著性檢測模型的訓練方法訓練得到的。 An embodiment of the present invention provides a saliency detection device, comprising: a second acquisition module, configured to acquire an image to be processed; a second detection module, configured to process the image to be processed by using a saliency detection model, and obtain the to-be-processed image The predicted location information about the saliency region in the image content is processed, wherein the saliency detection model is obtained by training the above-mentioned saliency detection model training method.

在一些實施例中，在利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊之後，顯著性檢測裝置還包括功能模組，功能模組配置為：利用預測位置資訊，對顯著性區域進行骨骼提取，得到目標骨骼；為目標骨骼選擇一骨骼模型作為源骨骼；將與源骨骼相關的第一動畫驅動資料移轉至目標骨骼上，得到目標骨骼的第二動畫驅動資料。 In some embodiments, after using the saliency detection model to process the image to be processed to obtain the predicted location information about the saliency region in the content of the image to be processed, the saliency detection device further includes a function module, and the function module is configured It is: using the predicted position information to extract the salient area to obtain the target bone; select a bone model for the target bone as the source bone; transfer the first animation driving data related to the source bone to the target bone to obtain the target bone A second animation driver profile for the bone.

本發明實施例提供了一種電子設備，包括記憶體和處理器，處理器用於執行記憶體中儲存的程式指令，以實現上述顯著性檢測模型的訓練方法和/或顯著性檢測方法。 An embodiment of the present invention provides an electronic device including a memory and a processor, where the processor is configured to execute program instructions stored in the memory to implement the above-mentioned training method and/or saliency detection method of a saliency detection model.

本發明實施例提供了一種電腦可讀儲存介質，其上儲存有程式指令，程式指令被處理器執行時實現上述顯著性檢測模型的訓練方法和/或顯著性檢測方法。 Embodiments of the present invention provide a computer-readable storage medium on which program instructions are stored, and when the program instructions are executed by a processor, the above-mentioned training method and/or saliency detection method for a saliency detection model are implemented.

本發明實施例還提供一種電腦程式，所述電腦程式包括電腦可讀代碼，在所述電腦可讀代碼在電子設備中運行的情況下，所述電子設備的處理器執行上述任一實施例所述的顯著性檢測模型的訓練方法和/或顯著性檢測方法。 An embodiment of the present invention further provides a computer program, where the computer program includes computer-readable code, and when the computer-readable code is executed in an electronic device, the processor of the electronic device executes any of the above-mentioned embodiments. The training method and/or the saliency detection method of the saliency detection model described above.

本發明實施例至少提供一種顯著性檢測方法及其模型的訓練方法、設備及電腦可讀儲存介質，通過對獲取到的預設圖像類型的目標樣本圖像進行按照其顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，使得保留下的樣本圖像中顯著性區域較為完整，進而利用這種保留下的品質較高的樣本圖像對顯著性檢測模型進行訓練，可以使得訓練得到的顯著性檢測模型後續對圖像進行檢測的結果更準確。 The embodiments of the present invention provide at least a saliency detection method and a training method, device and computer-readable storage medium for a saliency detection method. In this case, filter the target sample images so that the salient regions in the retained sample images are relatively complete, and then use the retained higher-quality sample images to train the saliency detection model, so that the training results can be obtained. The saliency detection model of the subsequent image detection results is more accurate.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。 It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

為使本發明的上述目的、特徵和優點能更明顯易懂，下文特舉較佳實施例，並配合所附附圖，作詳細說明如下。 In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, preferred embodiments are given below, and are described in detail as follows in conjunction with the accompanying drawings.

201:樣本圖像獲取終端 201: Sample image acquisition terminal

202:網路 202: Internet

203:控制終端 203: Control Terminal

30:顯著性檢測模型的訓練裝置 30: Training device for saliency detection model

31:第一獲取模組 31: The first acquisition module

32:篩選模組 32: Screening module

33:第一檢測模組 33: The first detection module

34:調整模組 34: Adjustment Mods

40:顯著性檢測裝置 40: Significant detection device

41:第二獲取模組 41: Second acquisition module

42:第二檢測模組 42: The second detection module

50:電子設備 50: Electronics

51:記憶體 51: Memory

53:處理器 53: Processor

60:電腦可讀儲存介質 60: Computer-readable storage medium

61:程式指令 61: Program command

S11~S14,S21~S22:步驟 S11~S14, S21~S22: Steps

此處的附圖被併入說明書中並構成本說明書的一部分，這些附圖示出了符合本發明的實施例，並與說明書一起用於說明本發明的技術方案。 The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the present invention.

圖1是本發明實施例的顯著性檢測模型的訓練方法一實施例的流程示意圖；圖2為可以應用本發明實施例的顯著性檢測模型的訓練方法的系統架構示意圖；圖3是本發明顯著性檢測模型的訓練方法一實施例中示出對目標拍攝得到的圖像的示意圖；圖4是本發明顯著性檢測模型的訓練方法一實施例中示出的手繪圖的示意圖；圖5是本發明顯著性檢測模型的訓練方法一實施例中示出的卡通圖的示意圖；圖6是本發明顯著性檢測模型的訓練方法一實施例中示出顯著性區域存在缺失的手繪圖的示意圖；圖7是本發明顯著性檢測模型的訓練方法一實施例中示出填補後的手繪圖的示意圖；圖8是本發明顯著性檢測模型的訓練方法一實施例示出樣本圖像的示意圖；圖9是本發明顯著性檢測模型的訓練方法一實施例示出顯著圖的示意圖；圖10是本發明顯著性檢測方法一實施例的流程示意圖；圖11是本發明顯著性檢測方法一實施例示出映射關係的第一示意圖；圖12是本發明顯著性檢測方法一實施例示出映射關係的第二示意圖；圖13是本發明顯著性檢測方法一實施例示出映射關係的第三示意圖；圖14是本發明顯著性檢測模型的訓練裝置一實施例的結構示意圖；圖15是本發明顯著性檢測裝置一實施例的結構示意圖；圖16是本發明電子設備一實施例的結構示意圖；圖17是本發明電腦可讀儲存介質一實施例的結構示意圖。 1 is a schematic flowchart of an embodiment of a training method for a saliency detection model according to an embodiment of the present invention; FIG. 2 is a schematic diagram of a system architecture that can apply a training method for a saliency detection model according to an embodiment of the present invention; Figure 4 is a schematic diagram of a hand-drawn drawing shown in an embodiment of a training method for a saliency detection model of the present invention; A schematic diagram of a cartoon diagram shown in an embodiment of a training method for a saliency detection model of the present invention; FIG. 6 is a schematic diagram of a hand-drawn drawing showing that the saliency area is missing in an embodiment of the training method for a saliency detection model of the present invention; Figure 6 7 is a schematic diagram showing the filled hand drawing in an embodiment of the training method of the saliency detection model of the present invention; FIG. 8 is a schematic diagram showing a sample image according to an embodiment of the training method of the saliency detection model of the present invention; FIG. 9 is a An embodiment of the training method for a saliency detection model of the present invention shows a schematic diagram of a saliency map; FIG. 10 is a schematic flowchart of an embodiment of the saliency detection method of the present invention; FIG. 11 is an embodiment of the saliency detection method of the present invention showing a mapping relationship Fig. 12 is the second schematic diagram showing the mapping relationship according to an embodiment of the distinctiveness detection method of the present invention; Fig. 13 is the third schematic diagram showing the mapping relationship according to an embodiment of the distinctiveness detection method of the present invention; Fig. 14 is the present invention A schematic structural diagram of an embodiment of a training device for a saliency detection model; Fig. 15 is a schematic structural diagram of an embodiment of a saliency detection device of the present invention; Fig. 16 is a structural schematic diagram of an embodiment of an electronic device of the present invention; Fig. 17 is a computer of the present invention A schematic structural diagram of an embodiment of a readable storage medium.

下面結合說明書附圖，對本發明實施例的方案進行詳細說明。 The solutions of the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

以下描述中，為了說明而不是為了限定，提出了諸如特定系統結構、介面、技術之類的具體細節，以便透徹理解本發明。 In the following description, for purposes of illustration and not limitation, specific details such as specific system structures, interfaces, techniques, etc. are set forth in order to provide a thorough understanding of the present invention.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中字元“/”，一般表示前後關聯對象是一種“或”的關係。此外，本文中的“多”表示兩個或者多於兩個。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。 The term "and/or" in this article is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the character "/" in this text generally indicates that the related objects are an "or" relationship. Furthermore, "multiple" as used herein means two or more than two. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

本發明可應用於具備圖像處理能力的設備。此外，該設備可以具備圖像採集或是視頻採集功能，比如，該設備可以包括諸如攝影頭等用於採集圖像或是視頻的部件。或是該設備可以通過與其他設備進行資料傳輸或是資料交互的方式，以從其他設備中獲取所需的視頻流或是圖像，或是從其他設備的儲存資源中存取所需的視頻流或是圖像等。其中，其他設備具備圖像採集或是視頻採集功能，且與該設備之間具備通信連接，比如，該設備可以與其他設備之間通過藍牙、無線網路等方式進行資料傳輸或是資料交互，在此對於二者之間的通信方式不予限定，可以包括但不限於上述例舉的情況。在一種實現方式中，該設備可以包括手機、平板電腦、可交互螢幕等，在此不予限定。 The present invention can be applied to a device with image processing capability. In addition, the device may have an image capture or video capture function, for example, the device may include components such as a camera for capturing images or videos. Or the device can obtain the required video stream or image from other devices by data transmission or data interaction with other devices, or access the required video from the storage resources of other devices stream or image, etc. Among them, other devices have image capture or video capture functions, and have a communication connection with the device. For example, the device can perform data transmission or data interaction with other devices through Bluetooth, wireless network, etc., The communication mode between the two is not limited here, and may include but not be limited to the above-mentioned cases. In an implementation manner, the device may include a mobile phone, a tablet computer, an interactive screen, etc., which are not limited herein.

請參閱圖1，圖1是本發明實施例的顯著性檢測模型的訓練方法一實施例的流程示意圖。所述顯著性檢測模型的訓練方法可以包括如下步驟。 Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an embodiment of a training method for a saliency detection model according to an embodiment of the present invention. The training method of the saliency detection model may include the following steps.

步驟S11：獲取至少一張樣本圖像，其中，至少一張樣本圖像包括屬於預設圖像類型的目標樣本圖像。 Step S11: Acquire at least one sample image, wherein the at least one sample image includes a target sample image belonging to a preset image type.

至少一張可以是一張及以上。獲取樣本圖像的方式有多種。例如，獲取樣本圖像在執行本訓練方法的執行設備中的儲存位置，然後通過存取該儲存位置以獲得樣本圖像，或者通過藍牙、無線網路等傳輸方式從其他設備中獲取樣本圖像。 At least one can be one or more. There are several ways to obtain sample images. For example, obtain the storage location of the sample image in the execution device that executes the training method, and then obtain the sample image by accessing the storage location image, or obtain sample images from other devices through transmission methods such as Bluetooth, wireless network, etc.

步驟S12：基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾。 Step S12: Filtering the target sample image based on the lack of contours of the salient region in the target sample image.

其中，如果目標樣本圖像中顯著性區域的輪廓缺失的情況滿足刪除條件，則將該目標樣本圖像從樣本圖像中刪除。目標樣本圖像中顯著性區域的輪廓缺失的情況不滿足刪除條件，則將該目標樣本圖像保留在樣本圖像中。其中，輪廓缺失較為嚴重，則進行刪除，若較為輕微，則保留。其中，嚴重或輕微的認定，可根據具體情況認定，此處不做具體規定。 Wherein, if the absence of the outline of the salient region in the target sample image satisfies the deletion condition, the target sample image is deleted from the sample image. If the outline of the salient region in the target sample image is missing, the deletion condition is not satisfied, then the target sample image is kept in the sample image. Among them, if the contour loss is more serious, it will be deleted, and if it is relatively slight, it will be retained. Among them, serious or minor determinations can be determined according to specific circumstances, and no specific provisions are made here.

步驟S13：利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊。 Step S13: Detect the filtered sample image by using the saliency detection model to obtain predicted position information about the saliency area in the sample image.

其中，顯著性檢測模型可以同時對各樣本圖像進行處理，得到一個批次的預測結果，也可以分時對各樣本圖像進行處理，分別得到各樣本圖像對應的預測結果。 The saliency detection model can process each sample image at the same time to obtain a batch of prediction results, or it can process each sample image in a time-sharing manner to obtain prediction results corresponding to each sample image respectively.

步驟S14：基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。 Step S14: Adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the saliency region of the sample image.

其中，可以根據顯著性區域的標注位置資訊與預測位置資訊之間的損失，調整顯著性檢測模型的參數。 The parameters of the saliency detection model can be adjusted according to the loss between the marked position information of the saliency region and the predicted position information.

上述方案，通過對獲取到的預設圖像類型的目標樣本圖像進行按照其顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，使得保留下的樣本圖像中顯著性區域較為完整，進而利用這種保留下的品質較高的樣本圖像對顯著性檢測模型進行訓練，可以使得訓練得到的顯著性檢測模型後續對圖像進行檢測的結果更準確。 The above scheme filters the target sample images according to the missing contours of the salient regions of the acquired target sample images of the preset image type, so that the salient regions in the retained sample images are relatively small. In order to be complete, the saliency detection model is trained by using the retained sample images of higher quality, so that the result of subsequent image detection by the trained saliency detection model can be more accurate.

圖2為可以應用本發明實施例的顯著性檢測模型的訓練方法的系統架構示意圖；如圖2所示，該系統架構中包括：樣本圖像獲取終端201、網路202和控制終端203。為實現支撐一個示例性應用，樣本圖像獲取終端201和控制終端203通過網路202建立通信連接樣本圖像獲取終端201通過網路202向控制終端203上報至少一張樣本圖像，控制終端203回應至少一張樣本圖像中的目標樣圖像，並基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，再利用顯著性檢測模型對經過濾後的所述樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊；最後基於樣本圖像關於所述顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。最後，控制終端203將調整後的參數上傳至網路202，並通過網路202發送給樣本圖像獲取終端201。 FIG. 2 is a schematic diagram of a system architecture to which a training method of a saliency detection model according to an embodiment of the present invention can be applied; as shown in FIG. In order to support an exemplary application, the sample image acquisition terminal 201 and the control terminal 203 establish a communication connection through the network 202. The sample image acquisition terminal 201 reports at least one sample image to the control terminal 203 through the network 202, and the control terminal 203 Respond to the target sample image in at least one sample image, and filter the target sample image based on the missing contour of the salient region in the target sample image, and then use the saliency detection model to filter all the filtered images. The sample image is detected, and the predicted position information about the salient region in the sample image is obtained; finally, the parameters of the saliency detection model are adjusted based on the marked position information and predicted position information of the sample image about the salient region. Finally, the control terminal 203 uploads the adjusted parameters to the network 202 , and sends the adjusted parameters to the sample image acquisition terminal 201 through the network 202 .

作為示例，樣本圖像獲取終端201可以包括圖像採集設備，控制終端203可以包括具有視覺資訊處理能力的視覺處理設備或遠端伺服器。網路202可以採用有線或無線連接方式。其中，當控制終端203為視覺處理設備時，樣本圖像獲取終端201可以通過有線連接的方式與視覺處理設備通信連接，例如通過匯流排進行資料通信；當控制終端203為遠端伺服器時，樣本圖像獲取終端201可以通過無線網路與遠端伺服器進行資料交互。 As an example, the sample image acquisition terminal 201 may include an image acquisition device, and the control terminal 203 may include a visual processing device or a remote server with visual information processing capability. Network 202 may employ wired or wireless connections. Wherein, when the control terminal 203 is a visual processing device, the sample image acquisition terminal 201 can be connected to the visual processing device through a wired connection, such as data communication through a bus; When the control terminal 203 is a remote server, the sample image acquisition terminal 201 can exchange data with the remote server through a wireless network.

或者，在一些場景中，樣本圖像獲取終端201可以是帶有視頻採集模組的視覺處理設備，可以是帶有攝影頭的主機。這時，本發明實施例的圖像優化模型的訓練方法可以由樣本圖像獲取終端201執行，上述系統架構可以不包含網路202和控制終端203。 Or, in some scenarios, the sample image acquisition terminal 201 may be a visual processing device with a video acquisition module, or a host with a camera. At this time, the training method of the image optimization model according to the embodiment of the present invention may be executed by the sample image acquisition terminal 201 , and the above-mentioned system architecture may not include the network 202 and the control terminal 203 .

一些公開實施例中，至少一張樣本圖像包括多種圖像類型。例如，包括兩種、三種或三種以上等等。通過使用多種圖像類型的樣本圖像對顯著性檢測模型進行訓練，使得訓練得到的顯著性檢測模型能夠對多種類型的圖像進行圖像處理，從而提高了顯著性檢測模型的適用性。可選地，圖像類型包括對目標拍攝得到的圖像、手繪圖以及卡通圖中的至少兩種。對目標拍攝得到的圖像又可分為可見光圖像以及紅外圖像等。手繪圖可以是在紙上手繪的圖，並對其拍攝得到手繪圖，還可以是在繪圖軟體上繪製的圖，例如，畫師在手繪板上畫製的簡易米老鼠。本發明實施例中，手繪圖進一步限定為預設背景顏色以及預設前景顏色的圖，以及前景是由單色的線條構成，例如，背景為白色，前景是由黑色線條構成的米老鼠的輪廓。卡通圖可以是具備多種前景顏色的虛擬圖像。 In some disclosed embodiments, the at least one sample image includes multiple image types. For example, two, three or more, etc. are included. By using sample images of various image types to train the saliency detection model, the trained saliency detection model can perform image processing on various types of images, thereby improving the applicability of the saliency detection model. Optionally, the image type includes at least two of an image obtained by photographing the target, a hand-drawn drawing, and a cartoon image. The images captured by the target can be divided into visible light images and infrared images. A hand drawing can be a drawing drawn by hand on paper and photographed to obtain a hand drawing, or it can be a drawing drawn on a drawing software, for example, a simple Mickey Mouse drawn by an artist on a hand-painted board. In this embodiment of the present invention, the hand-drawn drawing is further limited to a drawing of a preset background color and a preset foreground color, and the foreground is composed of monochrome lines, for example, the background is white, and the foreground is the outline of Mickey Mouse composed of black lines. . A cartoon image can be a virtual image with multiple foreground colors.

為更好地理解本發明實施例所述的對目標拍攝得到的圖像、手繪圖以及卡通圖，請同時參考圖3至圖5，圖3是本發明顯著性檢測模型的訓練方法一實施例中示出對目標拍攝得到的圖像的示意圖，圖4是本發明顯著性檢測模型的訓練方法一實施例中示出的手繪圖的示意圖，圖5是本發明顯著性檢測模型的訓練方法一實施例中示出的卡通圖的示意圖。如圖3所示，圖3是對真實存在的蘋果拍攝得到的圖像，圖4是在真實的紙上繪製的蘋果草圖，圖5是蘋果的卡通形象。通過將常見的圖像類型對應的樣本圖像用於對顯著性檢測模型進行訓練，使得訓練得到的顯著性檢測模型在日常生活或工作中更為適用。本發明實施例中，選擇使用一萬張上下的對目標拍攝得到的圖像、兩萬張上下的手繪圖以及兩萬張上下的卡通圖進行訓練。 In order to better understand the images, hand-drawn drawings and cartoon drawings obtained by shooting the target according to the embodiment of the present invention, please refer to FIGS. 3 to 5 at the same time, and FIG. 3 is an embodiment of the training method of the saliency detection model of the present invention. shown in A schematic diagram of an image obtained by shooting a target, FIG. 4 is a schematic diagram of a hand drawing shown in an embodiment of a training method for a saliency detection model of the present invention, and FIG. 5 is a training method for a saliency detection model according to an embodiment of the present invention. Schematic representation of the cartoon figure shown. As shown in Fig. 3, Fig. 3 is an image obtained by photographing a real apple, Fig. 4 is a sketch of an apple drawn on real paper, and Fig. 5 is a cartoon image of an apple. By using the sample images corresponding to common image types to train the saliency detection model, the trained saliency detection model is more suitable for daily life or work. In the embodiment of the present invention, 10,000 upper and lower images of the target, 20,000 upper and lower hand drawings, and 20,000 upper and lower cartoon images are selected for training.

一些公開實施例中，預設圖像類型為手繪圖。由於手繪圖在繪製過程中很可能出現中斷點，通過對手繪圖按照輪廓缺失的情況進行過濾，使得留下的手繪圖中顯著性區域輪廓的品質更好。其中，基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾的方式可以是：對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像。然後，獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異。其中，若目標樣本圖像中顯著性區域的輪廓不存在缺失或缺失較小，則填補樣本圖像與填補前的目標樣本圖像中的顯著性區域相同或差異在預設範圍內。若目標樣本圖像中顯著性區域的輪廓存在較大缺失，則填補樣本圖像與填補前的目標樣本圖像中的顯著性區域之間的差異較大。在差異滿足預設要求的情況下，過濾目標樣本圖像。通過獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異能夠較快的獲取顯著性區域的輪廓缺失情況。其中，因為需要去掉樣本圖像中，顯著性區域存在缺陷的目標樣本圖像，所以，本發明實施例中，預設要求為該差異大於預設差異值。 In some disclosed embodiments, the preset image type is a hand drawing. Since the hand-drawn drawing is likely to have break points during the drawing process, the hand-drawn drawing is filtered according to the absence of contours, so that the quality of the contours of the salient regions in the left-hand drawing is better. Wherein, based on the missing contour of the salient region in the target sample image, the method of filtering the target sample image may be: filling the contour of the salient region in the target sample image to obtain the filled sample image. Then, the difference between the salient regions in the filled sample image and the target sample image is obtained. Wherein, if the outline of the salient region in the target sample image is not missing or the missing is small, the salient region in the filling sample image and the target sample image before filling is the same or the difference is within a preset range. If the contours of the salient regions in the target sample image are largely missing, the difference between the salient regions in the filled sample image and the target sample image before filling is large. In the case that the difference meets the preset requirements, Filter the target sample image. By obtaining the difference between the salient region in the filled sample image and the target sample image, the missing contour of the salient region can be quickly obtained. Wherein, since it is necessary to remove the target sample image with defects in the salient region in the sample image, in the embodiment of the present invention, the preset requirement is that the difference is greater than the preset difference value.

為更好地理解存在缺失的顯著性區域的手繪圖和填補之後的手繪圖之間的差異，請參考圖6和圖7，圖6是本發明顯著性檢測模型的訓練方法一實施例中示出顯著性區域存在缺失的手繪圖的示意圖，圖7是本發明顯著性檢測模型的訓練方法一實施例中示出填補後的手繪圖的示意圖。 In order to better understand the difference between the hand drawing of the missing saliency region and the hand drawing after filling, please refer to Fig. 6 and Fig. 7, Fig. 6 shows the training method of the saliency detection model of the present invention in one embodiment. Fig. 7 is a schematic diagram showing the filled-in hand-drawing in an embodiment of a training method for a saliency detection model of the present invention.

如圖6和圖7所示，填補前的手繪圖中顯著性區域的輪廓為圓弧，兩個端點與圓心的夾角為45°，顯著性區域的面積可以是將缺口用線段進行連接，得到小於整圓的面積，而填補後顯著性區域的輪廓為整圓。顯著性區域的面積即為整圓的面積。很明顯，填補後的顯著性區域的面積與填補前的顯著性區域的面積相差較大，此時，可以將填補前的手繪圖去除，不讓其參與模型的訓練。 As shown in Figures 6 and 7, the outline of the salient area in the hand-painted drawing before filling is an arc, the angle between the two endpoints and the center of the circle is 45°, and the area of the salient area can be connected by a line segment. The area smaller than the full circle is obtained, and the contour of the salient region after filling is the full circle. The area of the salient region is the area of the full circle. Obviously, the area of the saliency area after filling is quite different from the area of the saliency area before filling. At this time, the hand drawing before filling can be removed and not allowed to participate in the training of the model.

其中，對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像的方式可以是：對目標樣本圖像進行閉運算，得到填補樣本圖像。其中，閉運算指的是先對目標樣本圖像進行膨脹運算，再進行腐蝕運算或縮放運算。其中，閉運算能夠小湖(即小孔)，彌合小裂縫，而總的位置和形狀不變。通過膨脹運算能夠使得顯著性區域的輪廓缺口彌合，通過縮放運算能夠減少顯著性區域的輪廓的厚度。如上述，手繪圖可以是白底黑線條的形式，其中，手繪圖的顯著性區域為黑線條包圍的區域，而顯著性區域的輪廓即為黑色線條。對目標樣本圖像進行閉運算例如可以是對顯著性區域的輪廓進行閉運算。也就是先對黑色線條進行膨脹，再對膨脹之後的黑線條進行縮放或腐蝕，使得填補樣本圖像中顯著性區域的輪廓粗細與填補前目標樣本圖像中顯著性區域的輪廓粗細相同或差異在預設範圍內。通過此種方式，使得在獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異的過程中，可以忽略二者之間的輪廓差異。 The method of filling in the contour of the salient region in the target sample image to obtain the filling sample image may be: performing a closing operation on the target sample image to obtain the filling sample image. The closing operation refers to performing an expansion operation on the target sample image first, and then performing an erosion operation or a scaling operation. Among them, the closing operation can close small lakes (ie, small holes) and bridge small cracks, while the overall position and shape remain unchanged. The dilation operation can make the salient region round The contour gap is bridged, and the thickness of the contour in the salient region can be reduced by the scaling operation. As mentioned above, the hand drawing may be in the form of black lines on a white background, wherein the salient area of the hand drawing is the area surrounded by the black lines, and the outline of the salient area is the black line. The closing operation on the target sample image may be, for example, the closing operation on the contour of the salient region. That is, the black lines are expanded first, and then the expanded black lines are scaled or eroded, so that the contour thickness of the salient region in the filled sample image is the same or different from that of the salient region in the target sample image before filling. within the preset range. In this way, in the process of acquiring the difference between the saliency region in the filled sample image and the target sample image, the contour difference between the two can be ignored.

其中，獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異的方式可以是獲取填補樣本圖像關於顯著性區域的第一面積，以及目標樣本圖像中關於顯著性區域的第二面積。一般獲取區域面積的方式均可，此處不對獲取顯著性區域的面積的方式做具體限定。例如，獲取第二面積的方式可以是使用線段連接輪廓缺口兩端，形成封閉區域，從而計算封閉區域的面積，當然，還可以是以缺口兩端分別作為原點，分別畫橫向和縱向兩條直線，四條直線可能存在兩個的交點。分別計算每個交點連接的兩條直線與顯著性區域形成的封閉區域的面積，將較小的封閉區域的面積作為第二面積。將第一面積和第二面積之差作為差異。例如，將第二面積減去第一面積的差作為填補樣本圖像與目標樣本圖像中關於顯著性區域的差異。一些公開實施例中，可以將填補前後顯著性區域的輪廓所占面積之差作為差異。若目標樣本圖像中的顯著性區域的輪廓存在較大的缺口，則填補前後的顯著性區域的面積可能存在較大的差異，從而根據填補前後顯著性區域的面積差，即可確定目標樣本圖像中顯著性區域的輪廓是否存在缺失。 Wherein, the manner of obtaining the difference between the saliency region in the fill-in sample image and the target sample image may be to obtain a first area of the fill-in sample image with respect to the saliency region, and a second area with respect to the saliency region in the target sample image. area. Generally, any method for obtaining the area of the region is acceptable, and the method for obtaining the area of the salient region is not specifically limited here. For example, the way to obtain the second area can be to use a line segment to connect the two ends of the contour gap to form a closed area, so as to calculate the area of the closed area. Of course, you can also use the two ends of the gap as the origin, and draw two horizontal and vertical lines respectively. Straight lines, four straight lines may have two intersections. Calculate the area of the enclosed area formed by the two straight lines connected by each intersection point and the saliency area, and take the smaller enclosed area as the second area. Take the difference between the first area and the second area as the difference. For example, the difference between the second area minus the first area is used as the difference with respect to the saliency area in the filled-in sample image and the target sample image. some males In an embodiment, the difference between the areas occupied by the contours of the salient regions before and after filling can be used as the difference. If there is a large gap in the contour of the salient region in the target sample image, there may be a large difference in the area of the salient region before and after filling, so the target sample can be determined according to the difference in the area of the salient region before and after filling. Whether the contours of salient regions in the image are missing.

一些公開實施例中，對目標樣本圖像進行過濾之後，顯著性檢測模型的訓練方法還包括以下步驟：基於填補樣本圖像的顯著性區域的位置資訊，得到目標樣本圖像關於顯著性區域的標注位置資訊。例如，獲取填補樣本圖像的顯著性區域的輪廓，作為目標樣本圖像關於顯著性區域的輪廓的標注位置資訊。以及，將輪廓及其包圍的區域作為顯著性區域。通過填補樣本圖像的顯著性區域的位置資訊，確定目標樣本圖像關於顯著性區域的標注位置資訊，能夠保障顯著性區域的完整性。 In some disclosed embodiments, after the target sample image is filtered, the training method for the saliency detection model further includes the following steps: obtaining information about the saliency area of the target sample image based on the position information of the saliency area filled in the sample image. Label location information. For example, the contour of the salient region filling the sample image is obtained as the labeling position information about the contour of the salient region in the target sample image. And, take the contour and its surrounding area as the salient area. By filling in the position information of the salient region of the sample image, the marked position information of the salient region of the target sample image can be determined, which can ensure the integrity of the salient region.

一些公開實施例中，在利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊之前，顯著性檢測模型的訓練方法還包括以下步驟：對經過濾後的樣本圖像進行資料增強。其中，資料增強的方式有多種，例如包括對樣本圖像中除顯著性區域以外的背景區域進行填充。其中，可以使用預設圖元值進行填充。例如，統一使用0圖元進行填充，或統一使用其他圖元值進行填充。當然，不同的圖元位置還可以使用不同的圖元值進行填充，關於填充方式此處不做具體規定。一些公開實施例中，資料增強的方式還可以是增加雜訊、高斯模糊處理、裁剪以及旋轉中的至少一種。其中，高斯模糊處理又可稱之為高斯平滑，主要作用就是減少圖像雜訊以及降低細節層次，主要的做法是根據高斯曲線調節圖元色值，有選擇地模糊圖像。裁剪，指的是將訓練樣本圖像裁剪為不同大小的圖像，例如將訓練樣本圖像裁剪成尺寸為1024*2048或512*512大小的圖像，當然，這尺寸僅是舉例，在其他實施例中完全可以採取裁剪為其他尺寸的圖像，因此，關於裁剪的尺寸此處不做具體規定。旋轉可以是將訓練樣本圖像旋轉90°、180°或270°。當然，在其他實施例中，資料增強方式還可以是調整解析度等。通過對樣本圖像進行資料增強，能夠提高顯著性檢測模型的適用性。 In some disclosed embodiments, before using the saliency detection model to detect the filtered sample image to obtain the predicted position information about the saliency region in the sample image, the training method of the saliency detection model further includes the following steps: Data enhancement is performed on the filtered sample images. Among them, there are various ways of data enhancement, for example, including filling the background area except the salient area in the sample image. Among them, preset primitive values can be used for filling. For example, uniformly fill with 0 primitives, or uniformly fill with other primitive values. Of course, different primitive positions can also be filled with different primitive values. There are no specific rules for the filling method here. Certainly. In some disclosed embodiments, the data enhancement method may also be at least one of adding noise, Gaussian blurring, cropping, and rotation. Among them, Gaussian blur processing can also be called Gaussian smoothing. The main function is to reduce image noise and reduce the level of detail. The main method is to adjust the color value of the primitive according to the Gaussian curve, and selectively blur the image. Cropping refers to cropping the training sample images into images of different sizes, for example, cropping the training sample images into images with a size of 1024*2048 or 512*512. Of course, this size is only an example, in other In the embodiment, images that are cropped to other sizes may be adopted, and therefore, the cropped size is not specified here. The rotation can be to rotate the training sample image by 90°, 180° or 270°. Of course, in other embodiments, the data enhancement method may also be to adjust the resolution or the like. The applicability of the saliency detection model can be improved by data enhancement of the sample images.

一些公開實施例中，顯著性檢測模型為MobileNetV3的網路結構。其中，顯著性檢測模型包括特徵提取子網路和第一檢測子網路和第二檢測子網路。其中，第一檢測子網路和第二檢測子網路採用級聯結構。即，第一檢測子網路的輸出作為第二檢測子網路的輸入。在一些實施例中，第一檢測子網路和第二檢測子網路的結構相同。其中，利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊的方式可以是：利用特徵提取子網路對樣本圖像進行特徵提取，得到樣本圖像對應的特徵圖。然後在利用第一檢測子網路對特徵圖進行初始檢測，得到樣本圖像中關於顯著性區域的初始位置資訊。其中，初始位置資訊可以是以顯著圖的形式呈現。然後將特徵圖和初始位置資訊進行融合，得到融合結果。例如，融合的方式可以是將特徵圖與初始位置資訊做乘法操作，得到融合結果。再利用第二檢測子網路對融合結果進行最終檢測，得到樣本圖像的預測位置資訊。最終的預測位置資訊也可以顯著圖的形式呈現。為更好地理解顯著圖，請參見圖8和圖9，圖8是本發明顯著性檢測模型的訓練方法一實施例示出樣本圖像的示意圖，圖9是本發明顯著性檢測模型的訓練方法一實施例示出顯著圖的示意圖。如圖8和圖9所示，樣本圖像中包括一張桌子以及位於桌子上的玩具鴨，顯著性檢測模型對樣本圖像進行檢測，輸出的初始位置資訊(顯著圖)如圖9所示，玩具鴨所在位置的圖元值為1，其餘位置的圖元值為0。由此，可以清楚地得到玩具鴨在樣本圖像中的位置。因MobileNetV3的網路結構簡單，通過使用MobileNetV3的網路結構，能夠加快檢測效率，而且可以使得處理能力較小的設備也可使用該顯著性檢測模型實現顯著性檢測；另，通過第一檢測子網路對特徵圖進行初始檢測之後，再使用第二檢測子網路對初始檢測結果進行最終檢測，能夠提高檢測的準確度。 In some disclosed embodiments, the saliency detection model is the network structure of MobileNetV3. The saliency detection model includes a feature extraction sub-network, a first detection sub-network and a second detection sub-network. Wherein, the first detection sub-network and the second detection sub-network adopt a cascade structure. That is, the output of the first detection sub-network is used as the input of the second detection sub-network. In some embodiments, the structures of the first detection sub-network and the second detection sub-network are the same. Wherein, using the saliency detection model to detect the filtered sample image, and obtaining the predicted position information about the saliency region in the sample image, the method may be: using the feature extraction sub-network to perform feature extraction on the sample image, Obtain the feature map corresponding to the sample image. Then, use the first detection sub-network to perform initial detection on the feature map, and obtain the relevant salient features in the sample image. The initial location information of the sex zone. The initial location information may be presented in the form of a saliency map. Then, the feature map and the initial position information are fused to obtain the fusion result. For example, the fusion method may be to multiply the feature map and the initial position information to obtain the fusion result. The fusion result is finally detected by the second detection sub-network to obtain the predicted position information of the sample image. The final predicted location information can also be presented in the form of a saliency map. For a better understanding of the saliency map, please refer to FIG. 8 and FIG. 9 , FIG. 8 is a schematic diagram showing a sample image according to an embodiment of the training method of the saliency detection model of the present invention, and FIG. 9 is the training method of the saliency detection model of the present invention. An embodiment shows a schematic diagram of a prominent figure. As shown in Figure 8 and Figure 9, the sample image includes a table and a toy duck on the table, the saliency detection model detects the sample image, and the output initial position information (saliency map) is shown in Figure 9 , the primitive value of the toy duck's position is 1, and the primitive value of the rest of the position is 0. From this, the position of the toy duck in the sample image can be clearly obtained. Due to the simple network structure of MobileNetV3, by using the network structure of MobileNetV3, the detection efficiency can be accelerated, and devices with smaller processing capabilities can also use the saliency detection model to achieve saliency detection; After the network performs initial detection on the feature map, the second detection sub-network is used to perform final detection on the initial detection result, which can improve the detection accuracy.

一些公開實施例中，分別利用顯著性檢測模型對樣本圖像進行處理，得到樣本圖像中關於顯著性區域的預測位置資訊，基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數的方式包括如下。 In some disclosed embodiments, a saliency detection model is used to process the sample image, respectively, to obtain predicted position information about the saliency region in the sample image, based on the marked position information and predicted position information of the saliency region in the sample image, Ways to tune the parameters of the saliency detection model include as follows.

從多張樣本圖像中選擇若干樣本圖像作為當前樣本圖像。其中，若干指的是1及以上。也就是，這裡可以從多張樣本圖像中選擇其中一張樣本圖像作為當前樣本圖像，也可以是選擇兩張及以上的樣本圖像作為當前樣本圖像。在一些實施例中，選擇出的若干樣本圖像所屬的圖像類型包含多張樣本圖像的所有圖像類型。例如，在多張樣本圖像的圖像類型一共包括上述三種圖像類型時，從多張樣本圖像中選擇出的若干張樣本圖像也包含上述三種圖像類型。其中，每種圖像類型的樣本圖像的數量可以相同，也可以是不同。然後，利用顯著性檢測模型對當前樣本圖像進行處理，得到當前樣本圖像的預測結果。例如，將當前樣本圖像作為一個批次，利用顯著性檢測模型對這一個批次的樣本圖像進行處理，得到一個批次的預測結果。再基於當前樣本圖像的標注結果和預測結果，調整顯著性檢測模型的參數。可選地，可以使用分別利用一個批次中各個標注結果與其對應的預測結果之間的損失對模型的參數進行調整，這種方式需要對參數調整若干次，還可以是結合各標注結果與其對應的預測結果之間的損失對模型的參數進行調整，這種方式只需要對模型的參數調整一次。重複執行從多張樣本圖像選擇若干樣本圖像作為當前樣本圖像以及後續步驟，直到顯著性檢測模型滿足預設要求。其中，這裡的預設要求可以是模型給出的預測結果與標注結果之間的誤差大小。具體誤差大小根據實際需求確定，此處不做規定。可選地，每次從多張樣本圖像中選擇的若干樣本圖像可以與上一次選擇的部分樣本圖像相同。另一些公開實施例中，每次從多張樣本圖像中選擇的若干樣本圖像均不相同。從多張樣本圖像中選擇若干樣本圖像作為當前樣本圖像，並利用顯著性檢測模型對當前樣本圖像進行處理，能夠提高訓練速度。 Several sample images are selected from the plurality of sample images as the current sample image. Among them, several refer to 1 and above. That is, here, one of the sample images may be selected from the multiple sample images as the current sample image, or two or more sample images may be selected as the current sample image. In some embodiments, the image types to which the selected sample images belong include all image types of the plurality of sample images. For example, when the image types of the multiple sample images include the above three image types in total, several sample images selected from the multiple sample images also include the above three image types. The number of sample images of each image type may be the same or different. Then, the saliency detection model is used to process the current sample image to obtain the prediction result of the current sample image. For example, take the current sample image as a batch, and use the saliency detection model to process the sample images of this batch to obtain a batch of prediction results. Then, based on the annotation results and prediction results of the current sample image, the parameters of the saliency detection model are adjusted. Optionally, the parameters of the model can be adjusted by using the loss between each annotation result in a batch and its corresponding prediction result. In this way, the parameters need to be adjusted several times, or it can be combined with each annotation result and its corresponding prediction result. The loss between the prediction results adjusts the parameters of the model, which only needs to adjust the parameters of the model once. The selection of several sample images from multiple sample images as the current sample image and the subsequent steps are repeatedly performed until the saliency detection model meets the preset requirements. The preset requirement here may be the size of the error between the prediction result given by the model and the labeling result. The specific error size is determined according to the actual needs. Yes, there is no provision here. Optionally, several sample images selected from the plurality of sample images each time may be the same as the partial sample images selected last time. In other disclosed embodiments, several sample images selected from the plurality of sample images are different each time. Several sample images are selected from multiple sample images as the current sample image, and the saliency detection model is used to process the current sample image, which can improve the training speed.

一些公開實施例中，樣本圖像的標注資訊還包括樣本圖像的真實圖像類型，樣本圖像的預測結果包括樣本圖像的預測圖像類型。其中，在顯著性檢測模型為目標分類模型的情況下，顯著性檢測模型的預測結果包括目標的預測類別以及樣本圖像的預測圖像類型。在顯著性檢測模型為顯著性檢測模型的情況下，預測位置資訊為樣本圖像中目標的預測類別以及樣本圖像的預測圖像類型。通過使用關於樣本圖像的內容的標注位置資訊與其內容的預測位置資訊，和/或樣本圖像的真實圖像類型以及樣本圖像的預測圖像類型，對顯著性檢測模型的參數進行調整，使得調整之後的顯著性檢測模型的適用性更強。 In some disclosed embodiments, the annotation information of the sample image further includes the real image type of the sample image, and the prediction result of the sample image includes the predicted image type of the sample image. Wherein, when the saliency detection model is a target classification model, the prediction result of the saliency detection model includes the predicted category of the target and the predicted image type of the sample image. When the saliency detection model is a saliency detection model, the predicted position information is the predicted category of the target in the sample image and the predicted image type of the sample image. Adjusting the parameters of the saliency detection model by using annotated location information about the content of the sample image and predicted location information of its content, and/or the real image type of the sample image and the predicted image type of the sample image, This makes the adjusted saliency detection model more applicable.

一些公開實施例中，基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數的方式可以是：基於標注位置資訊和預測位置資訊，獲取樣本圖像中各圖元的第一損失。將樣本圖像中各圖元的第一損失進行加權，得到樣本圖像的第二損失。基於第二損失，調整顯著性檢測模型的參數。獲取第一損失的方式可以是將標注位置資訊與預測位置資訊進行作差，得到第一損失。通過對各圖元的第一損失進行加權，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 In some disclosed embodiments, based on the labeled location information and predicted location information of the saliency region of the sample image, the parameters of the saliency detection model can be adjusted by: The first loss of primitives. The first loss of each primitive in the sample image is weighted to obtain the second loss of the sample image. Based on the second loss, the parameters of the saliency detection model are adjusted. The way to obtain the first loss can be to make a difference between the marked position information and the predicted position information to get first loss. By weighting the first loss of each primitive, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

其中，圖元的第一損失的權重與圖元的邊界距離相關。圖元的邊界距離為圖元與真實顯著性區域的邊界之間的距離，真實顯著性區域為樣本圖像中由標注位置資訊定義的顯著性區域。其中，這裡的圖元與真實顯著性區域的邊界之間的距離可以是與顯著性區域的邊界最小距離。例如，樣本圖像的左上角的圖元位置為(0，0)，真實顯著性區域的邊界包括(0，1)、(0，2)等，該圖元位置與真實顯著性區域的邊界之間的距離為1。通過根據圖元的邊界距離確定權重，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 Among them, the weight of the first loss of the primitive is related to the boundary distance of the primitive. The boundary distance of the primitive is the distance between the primitive and the boundary of the real saliency area, and the real saliency area is the saliency area defined by the annotation location information in the sample image. Wherein, the distance between the primitive here and the boundary of the real saliency region may be the minimum distance from the boundary of the saliency region. For example, the position of the primitive in the upper left corner of the sample image is (0, 0), and the boundary of the real saliency area includes (0, 1), (0, 2), etc. The position of this primitive is the boundary of the real saliency area. The distance between them is 1. By determining the weight according to the boundary distance of the primitive, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

在一些實施例中，圖元點的邊界距離越小，圖元的第一損失的權重越大。即，圖元點的第一損失的權重與圖元點的邊界距離呈負相關。圖元的邊界距離與圖元的第一損失的權重呈負相關，使得得到的第二損失更準確。 In some embodiments, the smaller the boundary distance of the primitive point, the greater the weight of the primitive's first loss. That is, the weight of the first loss of the primitive point is negatively correlated with the boundary distance of the primitive point. The boundary distance of the primitive is negatively correlated with the weight of the first loss of the primitive, making the obtained second loss more accurate.

一些公開實施例中，基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數的方式可以是：基於真實圖像類型和預測圖像類型，得到第三損失。然後，基於第二損失和第三損失，調整顯著性檢測模型的參數。例如，基於真實圖像類型和預測圖像類型之間的誤差，得到第三損失。例如，通過結合一個批次的預測位置資訊與對應的標注資訊之間的誤差，確定一個第二損失，以及結合一個批次的預測圖像類型與真實的圖像類型之間的誤差，確定一個第三損失。結合第二損失和第三損失，調整顯著性檢測模型的參數。通過使用關於樣本圖像的內容的標注位置資訊與其內容的預測位置資訊之間的第二損失以及基於真實圖像類型和預測圖像類型的第三損失，調整顯著性檢測模型的參數，能夠提高顯著性檢測模型的適用性。 In some disclosed embodiments, based on the labeled position information and predicted position information of the saliency region of the sample image, the parameters of the saliency detection model may be adjusted by: obtaining the third loss based on the actual image type and the predicted image type. . Then, based on the second loss and the third loss, the parameters of the saliency detection model are adjusted. For example, a third loss is obtained based on the error between the real image type and the predicted image type. For example, a second loss is determined by combining the error between a batch of predicted location information and the corresponding annotation information, and combining a batch of predicted image types with ground truth The error between the image types determines a third loss. Combine the second loss and the third loss to adjust the parameters of the saliency detection model. By adjusting the parameters of the saliency detection model using a second loss between the annotated location information about the content of the sample image and the predicted location information of its content and a third loss based on the real image type and the predicted image type, it is possible to improve the Suitability of saliency detection models.

例如，第二損失對模型的參數進行優化，使得顯著性檢測模型得到的預測位置資訊更接近標注位置資訊，也就是二者之間的誤差變小。通過使用第三損失對模型的參數進行調整，使得表示同一物體但屬於不同圖像類型的圖像的特徵向量在特徵空間中的距離更接近，從而使得不同圖像類型的圖像的特徵向量都在距離較近的特徵空間中。例如，訓練得到的顯著性檢測模型對表示蘋果的手繪圖、卡通圖以及對蘋果進行拍攝得到的圖像進行特徵提取得到的特徵向量在特徵空間的距離更為接近。 For example, the second loss optimizes the parameters of the model, so that the predicted location information obtained by the saliency detection model is closer to the labeled location information, that is, the error between the two becomes smaller. By using the third loss to adjust the parameters of the model, the distance between the feature vectors of images representing the same object but belonging to different image types is closer in the feature space, so that the feature vectors of images of different image types are all in the feature space with closer distances. For example, the trained saliency detection model has a closer distance in the feature space for feature vectors obtained by extracting features from hand-drawn drawings representing apples, cartoon drawings, and images obtained by photographing apples.

一些公開實施例中，基於第二損失和第三損失，調整顯著性檢測模型的參數的方式可以是：獲取第二損失與第三損失之間的損失差。然後利用損失差和第三損失，對顯著性檢測模型的參數進行調整。例如，該損失差為第二損失和第三損失作差得到。利用第二損失差和第三損失差，對顯著性檢測模型的參數進行調整可以是先使用其中一個損失對模型的參數進行調整，再使用另一個損失對模型的參數進行調整。通過使用第二損失和第三損失的損失差以及第三損失對顯著性檢測模型的參數進行調整，能夠提高顯著性檢測模型的適用性。 In some disclosed embodiments, based on the second loss and the third loss, the method of adjusting the parameters of the saliency detection model may be: obtaining a loss difference between the second loss and the third loss. The parameters of the saliency detection model are then adjusted using the loss difference and the third loss. For example, the loss difference is obtained by taking the difference between the second loss and the third loss. Using the second loss difference and the third loss difference, the parameters of the saliency detection model may be adjusted by using one of the losses to adjust the parameters of the model first, and then using the other loss to adjust the parameters of the model. By adjusting the parameters of the saliency detection model using the loss difference between the second loss and the third loss and the third loss, it is possible to improve the Applicability of high saliency detection models.

一些公開實施例中，顯著性檢測模型還包括圖像類型分類子網路。 In some disclosed embodiments, the saliency detection model further includes an image type classification sub-network.

其中，圖像類型分類子網路連接特徵提取子網路。利用圖像類型分類網路對樣本圖像進行圖像類型分類，得到樣本圖像的預測圖像類型。在一些實施例中，將特徵提取子網路提取得到的特徵圖輸入圖像類型分類網路，得到關於樣本圖像的預測圖像類型。其中，利用損失差和第三損失，對顯著性檢測模型的參數進行調整的方式可以是：利用第三損失對圖像類型分類子網路的參數進行調整。以及利用損失差，對特徵提取子網路、第一檢測子網路及第二檢測子網路的參數進行調整。使用損失差和第三損失對參數進行調整的方式均為正向調整。通過使用損失差對顯著性檢測模型中的特徵提取子網路、第一檢測子網路及第二檢測子網路進行調整，使得顯著性檢測模型得到的關於樣本圖像的內容的預測位置資訊更準確，以及使用第三損失對圖像類型分類網路的參數進行調整，能夠提高圖像類型分類網路的準確度。 Among them, the image type classification sub-network connects the feature extraction sub-network. The image type classification network is used to classify the image type of the sample image, and the predicted image type of the sample image is obtained. In some embodiments, the feature map extracted by the feature extraction sub-network is input into the image type classification network to obtain the predicted image type of the sample image. The method of adjusting the parameters of the saliency detection model by using the difference of the loss and the third loss may be: adjusting the parameters of the image type classification sub-network by using the third loss. and using the loss difference to adjust the parameters of the feature extraction sub-network, the first detection sub-network and the second detection sub-network. Both the loss difference and the third loss are used to adjust the parameters in a positive way. By using the loss difference to adjust the feature extraction sub-network, the first detection sub-network and the second detection sub-network in the saliency detection model, the predicted location information about the content of the sample image obtained by the saliency detection model More accurate, and using the third loss to adjust the parameters of the image type classification network can improve the accuracy of the image type classification network.

一些公開實施例中，訓練得到的顯著性檢測模型能夠部署到手機端，AR/VR端進行圖像處理。顯著性檢測方法還可應用於拍照、視頻錄製濾鏡等軟體中。 In some disclosed embodiments, the trained saliency detection model can be deployed to the mobile phone, and the AR/VR end performs image processing. The saliency detection method can also be applied to software such as photography and video recording filters.

其中，顯著性檢測模型的訓練方法的執行主體可以是顯著性檢測模型的訓練裝置，例如，顯著性檢測模型的訓練方法可以由終端設備或伺服器或其它處理設備執行，其中，終端設備可以為使用者設備(User Equipment，UE)、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理(Personal Digital Assistant，PDA)、手持設備、計算設備、車載設備、可穿戴設備等。在一些可能的實現方式中，該顯著性檢測模型的訓練方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。 The execution body of the training method for the saliency detection model may be a training device for the saliency detection model. For example, the training method for the saliency detection model may be executed by a terminal device or a server or other processing device, where the terminal device may be User Equipment (UE), mobile device, user terminal, terminal, cellular phone, wireless phone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc. In some possible implementations, the training method of the saliency detection model may be implemented by the processor calling computer-readable instructions stored in the memory.

請參見圖10，圖10是本發明顯著性檢測方法一實施例的流程示意圖。如圖10所示，本發明實施例提供的顯著性檢測方法包括以下步驟。 Please refer to FIG. 10. FIG. 10 is a schematic flowchart of an embodiment of a method for detecting significance of the present invention. As shown in FIG. 10 , the saliency detection method provided by the embodiment of the present invention includes the following steps.

步驟S21：獲取待處理圖像。 Step S21: Acquire the image to be processed.

其中，獲取待處理圖像的方式有多種，例如，通過執行顯著性檢測方法的執行設備中的攝影元件進行拍攝得到，也可以是根據各種通信方式從其他設備中獲取待處理圖像。其中，待處理圖像的圖像類型可以是多種圖像類型中的一種。例如，待處理圖像的圖像類型可以是對目標拍攝得到的圖像、手繪圖卡通圖中的一種或多種。一些公開實施例中，還可從視頻中獲取待處理圖像。例如，將一段視頻輸入顯著性檢測模型，顯著性檢測模型獲取視頻中的每一幀視頻幀，並將每一幀視頻幀作為待處理圖像。 There are many ways to obtain the image to be processed, for example, the image to be processed can be obtained by photographing a photographing element in the execution device executing the saliency detection method, or the image to be processed can be obtained from other devices according to various communication methods. The image type of the image to be processed may be one of multiple image types. For example, the image type of the image to be processed may be one or more of an image obtained by photographing a target and a hand-drawn cartoon. In some disclosed embodiments, the image to be processed may also be obtained from the video. For example, put a The segment video is input to the saliency detection model, and the saliency detection model obtains each video frame in the video, and uses each video frame as an image to be processed.

步驟S22：利用顯著性檢測模型對待處理圖像進行處理，得到關於待處理圖像的內容中關於顯著性區域的預測位置資訊，其中，顯著性檢測模型是顯著性檢測模型的訓練方法訓練得到的。 Step S22: using the saliency detection model to process the image to be processed to obtain the predicted position information about the saliency area in the content of the image to be processed, wherein the saliency detection model is obtained by training the training method of the saliency detection model .

本發明實施例中的顯著性檢測模型包括特徵提取子網路、第一檢測子網路以及第二檢測子網路。其中，該顯著性檢測模型利用了多種圖像類型的樣本圖像進行訓練。例如，將待處理圖像從顯著性檢測模型的輸入端輸入該顯著性檢測模型。顯著性檢測模型對待處理圖像進行處理得到待處理圖像內容中關於顯著性區域的預測位置資訊。 The saliency detection model in the embodiment of the present invention includes a feature extraction sub-network, a first detection sub-network, and a second detection sub-network. Among them, the saliency detection model is trained using sample images of various image types. For example, the image to be processed is input into the saliency detection model from the input end of the saliency detection model. The saliency detection model processes the to-be-processed image to obtain predicted location information about the saliency region in the content of the to-be-processed image.

上述方案，通過使用上述顯著性檢測模型的訓練方法訓練得到的顯著性檢測模型對待處理圖像進行處理，能夠提高圖像處理的準確度。 In the above solution, by using the saliency detection model trained by the above-mentioned saliency detection model training method to process the image to be processed, the accuracy of image processing can be improved.

一些公開實施例中，利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊之後，顯著性檢測方法還包括以下至少一步驟。 In some disclosed embodiments, the saliency detection method further includes at least one of the following steps after processing the to-be-processed image by using a saliency detection model to obtain predicted location information about the saliency region in the content of the to-be-processed image.

1、在顯示待處理圖像的介面上顯示預測位置資訊。其中，顯示的方式有多種，例如將預測位置資訊標注在待處理圖像上，以便將待處理圖像和對應的預測位置資訊一起在顯示介面上顯示，當然，還可以是在顯示介面的不同區域分別顯示待處理圖像和對應的預測位置資訊。一些公開實施例中，若待處理圖像為兩個及以上時，可以在顯示介面的不同區域顯示對應的待處理圖像及其預測位置資訊，或者以翻頁的形式顯示待處理圖像及其預測位置資訊。其中，在待處理圖像是從視頻中獲取時，判斷連續預設數量幀的視頻幀的預測位置資訊是否相同，若是，則認為預測位置資訊正確。若否，則認為預測位置資訊不正確。其中，可以選擇將正確的預測位置資訊輸出，將錯誤的預測位置資訊不輸出，也可以選擇將正確和錯誤的預測位置資訊進行對應的批註，並輸出。其中，預設數量幀可以是5幀、10幀等等，可根據具體使用場景確定。 1. Display the predicted location information on the interface displaying the image to be processed. Among them, there are various ways of displaying, for example, the predicted position information is marked on the image to be processed, so that the image to be processed and the corresponding predicted position information can be displayed on the display interface together. Of course, it can also be displayed on different display interfaces The areas respectively display the image to be processed and the corresponding predicted location information. In some disclosed embodiments, if there are two or more images to be processed, the corresponding images to be processed and their predicted location information may be displayed in different areas of the display interface, or the images to be processed and their predicted location information may be displayed in the form of page turning. Its predicted location information. Wherein, when the image to be processed is obtained from the video, it is determined whether the predicted position information of the video frames of the consecutive preset number of frames is the same, and if so, the predicted position information is considered to be correct. If not, the predicted location information is considered incorrect. Among them, you can choose to output the correct predicted position information, and not output the wrong predicted position information, or you can choose to annotate and output the correct and wrong predicted position information correspondingly. The preset number of frames may be 5 frames, 10 frames, etc., which may be determined according to specific usage scenarios.

2、利用預測位置資訊，對顯著性區域進行骨骼提取，得到目標骨骼。以及，為目標骨骼選擇一骨骼模型作為源骨骼。其中，源骨骼上設置有動畫驅動資料。然後將與源骨骼相關的第一動畫驅動資料移轉至目標骨骼上，得到目標骨骼的第二動畫驅動資料。其中，目標骨骼是基於待處理圖像中目標進行骨骼提取得到的。 2. Use the predicted position information to extract bones from the salient regions to obtain the target bones. And, select a bone model for the target bone as the source bone. Among them, animation driving data is set on the source bone. Then, the first animation driving data related to the source bone is transferred to the target bone to obtain the second animation driving data of the target bone. The target skeleton is obtained by performing skeleton extraction based on the target in the image to be processed.

一些公開實施例中，利用預測位置資訊，對顯著性區域進行骨骼提取，得到目標骨骼的步驟可以是：對顯著性區域進行輪廓提取，得到目標的輪廓，然後利用該輪廓，為目標生成三維網格模型。最後，從三維網格模型中提取得到目標骨骼。 In some disclosed embodiments, using the predicted position information to perform bone extraction on the salient region to obtain the target skeleton may be: performing contour extraction on the salient region to obtain the contour of the target, and then using the contour to generate a 3D mesh for the target. lattice model. Finally, the target bones are extracted from the 3D mesh model.

獲取源骨骼的方式可以是：對待處理圖像進行分類，得到目標對象的類別，並選擇與類別匹配的骨骼模型作為源骨骼。其中，目標骨骼為目標對象的骨骼。例如，本發明實施例可以採用預測標籤映射，也可以採用資料集標籤映射。預測標籤映射對目標對象的分類結果包括目標對象的預測骨骼拓撲結構類型，例如預測骨骼拓撲結構類型包括二足、四足等等。也就是，預測標籤映射的過程主要是預測目標對象的骨骼拓撲結構特點。資料集標籤映射的分類結果需要給出輸入圖像中目標對象的具體種類，例如目標對象為貓、狗、大熊貓、狗熊等等。本發明實施例選擇採用預測標籤映射，具體應用過程中，若目標對象為大熊貓，而預測標籤映射給出的目標對象類別為四足，並選擇與類別匹配的骨骼模型作為初始源骨骼，若選擇的初始源骨骼為四足的狗熊。雖然大熊貓和狗熊不同，但是他們實際上具有大致相同的骨骼拓撲結構，因此，將狗熊的動畫驅動資料移轉到大熊貓上也能夠以自然合理的形式出現。也就是通過預測標籤映射雖然無法得到完全正確的目標對象的類別，但是也不影響對最終目標骨骼的驅動。同時，因為預測標籤映射沒有進一步獲知目標對象的具體類別，從而降低了計算成本。 The way to obtain the source skeleton can be: classify the image to be processed, get the category of the target object, and select the skeleton model that matches the category as the source bone. The target bone is the bone of the target object. For example, in this embodiment of the present invention, a predicted label mapping may be used, and a data set label mapping may also be used. The classification result of the target object by the predicted label mapping includes the predicted skeletal topology type of the target object, for example, the predicted skeletal topology type includes bipedal, quadrupedal and so on. That is, the process of predicting the label mapping is mainly to predict the skeletal topological characteristics of the target object. The classification result of the dataset label mapping needs to give the specific type of the target object in the input image, for example, the target object is a cat, a dog, a giant panda, a bear, and so on. This embodiment of the present invention chooses to use predicted label mapping. In the specific application process, if the target object is a giant panda, and the target object category given by the predicted label mapping is quadruped, a skeleton model matching the category is selected as the initial source bone. The initial source bone selected is a four-legged bear. Although pandas and bears are different, they actually have roughly the same skeletal topology, so transferring the bear's animation-driven data to pandas can also appear in a natural and reasonable form. That is, although the category of the target object cannot be obtained by predicting the label mapping, it does not affect the driving of the final target skeleton. At the same time, the computational cost is reduced because the predicted label mapping does not further know the specific category of the target object.

確定與目標骨骼匹配的源骨骼後，將源骨骼與目標骨骼進行之間進行骨骼節點映射，得到二者之間的節點映射關係。一些公開實施例中，得到二者之間的節點映射關係的方式可以是：確定源骨骼和目標骨骼中各節點所在的骨骼分支數量。按照骨骼分支數量從多到少的順序，依序對源骨骼和目標骨骼中的節點進行映射。其中，所在的骨骼分支數量最多的節點一般稱之為根節點。其中，暫且將節點所在的骨骼分支數量稱之為度數。也就是先構建兩個骨骼中度數較大的節點之間的映射關係，再構建度數較少的節點之間的映射關係。又或者，可以採用骨骼分支映射誤差值最小的原則進行映射。其中，如果源骨骼和目標骨骼之間的節點數不同，則選擇成本最低的最小多對一映射。例如，可以通過在發生多對一或跳過映射的序列中執行一對一的聯合匹配的方式進行映射。 After determining the source bone that matches the target bone, perform bone node mapping between the source bone and the target bone to obtain the node mapping relationship between the two. In some disclosed embodiments, the manner of obtaining the node mapping relationship between the two may be: determining the number of bone branches where each node of the source bone and the target bone is located. The nodes in the source bone and the target bone are mapped in order according to the order of the number of bone branches from more to less. Among them, the bone The node with the largest number of skeleton branches is generally called the root node. Among them, the number of bone branches where the node is located is called the degree for the time being. That is, the mapping relationship between the nodes with the larger degree in the two bones is first constructed, and then the mapping relationship between the nodes with the smaller degree is constructed. Alternatively, the mapping can be performed using the principle of the minimum error value of the skeleton branch mapping. Among them, if the number of nodes between the source bone and the target bone is different, the smallest many-to-one mapping with the lowest cost is selected. For example, the mapping can be performed by performing a one-to-one joint match in the sequence in which the many-to-one or skip mapping occurs.

一些公開實施例中，最終的目標骨骼與源骨骼的節點拓撲結構一致。或，最終目標骨骼與最終源骨骼之間的節點一一映射。也就是，最終的目標骨骼與最終的源骨骼的節點拓撲結構可能存在兩種形式，一種是最終的目標骨骼與最終的源骨骼的節點拓撲結構完全一致，另一種是最終的目標骨骼中的節點均有最終的源骨骼的節點與之對應，但是最終的源骨骼中存在一些沒有構建映射關係的節點。即，需要保證在動畫遷移後，最終的目標骨骼的節點上均有對應的動畫驅動資料。 In some disclosed embodiments, the final target bone is consistent with the node topology of the source bone. Or, the nodes between the final target bone and the final source bone are mapped one-to-one. That is, the node topology of the final target bone and the final source bone may exist in two forms, one is that the node topology of the final target bone is completely consistent with the final source bone, and the other is the node in the final target bone. All the nodes of the final source bone correspond to it, but there are some nodes that have no mapping relationship in the final source bone. That is, it needs to be ensured that after the animation is migrated, the nodes of the final target bone have corresponding animation driving data.

在獲得二者之間的節點映射關係之後，進行拓撲結構對齊以及節點對齊。 After obtaining the node mapping relationship between the two, perform topology alignment and node alignment.

其中，進行拓撲結構對齊的方式可以包括以下至少一步。 Wherein, the manner of performing topology structure alignment may include at least one of the following steps.

一是在源骨骼和目標骨骼之間存在多個節點映射於同一節點的情況下，更新其中一個骨骼的節點拓撲結構。其中，經更新之後的兩個骨骼之間的節點一一映射。通過更新骨骼的節點拓撲結構能夠使得兩個骨骼之間的多個節點映射於同一節點的情況調整為兩個骨骼之間的節點一一映射，以減少後續動畫驅動最終目標骨骼的過程中出現不合理的情況出現。 One is to update the node topology of one of the bones when there are multiple nodes mapped to the same node between the source bone and the target bone. Among them, the updated nodes between the two bones are mapped one by one. pass Updating the node topology of the bones can make the case where multiple nodes between two bones are mapped to the same node, and adjust the one-to-one mapping between the nodes between the two bones to reduce the unreasonableness in the process of subsequent animation driving the final target bone situation occurs.

其中，更新其中一個骨骼的節點拓撲結構又可分為多種情況：第一種情況是在多個節點位於同一骨骼分支的情況下，更新多個節點所在的第一骨骼。其中，第一骨骼和第二骨骼中的其中一個為源骨骼，另一個為目標骨骼。通過更新多個節點所在的第一骨骼，使得兩個骨骼之間的多個節點映射於同一節點的情況調整為兩個骨骼之間的節點一一映射，進而減少後續動畫驅動最終目標骨骼的過程中出現不合理的情況出現。可選地，更新多個節點所在的第一骨骼的方式可以是將第一骨骼中的多個節點合併為一個第一節點。其中，第一節點保留合併前多個節點的映射關係。並且，第一節點的位置取所有被合併節點的位置的平均值。 Wherein, updating the node topology of one of the bones can be divided into multiple cases: the first case is to update the first bone where multiple nodes are located when multiple nodes are located in the same bone branch. Among them, one of the first bone and the second bone is the source bone, and the other is the target bone. By updating the first bone where multiple nodes are located, the situation where multiple nodes between two bones are mapped to the same node is adjusted to one-to-one mapping between nodes between two bones, thereby reducing the process of subsequent animation driving the final target bone Unreasonable situations occur. Optionally, the manner of updating the first bone where the multiple nodes are located may be to combine multiple nodes in the first bone into one first node. The first node retains the mapping relationship of the multiple nodes before merging. And, the position of the first node is the average of the positions of all merged nodes.

同時參見圖11，圖11是本發明顯著性檢測方法一實施例示出映射關係的第一示意圖。如圖11所示，目標骨骼中的第二個節點和第三個節點同時映射於源骨骼中的第二個節點時。在這種情況下，將目標骨骼中的第二個節點和第三個節點進行合併為一個第一節點。其中，第一節點的位置取目標骨骼中第二個節點和第三個節點的位置的平均值。其中，當第一骨骼為源骨骼時，因為源骨骼中的節點攜帶有動畫驅動資料，所以當節點合併之後，需要獲取第一節點的動畫驅動資料，此時，可以將被合併的所有節點的動畫驅動資料進行合併。例如，動畫驅動資料一般可以用矩陣表示，矩陣的合併可以用矩陣乘法表示，即將動畫驅動資料進行相乘，即可得到第一節點的動畫驅動資料。第二種情況是在多個節點位於不同骨骼分支的情況下，更新不包括多個節點的第二骨骼。其中，第一骨骼和第二骨骼中的其中一個為源骨骼，另一個為目標骨骼。可選地，在第一骨骼中查找出多個節點所在的骨骼分支匯合的第二節點。具體做法可以是依次父節點遍歷，從而得到第二節點。並在第二骨骼中查找出映射於第二節點的第三節點。然後找到多個節點對應的節點拓撲結構，在第三節點處新增至少一條骨骼分支。本發明實施例中，一個節點的父節點指的是在一條骨骼分支中，與該節點相鄰且比該節點更靠近根節點的節點。其中，多個節點與第三節點處新增的骨骼分支和原始的骨骼分支中的節點一一映射。其中，新增的骨骼分支可以是複製原始的骨骼分支。複製的內容包括動畫資料、以及該節點與其父節點之間的變換關係。例如，原始的骨骼分支中包括三個節點，則新增的骨骼分支中也包括三個節點，且新增的骨骼分支中的三個節點的動畫驅動資料是通過複製原始的骨骼分支中對應節點的動畫資料得到。 Referring to FIG. 11 at the same time, FIG. 11 is a first schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present invention. As shown in Figure 11, when the second node and the third node in the target bone are mapped to the second node in the source bone at the same time. In this case, the second and third nodes in the target bone are merged into one first node. Among them, the position of the first node is the average of the positions of the second node and the third node in the target bone. Among them, when the first bone is the source bone, because the nodes in the source bone carry animation driving data, after the nodes are merged, The animation driving data of the first node needs to be acquired. At this time, the animation driving data of all the nodes to be merged can be merged. For example, the animation driving data can generally be represented by a matrix, and the combination of the matrices can be represented by matrix multiplication, that is, the animation driving data of the first node can be obtained by multiplying the animation driving data. The second case is when multiple nodes are in different bone branches, updating the second bone that does not include multiple nodes. Among them, one of the first bone and the second bone is the source bone, and the other is the target bone. Optionally, a second node where the branches of the bones where the multiple nodes are located converge is found in the first bone. The specific method may be to traverse the parent nodes in turn to obtain the second node. And find the third node mapped to the second node in the second bone. Then, the node topology structure corresponding to the multiple nodes is found, and at least one bone branch is added at the third node. In this embodiment of the present invention, the parent node of a node refers to a node in a skeleton branch that is adjacent to the node and closer to the root node than the node. The multiple nodes are mapped one-to-one with the newly added skeleton branch at the third node and the nodes in the original skeleton branch. The newly added bone branch may be a copy of the original bone branch. The copied content includes animation data and the transformation relationship between the node and its parent node. For example, if the original bone branch includes three nodes, the newly added bone branch also includes three nodes, and the animation driving data of the three nodes in the newly added bone branch is obtained by copying the corresponding nodes in the original bone branch animation data obtained.

同時參見圖12，圖12是本發明顯著性檢測方法一實施例示出映射關係的第二示意圖。如圖12所示，左邊的節點拓撲結構為源骨骼的節點拓撲結構，右邊的節點拓撲結構為目標骨骼的節點拓撲結構。圖12中，目標骨骼的第一個節點映射於源骨骼的第一個節點，目標骨骼的第二個節點映射於源骨骼的第二個節點，目標骨骼的第二個節點下包括兩個分支，即左分支與右分支，其中，左分支中的第一個節點和右分支中的第一個節點映射於源骨骼的第三個節點，左分支中的第二個節點和右分支中的第二個節點映射於源骨骼的第四個節點。這也就出現了目標骨骼中兩個節點映射於源骨骼的第三個節點，且這兩個節點屬於不同的分支，以及目標骨骼中兩個節點映射於源骨骼的第四個節點，且這兩個節點屬於不同的分支。其中，這兩個分支匯合在目標骨骼的第二個節點。在源骨骼中找出映射於目標骨骼的第二個節點為第二個節點。按照目標骨骼這兩個節點對應的節點拓撲結構，在源骨骼的第二個節點處新增一條骨骼分支。其中，新增的一條骨骼分支中的節點有兩個。此時，目標骨骼中所有的節點均一一對應與源骨骼中的節點。因此，通過此種方式在實現節點一一映射的情況下，還能夠最大化的保留第一骨骼的節點拓撲結構。 Referring to FIG. 12 at the same time, FIG. 12 is a second schematic diagram illustrating a mapping relationship according to an embodiment of the saliency detection method of the present invention. As shown in Figure 12, the node topology on the left is the node topology of the source bone, and the node on the right Topology is the node topology of the target bone. In Figure 12, the first node of the target bone is mapped to the first node of the source bone, the second node of the target bone is mapped to the second node of the source bone, and the second node of the target bone includes two branches , that is, the left branch and the right branch, where the first node in the left branch and the first node in the right branch are mapped to the third node of the source bone, the second node in the left branch and the right branch The second node maps to the fourth node of the source bone. This also means that two nodes in the target bone are mapped to the third node of the source bone, and these two nodes belong to different branches, and two nodes in the target bone are mapped to the fourth node of the source bone, and this The two nodes belong to different branches. Among them, the two branches meet at the second node of the target bone. Find the second node in the source bone that is mapped to the target bone as the second node. According to the node topology corresponding to the two nodes of the target bone, a new bone branch is added at the second node of the source bone. Among them, there are two nodes in the newly added bone branch. At this point, all the nodes in the target bone are in one-to-one correspondence with the nodes in the source bone. Therefore, in the case of implementing node-to-node mapping in this way, the node topology of the first bone can also be preserved to the greatest extent.

二是在骨骼中存在未有映射關係的情況下，更新未有映射關係的節點所在骨骼的節點拓撲結構。其中，兩個骨骼包括源骨骼和目標骨骼，經更新之後的兩個骨骼之間的節點一一映射。通過更新沒有映射關係的節點所在骨骼的節點拓撲結構，減少沒有映射關係的節點，使得更新後的兩個骨骼之間的節點一一映射，從而減少後續動畫驅動最終目標骨骼的過程中出現不合理的情況出現。可選地，將未有映射關係的節點合併至具有映射關係的相鄰節點。其中，相鄰節點為未有映射關係的節點在所在骨骼中的父節點或子節點。本發明實施例中將未有映射關係的節點向其父節點合併。 The second is to update the node topology of the bone where the node with no mapping relationship is located when there is no mapping relationship in the bone. The two bones include a source bone and a target bone, and the updated nodes between the two bones are mapped one-to-one. By updating the node topology of the bone where the node without a mapping relationship is located, the nodes without a mapping relationship are reduced, so that the nodes between the updated two bones are mapped one by one, thereby reducing the subsequent animation drive. An unreasonable situation occurs in the process of the final target bone. Optionally, a node without a mapping relationship is merged into an adjacent node with a mapping relationship. Among them, the adjacent node is the parent node or child node in the bone where the node without mapping relationship is located. In this embodiment of the present invention, nodes without a mapping relationship are merged with their parent nodes.

請參見圖13，圖13是本發明顯著性檢測方法一實施例示出映射關係的第三示意圖。如圖13所示，目標骨骼的第一個節點映射於源骨骼的第一個節點，目標骨骼的第二個節點映射於源骨骼的第三個節點，目標骨骼的第三個節點映射於源骨骼的第四個節點。其中，源骨骼的第二個節點沒有映射關係。可以將源骨骼的第二個節點向其父節點合併，也就是向源骨骼的第一個節點合併。當然，源骨骼中的節點合併都會伴隨著動畫驅動資料之間的合併，關於動畫驅動資料之間的合併此處不再贅述。 Please refer to FIG. 13 . FIG. 13 is a third schematic diagram illustrating a mapping relationship according to an embodiment of the method for detecting significance of the present invention. As shown in Figure 13, the first node of the target bone is mapped to the first node of the source bone, the second node of the target bone is mapped to the third node of the source bone, and the third node of the target bone is mapped to the source bone The fourth node of the bone. Among them, the second node of the source bone has no mapping relationship. You can merge the second node of the source bone to its parent node, that is, to the first node of the source bone. Of course, the merging of nodes in the source bones will be accompanied by the merging between animation-driven data, and the merging between animation-driven data will not be repeated here.

其中，進行節點對齊，主要是為了確定源骨骼和目標骨骼之間的第一位姿變換關係。 Among them, the node alignment is mainly to determine the first pose transformation relationship between the source bone and the target bone.

例如，按照從根源節點到葉源節點的順序，分別將最終源骨骼中的各源節點與最終目標骨骼中對應映射的目標節點進行對齊，以得到各源節點與映射的目標節點之間的第一位姿變換關係。如上述，根節點為所在的骨骼分支數量最多的節點。則根源節點指的是最終源骨骼中的根節點，同理，根目標節點指的是最終目標骨骼的根節點。最終源骨骼和最終目標骨骼指的是經過拓撲結構對齊後的源骨骼和目標骨骼。其中，葉節點指的是具有父節點但沒有子節點的節點。葉源節點指的是最終源骨骼中的葉節點，葉目標節點指的是最終目標骨骼中的葉節點。即，先對齊根源節點以及與根源節點有映射關係的根目標節點。然後再對齊與根源節點連接的葉源節點以及與該葉源節點之間具備映射關係的葉目標節點，以此類推，直至最終目標骨骼中所有節點均與最終源骨骼的節點一一對齊為止。一些公開實施例中，可以直接將最終目標骨骼的根目標節點作為第一座標系原點。 For example, in the order from the root node to the leaf source node, align each source node in the final source skeleton with the corresponding mapped target node in the final target skeleton, so as to obtain the first node between each source node and the mapped target node. A pose transformation relationship. As mentioned above, the root node is the node with the largest number of bone branches. The root node refers to the root node in the final source bone, and similarly, the root target node refers to the root node of the final target bone. The final source bone and final target bone refer to the source bone and target bone after topology alignment. Among them, a leaf node refers to a node with a parent but A node with no children. The leaf source node refers to the leaf node in the final source bone, and the leaf target node refers to the leaf node in the final target bone. That is, the root node and the root target node that has a mapping relationship with the root node are first aligned. Then align the leaf source node connected to the root node and the leaf target node with a mapping relationship with the leaf source node, and so on, until all the nodes in the final target bone are aligned with the nodes of the final source bone one by one. In some disclosed embodiments, the root target node of the final target bone may be directly used as the origin of the first coordinate system.

位姿變換關係為源節點與映射的目標節點在第一座標系中的變換關係。通過最終源骨骼的根源節點和最終目標骨骼的根目標節點均平移至第一座標系的原點，能夠獲取最終源骨骼的根源節點和最終目標骨骼的根目標節點之間的偏移量。例如，對於最終源骨骼中的每個源節點，獲取使源節點對齊於映射的目標節點所需的偏移量。其中，偏移量包括平移分量和旋轉分量。一般而言，平移分量中包括縮放分量。然後基於源節點對應的偏移量，得到源節點的第一位姿變換關係。 The pose transformation relationship is the transformation relationship between the source node and the mapped target node in the first coordinate system. By translating the root node of the final source bone and the root target node of the final target bone to the origin of the first coordinate system, the offset between the root node of the final source bone and the root target node of the final target bone can be obtained. For example, for each source node in the final source bone, get the offset needed to align the source node to the mapped target node. Among them, the offset includes translation and rotation components. In general, a scaling component is included in the translation component. Then, based on the offset corresponding to the source node, the first pose transformation relationship of the source node is obtained.

其中，若源骨骼的拓撲結構有發生改變，則源骨骼上的動畫資料也對應發生改變。例如，源骨骼中某兩個源節點發生合併，則將其節點對應的動畫資料也進行合併。 Among them, if the topology of the source bone changes, the animation data on the source bone also changes correspondingly. For example, if two source nodes in the source bone are merged, the animation data corresponding to the nodes will also be merged.

由此，可以將源骨骼上的動畫資料移轉到目標骨骼上，以驅動待處理圖像中的目標進行運動。 In this way, the animation data on the source bone can be transferred to the target bone to drive the movement of the target in the image to be processed.

通過在得到預測資訊之後，還執行上述至少一步，提高了使用過程中的便捷性。 By performing the above at least one step after obtaining the prediction information, the convenience in the use process is improved.

以及通過使用上述顯著性檢測模型的訓練方法訓練得到的顯著性檢測模型輸出的顯著性區域，並以此對顯著性區域進行骨骼提取得到目標骨骼，使得得到的目標骨骼更為準確。 And the saliency area output by the saliency detection model obtained by using the training method of the saliency detection model above, and the saliency area is extracted from the saliency area to obtain the target skeleton, so that the obtained target skeleton is more accurate.

其中，顯著性檢測方法的執行主體可以是顯著性檢測裝置，例如，顯著性檢測方法可以由終端設備或伺服器或其它處理設備執行，其中，終端設備可以為使用者設備(User Equipment，UE)、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理(Personal Digital Assistant，PDA)、手持設備、計算設備、車載設備、可穿戴設備等。在一些可能的實現方式中，該顯著性檢測方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。 Wherein, the execution body of the saliency detection method may be a saliency detection apparatus, for example, the salience detection method may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a user equipment (User Equipment, UE) , mobile devices, user terminals, terminals, cellular phones, wireless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the significance detection method may be implemented by the processor calling computer-readable instructions stored in the memory.

請參閱圖14，圖14是本發明顯著性檢測模型的訓練裝置一實施例的結構示意圖。顯著性檢測模型的訓練裝置30包括第一獲取模組31、篩選模組32、第一檢測模組33以及調整模組34。第一獲取模組31，配置為獲取至少一張樣本圖像，其中，至少一張樣本圖像包括屬於預設圖像類型的目標樣本圖像；篩選模組32，配置為基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾；第一檢測模組33，配置為利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊；調整模組34，配置為基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。 Please refer to FIG. 14. FIG. 14 is a schematic structural diagram of an embodiment of a training apparatus for a saliency detection model of the present invention. The training device 30 for the saliency detection model includes a first acquisition module 31 , a screening module 32 , a first detection module 33 and an adjustment module 34 . The first acquisition module 31 is configured to acquire at least one sample image, wherein the at least one sample image includes a target sample image belonging to a preset image type; the screening module 32 is configured to be based on the target sample image If the contour of the medium saliency area is missing, filter the target sample image; the first detection module 33 is configured to use the saliency The detection model detects the filtered sample image, and obtains the predicted position information about the salient region in the sample image; the adjustment module 34 is configured to be based on the marked position information and predicted position information of the salient region based on the sample image , adjust the parameters of the saliency detection model.

上述方案，通過對獲取到的預設圖像類型的目標樣本圖像進行按照其顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，使得保留下的樣本圖像中顯著性區域較為完整，進而利用這種保留下的品質較高的樣本圖像對顯著性檢測模型進行訓練，可以使得訓練得到的顯著性檢測模型後續對圖像進行檢測的結果更準確。 The above scheme filters the target sample image according to the missing outline of the salient region of the obtained target sample image of the preset image type, so that the salient region in the retained sample image is relatively complete. , and then use the retained sample images with higher quality to train the saliency detection model, which can make the result of subsequent image detection by the trained saliency detection model more accurate.

在一些實施例中，篩選模組32配置為基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾，包括：對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異；在差異滿足預設要求的情況下，過濾目標樣本圖像。 In some embodiments, the screening module 32 is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image, including: filling in the contour of the salient region in the target sample image, Obtain the fill-in sample image; obtain the difference between the fill-in sample image and the target sample image with respect to the saliency area; filter the target sample image when the difference meets the preset requirements.

上述方案，樣本圖像按照輪廓缺失的情況進行過濾，使得留下的樣本圖像中顯著性區域輪廓的品質更好。另外，通過獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異能夠較快的獲取顯著性區域的輪廓缺失情況。 In the above solution, the sample image is filtered according to the absence of the contour, so that the quality of the contour of the salient region in the remaining sample image is better. In addition, the missing contour of the salient region can be quickly acquired by acquiring the difference between the saliency region in the fill-in sample image and the target sample image.

在一些實施例中，預設要求為差異大於預設差異值；篩選模組32配置為對目標樣本圖像中顯著性區域的輪廓進行填補，得到填補樣本圖像，包括：對目標樣本圖像進行閉運算，得到填補樣本圖像；獲取填補樣本圖像與目標樣本圖像中關於顯著性區域的差異，包括：獲取填補樣本圖像關於顯著性區域的第一面積，以及目標樣本圖像中關於顯著性區域的第二面積；將第一面積和第二面積之差作為差異。 In some embodiments, the preset requirement is that the difference is greater than the preset difference value; the screening module 32 is configured to fill in the contour of the salient region in the target sample image to obtain the filled sample image, including: Perform the closing operation to obtain the filled sample image; obtain the filled sample image and the target The difference in the saliency area in the sample image includes: obtaining the first area of the filling sample image regarding the saliency area, and the second area regarding the saliency area in the target sample image; combining the first area and the second area The difference is the difference.

上述方案，若目標樣本圖像中的顯著性區域的輪廓存在較大的缺口，則填補前後的顯著性區域的面積可能存在較大的差異，從而根據填補前後顯著性區域的面積差，即可確定目標樣本圖像中顯著性區域的輪廓是否存在缺失。 In the above scheme, if there is a large gap in the contour of the salient region in the target sample image, there may be a large difference in the area of the salient region before and after filling, so according to the difference in the area of the salient region before and after filling, Determines whether the contours of salient regions in the target sample image are missing.

在一些實施例中，在基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾之後，篩選模組32還配置為：基於填補樣本圖像的顯著性區域的位置資訊，得到目標樣本圖像關於顯著性區域的標注位置資訊。 In some embodiments, after filtering the target sample image based on the absence of the contour of the salient region in the target sample image, the filtering module 32 is further configured to: fill in the position information of the salient region of the sample image based on the location information , to obtain the annotation location information about the saliency region of the target sample image.

上述方案，通過填補樣本圖像的顯著性區域的位置資訊，確定目標樣本圖像關於顯著性區域的標注位置資訊，能夠保障顯著性區域的完整性。 The above solution can ensure the integrity of the salient region by filling in the position information of the salient region of the sample image to determine the marked position information about the salient region of the target sample image.

上述方案，通過使用多種圖像類型的樣本圖像對顯著性檢測模型進行訓練，使得訓練得到的顯著性檢測模型能夠對多種類型的圖像進行圖像處理，從而提高了顯著性檢測模型的適用性。 In the above solution, the saliency detection model is trained by using sample images of various image types, so that the saliency detection model obtained by training can perform image processing on various types of images, thereby improving the applicability of the saliency detection model. sex.

在一些實施例中，多種圖像類型包括對真實物體拍攝得到的圖像、手繪圖以及卡通圖中的至少兩種。 In some embodiments, the multiple image types include At least two of the captured images, hand-drawn drawings and cartoon drawings.

上述方案，通過將常見的圖像類型對應的樣本圖像用於對圖像處理模型進行訓練，使得訓練得到的圖像處理模型在日常生活或工作中更為適用。 In the above solution, by using sample images corresponding to common image types for training the image processing model, the image processing model obtained by training is more suitable for daily life or work.

在一些實施例中，調整模組34配置為基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數，包括：基於標注位置資訊和預測位置資訊，獲取樣本圖像中各圖元的第一損失；將樣本圖像中各圖元的第一損失進行加權，得到樣本圖像的第二損失；基於第二損失，調整顯著性檢測模型的參數。 In some embodiments, the adjustment module 34 is configured to adjust the parameters of the saliency detection model based on the labeled location information and predicted location information of the saliency region of the sample image, including: obtaining samples based on the labeled location information and predicted location information The first loss of each image element in the image; the first loss of each image element in the sample image is weighted to obtain the second loss of the sample image; based on the second loss, the parameters of the saliency detection model are adjusted.

上述方案，通過對各圖元的第一損失進行加權，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 In the above solution, by weighting the first loss of each primitive, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

上述方案，通過根據圖元的邊界距離確定權重，使得利用加權後的第二損失調整顯著性檢測模型的參數更準確。 In the above solution, by determining the weight according to the boundary distance of the primitives, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.

上述方案，圖元的邊界距離與圖元的第一損失的權重呈負相關，使得得到的第二損失更準確。 In the above scheme, the boundary distance of the primitive is negatively correlated with the weight of the first loss of the primitive, so that the obtained second loss is more accurate.

在一些實施例中，顯著性檢測模型至少包括以下至少一個：顯著性檢測模型為MobileNetV3的網路結構、顯著性檢測模型包括特徵提取子網路和第一檢測子網路和第二檢測子網路；第一檢測模組33配置為利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊，包括：利用特徵提取子網路對樣本圖像進行特徵提取，得到樣本圖像對應的特徵圖；利用第一檢測子網路對特徵圖進行初始檢測，得到樣本圖像中關於顯著性區域的初始位置資訊；將特徵圖和初始位置資訊進行融合，得到融合結果；利用第二檢測子網路對融合結果進行最終檢測，得到樣本圖像的預測位置資訊。 In some embodiments, the saliency detection model includes at least one of the following: the saliency detection model is the network structure of MobileNetV3, and the saliency detection model includes a feature extraction sub-network, a first detection sub-network and a second detection sub-network The first detection module 33 is configured to use the saliency detection model to detect the filtered sample image, and obtain the predicted position information about the saliency area in the sample image, including: using the feature extraction sub-network to Perform feature extraction on the image to obtain the feature map corresponding to the sample image; use the first detection sub-network to perform initial detection on the feature map to obtain the initial position information about the salient region in the sample image; combine the feature map and the initial position information Perform fusion to obtain a fusion result; use the second detection sub-network to perform final detection on the fusion result to obtain the predicted position information of the sample image.

上述方案，因MobileNetV3的網路結構簡單，通過使用MobileNetV3的網路結構，能夠加快檢測效率，而且可以使得處理能力較小的設備也可使用該顯著性檢測模型實現顯著性檢測；另，通過第一檢測子網路對特徵圖進行初始檢測之後，再使用第二檢測子網路對初始檢測結果進行最終檢測，能夠提高檢測的準確度。 The above scheme, due to the simple network structure of MobileNetV3, can speed up the detection efficiency by using the network structure of MobileNetV3, and can also enable devices with smaller processing capabilities to use the saliency detection model to achieve saliency detection; After the first detection sub-network performs initial detection on the feature map, the second detection sub-network is used to perform final detection on the initial detection result, which can improve the detection accuracy.

在一些實施例中，第一檢測模組33配置為在利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊之前，篩選模組32還配置為：對經過濾後的樣本圖像進行資料增強；其中，資料增強的方式包括對樣本圖像中除顯著性區域以外的背景區域進行填充。 In some embodiments, the first detection module 33 is configured to use the saliency detection model to detect the filtered sample image and obtain the predicted location information about the saliency region in the sample image, the screening module 32 It is also configured to: perform data enhancement on the filtered sample image; wherein, the data enhancement method includes filling the background area except the salient area in the sample image.

上述方案，通過對樣本圖像進行資料增強，能夠提高顯著性檢測模型的適用性。 The above solution can improve the applicability of the saliency detection model by enhancing the data of the sample image.

請參閱圖15，圖15是本發明顯著性檢測裝置一實施例的結構示意圖。顯著性檢測裝置40包括第二獲取模組41以及第二檢測模組42。第二獲取模組41，配置為獲取待處理圖像；第二檢測模組42，配置為利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊，其中，顯著性檢測模型是由上述顯著性檢測模型的訓練方法訓練得到的。 Please refer to FIG. 15 . FIG. 15 is a schematic structural diagram of an embodiment of a saliency detection apparatus according to the present invention. The significance detection device 40 includes a second acquisition module 41 and a second detection module 42 . The second acquisition module 41 is configured to acquire the to-be-processed image; the second detection module 42 is configured to process the to-be-processed image by using the saliency detection model to obtain the predicted position of the saliency region in the content of the to-be-processed image information, wherein the saliency detection model is obtained by training the above-mentioned saliency detection model training method.

上述方案，通過使用顯著性檢測模型的訓練方法訓練得到的顯著性檢測模型對待處理圖像進行檢測，能夠提高得到關於顯著性區域的預測位置資訊的準確度。 In the above solution, by using the saliency detection model trained by the saliency detection model training method to detect the to-be-processed image, the accuracy of obtaining the predicted position information about the saliency region can be improved.

在一些實施例中，在利用顯著性檢測模型對待處理圖像進行處理，得到待處理圖像內容中關於顯著性區域的預測位置資訊之後，顯著性檢測裝置還包括功能模組(圖未示)，功能模組配置為：利用預測位置資訊，對顯著性區域進行骨骼提取，得到目標骨骼；為目標骨骼選擇一骨骼模型作為源骨骼；將與源骨骼相關的第一動畫驅動資料移轉至目標骨骼上，得到目標骨骼的第二動畫驅動資料。 In some embodiments, after using the saliency detection model to process the to-be-processed image to obtain the predicted location information about the saliency region in the content of the to-be-processed image, the saliency detection device further includes a functional module (not shown in the figure) , the function module is configured as: using the predicted position information to extract the salient area to obtain the target bone; select a bone model for the target bone as the source bone; transfer the first animation driving data related to the source bone to the target On the bone, get the second animation driving data of the target bone.

上述方案，通過利用預測位置資訊，對顯著性區域進行骨胳提取，能夠提高目標骨骼的準確度。 The above solution can improve the accuracy of the target skeleton by using the predicted position information to perform skeleton extraction on the salient region.

請參閱圖16，圖16是本發明電子設備一實施例的結構示意圖。電子設備50包括記憶體51和處理器52，處理器52用於執行記憶體51中儲存的程式指令，以實現上述任一顯著性檢測模型的訓練方法實施例中的步驟和/或顯著性檢測方法實施例中的步驟。在一個實施場景中，電子設備50可以包括但不限於：醫療設備、微型電腦、臺式電腦、伺服器，此外，電子設備50還可以包括筆記型電腦、平板電腦等移動設備，在此不做限定。 Please refer to FIG. 16 , which is a schematic structural diagram of an embodiment of an electronic device of the present invention. The electronic device 50 includes a memory 51 and a processor 52, and the processor 52 is used for executing program instructions stored in the memory 51 to achieve The steps in any of the above-mentioned saliency detection model training method embodiments and/or the steps in the saliency detection method embodiments. In an implementation scenario, the electronic device 50 may include, but is not limited to, medical equipment, microcomputers, desktop computers, and servers. In addition, the electronic device 50 may also include mobile devices such as notebook computers and tablet computers, which are not described here. limited.

處理器52用於控制其自身以及記憶體51以實現上述任一顯著性檢測模型的訓練方法實施例中的步驟。處理器52還可以稱為CPU(Central Processing Unit，中央處理單元)。處理器52可能是一種積體電路晶片，具有信號的處理能力。處理器52還可以是通用處理器、數位訊號處理器(Digital Signal Processor,DSP)、專用積體電路(Application Specific Integrated Circuit,ASIC)、現場可程式設計閘陣列(Field-Programmable Gate Array,FPGA)或者其他可程式設計邏輯器件、分立門或者電晶體邏輯器件、分立硬體元件。通用處理器可以是微處理器或者該處理器也可以是任何常規的處理器等。另外，處理器52可以由積體電路晶片共同實現。 The processor 52 is used to control itself and the memory 51 to implement the steps in any of the above-mentioned embodiments of the training method for the saliency detection model. The processor 52 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 52 may be an integrated circuit chip with signal processing capabilities. The processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field-programmable gate array (FPGA). Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 52 may be commonly implemented by an integrated circuit die.

請參閱圖17，圖17是本發明電腦可讀儲存介質一實施例的結構示意圖。電腦可讀儲存介質60儲存有能夠被處理器運行的程式指令61，程式指令61用於實現上述任一顯著性檢測模型的訓練方法實施例中的步驟和/或顯著性檢測方法實施例中的步驟。 Please refer to FIG. 17 , which is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present invention. The computer-readable storage medium 60 stores program instructions 61 that can be executed by the processor, and the program instructions 61 are used to implement the steps in the training method embodiment of any of the above-mentioned saliency detection models and/or the saliency detection method embodiments. step.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其實現可以參照上文方法實施例的描述。 In some embodiments, the functions or modules included in the apparatus provided in the embodiments of the present invention may be used to execute the methods described in the above method embodiments, and for implementation, reference may be made to the descriptions in the above method embodiments.

上文對各個實施例的描述傾向於強調各個實施例之間的不同之處，其相同或相似之處可以互相參考，為了簡潔，本文不再贅述。 The above descriptions of the various embodiments tend to emphasize the differences between the various embodiments, and the similarities or similarities can be referred to each other. For the sake of brevity, details are not repeated herein.

在本發明所提供的幾個實施例中，應該理解到，所揭露的方法和裝置，可以通過其它的方式實現。例如，以上所描述的裝置實施方式僅僅是示意性的，例如，模組或單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如單元或元件可以結合或者可以集成到另一個系統，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通信連接可以是通過一些介面，裝置或單元的間接耦合或通信連接，可以是電性、機械或其它的形式。 In the several embodiments provided by the present invention, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device implementations described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other divisions. For example, units or elements may be combined or integrated. to another system, or some features can be ignored, or not implemented. Another point, shown or discussed mutual coupling or direct coupling or communication connection The connection may be indirect coupling or communication connection of devices or units through some interfaces, and may be in electrical, mechanical or other forms.

另外，在本發明各個實施例中的各功能單元可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體功能單元的形式實現。集成的單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取儲存介質中。基於這樣的理解，本發明的技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的全部或部分可以以軟體產品的形式體現出來，該電腦軟體產品儲存在一個儲存介質中，包括若干指令用以使得一台電腦設備(可以是個人電腦，伺服器，或者網路設備等)或處理器(processor)執行本發明各個實施方式方法的全部或部分步驟。而前述的儲存介質包括：U盤、移動硬碟、唯讀記憶體(ROM，Read-Only Memory)、隨機存取記憶體(RAM，Random Access Memory)、磁碟或者光碟等各種可以儲存程式碼的介質。 In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software functional units. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods of various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or CD, etc. medium.

Industrial Applicability

本發明實施例公開了一種顯著性檢測方法及其模型的訓練方法、設備及電腦可讀儲存介質，顯著性檢測模型的訓練方法包括：獲取至少一張樣本圖像，其中，至少一張樣本圖像包括屬於預設圖像類型的目標樣本圖像；基於目標樣本圖像中顯著性區域的輪廓缺失情況，對目標樣本圖像進行過濾；利用顯著性檢測模型對經過濾後的樣本圖像進行檢測，得到樣本圖像中關於顯著性區域的預測位置資訊；基於樣本圖像關於顯著性區域的標注位置資訊與預測位置資訊，調整顯著性檢測模型的參數。上述方案，通過對樣本圖像進行篩選再利用篩選後的樣本圖像對顯著性檢測模型進行訓練，能夠提高顯著性檢測模型輸出結果的準確度。 The embodiment of the present invention discloses a saliency detection method and a training method, device and computer-readable storage medium of the model. The training method of the saliency detection model includes: acquiring at least one sample image, wherein at least one sample image The image includes the target sample image belonging to the preset image type; based on the missing contour of the salient region in the target sample image, the target sample image filter the image; use the saliency detection model to detect the filtered sample image to obtain the predicted position information about the salient region in the sample image; based on the marked position information and predicted position information of the salient region in the sample image , adjust the parameters of the saliency detection model. In the above solution, the accuracy of the output result of the saliency detection model can be improved by screening the sample images and then using the filtered sample images to train the saliency detection model.

S11~S14:步驟 S11~S14: Steps

Claims

A training method for a saliency detection model, the method is executed by an electronic device, the method includes: acquiring at least one sample image, wherein the at least one sample image includes a target sample belonging to a preset image type image; filter the target sample image based on the missing outline of the saliency region in the target sample image; use a saliency detection model to detect the filtered sample image to obtain the The predicted position information about the saliency region in the sample image; based on the marked position information about the saliency region in the sample image and the predicted position information, the parameters of the saliency detection model are adjusted; wherein, the Filtering the target sample image based on the missing contour of the salient region in the target sample image includes: filling the contour of the salient region in the target sample image to obtain a filled sample image obtaining the difference with respect to the saliency region in the filling sample image and the target sample image; and filtering the target sample image when the difference meets a preset requirement.

The method according to claim 1, wherein the preset requirement is that the difference is greater than a preset difference value; the contour of the salient region in the target sample image is filled to obtain a filled sample image Like, including: Perform a closing operation on the target sample image to obtain the fill-in sample image; the obtaining the difference between the fill-in sample image and the target sample image with respect to the salient region includes: obtaining the saliency region Fill in the first area of the sample image with respect to the saliency region, and the second area of the target sample image with respect to the saliency region; determine the difference between the first area and the second area for the difference.

The method according to claim 2, wherein, after filtering the target sample image based on the absence of contours of salient regions in the target sample image, the method further comprises: based on the Filling in the position information of the salient region of the sample image, to obtain the marked position information of the target sample image with respect to the salient region.

The method of claim 1 or 2, wherein the at least one sample image includes multiple image types.

The method according to claim 4, wherein the multiple image types include at least two of images obtained by photographing real objects, hand-drawn drawings, and cartoon drawings.

The method according to claim 1, wherein the adjusting the parameters of the saliency detection model based on the labeled location information and the predicted location information of the saliency region based on the sample image includes: the labeled position information and the predicted position information of the saliency region of the sample image, and obtain the first loss; weighting the first loss of each primitive in the sample image to obtain the second loss of the sample image; adjusting the parameters of the saliency detection model based on the second loss.

The method according to claim 6, wherein the weight of the first loss of the primitive is related to the boundary distance of the primitive, and the boundary distance of the primitive is the boundary between the primitive and the real saliency area The real saliency area is the saliency area defined by the annotation location information in the sample image.

The method according to claim 7, wherein the smaller the boundary distance of the primitive, the greater the weight of the first loss of the primitive.

The method according to claim 1 or 2, wherein the saliency detection model includes at least one of the following: the saliency detection model is the network structure of MobileNetV3, and the saliency detection model includes a feature extraction sub-network and a first detection sub-network and a second detection sub-network; the saliency detection model is used to detect the filtered sample image to obtain the predicted position information about the saliency region in the sample image , including: using the feature extraction sub-network to perform feature extraction on the sample image to obtain a feature map corresponding to the sample image; using the first detection sub-network to perform initial detection on the feature map, Obtain the initial position information about the salient region in the sample image; fuse the feature map and the initial position information to obtain a fusion Result: using the second detection sub-network to perform final detection on the fusion result to obtain the predicted position information of the sample image.

The method according to claim 1 or 2, wherein, before using the saliency detection model to detect the filtered sample image to obtain the predicted position information about the saliency region in the sample image , the method further includes: performing data enhancement on the filtered sample image; wherein, the data enhancement method includes filling the background area in the sample image other than the salient area.

A saliency detection method, comprising: acquiring an image to be processed; using a saliency detection model to process the image to be processed to obtain predicted position information about a saliency area in the content of the image to be processed, wherein the The saliency detection model described above is trained by any method of request items 1 to 10.

The method according to claim 11, wherein after the to-be-processed image is processed by using a saliency detection model to obtain predicted position information about the saliency region in the content of the to-be-processed image, the The method further includes: using the predicted position information, performing bone extraction on the salient region to obtain a target bone; selecting a bone model for the target bone as a source bone; driving a first animation related to the source bone Data transferred to the stated purpose On the target bone, the second animation driving data of the target bone is obtained.

An electronic device includes a memory and a processor, the processor is configured to execute program instructions stored in the memory, so as to implement the method described in any one of claim 1 to 12.

A computer-readable storage medium on which program instructions are stored, wherein the program instructions implement the method described in any one of claim 1 to 12 when the program instructions are executed by a processor.