WO2023273069A1 - 显著性检测方法及其模型的训练方法和装置、设备、介质及程序 - Google Patents

显著性检测方法及其模型的训练方法和装置、设备、介质及程序 Download PDF

Info

Publication number
WO2023273069A1
WO2023273069A1 PCT/CN2021/127459 CN2021127459W WO2023273069A1 WO 2023273069 A1 WO2023273069 A1 WO 2023273069A1 CN 2021127459 W CN2021127459 W CN 2021127459W WO 2023273069 A1 WO2023273069 A1 WO 2023273069A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample image
image
saliency
position information
detection model
Prior art date
Application number
PCT/CN2021/127459
Other languages
English (en)
French (fr)
Inventor
秦梓鹏
黄健文
黄展鹏
Original Assignee
深圳市慧鲤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市慧鲤科技有限公司 filed Critical 深圳市慧鲤科技有限公司
Publication of WO2023273069A1 publication Critical patent/WO2023273069A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the technical field of image processing, in particular to a saliency detection method and its model training method, device, equipment, medium and program.
  • the sample images of certain data are simply obtained from the sample image database, and the model is trained directly using these sample images.
  • some sample images themselves have certain defects. If these sample images are used to train the model, the accuracy of the results obtained by the trained model after processing the images will be low.
  • Embodiments of the present application at least provide a saliency detection method and its model training method, device, equipment, medium and program.
  • An embodiment of the present application provides a training method for a saliency detection model, including: acquiring at least one sample image, wherein at least one sample image includes a target sample image belonging to a preset image type; The contour of the area is missing, and the target sample image is filtered; the filtered sample image is detected by the saliency detection model, and the predicted position information of the salient area in the sample image is obtained; the annotation of the salient area based on the sample image Position information and predicted position information, adjust the parameters of the saliency detection model.
  • the salient region in the retained sample image is relatively complete, and then using this
  • the retained high-quality sample images are used to train the saliency detection model, which can make the subsequent image detection results of the trained saliency detection model more accurate.
  • the filtering of the target sample image based on the absence of the contour of the salient region in the target sample image includes: filling the contour of the salient region in the target sample image to obtain a filled sample image; obtaining the filled sample image The difference with the salient region in the target sample image; when the difference meets the preset requirements, filter the target sample image.
  • the quality of the contour of the salient region in the remaining sample image is better.
  • the contour loss of the salient region can be quickly obtained.
  • the preset requirement is that the difference is greater than the preset difference value; filling the outline of the salient region in the target sample image to obtain the filled sample image includes: performing a closing operation on the target sample image to obtain the filled sample image; Obtaining the difference between the salient region in the filling sample image and the target sample image includes: obtaining the first area of the salient region in the filling sample image and the second area of the salient region in the target sample image; combining the first area and The difference of the second area, determined as the difference.
  • the target can be determined according to the area difference of the salient region before and after filling. Whether the outline of the salient region in the sample image is missing.
  • the method further includes: based on the position information of the salient region filled in the sample image, obtaining the relevant information about the salient region in the target sample image Marking position information of the sex region.
  • the integrity of the salient area can be guaranteed.
  • At least one sample image includes multiple image types.
  • the trained saliency detection model can perform image processing on various types of images, thereby improving the applicability of the saliency detection model.
  • the plurality of image types includes at least two of images taken from real objects, hand-drawn drawings, and cartoon images.
  • the trained image processing model is more applicable in daily life or work.
  • adjusting the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region in the sample image includes: acquiring a sample based on the marked position information and predicted position information of the salient region in the sample image a first loss of each pixel in the image; weighting the first loss of each pixel in the sample image to obtain a second loss of the sample image; and adjusting parameters of the saliency detection model based on the second loss.
  • the weight of the first loss of the pixel is related to the boundary distance of the pixel
  • the boundary distance of the pixel is the distance between the pixel and the boundary of the real salient region
  • the real salient region is the position information marked by the annotation in the sample image Defined salient regions.
  • the smaller the boundary distance of the pixel the greater the weight of the first loss of the pixel.
  • the pixel's boundary distance is negatively correlated with the weight of the pixel's first loss, making the resulting second loss more accurate.
  • the saliency detection model at least includes at least one of the following: the saliency detection model is a network structure of MobileNetV3, the saliency detection model includes a feature extraction subnetwork and a first detection subnetwork and a second detection subnetwork;
  • the saliency detection model detects the filtered sample image and obtains the predicted position information of the salient region in the sample image, including: using the feature extraction sub-network to extract the feature of the sample image to obtain the corresponding feature map of the sample image; using The first detection sub-network performs initial detection on the feature map to obtain the initial position information of the salient region in the sample image; the feature map and the initial position information are fused to obtain the fusion result; the second detection sub-network is used to finalize the fusion result Detect to obtain the predicted position information of the sample image.
  • the detection efficiency can be accelerated, and devices with smaller processing capabilities can also use this saliency detection model to achieve saliency detection; in addition, through the first detector After the network performs initial detection on the feature map, the second detection sub-network is used to perform final detection on the initial detection result, which can improve the accuracy of detection.
  • the method before using the saliency detection model to detect the filtered sample image to obtain the predicted position information about the salient region in the sample image, the method further includes: performing data enhancement on the filtered sample image ; Wherein, the way of data enhancement includes filling the background area in the sample image except the salient area.
  • An embodiment of the present application provides a saliency detection method, including: acquiring an image to be processed; using a saliency detection model to process the image to be processed to obtain predicted position information about a saliency region in the content of the image to be processed, wherein the saliency
  • the detection model is trained by the above-mentioned training method of the saliency detection model.
  • the accuracy of obtaining predicted position information about the saliency region can be improved.
  • the method further includes: using the predicted position information to extract the skeleton of the salient area , to obtain the target bone; select a bone model for the target bone as the source bone; migrate the first animation driving data related to the source bone to the target bone, and obtain the second animation driving data of the target bone.
  • the accuracy of the target skeleton can be improved.
  • An embodiment of the present application provides a training device for a saliency detection model, including: a first acquisition module configured to acquire at least one sample image, wherein the at least one sample image includes a target sample image belonging to a preset image type; The screening module is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image; the first detection module is configured to use the saliency detection model to detect the filtered sample image to obtain the sample image Predicted location information about the salient region; an adjustment module configured to adjust parameters of the saliency detection model based on the marked location information and predicted location information of the sample image about the salient region.
  • the screening module is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image, including: filling the contour of the salient region in the target sample image to obtain a filled sample image; Obtain the difference between the filling sample image and the target sample image about the salient region; and filter the target sample image when the difference meets the preset requirements.
  • the preset requirement is that the difference is greater than the preset difference value
  • the screening module is configured to fill the outline of the salient region in the target sample image to obtain the filled sample image, including: performing a closing operation on the target sample image to obtain Filling the sample image; obtaining the difference between the filling sample image and the target sample image about the salient area, including: obtaining the first area of the filling sample image about the salient area, and the second area of the target sample image about the salient area; The difference between the first area and the second area is taken as the difference.
  • the screening module is further configured to: obtain the target sample image based on the position information of the salient region filled in the sample image Annotation location information about salient regions.
  • At least one sample image includes multiple image types.
  • the plurality of image types includes at least two of images taken from real objects, hand-drawn drawings, and cartoon images.
  • the adjustment module is configured to adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region in the sample image, including: based on the marked position information and predicted position information, obtaining each The first loss of the pixel; weighting the first loss of each pixel in the sample image to obtain the second loss of the sample image; based on the second loss, adjusting the parameters of the saliency detection model.
  • the weight of the first loss of the pixel is related to the boundary distance of the pixel
  • the boundary distance of the pixel is the distance between the pixel and the boundary of the real salient region
  • the real salient region is the position information marked by the annotation in the sample image Defined salient regions.
  • the smaller the boundary distance of the pixel the greater the weight of the first loss of the pixel.
  • the saliency detection model includes at least one of the following: the saliency detection model is a network structure of MobileNetV3, the saliency detection model includes a feature extraction sub-network and a first detection sub-network and a second detection sub-network;
  • the detection module is configured to use the saliency detection model to detect the filtered sample image, and obtain the predicted position information about the salient region in the sample image, including: using the feature extraction sub-network to perform feature extraction on the sample image to obtain the corresponding feature map; use the first detection sub-network to perform initial detection on the feature map, and obtain the initial position information of the salient region in the sample image; fuse the feature map and the initial position information to obtain the fusion result; use the second detection sub-network The final detection is performed on the fusion result to obtain the predicted position information of the sample image.
  • the first detection module is configured to use the saliency detection model to detect the filtered sample image to obtain the predicted position information about the salient region in the sample image
  • the screening module is further configured to: Data enhancement is performed on the filtered sample image; wherein, the way of data enhancement includes filling the background area in the sample image except the salient area.
  • An embodiment of the present application provides a saliency detection device, including: a second acquisition module configured to acquire an image to be processed; a second detection module configured to use a saliency detection model to process the image to be processed to obtain the content of the image to be processed The predicted location information about the salient region in , wherein the saliency detection model is trained by the above-mentioned saliency detection model training method.
  • the saliency detection device after using the saliency detection model to process the to-be-processed image to obtain the predicted position information about the saliency region in the to-be-processed image content, the saliency detection device further includes a functional module configured to: use the predicted Position information, extract the bones of the salient area to obtain the target bone; select a bone model for the target bone as the source bone; migrate the first animation driving data related to the source bone to the target bone to obtain the second animation of the target bone drive data.
  • An embodiment of the present application provides an electronic device, including a memory and a processor, and the processor is configured to execute program instructions stored in the memory, so as to implement the above-mentioned training method of a saliency detection model and/or a saliency detection method.
  • An embodiment of the present application provides a computer-readable storage medium, on which program instructions are stored.
  • program instructions When the program instructions are executed by a processor, the above-mentioned training method and/or saliency detection method for a saliency detection model are implemented.
  • An embodiment of the present disclosure also provides a computer program, where the computer program includes computer readable codes, and when the computer readable codes run in an electronic device, the processor of the electronic device executes any of the above embodiments The training method and/or the saliency detection method of the saliency detection model.
  • the embodiment of the present application at least provides a saliency detection method and its model training method, device, medium and program, by performing the contour loss of the acquired target sample image of the preset image type according to its saliency region , to filter the target sample image, so that the saliency region in the retained sample image is relatively complete, and then use the retained high-quality sample image to train the saliency detection model, which can make the trained saliency detection The result of the model's subsequent detection of the image is more accurate.
  • Fig. 1 is a schematic flow chart of an embodiment of a method for training a saliency detection model according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of the system architecture of the training method of the saliency detection model that can be applied to the embodiment of the present application;
  • Fig. 3 is a schematic diagram showing an image captured by a target in an embodiment of the training method of the saliency detection model of the present application
  • Fig. 4 is a schematic diagram of a hand drawing shown in an embodiment of the training method of the saliency detection model of the present application
  • Fig. 5 is a schematic diagram of the cartoon diagram shown in an embodiment of the training method of the saliency detection model of the present application
  • Fig. 6 is a schematic diagram of a hand drawing showing a missing salient region in an embodiment of the training method of the saliency detection model of the present application;
  • Fig. 7 is a schematic diagram showing a filled hand-drawn drawing in an embodiment of the training method of the saliency detection model of the present application
  • Fig. 8 is a schematic diagram showing a sample image of an embodiment of the training method of the saliency detection model of the present application.
  • Fig. 9 is a schematic diagram showing a saliency map of an embodiment of the training method of the saliency detection model of the present application.
  • Fig. 10 is a schematic flow chart of an embodiment of the significance detection method of the present application.
  • Fig. 11 is a first schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present application.
  • FIG. 12 is a second schematic diagram showing the mapping relationship in an embodiment of the saliency detection method of the present application.
  • Fig. 13 is a third schematic diagram showing the mapping relationship in an embodiment of the saliency detection method of the present application.
  • Fig. 14 is a schematic structural diagram of an embodiment of a training device for a saliency detection model of the present application.
  • Fig. 15 is a schematic structural view of an embodiment of the significance detection device of the present application.
  • FIG. 16 is a schematic structural diagram of an embodiment of the electronic device of the present application.
  • Fig. 17 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
  • the device may have an image collection or video collection function, for example, the device may include components such as a camera for collecting images or videos. Or the device can obtain the required video stream or image from other devices through data transmission or data interaction with other devices, or access the required video stream or video from storage resources of other devices. are images etc.
  • other devices have image acquisition or video acquisition functions, and have communication connections with this device.
  • this device can perform data transmission or data interaction with other devices through Bluetooth, wireless networks, etc., here There is no limitation on the way of communication between the two, which may include but not limited to the situations listed above.
  • the device may include a mobile phone, a tablet computer, an interactive screen, etc., which is not limited herein.
  • FIG. 1 is a schematic flowchart of an embodiment of a method for training a saliency detection model according to an embodiment of the present application.
  • the training method of the saliency detection model may include the following steps:
  • Step S11 Acquiring at least one sample image, wherein at least one sample image includes a target sample image belonging to a preset image type.
  • At least one can be one and more.
  • There are several ways to obtain sample images For example, obtain the storage location of the sample image in the execution device that executes the training method, and then obtain the sample image by accessing the storage location, or obtain the sample image from other devices through bluetooth, wireless network and other transmission methods.
  • Step S12 Filter the target sample image based on the lack of outline of the salient region in the target sample image.
  • the target sample image is deleted from the sample image. If the outline of the salient region in the target sample image is missing, the deletion condition is not met, and the target sample image is retained in the sample image. Among them, if the contour loss is serious, it will be deleted, and if it is relatively slight, it will be retained. Among them, serious or minor determinations can be determined according to specific circumstances, and no specific provisions are made here.
  • Step S13 Use the saliency detection model to detect the filtered sample image, and obtain predicted position information about the saliency region in the sample image.
  • the saliency detection model can process each sample image at the same time to obtain a batch of prediction results, or process each sample image in time-sharing to obtain the prediction results corresponding to each sample image.
  • Step S14 Adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region in the sample image.
  • the parameters of the saliency detection model can be adjusted according to the loss between the marked location information and the predicted location information of the saliency region.
  • the above solution filters the target sample image according to the contour loss of the salient region of the acquired preset image type, so that the salient region in the retained sample image is relatively complete, and then uses this Training the saliency detection model with the retained high-quality sample images can make the subsequent image detection results of the trained saliency detection model more accurate.
  • FIG. 2 is a schematic diagram of a system architecture that can be applied to a training method of a saliency detection model according to an embodiment of the present application; as shown in FIG. 2 , the system architecture includes: a sample image acquisition terminal 201 , a network 202 and a control terminal 203 .
  • the sample image acquisition terminal 201 and the control terminal 203 establish a communication connection through the network 202.
  • the sample image acquisition terminal 201 reports at least one sample image to the control terminal 203 through the network 202, and the control terminal 203 responds to at least one sample image.
  • the target sample image in the image and based on the contour loss of the salient region in the target sample image, filter the target sample image, and then use the saliency detection model to detect the filtered sample image, and obtain the sample image Predicted location information about the salient region; finally, adjust the parameters of the salient detection model based on the marked location information and predicted location information of the sample image about the salient region.
  • the control terminal 203 uploads the adjusted parameters to the network 202 and sends them to the sample image acquisition terminal 201 through the network 202 .
  • the sample image acquisition terminal 201 may include an image acquisition device, and the control terminal 203 may include a vision processing device or a remote server capable of processing visual information.
  • the network 202 may be connected in a wired or wireless manner.
  • the control terminal 203 is a visual processing device
  • the sample image acquisition terminal 201 can communicate with the visual processing device through a wired connection, such as performing data communication through a bus;
  • the control terminal 203 is a remote server, the sample image acquisition terminal 201 can perform data interaction with a remote server through a wireless network.
  • the sample image acquisition terminal 201 may be a vision processing device with a video capture module, or a host with a camera.
  • the image optimization model training method of the embodiment of the present application may be executed by the sample image acquisition terminal 201 , and the above-mentioned system architecture may not include the network 202 and the control terminal 203 .
  • At least one sample image includes multiple image types. For example, two, three, or more than three, etc. are included.
  • the trained saliency detection model can perform image processing on various types of images, thereby improving the applicability of the saliency detection model.
  • the image type includes at least two of an image taken of the target, a hand-drawn drawing and a cartoon drawing.
  • the images captured by the target can be divided into visible light images and infrared images.
  • Hand-drawn drawings can be hand-drawn drawings on paper and photographed to obtain hand-drawn drawings, and can also be drawings drawn on drawing software, for example, a simple Mickey Mouse drawn by an artist on a hand-painted board.
  • the hand drawing is further defined as a picture with a preset background color and a preset foreground color
  • the foreground is composed of monochromatic lines, for example, the background is white, and the foreground is the outline of Mickey Mouse composed of black lines .
  • a cartoon can be a virtual image with multiple foreground colors.
  • Figure 3 is an embodiment of the training method for the saliency detection model of the present application It shows a schematic diagram of an image captured by a target.
  • FIG. 4 is a schematic diagram of a hand-drawn drawing shown in an embodiment of the training method of the saliency detection model of the present application.
  • FIG. 5 is an embodiment of the training method of the saliency detection model of the present application. Schematic representation of the cartoon plot shown in . As shown in Figure 3, Figure 3 is an image taken of a real apple, Figure 4 is a sketch of an apple drawn on real paper, and Figure 5 is a cartoon image of an apple.
  • the trained saliency detection model is more applicable in daily life or work.
  • 10,000 up and down images of the target, 20,000 up and down hand drawings, and 20,000 up and down cartoon images are selected for training.
  • the preset image type is hand drawing. Since there may be breakpoints in the drawing process of the hand-drawn drawing, by filtering the hand-drawn drawing according to the missing contour, the quality of the outline of the salient area in the remaining hand-drawn drawing is better.
  • the manner of filtering the target sample image may be: filling the contour of the salient region in the target sample image to obtain a filled sample image. Then, the difference of the salient region between the padding sample image and the target sample image is obtained.
  • the salient region in the filled sample image is the same as or the difference is within a preset range with the target sample image before filling. If the outline of the salient region in the target sample image is largely missing, the difference between the salient region in the padded sample image and the target sample image before filling is large. When the difference meets the preset requirements, the target sample image is filtered. By obtaining the difference between the filling sample image and the target sample image about the salient region, the contour loss of the salient region can be quickly obtained.
  • the preset requirement is that the difference is greater than the preset difference value.
  • FIG. 6 is an example of the training method of the saliency detection model in this application
  • Fig. 7 is a schematic diagram of a hand-drawn drawing after filling in in an embodiment of the method for training a saliency detection model in the present application.
  • the outline of the salient area in the hand-drawn drawing before filling is a circular arc, and the angle between the two endpoints and the center of the circle is 45°.
  • the area of the salient area can be connected by a line segment.
  • the area smaller than the full circle is obtained, and the outline of the salient area after filling is a full circle.
  • the area of the salient region is the area of a full circle. Obviously, the area of the salient region after filling is quite different from the area of the salient region before filling.
  • the hand drawing before filling can be removed to prevent it from participating in the training of the model.
  • the manner of filling the outline of the salient region in the target sample image to obtain the filled sample image may be: performing a closing operation on the target sample image to obtain the filled sample image.
  • the closing operation refers to performing an expansion operation on the target sample image first, and then performing an erosion operation or a scaling operation.
  • the closing operation can make small lakes (that is, small holes) and bridge small cracks, while the overall position and shape remain unchanged.
  • the contour gap of the salient region can be bridged by the expansion operation, and the thickness of the contour of the salient region can be reduced by the scaling operation.
  • the hand drawing may be in the form of black lines on a white background, wherein the salient area of the hand drawing is the area surrounded by black lines, and the outline of the salient area is the black line.
  • Performing the closing operation on the target sample image may be, for example, performing the closing operation on the contour of the salient region. That is to expand the black line first, and then scale or corrode the expanded black line, so that the outline thickness of the salient area in the filled sample image is the same as the outline thickness of the salient area in the target sample image before filling or the difference is within the preset range. within the set range. In this way, in the process of obtaining the difference between the filling sample image and the target sample image about the salient region, the contour difference between the two can be ignored.
  • the way of obtaining the difference between the padding sample image and the target sample image about the salient area may be to get the first area of the padding sample image about the salient area and the second area of the target sample image about the salient area.
  • the method of obtaining the area of the region is acceptable, and the method of obtaining the area of the salient region is not specifically limited here.
  • the way to obtain the second area can be to use a line segment to connect the two ends of the contour gap to form a closed area, so as to calculate the area of the closed area.
  • the area of the closed area formed by the two straight lines connected by each intersection point and the salient area is calculated separately, and the area of the smaller closed area is taken as the second area.
  • the difference between the second area minus the first area is used as the difference between the filling sample image and the target sample image about the salient area.
  • the difference in area occupied by the outline of the salient region before and after filling may be taken as the difference.
  • the target sample image can be determined according to the area difference of the salient region before and after filling Whether there is a missing contour in the salient region.
  • the training method of the saliency detection model further includes the following steps: Based on the position information of the salient area filled in the sample image, the marked position information of the target sample image about the salient area is obtained .
  • the contour of the salient region filled in the sample image is acquired as the marked position information of the contour of the salient region in the target sample image. And, take the contour and its surrounding area as the salient area.
  • the training method of the saliency detection model before using the saliency detection model to detect the filtered sample image to obtain the predicted position information about the saliency region in the sample image, the training method of the saliency detection model further includes the following steps: Filtered sample images for data augmentation.
  • Filtered sample images for data augmentation there are many ways of data enhancement, for example, including filling the background area in the sample image except the salient area.
  • preset pixel values can be used for filling. For example, uniformly use 0 pixels for padding, or uniformly use other pixel values for padding.
  • different pixel positions can also be filled with different pixel values, and there is no specific regulation on the filling method here.
  • the manner of data enhancement may also be at least one of noise addition, Gaussian blur processing, cropping and rotation.
  • Gaussian blur processing can also be called Gaussian smoothing.
  • the main function is to reduce image noise and reduce the level of detail.
  • the main method is to adjust the pixel color value according to the Gaussian curve to selectively blur the image.
  • Cropping refers to cropping the training sample image into images of different sizes, for example, cropping the training sample image into an image with a size of 1024*2048 or 512*512. Of course, this size is only an example, and in other embodiments it is completely Images cropped to other sizes can be used, so there is no specific regulation on the cropped size here.
  • the rotation can be to rotate the training sample image by 90°, 180° or 270°.
  • the data enhancement manner may also be adjusting resolution and the like.
  • the saliency detection model is the network structure of MobileNetV3.
  • the saliency detection model includes a feature extraction subnetwork, a first detection subnetwork and a second detection subnetwork.
  • the first detection subnetwork and the second detection subnetwork adopt a cascade structure. That is, the output of the first detection sub-network is used as the input of the second detection sub-network.
  • the first detection subnetwork and the second detection subnetwork have the same structure.
  • the method of using the saliency detection model to detect the filtered sample image to obtain the predicted position information of the salient region in the sample image can be: use the feature extraction sub-network to extract the feature of the sample image, and obtain the corresponding feature map of . Then use the first detection sub-network to perform initial detection on the feature map to obtain the initial position information of the salient region in the sample image.
  • the initial position information may be presented in the form of a saliency map.
  • the feature map and the initial position information are fused to obtain the fusion result.
  • the fusion method may be to perform a multiplication operation on the feature map and the initial position information to obtain the fusion result.
  • the second detection sub-network is used to perform final detection on the fusion result to obtain the predicted position information of the sample image.
  • the final predicted location information can also be presented in the form of a saliency map.
  • Figure 8 is a schematic diagram showing a sample image of an embodiment of the training method of the saliency detection model of the present application
  • Figure 9 is a training method of the saliency detection model of the present application
  • Example shows a schematic representation of a saliency map.
  • the sample image includes a table and a toy duck on the table
  • the saliency detection model detects the sample image
  • the output initial position information (saliency map) is shown in Figure 9
  • the toy The pixel value of the duck's position is 1, and the pixel value of other positions is 0.
  • the position of the toy duck in the sample image can be clearly obtained.
  • the detection efficiency can be accelerated, and devices with smaller processing capabilities can also use this saliency detection model to achieve saliency detection; in addition, through the first detection sub-network to After the initial detection of the feature map, the second detection sub-network is used to perform final detection on the initial detection result, which can improve the accuracy of detection.
  • the saliency detection model is used to process the sample image to obtain the predicted position information about the salient area in the sample image, and adjust the saliency based on the marked position information and predicted position information about the salient area in the sample image Ways to check the parameters of the model include:
  • sample images are selected from the plurality of sample images as current sample images.
  • the image types to which the selected sample images belong include all image types of the multiple sample images.
  • the image types of the plurality of sample images include the above three image types in total
  • several sample images selected from the plurality of sample images also include the above three image types.
  • the number of sample images of each image type may be the same or different. Then, the current sample image is processed by using the saliency detection model to obtain the prediction result of the current sample image.
  • the current sample images are taken as a batch, and the saliency detection model is used to process the batch of sample images to obtain a batch of prediction results. Then adjust the parameters of the saliency detection model based on the annotation results and prediction results of the current sample image.
  • the parameters of the model can be adjusted by using the loss between each labeling result in a batch and its corresponding prediction result. The loss between the predicted results adjusts the parameters of the model. In this way, the parameters of the model only need to be adjusted once. Selecting several sample images from multiple sample images as the current sample image and subsequent steps are repeated until the saliency detection model meets the preset requirements.
  • the preset requirement here may be the size of the error between the prediction result given by the model and the labeling result.
  • the specific error size is determined according to actual needs, and is not specified here.
  • several sample images selected each time from the multiple sample images may be the same as some sample images selected last time.
  • the sample images selected from the multiple sample images each time are different. Select several sample images from multiple sample images as the current sample image, and use the saliency detection model to process the current sample image, which can improve the training speed.
  • the annotation information of the sample image further includes the real image type of the sample image
  • the prediction result of the sample image includes the predicted image type of the sample image.
  • the prediction result of the saliency detection model includes the predicted category of the object and the predicted image type of the sample image.
  • the predicted location information is the predicted category of the object in the sample image and the predicted image type of the sample image.
  • Adjust the parameters of the saliency detection model by using the labeled position information about the content of the sample image and the predicted position information of its content, and/or the real image type of the sample image and the predicted image type of the sample image, so that the adjusted saliency
  • the applicability of the sex detection model is stronger.
  • the way to adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region of the sample image may be: based on the marked position information and predicted position information, obtain the first loss.
  • the first loss of each pixel in the sample image is weighted to obtain the second loss of the sample image.
  • the parameters of the saliency detection model are adjusted.
  • the way to obtain the first loss may be to make a difference between the labeled position information and the predicted position information to obtain the first loss. By weighting the first loss of each pixel, it is more accurate to use the weighted second loss to adjust the parameters of the saliency detection model.
  • the weight of the first loss of the pixel is related to the boundary distance of the pixel.
  • the pixel boundary distance is the distance between the pixel and the boundary of the real salient region, which is the salient region defined by the labeled position information in the sample image.
  • the distance between the pixel and the border of the real salient region may be the minimum distance from the border of the salient region.
  • the pixel position of the upper left corner of the sample image is (0, 0)
  • the boundary of the real salient region includes (0, 1), (0, 2), etc., the distance between the pixel position and the boundary of the real salient region The distance is 1.
  • the smaller the boundary distance of the pixel the greater the weight of the first loss of the pixel. That is, the weight of the first loss of a pixel is negatively correlated with the boundary distance of the pixel. The pixel's boundary distance is negatively correlated with the weight of the pixel's first loss, making the resulting second loss more accurate.
  • the method of adjusting the parameters of the saliency detection model may be: based on the real image type and the predicted image type, the third loss is obtained. Then, based on the second loss and the third loss, the parameters of the saliency detection model are adjusted. For example, based on the error between the true image type and the predicted image type, a third loss is obtained. For example, a second loss is determined by combining the error between a batch of predicted location information and the corresponding label information, and a third loss is determined by combining the error between a batch of predicted image type and the true image type. loss. Combine the second loss and the third loss to adjust the parameters of the saliency detection model.
  • Saliency detection can be improved by adjusting the parameters of the saliency detection model using a second loss between the labeled position information about the content of the sample image and the predicted position information of its content and a third loss based on the real image type and the predicted image type. model applicability.
  • the second loss optimizes the parameters of the model so that the predicted location information obtained by the saliency detection model is closer to the labeled location information, that is, the error between the two becomes smaller.
  • the third loss to adjust the parameters of the model, the feature vectors of images representing the same object but belonging to different image types are closer in the feature space, so that the feature vectors of images of different image types are all in the distance. in the nearest feature space.
  • the trained saliency detection model performs feature extraction on hand-drawn drawings, cartoon images representing apples, and images obtained by shooting apples, and the feature vectors in the feature space are closer to each other.
  • a manner of adjusting parameters of the saliency detection model may be: obtaining a loss difference between the second loss and the third loss.
  • the parameters of the saliency detection model are then tuned using the loss difference and the third loss.
  • the loss difference is obtained by subtracting the second loss and the third loss.
  • Using the second loss difference and the third loss difference to adjust the parameters of the saliency detection model may first use one of the losses to adjust the model parameters, and then use the other loss to adjust the model parameters.
  • the saliency detection model further includes an image type classification sub-network.
  • the image type classification subnetwork connects the feature extraction subnetwork.
  • An image type classification network is used to classify the image type of the sample image, and the predicted image type of the sample image is obtained.
  • the feature map extracted by the feature extraction sub-network is input into the image type classification network to obtain the predicted image type of the sample image.
  • using the loss difference and the third loss to adjust the parameters of the saliency detection model may be: using the third loss to adjust the parameters of the image type classification sub-network. and adjusting the parameters of the feature extraction sub-network, the first detection sub-network and the second detection sub-network by using the loss difference.
  • the ways to adjust the parameters using the loss difference and the third loss are both positive adjustments.
  • the trained saliency detection model can be deployed to the mobile phone end, and the AR/VR end performs image processing.
  • the saliency detection method can also be applied to software such as camera and video recording filters.
  • the above solution filters the target sample image according to the contour loss of the salient region of the acquired preset image type, so that the salient region in the retained sample image is relatively complete, and then uses this Training the saliency detection model with the retained high-quality sample images can make the subsequent image detection results of the trained saliency detection model more accurate.
  • the executor of the training method of the saliency detection model may be the training device of the saliency detection model, for example, the training method of the saliency detection model may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a user Equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc.
  • the method for training the saliency detection model may be implemented by calling a computer-readable instruction stored in a memory by a processor.
  • FIG. 10 is a schematic flowchart of an embodiment of the saliency detection method of the present application.
  • the significance detection method provided by the embodiment of the present application includes the following steps:
  • Step S21 Acquiring images to be processed.
  • the image to be processed can be acquired by the camera component in the execution device executing the saliency detection method, or the image to be processed can be acquired from other devices according to various communication methods.
  • the image type of the image to be processed may be one of multiple image types.
  • the image type of the image to be processed may be one or more of an image obtained by photographing a target, and a hand-drawn cartoon image.
  • the image to be processed can also be obtained from the video.
  • a video is input to the saliency detection model, and the saliency detection model obtains each video frame in the video, and uses each video frame as an image to be processed.
  • Step S22 Use the saliency detection model to process the image to be processed, and obtain the predicted position information about the saliency region in the content of the image to be processed, wherein the saliency detection model is obtained by the training method of the saliency detection model.
  • the saliency detection model in the embodiment of the present application includes a feature extraction subnetwork, a first detection subnetwork and a second detection subnetwork.
  • the saliency detection model utilizes sample images of various image types for training.
  • the image to be processed is input into the saliency detection model from the input end of the saliency detection model.
  • the saliency detection model processes the image to be processed to obtain the predicted position information of the salient region in the content of the image to be processed.
  • the accuracy of image processing can be improved by using the saliency detection model trained by the saliency detection model training method to process the image to be processed.
  • the saliency detection method further includes at least the following steps:
  • Display predicted position information on an interface displaying images to be processed.
  • there are many display methods such as marking the predicted position information on the image to be processed, so that the image to be processed and the corresponding predicted position information can be displayed on the display interface together, of course, it can also be in different areas of the display interface
  • the image to be processed and the corresponding predicted position information are displayed respectively.
  • the corresponding images to be processed and their predicted position information can be displayed in different areas of the display interface, or the images to be processed and their predicted positions can be displayed in the form of page turning information.
  • the image to be processed is obtained from a video, it is judged whether the predicted position information of the video frames of a predetermined number of consecutive frames is the same, and if so, the predicted position information is considered correct. If not, it is considered that the predicted position information is incorrect.
  • the correct predicted position information may be selected to be output, and the wrong predicted position information may not be output, or the correct and wrong predicted position information may be selected to be annotated correspondingly and output.
  • the preset number of frames may be 5 frames, 10 frames, etc., which may be determined according to specific usage scenarios.
  • the step of extracting the skeleton of the salient region by using the predicted position information to obtain the skeleton of the target may be: performing contour extraction on the salient region to obtain the contour of the target, and then using the contour to generate a 3D mesh for the target grid model. Finally, the target bone is extracted from the 3D mesh model.
  • the way to obtain the source bone may be: classify the image to be processed, obtain the category of the target object, and select the bone model matching the category as the source bone.
  • the target bone is the bone of the target object.
  • the embodiment of the present application may use prediction label mapping, or may use data set label mapping.
  • the classification result of the target object by the predicted label mapping includes the predicted skeletal topology type of the target object, for example, the predicted skeletal topology type includes biped, quadruped and so on. That is, the process of predicting label mapping is mainly to predict the skeletal topological structure characteristics of the target object.
  • the classification result of the dataset label mapping needs to give the specific type of the target object in the input image, for example, the target object is a cat, a dog, a giant panda, a bear, and so on.
  • the embodiment of this application chooses to use the predicted label mapping.
  • the target object is a giant panda
  • the target object category given by the predicted label mapping is quadrupeds
  • the bone model matching the category is selected as the initial source bone
  • the initial source bone chosen was a quadruped bear.
  • giant pandas and bears are different, they actually have roughly the same bone topology. Therefore, migrating the animation-driven data of bears to giant pandas can also appear in a natural and reasonable form. That is, although the completely correct category of the target object cannot be obtained by predicting the label mapping, it does not affect the driving of the final target bone. At the same time, the computational cost is reduced because the predicted label map does not further learn the specific category of the target object.
  • the way to obtain the node mapping relationship between the two may be: determine the number of bone branches where each node in the source bone and the target bone is located.
  • the nodes in the source bone and the target bone are mapped sequentially in descending order of the number of bone branches.
  • the node with the largest number of bone branches is generally called the root node.
  • the number of skeletal branches where the nodes are located is called the degree. That is, first construct the mapping relationship between nodes with larger degrees in the two bones, and then construct the mapping relationship between nodes with less degrees.
  • mapping can be done by performing a one-to-one joint matching in the sequence in which many-to-one or skip mapping occurs.
  • the final target bone is consistent with the node topology of the source bone.
  • a node-to-one mapping between the final target bone and the final source bone may exist in two forms, one is that the node topology of the final target bone is completely consistent with the final source bone, and the other is that the final target bone
  • the nodes of all have corresponding nodes of the final source bone, but there are some nodes in the final source bone that have no mapping relationship. That is, it is necessary to ensure that after animation migration, the nodes of the final target bone have corresponding animation driving data.
  • the method for performing topology alignment may include at least one of the following steps:
  • One is to update the node topology of one of the bones when there are multiple nodes mapped to the same node between the source bone and the target bone. Among them, the nodes between the updated two bones are mapped one by one. By updating the node topology of the bone, the situation that multiple nodes between two bones are mapped to the same node can be adjusted to a node-to-one mapping between the two bones, so as to reduce the occurrence of inconsistencies in the process of driving the final target bone in the subsequent animation. Reasonable situations arise.
  • updating the node topology structure of one of the bones can be divided into multiple cases: the first case is to update the first bone where multiple nodes are located when multiple nodes are located in the same bone branch. Wherein, one of the first bone and the second bone is the source bone, and the other is the target bone.
  • the first bone where multiple nodes are located the situation that multiple nodes between two bones are mapped to the same node is adjusted to a node-to-one mapping between two bones, thereby reducing the process of subsequent animation driving the final target bone Unreasonable situations occur.
  • a manner of updating the first bone where multiple nodes are located may be to merge multiple nodes in the first bone into one first node. Wherein, the first node retains the mapping relationship of multiple nodes before merging. And, the position of the first node is the average value of the positions of all merged nodes.
  • FIG. 11 is a first schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present application.
  • FIG. 11 is a first schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present application.
  • the second node and the third node in the target bone are mapped to the second node in the source bone at the same time.
  • the position of the first node is the average value of the positions of the second node and the third node in the target bone.
  • the first bone is the source bone
  • the nodes in the source bone carry animation driving data
  • after the nodes are merged it is necessary to obtain the animation driving data of the first node.
  • the Animation-driven data is merged.
  • the animation driving data can generally be represented by a matrix, and the combination of the matrices can be represented by matrix multiplication, that is, the animation driving data of the first node can be obtained by multiplying the animation driving data.
  • the second case is to update the second bone that does not include the multiple nodes when the multiple nodes are located on different bone branches.
  • one of the first bone and the second bone is the source bone, and the other is the target bone.
  • the second node where the branches of the bones where the multiple nodes are located converges is found in the first bone.
  • a specific method may be to traverse the parent nodes in turn to obtain the second node. And find out the third node mapped to the second node in the second bone.
  • a parent node of a node refers to a node adjacent to the node and closer to the root node than the node in a skeletal branch.
  • multiple nodes are mapped one by one with the nodes in the newly added bone branch at the third node and the original bone branch.
  • the newly-added bone branch may copy the original bone branch.
  • the copied content includes animation data, and the transformation relationship between this node and its parent node. For example, if the original bone branch includes three nodes, the new bone branch also includes three nodes, and the animation driving data of the three nodes in the new bone branch is copied from the corresponding nodes in the original bone branch The animation data is obtained.
  • FIG. 12 is a second schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present application. As shown in Figure 12, the node topology on the left is the node topology of the source bone, and the node topology on the right is the node topology of the target bone.
  • the first node of the target bone is mapped to the first node of the source bone
  • the second node of the target bone is mapped to the second node of the source bone
  • the second node of the target bone includes two branches , that is, the left branch and the right branch, where the first node in the left branch and the first node in the right branch map to the third node of the source bone, the second node in the left branch and the node in the right branch
  • the second node maps to the fourth node of the source bone.
  • the second is to update the node topology of the bone where the node without the mapping relationship exists when there is no mapping relationship in the bone.
  • the two bones include a source bone and a target bone, and the updated nodes between the two bones are mapped one by one.
  • the number of nodes without the mapping relationship is reduced, so that the nodes between the updated two bones are mapped one by one, thereby reducing the irrationality in the process of driving the final target bone in the subsequent animation situation occurs.
  • the adjacent node is the parent node or child node in the bone where the node has no mapping relationship.
  • nodes without a mapping relationship are merged to their parent nodes.
  • FIG. 13 is a third schematic diagram showing a mapping relationship according to an embodiment of the saliency detection method of the present application.
  • the first node of the target bone is mapped to the first node of the source bone
  • the second node of the target bone is mapped to the third node of the source bone
  • the third node of the target bone is mapped to the source bone
  • the fourth node of the bone is mapped to the source bone
  • the second node of the source bone has no mapping relationship.
  • the merging of nodes in the source skeleton will be accompanied by the merging of animation-driven data, and the merging of animation-driven data will not be repeated here.
  • the node alignment is mainly to determine the first pose transformation relationship between the source bone and the target bone.
  • each source node in the final source bone is aligned with the corresponding mapped target node in the final target bone to obtain the first position between each source node and the mapped target node.
  • a pose transformation relation As mentioned above, the root node is the node with the largest number of bone branches. Then the root node refers to the root node of the final source bone. Similarly, the root target node refers to the root node of the final target bone. The final source bone and final target bone refer to the topologically aligned source bone and target bone.
  • a leaf node refers to a node that has a parent node but no child nodes.
  • the leaf source node refers to the leaf node in the final source bone
  • the leaf target node refers to the leaf node in the final target bone. That is, first align the root node and the root target node that has a mapping relationship with the root node. Then align the leaf source node connected to the root node and the leaf target node that has a mapping relationship with the leaf source node, and so on, until all nodes in the final target bone are aligned with the nodes of the final source bone.
  • the root target node of the final target bone can be directly used as the origin of the first coordinate system.
  • the pose transformation relationship is the transformation relationship between the source node and the mapped target node in the first coordinate system.
  • the offset By translating the root node of the final source bone and the root target node of the final target bone to the origin of the first coordinate system, the offset between the root node of the final source bone and the root target node of the final target bone can be obtained. For example, for each source node in the final source bone, get the offset needed to align the source node to the mapped target node.
  • the offset includes a translation component and a rotation component.
  • a translation component includes a scaling component.
  • the animation data on the source bone also changes accordingly. For example, if two source nodes in the source bone are merged, the animation data corresponding to the nodes will also be merged.
  • the animation data on the source bone can be migrated to the target bone to drive the target in the image to be processed to move.
  • the accuracy of image processing can be improved by using the saliency detection model trained by the saliency detection model training method to process the image to be processed.
  • the executor of the saliency detection method may be a saliency detection device, for example, the salience detection method may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a user equipment (User Equipment, UE), a mobile Devices, User Terminals, Terminals, Cellular Phones, Cordless Phones, Personal Digital Assistant (PDA), Handheld Devices, Computing Devices, Vehicle Devices, Wearable Devices, etc.
  • the significance detection method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 14 is a schematic structural diagram of an embodiment of a training device for a saliency detection model of the present application.
  • the training device 30 of the saliency detection model includes a first acquisition module 31 , a screening module, a first detection module 32 and an adjustment module 33 .
  • the first acquisition module 31 is configured to acquire at least one sample image, wherein at least one sample image includes a target sample image belonging to a preset image type; the screening module 32 is configured to be based on the lack of outline of a salient region in the target sample image case, filter the target sample image; the first detection module 33 is configured to detect the filtered sample image by using the saliency detection model, and obtain the predicted position information about the salient region in the sample image; the adjustment module 34 configures In order to adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region in the sample image.
  • the above solution filters the target sample image according to the contour loss of the salient region of the acquired preset image type, so that the salient region in the retained sample image is relatively complete, and then uses this Training the saliency detection model with the retained high-quality sample images can make the subsequent image detection results of the trained saliency detection model more accurate.
  • the screening module 32 is configured to filter the target sample image based on the absence of the contour of the salient region in the target sample image, including: filling the contour of the salient region in the target sample image to obtain a filled sample image ; Obtain the difference between the filling sample image and the target sample image about the salient region; when the difference meets the preset requirements, filter the target sample image.
  • the sample image is filtered according to the absence of the contour, so that the quality of the contour of the salient region in the remaining sample image is better.
  • the contour loss of the salient region can be quickly obtained.
  • the preset requirement is that the difference is greater than the preset difference value
  • the screening module 32 is configured to fill the outline of the salient region in the target sample image to obtain the filled sample image, including: performing a closing operation on the target sample image, Obtaining the filling sample image; obtaining the difference between the filling sample image and the target sample image about the salient area, including: obtaining the first area of the filling sample image about the salient area, and the second area of the target sample image about the salient area; Take the difference between the first area and the second area as the difference.
  • the screening module 32 is further configured to: obtain the target sample based on the position information of the salient region filled in the sample image Annotated position information of the image about the salient region.
  • the marked position information of the target sample image about the salient area can be determined, so as to ensure the integrity of the salient area.
  • At least one sample image includes multiple image types.
  • the trained saliency detection model can perform image processing on various types of images, thereby improving the applicability of the saliency detection model.
  • the plurality of image types includes at least two of images taken from real objects, hand-drawn drawings, and cartoon images.
  • the trained image processing model is more applicable in daily life or work.
  • the adjustment module 34 is configured to adjust the parameters of the saliency detection model based on the marked position information and predicted position information of the salient region in the sample image, including: based on the marked position information and predicted position information, obtaining a first loss of each pixel; weighting the first loss of each pixel in the sample image to obtain a second loss of the sample image; and adjusting parameters of a saliency detection model based on the second loss.
  • the weight of the first loss of the pixel is related to the boundary distance of the pixel
  • the boundary distance of the pixel is the distance between the pixel and the boundary of the real salient region
  • the real salient region is the position information marked by the annotation in the sample image Defined salient regions.
  • the smaller the boundary distance of the pixel the greater the weight of the first loss of the pixel.
  • the boundary distance of the pixel is negatively correlated with the weight of the first loss of the pixel, so that the obtained second loss is more accurate.
  • the saliency detection model includes at least one of the following: the saliency detection model is a network structure of MobileNetV3, the saliency detection model includes a feature extraction sub-network and a first detection sub-network and a second detection sub-network;
  • the detection module 33 is configured to use the saliency detection model to detect the filtered sample image to obtain the predicted position information about the salient region in the sample image, including: using the feature extraction sub-network to perform feature extraction on the sample image to obtain the sample image The corresponding feature map; use the first detection sub-network to perform initial detection on the feature map to obtain the initial position information about the salient area in the sample image; fuse the feature map and the initial position information to obtain the fusion result; use the second detection sub-network The network performs final detection on the fusion result to obtain the predicted position information of the sample image.
  • the detection efficiency can be accelerated, and devices with smaller processing capabilities can also use this saliency detection model to achieve saliency detection; in addition, through the first detection After the subnetwork performs initial detection on the feature map, the second detection subnetwork is used to perform final detection on the initial detection result, which can improve the accuracy of detection.
  • the first detection module 33 is configured to use the saliency detection model to detect the filtered sample image to obtain the predicted position information about the salient region in the sample image
  • the screening module 32 is further configured to: Perform data enhancement on the filtered sample image; wherein, the way of data enhancement includes filling the background area in the sample image except the salient area.
  • the applicability of the saliency detection model can be improved by performing data enhancement on the sample image.
  • FIG. 15 is a schematic structural diagram of an embodiment of a saliency detection device of the present application.
  • the significance detection device 40 includes a second acquisition module 41 and a second detection module 42 .
  • the second acquisition module 41 is configured to acquire the image to be processed;
  • the second detection module 42 is configured to process the image to be processed by using the saliency detection model to obtain the predicted position information about the salient region in the content of the image to be processed, wherein the salient
  • the saliency detection model is trained by the above-mentioned training method of the saliency detection model.
  • the accuracy of obtaining the predicted position information about the salient region can be improved.
  • the saliency detection device further includes a functional module (not shown in the figure), the functional module The configuration is: use the predicted position information to extract the bones of the salient area to obtain the target bone; select a bone model for the target bone as the source bone; migrate the first animation driving data related to the source bone to the target bone to obtain the target The bone's second animation driver data.
  • the accuracy of the target skeleton can be improved by using the predicted position information to extract the skeleton of the salient region.
  • FIG. 16 is a schematic structural diagram of an embodiment of the electronic device of the present application.
  • the electronic device 50 includes a memory 51 and a processor 52, the processor 52 is used to execute the program instructions stored in the memory 51, so as to realize the steps in any of the above-mentioned embodiments of the training method of the saliency detection model and/or the implementation of the saliency detection method steps in the example.
  • the electronic device 50 may include but not limited to: medical equipment, microcomputers, desktop computers, and servers.
  • the electronic device 50 may also include mobile devices such as notebook computers and tablet computers, which are not limited here.
  • the processor 52 is used to control itself and the memory 51 to implement the steps in any one of the above embodiments of the method for training a saliency detection model.
  • the processor 52 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 52 may be an integrated circuit chip with signal processing capability.
  • the processor 52 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 52 may be jointly realized by an integrated circuit chip.
  • the above solution filters the target sample image according to the contour loss of the salient region of the acquired preset image type, so that the salient region in the retained sample image is relatively complete, and then uses this Training the saliency detection model with the retained high-quality sample images can make the subsequent image detection results of the trained saliency detection model more accurate.
  • FIG. 17 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
  • the computer-readable storage medium 60 stores program instructions 61 that can be executed by the processor, and the program instructions 61 are used to implement the steps in any of the above-mentioned embodiments of the training method for the saliency detection model and/or the steps in the embodiments of the saliency detection method. step.
  • the above solution filters the target sample image according to the contour loss of the salient region of the acquired preset image type, so that the salient region in the retained sample image is relatively complete, and then uses this Training the saliency detection model with the retained high-quality sample images can make the subsequent image detection results of the trained saliency detection model more accurate.
  • the functions or modules included in the apparatus provided in the embodiments of the present application can be used to execute the methods described in the above method embodiments, and the implementation can refer to the descriptions of the above method embodiments.
  • the disclosed methods and devices may be implemented in other ways.
  • the device implementations described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units. If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the embodiment of the present application discloses a saliency detection method and its model training method, device, medium and program.
  • the saliency detection model training method includes: acquiring at least one sample image, wherein at least one sample image Including the target sample image belonging to the preset image type; filtering the target sample image based on the absence of the contour of the salient region in the target sample image; using the saliency detection model to detect the filtered sample image to obtain the sample image Predicted position information about the salient area; based on the marked position information and predicted position information of the sample image about the salient area, adjust the parameters of the salient detection model.
  • the accuracy of the output result of the saliency detection model can be improved by screening the sample images and then using the screened sample images to train the saliency detection model.

Abstract

一种显著性检测方法及其模型的训练方法和装置、设备、介质,显著性检测模型的训练方法包括:获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像(S11);基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤(S12);利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息(S13);基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数(S14)。上述方案,通过对样本图像进行筛选再利用筛选后的样本图像对显著性检测模型进行训练,能够提高显著性检测模型输出结果的准确度。

Description

显著性检测方法及其模型的训练方法和装置、设备、介质及程序
相关申请的交叉引用
本专利申请要求2021年06月30日提交的中国专利申请号为202110735893.4、申请人为深圳市慧鲤科技有限公司,申请名称为“显著性检测方法及其模型的训练方法和装置、设备、介质”的优先权,该申请文件以引用的方式并入本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种显著性检测方法及其模型的训练方法和装置、设备、介质及程序。
背景技术
目前,在对模型进行训练的过程中,只是简单从样本图像数据库中获取一定数据的样本图像,并直接使用这部分样本图像对模型进行训练。但是有的样本图像本身存在一定的缺陷,若使用这部分样本图像对模型进行训练,会导致训练后的模型对图像进行处理得到的结果的准确度不高。
发明内容
本申请实施例至少提供一种显著性检测方法及其模型的训练方法和装置、设备、介质及程序。
本申请实施例提供了一种显著性检测模型的训练方法,包括:获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像;基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤;利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息;基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。
因此,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
在一些实施例中,基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤,包括:对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异;在差异满足预设要求的情况下,过滤目标样本图像。
因此,通过对样本图像按照轮廓缺失的情况进行过滤,使得留下的样本图像中显著性区域轮廓的质量更好。另外,通过获取填补样本图像与目标样本图像中关于显著性区域的差异能够较快的获取显著性区域的轮廓缺失情况。
在一些实施例中,预设要求为差异大于预设差异值;对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像,包括:对目标样本图像进行闭运算,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异,包括:获取填补样本图像关于显著性区域的第一面积,以及目标样本图像中关于显著性区域的第二面积;将第一面积和第二面积之差,确定为差异。
因此,若目标样本图像中的显著性区域的轮廓存在较大的缺口,则填补前后的显著性区域的面积可能存在较大的差异,从而根据填补前后显著性区域的面积差,即可确定目标样本图像 中显著性区域的轮廓是否存在缺失。
在一些实施例中,在基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤之后,方法还包括:基于填补样本图像的显著性区域的位置信息,得到目标样本图像关于显著性区域的标注位置信息。
因此,通过填补样本图像的显著性区域的位置信息,确定目标样本图像关于显著性区域的标注位置信息,能够保障显著性区域的完整性。
在一些实施例中,至少一张样本图像包括多种图像类型。
因此,通过使用多种图像类型的样本图像对显著性检测模型进行训练,使得训练得到的显著性检测模型能够对多种类型的图像进行图像处理,从而提高了显著性检测模型的适用性。
在一些实施例中,多种图像类型包括对真实物体拍摄得到的图像、手绘图以及卡通图中的至少两种。
因此,通过将常见的图像类型对应的样本图像用于对图像处理模型进行训练,使得训练得到的图像处理模型在日常生活或工作中更为适用。
在一些实施例中,基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数,包括:基于样本图像关于显著性区域的标注位置信息和预测位置信息,获取样本图像中各像素的第一损失;将样本图像中各像素的第一损失进行加权,得到样本图像的第二损失;基于第二损失,调整显著性检测模型的参数。
因此,通过对各像素的第一损失进行加权,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
在一些实施例中,像素的第一损失的权重与像素的边界距离相关,像素的边界距离为像素与真实显著性区域的边界之间的距离,真实显著性区域为样本图像中由标注位置信息定义的显著性区域。
因此,通过根据像素的边界距离确定权重,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
在一些实施例中,像素的边界距离越小,像素的第一损失的权重越大。
因此,像素的边界距离与像素的第一损失的权重呈负相关,使得得到的第二损失更准确。
在一些实施例中,显著性检测模型至少包括以下至少之一:显著性检测模型为MobileNetV3的网络结构、显著性检测模型包括特征提取子网络和第一检测子网络和第二检测子网络;利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息,包括:利用特征提取子网络对样本图像进行特征提取,得到样本图像对应的特征图;利用第一检测子网络对特征图进行初始检测,得到样本图像中关于显著性区域的初始位置信息;将特征图和初始位置信息进行融合,得到融合结果;利用第二检测子网络对融合结果进行最终检测,得到样本图像的预测位置信息。
因此,因MobileNetV3的网络结构简单,通过使用MobileNetV3的网络结构,能够加快检测效率,而且可以使得处理能力较小的设备也可使用该显著性检测模型实现显著性检测;另,通过第一检测子网络对特征图进行初始检测之后,再使用第二检测子网络对初始检测结果进行最终检测,能够提高检测的准确度。
在一些实施例中,在利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息之前,方法还包括:对经过滤后的样本图像进行数据增强;其中,数据增强的方式包括对样本图像中除显著性区域以外的背景区域进行填充。
因此,通过对样本图像进行数据增强,能够提高显著性检测模型的适用性。
本申请实施例提供了一种显著性检测方法,包括:获取待处理图像;利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息,其中,显著性检测模型是由上述显著性检测模型的训练方法训练得到的。
因此,通过使用显著性检测模型的训练方法训练得到的显著性检测模型对待处理图像进行检测,能够提高得到关于显著性区域的预测位置信息的准确度。
在一些实施例中,在利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息之后,方法还包括:利用预测位置信息,对显著性区域进行骨骼提取,得到目标骨骼;为目标骨骼选择一骨骼模型作为源骨骼;将与源骨骼相关的第一动画 驱动数据迁移至目标骨骼上,得到目标骨骼的第二动画驱动数据。
因此,通过利用预测位置信息,对显著性区域进行骨胳提取,能够提高目标骨骼的准确度。
本申请实施例提供了一种显著性检测模型的训练装置,包括:第一获取模块,配置为获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像;筛选模块,配置为基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤;第一检测模块,配置为利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息;调整模块,配置为基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。
在一些实施例中,筛选模块配置为基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤,包括:对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异;在差异满足预设要求的情况下,过滤目标样本图像。
在一些实施例中,预设要求为差异大于预设差异值;筛选模块配置为对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像,包括:对目标样本图像进行闭运算,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异,包括:获取填补样本图像关于显著性区域的第一面积,以及目标样本图像中关于显著性区域的第二面积;将第一面积和第二面积之差作为差异。
在一些实施例中,在基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤之后,筛选模块还配置为:基于填补样本图像的显著性区域的位置信息,得到目标样本图像关于显著性区域的标注位置信息。
在一些实施例中,至少一张样本图像包括多种图像类型。
在一些实施例中,多种图像类型包括对真实物体拍摄得到的图像、手绘图以及卡通图中的至少两种。
在一些实施例中,调整模块配置为基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数,包括:基于标注位置信息和预测位置信息,获取样本图像中各像素的第一损失;将样本图像中各像素的第一损失进行加权,得到样本图像的第二损失;基于第二损失,调整显著性检测模型的参数。
在一些实施例中,像素的第一损失的权重与像素的边界距离相关,像素的边界距离为像素与真实显著性区域的边界之间的距离,真实显著性区域为样本图像中由标注位置信息定义的显著性区域。
在一些实施例中,像素的边界距离越小,像素的第一损失的权重越大。
在一些实施例中,显著性检测模型至少包括以下至少一个:显著性检测模型为MobileNetV3的网络结构、显著性检测模型包括特征提取子网络和第一检测子网络和第二检测子网络;第一检测模块配置为利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息,包括:利用特征提取子网络对样本图像进行特征提取,得到样本图像对应的特征图;利用第一检测子网络对特征图进行初始检测,得到样本图像中关于显著性区域的初始位置信息;将特征图和初始位置信息进行融合,得到融合结果;利用第二检测子网络对融合结果进行最终检测,得到样本图像的预测位置信息。
在一些实施例中,第一检测模块配置为在利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息之前,筛选模块还配置为:对经过滤后的样本图像进行数据增强;其中,数据增强的方式包括对样本图像中除显著性区域以外的背景区域进行填充。
本申请实施例提供了一种显著性检测装置,包括:第二获取模块,配置为获取待处理图像;第二检测模块,配置为利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息,其中,显著性检测模型是由上述显著性检测模型的训练方法训练得到的。
在一些实施例中,在利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息之后,显著性检测装置还包括功能模块,功能模块配置为:利用预测位置信息,对显著性区域进行骨骼提取,得到目标骨骼;为目标骨骼选择一骨骼模型作 为源骨骼;将与源骨骼相关的第一动画驱动数据迁移至目标骨骼上,得到目标骨骼的第二动画驱动数据。
本申请实施例提供了一种电子设备,包括存储器和处理器,处理器用于执行存储器中存储的程序指令,以实现上述显著性检测模型的训练方法和/或显著性检测方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述显著性检测模型的训练方法和/或显著性检测方法。
本公开实施例还提供一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行上述任一实施例所述的显著性检测模型的训练方法和/或显著性检测方法。
本申请实施例至少提供一种显著性检测方法及其模型的训练方法和装置、设备、介质及程序,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请。
为使本申请的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。
图1是本申请实施例的显著性检测模型的训练方法一实施例的流程示意图;
图2为可以应用本申请实施例的显著性检测模型的训练方法的系统架构示意图;
图3是本申请显著性检测模型的训练方法一实施例中示出对目标拍摄得到的图像的示意图;
图4是本申请显著性检测模型的训练方法一实施例中示出的手绘图的示意图;
图5是本申请显著性检测模型的训练方法一实施例中示出的卡通图的示意图;
图6是本申请显著性检测模型的训练方法一实施例中示出显著性区域存在缺失的手绘图的示意图;
图7是本申请显著性检测模型的训练方法一实施例中示出填补后的手绘图的示意图;
图8是本申请显著性检测模型的训练方法一实施例示出样本图像的示意图;
图9是本申请显著性检测模型的训练方法一实施例示出显著图的示意图;
图10是本申请显著性检测方法一实施例的流程示意图;
图11是本申请显著性检测方法一实施例示出映射关系的第一示意图;
图12是本申请显著性检测方法一实施例示出映射关系的第二示意图;
图13是本申请显著性检测方法一实施例示出映射关系的第三示意图;
图14是本申请显著性检测模型的训练装置一实施例的结构示意图;
图15是本申请显著性检测装置一实施例的结构示意图;
图16是本申请电子设备一实施例的结构示意图;
图17是本申请计算机可读存储介质一实施例的结构示意图。
具体实施方式
下面结合说明书附图,对本申请实施例的方案进行详细说明。
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本申请。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本 文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
本申请可应用于具备图像处理能力的设备。此外,该设备可以具备图像采集或是视频采集功能,比如,该设备可以包括诸如摄像头等用于采集图像或是视频的部件。或是该设备可以通过与其他设备进行数据传输或是数据交互的方式,以从其他设备中获取所需的视频流或是图像,或是从其他设备的存储资源中访问所需的视频流或是图像等。其中,其他设备具备图像采集或是视频采集功能,且与该设备之间具备通信连接,比如,该设备可以与其他设备之间通过蓝牙、无线网络等方式进行数据传输或是数据交互,在此对于二者之间的通信方式不予限定,可以包括但不限于上述例举的情况。在一种实现方式中,该设备可以包括手机、平板电脑、可交互屏幕等,在此不予限定。
请参阅图1,图1是本申请实施例的显著性检测模型的训练方法一实施例的流程示意图。所述显著性检测模型的训练方法可以包括如下步骤:
步骤S11:获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像。
至少一张可以是一张及以上。获取样本图像的方式有多种。例如,获取样本图像在执行本训练方法的执行设备中的存储位置,然后通过访问该存储位置以获得样本图像,或者通过蓝牙、无线网络等传输方式从其他设备中获取样本图像。
步骤S12:基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤。
其中,如果目标样本图像中显著性区域的轮廓缺失的情况满足删除条件,则将该目标样本图像从样本图像中删除。目标样本图像中显著性区域的轮廓缺失的情况不满足删除条件,则将该目标样本图像保留在样本图像中。其中,轮廓缺失较为严重,则进行删除,若较为轻微,则保留。其中,严重或轻微的认定,可根据具体情况认定,此处不做具体规定。
步骤S13:利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息。
其中,显著性检测模型可以同时对各样本图像进行处理,得到一个批次的预测结果,也可以分时对各样本图像进行处理,分别得到各样本图像对应的预测结果。
步骤S14:基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。
其中,可以根据显著性区域的标注位置信息与预测位置信息之间的损失,调整显著性检测模型的参数。
上述方案,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
图2为可以应用本申请实施例的显著性检测模型的训练方法的系统架构示意图;如图2所示,该系统架构中包括:样本图像获取终端201、网络202和控制终端203。为实现支撑一个示例性应用,样本图像获取终端201和控制终端203通过网络202建立通信连接样本图像获取终端201通过网络202向控制终端203上报至少一张样本图像,控制终端203响应至少一张样本图像中的目标样图像,并基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤,再利用显著性检测模型对经过滤后的所述样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息;最后基于样本图像关于所述显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。最后,控制终端203将调整后的参数上传至网络202,并通过网络202发送给样本图像获取终端201。
作为示例,样本图像获取终端201可以包括图像采集设备,控制终端203可以包括具有视觉信息处理能力的视觉处理设备或远程服务器。网络202可以采用有线或无线连接方式。其中,当控制终端203为视觉处理设备时,样本图像获取终端201可以通过有线连接的方式与视觉处理设备通信连接,例如通过总线进行数据通信;当控制终端203为远程服务器时,样本图像获 取终端201可以通过无线网络与远程服务器进行数据交互。
或者,在一些场景中,样本图像获取终端201可以是带有视频采集模组的视觉处理设备,可以是带有摄像头的主机。这时,本申请实施例的图像优化模型的训练方法可以由样本图像获取终端201执行,上述系统架构可以不包含网络202和控制终端203。
一些公开实施例中,至少一张样本图像包括多种图像类型。例如,包括两种、三种或三种以上等等。通过使用多种图像类型的样本图像对显著性检测模型进行训练,使得训练得到的显著性检测模型能够对多种类型的图像进行图像处理,从而提高了显著性检测模型的适用性。可选地,图像类型包括对目标拍摄得到的图像、手绘图以及卡通图中的至少两种。对目标拍摄得到的图像又可分为可见光图像以及红外图像等。手绘图可以是在纸上手绘的图,并对其拍摄得到手绘图,还可以是在绘图软件上绘制的图,例如,画师在手绘板上画制的简易米老鼠。本申请实施例中,手绘图进一步限定为预设背景颜色以及预设前景颜色的图,以及前景是由单色的线条构成,例如,背景为白色,前景是由黑色线条构成的米老鼠的轮廓。卡通图可以是具备多种前景颜色的虚拟图像。
为更好地理解本申请实施例所述的对目标拍摄得到的图像、手绘图以及卡通图,请同时参考图3至图5,图3是本申请显著性检测模型的训练方法一实施例中示出对目标拍摄得到的图像的示意图,图4是本申请显著性检测模型的训练方法一实施例中示出的手绘图的示意图,图5是本申请显著性检测模型的训练方法一实施例中示出的卡通图的示意图。如图3所示,图3是对真实存在的苹果拍摄得到的图像,图4是在真实的纸上绘制的苹果草图,图5是苹果的卡通形象。通过将常见的图像类型对应的样本图像用于对显著性检测模型进行训练,使得训练得到的显著性检测模型在日常生活或工作中更为适用。本申请实施例中,选择使用一万张上下的对目标拍摄得到的图像、两万张上下的手绘图以及两万张上下的卡通图进行训练。
一些公开实施例中,预设图像类型为手绘图。由于手绘图在绘制过程中很可能出现断点,通过对手绘图按照轮廓缺失的情况进行过滤,使得留下的手绘图中显著性区域轮廓的质量更好。其中,基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤的方式可以是:对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像。然后,获取填补样本图像与目标样本图像中关于显著性区域的差异。其中,若目标样本图像中显著性区域的轮廓不存在缺失或缺失较小,则填补样本图像与填补前的目标样本图像中的显著性区域相同或差异在预设范围内。若目标样本图像中显著性区域的轮廓存在较大缺失,则填补样本图像与填补前的目标样本图像中的显著性区域之间的差异较大。在差异满足预设要求的情况下,过滤目标样本图像。通过获取填补样本图像与目标样本图像中关于显著性区域的差异能够较快的获取显著性区域的轮廓缺失情况。其中,因为需要去掉样本图像中,显著性区域存在缺陷的目标样本图像,所以,本申请实施例中,预设要求为该差异大于预设差异值。
为更好地理解存在缺失的显著性区域的手绘图和填补之后的手绘图之间的差异,请参考图6和图7,图6是本申请显著性检测模型的训练方法一实施例中示出显著性区域存在缺失的手绘图的示意图,图7是本申请显著性检测模型的训练方法一实施例中示出填补后的手绘图的示意图。
如图6和图7所示,填补前的手绘图中显著性区域的轮廓为圆弧,两个端点与圆心的夹角为45°,显著性区域的面积可以是将缺口用线段进行连接,得到小于整圆的面积,而填补后显著性区域的轮廓为整圆。显著性区域的面积即为整圆的面积。很明显,填补后的显著性区域的面积与填补前的显著性区域的面积相差较大,此时,可以将填补前的手绘图去除,不让其参与模型的训练。
其中,对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像的方式可以是:对目标样本图像进行闭运算,得到填补样本图像。其中,闭运算指的是先对目标样本图像进行膨胀运算,再进行腐蚀运算或缩放运算。其中,闭运算能够小湖(即小孔),弥合小裂缝,而总的位置和形状不变。通过膨胀运算能够使得显著性区域的轮廓缺口弥合,通过缩放运算能够减少显著性区域的轮廓的厚度。如上述,手绘图可以是白底黑线条的形式,其中,手绘图的显著性区域为黑线条包围的区域,而显著性区域的轮廓即为黑色线条。对目标样本图像进行闭运算例如可以是对显著性区域的轮廓进行闭运算。也就是先对黑色线条进行膨胀,再对膨胀之后的黑线条进行缩放或腐蚀,使得填补样本图像中显著性区域的轮廓粗细与填补前目标样本图像中 显著性区域的轮廓粗细相同或差异在预设范围内。通过此种方式,使得在获取填补样本图像与目标样本图像中关于显著性区域的差异的过程中,可以忽略二者之间的轮廓差异。
其中,获取填补样本图像与目标样本图像中关于显著性区域的差异的方式可以是获取填补样本图像关于显著性区域的第一面积,以及目标样本图像中关于显著性区域的第二面积。一般获取区域面积的方式均可,此处不对获取显著性区域的面积的方式做具体限定。例如,获取第二面积的方式可以是使用线段连接轮廓缺口两端,形成封闭区域,从而计算封闭区域的面积,当然,还可以是以缺口两端分别作为原点,分别画横向和纵向两条直线,四条直线可能存在两个的交点。分别计算每个交点连接的两条直线与显著性区域形成的封闭区域的面积,将较小的封闭区域的面积作为第二面积。将第一面积和第二面积之差作为差异。例如,将第二面积减去第一面积的差作为填补样本图像与目标样本图像中关于显著性区域的差异。一些公开实施例中,可以将填补前后显著性区域的轮廓所占面积之差作为差异。若目标样本图像中的显著性区域的轮廓存在较大的缺口,则填补前后的显著性区域的面积可能存在较大的差异,从而根据填补前后显著性区域的面积差,即可确定目标样本图像中显著性区域的轮廓是否存在缺失。
一些公开实施例中,对目标样本图像进行过滤之后,显著性检测模型的训练方法还包括以下步骤:基于填补样本图像的显著性区域的位置信息,得到目标样本图像关于显著性区域的标注位置信息。例如,获取填补样本图像的显著性区域的轮廓,作为目标样本图像关于显著性区域的轮廓的标注位置信息。以及,将轮廓及其包围的区域作为显著性区域。通过填补样本图像的显著性区域的位置信息,确定目标样本图像关于显著性区域的标注位置信息,能够保障显著性区域的完整性。
一些公开实施例中,在利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息之前,显著性检测模型的训练方法还包括以下步骤:对经过滤后的样本图像进行数据增强。其中,数据增强的方式有多种,例如包括对样本图像中除显著性区域以外的背景区域进行填充。其中,可以使用预设像素值进行填充。例如,统一使用0像素进行填充,或统一使用其他像素值进行填充。当然,不同的像素位置还可以使用不同的像素值进行填充,关于填充方式此处不做具体规定。一些公开实施例中,数据增强的方式还可以是增加噪声、高斯模糊处理、裁剪以及旋转中的至少一种。其中,高斯模糊处理又可称之为高斯平滑,主要作用就是减少图像噪声以及降低细节层次,主要的做法是根据高斯曲线调节像素色值,有选择地模糊图像。裁剪,指的是将训练样本图像裁剪为不同大小的图像,例如将训练样本图像裁剪成尺寸为1024*2048或512*512大小的图像,当然,这尺寸仅是举例,在其他实施例中完全可以采取裁剪为其他尺寸的图像,因此,关于裁剪的尺寸此处不做具体规定。旋转可以是将训练样本图像旋转90°、180°或270°。当然,在其他实施例中,数据增强方式还可以是调整分辨率等。通过对样本图像进行数据增强,能够提高显著性检测模型的适用性。
一些公开实施例中,显著性检测模型为MobileNetV3的网络结构。其中,显著性检测模型包括特征提取子网络和第一检测子网络和第二检测子网络。其中,第一检测子网络和第二检测子网络采用级联结构。即,第一检测子网络的输出作为第二检测子网络的输入。在一些实施例中,第一检测子网络和第二检测子网络的结构相同。其中,利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息的方式可以是:利用特征提取子网络对样本图像进行特征提取,得到样本图像对应的特征图。然后在利用第一检测子网络对特征图进行初始检测,得到样本图像中关于显著性区域的初始位置信息。其中,初始位置信息可以是以显著图的形式呈现。然后将特征图和初始位置信息进行融合,得到融合结果。例如,融合的方式可以是将特征图与初始位置信息做乘法操作,得到融合结果。再利用第二检测子网络对融合结果进行最终检测,得到样本图像的预测位置信息。最终的预测位置信息也可以显著图的形式呈现。为更好地理解显著图,请参见图8和图9,图8是本申请显著性检测模型的训练方法一实施例示出样本图像的示意图,图9是本申请显著性检测模型的训练方法一实施例示出显著图的示意图。如图8和图9所示,样本图像中包括一张桌子以及位于桌子上的玩具鸭,显著性检测模型对样本图像进行检测,输出的初始位置信息(显著图)如图9所示,玩具鸭所在位置的像素值为1,其余位置的像素值为0。由此,可以清楚地得到玩具鸭在样本图像中的位置。因MobileNetV3的网络结构简单,通过使用MobileNetV3的网络结构,能够加快检测效率,而且可以使得处理能力较小的设备也可使用该显著性检测模型实现显著性检测;另,通 过第一检测子网络对特征图进行初始检测之后,再使用第二检测子网络对初始检测结果进行最终检测,能够提高检测的准确度。
一些公开实施例中,分别利用显著性检测模型对样本图像进行处理,得到样本图像中关于显著性区域的预测位置信息,基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数的方式包括:
从多张样本图像中选择若干样本图像作为当前样本图像。其中,若干指的是1及以上。也就是,这里可以从多张样本图像中选择其中一张样本图像作为当前样本图像,也可以是选择两张及以上的样本图像作为当前样本图像。在一些实施例中,选择出的若干样本图像所属的图像类型包含多张样本图像的所有图像类型。例如,在多张样本图像的图像类型一共包括上述三种图像类型时,从多张样本图像中选择出的若干张样本图像也包含上述三种图像类型。其中,每种图像类型的样本图像的数量可以相同,也可以是不同。然后,利用显著性检测模型对当前样本图像进行处理,得到当前样本图像的预测结果。例如,将当前样本图像作为一个批次,利用显著性检测模型对这一个批次的样本图像进行处理,得到一个批次的预测结果。再基于当前样本图像的标注结果和预测结果,调整显著性检测模型的参数。可选地,可以使用分别利用一个批次中各个标注结果与其对应的预测结果之间的损失对模型的参数进行调整,这种方式需要对参数调整若干次,还可以是结合各标注结果与其对应的预测结果之间的损失对模型的参数进行调整,这种方式只需要对模型的参数调整一次。重复执行从多张样本图像选择若干样本图像作为当前样本图像以及后续步骤,直到显著性检测模型满足预设要求。其中,这里的预设要求可以是模型给出的预测结果与标注结果之间的误差大小。具体误差大小根据实际需求确定,此处不做规定。可选地,每次从多张样本图像中选择的若干样本图像可以与上一次选择的部分样本图像相同。另一些公开实施例中,每次从多张样本图像中选择的若干样本图像均不相同。从多张样本图像中选择若干样本图像作为当前样本图像,并利用显著性检测模型对当前样本图像进行处理,能够提高训练速度。
一些公开实施例中,样本图像的标注信息还包括样本图像的真实图像类型,样本图像的预测结果包括样本图像的预测图像类型。其中,在显著性检测模型为目标分类模型的情况下,显著性检测模型的预测结果包括目标的预测类别以及样本图像的预测图像类型。在显著性检测模型为显著性检测模型的情况下,预测位置信息为样本图像中目标的预测类别以及样本图像的预测图像类型。通过使用关于样本图像的内容的标注位置信息与其内容的预测位置信息,和/或样本图像的真实图像类型以及样本图像的预测图像类型,对显著性检测模型的参数进行调整,使得调整之后的显著性检测模型的适用性更强。
一些公开实施例中,基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数的方式可以是:基于标注位置信息和预测位置信息,获取样本图像中各像素的第一损失。将样本图像中各像素的第一损失进行加权,得到样本图像的第二损失。基于第二损失,调整显著性检测模型的参数。获取第一损失的方式可以是将标注位置信息与预测位置信息进行作差,得到第一损失。通过对各像素的第一损失进行加权,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
其中,像素的第一损失的权重与像素的边界距离相关。像素的边界距离为像素与真实显著性区域的边界之间的距离,真实显著性区域为样本图像中由标注位置信息定义的显著性区域。其中,这里的像素与真实显著性区域的边界之间的距离可以是与显著性区域的边界最小距离。例如,样本图像的左上角的像素位置为(0,0),真实显著性区域的边界包括(0,1)、(0,2)等,该像素位置与真实显著性区域的边界之间的距离为1。通过根据像素的边界距离确定权重,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
在一些实施例中,像素点的边界距离越小,像素的第一损失的权重越大。即,像素点的第一损失的权重与像素点的边界距离呈负相关。像素的边界距离与像素的第一损失的权重呈负相关,使得得到的第二损失更准确。
一些公开实施例中,基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数的方式可以是:基于真实图像类型和预测图像类型,得到第三损失。然后,基于第二损失和第三损失,调整显著性检测模型的参数。例如,基于真实图像类型和预测图像类型之间的误差,得到第三损失。例如,通过结合一个批次的预测位置信息与对应的标注 信息之间的误差,确定一个第二损失,以及结合一个批次的预测图像类型与真实的图像类型之间的误差,确定一个第三损失。结合第二损失和第三损失,调整显著性检测模型的参数。通过使用关于样本图像的内容的标注位置信息与其内容的预测位置信息之间的第二损失以及基于真实图像类型和预测图像类型的第三损失,调整显著性检测模型的参数,能够提高显著性检测模型的适用性。
例如,第二损失对模型的参数进行优化,使得显著性检测模型得到的预测位置信息更接近标注位置信息,也就是二者之间的误差变小。通过使用第三损失对模型的参数进行调整,使得表示同一物体但属于不同图像类型的图像的特征向量在特征空间中的距离更接近,从而使得不同图像类型的的图像的特征向量都在距离较近的特征空间中。例如,训练得到的显著性检测模型对表示苹果的手绘图、卡通图以及对苹果进行拍摄得到的图像进行特征提取得到的特征向量在特征空间的距离更为接近。
一些公开实施例中,基于第二损失和第三损失,调整显著性检测模型的参数的方式可以是:获取第二损失与第三损失之间的损失差。然后利用损失差和第三损失,对显著性检测模型的参数进行调整。例如,该损失差为第二损失和第三损失作差得到。利用第二损失差和第三损失差,对显著性检测模型的参数进行调整可以是先使用其中一个损失对模型的参数进行调整,再使用另一个损失对模型的参数进行调整。通过使用第二损失和第三损失的损失差以及第三损失对显著性检测模型的参数进行调整,能够提高显著性检测模型的适用性。
一些公开实施例中,显著性检测模型还包括图像类型分类子网络。
其中,图像类型分类子网络连接特征提取子网络。利用图像类型分类网络对样本图像进行图像类型分类,得到样本图像的预测图像类型。在一些实施例中,将特征提取子网络提取得到的特征图输入图像类型分类网络,得到关于样本图像的预测图像类型。其中,利用损失差和第三损失,对显著性检测模型的参数进行调整的方式可以是:利用第三损失对图像类型分类子网络的参数进行调整。以及利用损失差,对特征提取子网络、第一检测子网络及第二检测子网络的参数进行调整。使用损失差和第三损失对参数进行调整的方式均为正向调整。通过使用损失差对显著性检测模型中的特征提取子网络、第一检测子网络及第二检测子网络进行调整,使得显著性检测模型得到的关于样本图像的内容的预测位置信息更准确,以及使用第三损失对图像类型分类网络的参数进行调整,能够提高图像类型分类网络的准确度。
一些公开实施例中,训练得到的显著性检测模型能够部署到手机端,AR/VR端进行图像处理。显著性检测方法还可应用于拍照、视频录制滤镜等软件中。
上述方案,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
其中,显著性检测模型的训练方法的执行主体可以是显著性检测模型的训练装置,例如,显著性检测模型的训练方法可以由终端设备或服务器或其它处理设备执行,其中,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该显著性检测模型的训练方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
请参见图10,图10是本申请显著性检测方法一实施例的流程示意图。如图10所示,本申请实施例提供的显著性检测方法包括以下步骤:
步骤S21:获取待处理图像。
其中,获取待处理图像的方式有多种,例如,通过执行显著性检测方法的执行设备中的摄像组件进行拍摄得到,也可以是根据各种通信方式从其他设备中获取待处理图像。其中,待处理图像的图像类型可以是多种图像类型中的一种。例如,待处理图像的图像类型可以是对目标拍摄得到的图像、手绘图卡通图中的一种或多种。一些公开实施例中,还可从视频中获取待处理图像。例如,将一段视频输入显著性检测模型,显著性检测模型获取视频中的每一帧视频帧,并将每一帧视频帧作为待处理图像。
步骤S22:利用显著性检测模型对待处理图像进行处理,得到关于待处理图像的内容中关 于显著性区域的预测位置信息,其中,显著性检测模型是显著性检测模型的训练方法训练得到的。
本申请实施例中的显著性检测模型包括特征提取子网络、第一检测子网络以及第二检测子网络。其中,该显著性检测模型利用了多种图像类型的样本图像进行训练。例如,将待处理图像从显著性检测模型的输入端输入该显著性检测模型。显著性检测模型对待处理图像进行处理得到待处理图像内容中关于显著性区域的预测位置信息。
上述方案,通过使用上述显著性检测模型的训练方法训练得到的显著性检测模型对待处理图像进行处理,能够提高图像处理的准确度。
一些公开实施例中,利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息之后,显著性检测方法还包括以下至少步骤:
1、在显示待处理图像的界面上显示预测位置信息。其中,显示的方式有多种,例如将预测位置信息标注在待处理图像上,以便将待处理图像和对应的预测位置信息一起在显示界面上显示,当然,还可以是在显示界面的不同区域分别显示待处理图像和对应的预测位置信息。一些公开实施例中,若待处理图像为两个及以上时,可以在显示界面的不同区域显示对应的待处理图像及其预测位置信息,或者以翻页的形式显示待处理图像及其预测位置信息。其中,在待处理图像是从视频中获取时,判断连续预设数量帧的视频帧的预测位置信息是否相同,若是,则认为预测位置信息正确。若否,则认为预测位置信息不正确。其中,可以选择将正确的预测位置信息输出,将错误的预测位置信息不输出,也可以选择将正确和错误的预测位置信息进行对应的批注,并输出。其中,预设数量帧可以是5帧、10帧等等,可根据具体使用场景确定。
2、利用预测位置信息,对显著性区域进行骨骼提取,得到目标骨骼。以及,为目标骨骼选择一骨骼模型作为源骨骼。其中,源骨骼上设置有动画驱动数据。然后将与源骨骼相关的第一动画驱动数据迁移至目标骨骼上,得到目标骨骼的第二动画驱动数据。其中,目标骨骼是基于待处理图像中目标进行骨骼提取得到的。
一些公开实施例中,利用预测位置信息,对显著性区域进行骨骼提取,得到目标骨骼的步骤可以是:对显著性区域进行轮廓提取,得到目标的轮廓,然后利用该轮廓,为目标生成三维网格模型。最后,从三维网格模型中提取得到目标骨骼。
获取源骨骼的方式可以是:对待处理图像进行分类,得到目标对象的类别,并选择与类别匹配的骨骼模型作为源骨骼。其中,目标骨骼为目标对象的骨骼。例如,本申请实施例可以采用预测标签映射,也可以采用数据集标签映射。预测标签映射对目标对象的分类结果包括目标对象的预测骨骼拓扑结构类型,例如预测骨骼拓扑结构类型包括二足、四足等等。也就是,预测标签映射的过程主要是预测目标对象的骨骼拓扑结构特点。数据集标签映射的分类结果需要给出输入图像中目标对象的具体种类,例如目标对象为猫、狗、大熊猫、狗熊等等。本申请实施例选择采用预测标签映射,具体应用过程中,若目标对象为大熊猫,而预测标签映射给出的目标对象类别为四足,并选择与类别匹配的骨骼模型作为初始源骨骼,若选择的初始源骨骼为四足的狗熊。虽然大熊猫和狗熊不同,但是他们实际上具有大致相同的骨骼拓扑结构,因此,将狗熊的动画驱动数据迁移到大熊猫上也能够以自然合理的形式出现。也就是通过预测标签映射虽然无法得到完全正确的目标对象的类别,但是也不影响对最终目标骨骼的驱动。同时,因为预测标签映射没有进一步获知目标对象的具体类别,从而降低了计算成本。
确定与目标骨骼匹配的源骨骼后,将源骨骼与目标骨骼进行之间进行骨骼节点映射,得到二者之间的节点映射关系。一些公开实施例中,得到二者之间的节点映射关系的方式可以是:确定源骨骼和目标骨骼中各节点所在的骨骼分支数量。按照骨骼分支数量从多到少的顺序,依序对源骨骼和目标骨骼中的节点进行映射。其中,所在的骨骼分支数量最多的节点一般称之为根节点。其中,暂且将节点所在的骨骼分支数量称之为度数。也就是先构建两个骨骼中度数较大的节点之间的映射关系,再构建度数较少的节点之间的映射关系。又或者,可以采用骨骼分支映射误差值最小的原则进行映射。其中,如果源骨骼和目标骨骼之间的节点数不同,则选择成本最低的最小多对一映射。例如,可以通过在发生多对一或跳过映射的序列中执行一对一的联合匹配的方式进行映射。
一些公开实施例中,最终的目标骨骼与源骨骼的节点拓扑结构一致。或,最终目标骨骼与最终源骨骼之间的节点一一映射。也就是,最终的目标骨骼与最终的源骨骼的节点拓扑结构可 能存在两种形式,一种是最终的目标骨骼与最终的源骨骼的节点拓扑结构完全一致,另一种是最终的目标骨骼中的节点均有最终的源骨骼的节点与之对应,但是最终的源骨骼中存在一些没有构建映射关系的节点。即,需要保证在动画迁移后,最终的目标骨骼的节点上均有对应的动画驱动数据。
在获得二者之间的节点映射关系之后,进行拓扑结构对齐以及节点对齐。
其中,进行拓扑结构对齐的方式可以包括以下至少一步:
一是在源骨骼和目标骨骼之间存在多个节点映射于同一节点的情况下,更新其中一个骨骼的节点拓扑结构。其中,经更新之后的两个骨骼之间的节点一一映射。通过更新骨骼的节点拓扑结构能够使得两个骨骼之间的多个节点映射于同一节点的情况调整为两个骨骼之间的节点一一映射,以减少后续动画驱动最终目标骨骼的过程中出现不合理的情况出现。
其中,更新其中一个骨骼的节点拓扑结构又可分为多种情况:第一种情况是在多个节点位于同一骨骼分支的情况下,更新多个节点所在的第一骨骼。其中,第一骨骼和第二骨骼中的其中一个为源骨骼,另一个为目标骨骼。通过更新多个节点所在的第一骨骼,使得两个骨骼之间的多个节点映射于同一节点的情况调整为两个骨骼之间的节点一一映射,进而减少后续动画驱动最终目标骨骼的过程中出现不合理的情况出现。可选地,更新多个节点所在的第一骨骼的方式可以是将第一骨骼中的多个节点合并为一个第一节点。其中,第一节点保留合并前多个节点的映射关系。并且,第一节点的位置取所有被合并节点的位置的平均值。
同时参见图11,图11是本申请显著性检测方法一实施例示出映射关系的第一示意图。如图11所示,目标骨骼中的第二个节点和第三个节点同时映射于源骨骼中的第二个节点时。在这种情况下,将目标骨骼中的第二个节点和第三个节点进行合并为一个第一节点。其中,第一节点的位置取目标骨骼中第二个节点和第三个节点的位置的平均值。其中,当第一骨骼为源骨骼时,因为源骨骼中的节点携带有动画驱动数据,所以当节点合并之后,需要获取第一节点的动画驱动数据,此时,可以将被合并的所有节点的动画驱动数据进行合并。例如,动画驱动数据一般可以用矩阵表示,矩阵的合并可以用矩阵乘法表示,即将动画驱动数据进行相乘,即可得到第一节点的动画驱动数据。第二种情况是在多个节点位于不同骨骼分支的情况下,更新不包括多个节点的第二骨骼。其中,第一骨骼和第二骨骼中的其中一个为源骨骼,另一个为目标骨骼。可选地,在第一骨骼中查找出多个节点所在的骨骼分支汇合的第二节点。具体做法可以是依次父节点遍历,从而得到第二节点。并在第二骨骼中查找出映射于第二节点的第三节点。然后找到多个节点对应的节点拓扑结构,在第三节点处新增至少一条骨骼分支。本申请实施例中,一个节点的父节点指的是在一条骨骼分支中,与该节点相邻且比该节点更靠近根节点的节点。其中,多个节点与第三节点处新增的骨骼分支和原始的骨骼分支中的节点一一映射。其中,新增的骨骼分支可以是复制原始的骨骼分支。复制的内容包括动画数据、以及该节点与其父节点之间的变换关系。例如,原始的骨骼分支中包括三个节点,则新增的骨骼分支中也包括三个节点,且新增的骨骼分支中的三个节点的动画驱动数据是通过复制原始的骨骼分支中对应节点的动画数据得到。
同时参见图12,图12是本申请显著性检测方法一实施例示出映射关系的第二示意图。如图12所示,左边的节点拓扑结构为源骨骼的节点拓扑结构,右边的节点拓扑结构为目标骨骼的节点拓扑结构。图12中,目标骨骼的第一个节点映射于源骨骼的第一个节点,目标骨骼的第二个节点映射于源骨骼的第二个节点,目标骨骼的第二个节点下包括两个分支,即左分支与右分支,其中,左分支中的第一个节点和右分支中的第一个节点映射于源骨骼的第三个节点,左分支中的第二个节点和右分支中的第二个节点映射于源骨骼的第四个节点。这也就出现了目标骨骼中两个节点映射于源骨骼的第三个节点,且这两个节点属于不同的分支,以及目标骨骼中两个节点映射于源骨骼的第四个节点,且这两个节点属于不同的分支。其中,这两个分支汇合在目标骨骼的第二个节点。在源骨骼中找出映射于目标骨骼的第二个节点为第二个节点。按照目标骨骼这两个节点对应的节点拓扑结构,在源骨骼的第二个节点处新增一条骨骼分支。其中,新增的一条骨骼分支中的节点有两个。此时,目标骨骼中所有的节点均一一对应与源骨骼中的节点。因此,通过此种方式在实现节点一一映射的情况下,还能够最大化的保留第一骨骼的节点拓扑结构。
二是在骨骼中存在未有映射关系的情况下,更新未有映射关系的节点所在骨骼的节点拓扑 结构。其中,两个骨骼包括源骨骼和目标骨骼,经更新之后的两个骨骼之间的节点一一映射。通过更新没有映射关系的节点所在骨骼的节点拓扑结构,减少没有映射关系的节点,使得更新后的两个骨骼之间的节点一一映射,从而减少后续动画驱动最终目标骨骼的过程中出现不合理的情况出现。可选地,将未有映射关系的节点合并至具有映射关系的相邻节点。其中,相邻节点为未有映射关系的节点在所在骨骼中的父节点或子节点。本申请实施例中将未有映射关系的节点向其父节点合并。
请参见图13,图13是本申请显著性检测方法一实施例示出映射关系的第三示意图。如图13所示,目标骨骼的第一个节点映射于源骨骼的第一个节点,目标骨骼的第二个节点映射于源骨骼的第三个节点,目标骨骼的第三个节点映射于源骨骼的第四个节点。其中,源骨骼的第二个节点没有映射关系。可以将源骨骼的第二个节点向其父节点合并,也就是向源骨骼的第一个节点合并。当然,源骨骼中的节点合并都会伴随着动画驱动数据之间的合并,关于动画驱动数据之间的合并此处不再赘述。
其中,进行节点对齐,主要是为了确定源骨骼和目标骨骼之间的第一位姿变换关系。
例如,按照从根源节点到叶源节点的顺序,分别将最终源骨骼中的各源节点与最终目标骨骼中对应映射的目标节点进行对齐,以得到各源节点与映射的目标节点之间的第一位姿变换关系。如上述,根节点为所在的骨骼分支数量最多的节点。则根源节点指的是最终源骨骼中的根节点,同理,根目标节点指的是最终目标骨骼的根节点。最终源骨骼和最终目标骨骼指的是经过拓扑结构对齐后的源骨骼和目标骨骼。其中,叶节点指的是具有父节点但没有子节点的节点。叶源节点指的是最终源骨骼中的叶节点,叶目标节点指的是最终目标骨骼中的叶节点。即,先对齐根源节点以及与根源节点有映射关系的根目标节点。然后再对齐与根源节点连接的叶源节点以及与该叶源节点之间具备映射关系的叶目标节点,以此类推,直至最终目标骨骼中所有节点均与最终源骨骼的节点一一对齐为止。一些公开实施例中,可以直接将最终目标骨骼的根目标节点作为第一坐标系原点。
位姿变换关系为源节点与映射的目标节点在第一坐标系中的变换关系。通过最终源骨骼的根源节点和最终目标骨骼的根目标节点均平移至第一坐标系的原点,能够获取最终源骨骼的根源节点和最终目标骨骼的根目标节点之间的偏移量。例如,对于最终源骨骼中的每个源节点,获取使源节点对齐于映射的目标节点所需的偏移量。其中,偏移量包括平移分量和旋转分量。一般而言,平移分量中包括缩放分量。然后基于源节点对应的偏移量,得到源节点的第一位姿变换关系。
其中,若源骨骼的拓扑结构有发生改变,则源骨骼上的动画数据也对应发生改变。例如,源骨骼中某两个源节点发生合并,则将其节点对应的动画数据也进行合并。
由此,可以将源骨骼上的动画数据迁移到目标骨骼上,以驱动待处理图像中的目标进行运动。
通过在得到预测信息之后,还执行上述至少一步,提高了使用过程中的便捷性。
以及通过使用上述显著性检测模型的训练方法训练得到的显著性检测模型输出的显著性区域,并以此对显著性区域进行骨骼提取得到目标骨骼,使得得到的目标骨骼更为准确。
上述方案,通过使用上述显著性检测模型的训练方法训练得到的显著性检测模型对待处理图像进行处理,能够提高图像处理的准确度。
其中,显著性检测方法的执行主体可以是显著性检测装置,例如,显著性检测方法可以由终端设备或服务器或其它处理设备执行,其中,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该显著性检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
请参阅图14,图14是本申请显著性检测模型的训练装置一实施例的结构示意图。显著性检测模型的训练装置30包括第一获取模块31、筛选模块、第一检测模块32以及调整模块33。第一获取模块31,配置为获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像;筛选模块32,配置为基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤;第一检测模块33,配置为利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息;调整模块34,配置为基于样本 图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。
上述方案,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
在一些实施例中,筛选模块32配置为基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤,包括:对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异;在差异满足预设要求的情况下,过滤目标样本图像。
上述方案,样本图像按照轮廓缺失的情况进行过滤,使得留下的样本图像中显著性区域轮廓的质量更好。另外,通过获取填补样本图像与目标样本图像中关于显著性区域的差异能够较快的获取显著性区域的轮廓缺失情况。
在一些实施例中,预设要求为差异大于预设差异值;筛选模块32配置为对目标样本图像中显著性区域的轮廓进行填补,得到填补样本图像,包括:对目标样本图像进行闭运算,得到填补样本图像;获取填补样本图像与目标样本图像中关于显著性区域的差异,包括:获取填补样本图像关于显著性区域的第一面积,以及目标样本图像中关于显著性区域的第二面积;将第一面积和第二面积之差作为差异。
上述方案,若目标样本图像中的显著性区域的轮廓存在较大的缺口,则填补前后的显著性区域的面积可能存在较大的差异,从而根据填补前后显著性区域的面积差,即可确定目标样本图像中显著性区域的轮廓是否存在缺失。
在一些实施例中,在基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤之后,筛选模块32还配置为:基于填补样本图像的显著性区域的位置信息,得到目标样本图像关于显著性区域的标注位置信息。
上述方案,通过填补样本图像的显著性区域的位置信息,确定目标样本图像关于显著性区域的标注位置信息,能够保障显著性区域的完整性。
在一些实施例中,至少一张样本图像包括多种图像类型。
上述方案,通过使用多种图像类型的样本图像对显著性检测模型进行训练,使得训练得到的显著性检测模型能够对多种类型的图像进行图像处理,从而提高了显著性检测模型的适用性。
在一些实施例中,多种图像类型包括对真实物体拍摄得到的图像、手绘图以及卡通图中的至少两种。
上述方案,通过将常见的图像类型对应的样本图像用于对图像处理模型进行训练,使得训练得到的图像处理模型在日常生活或工作中更为适用。
在一些实施例中,调整模块34配置为基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数,包括:基于标注位置信息和预测位置信息,获取样本图像中各像素的第一损失;将样本图像中各像素的第一损失进行加权,得到样本图像的第二损失;基于第二损失,调整显著性检测模型的参数。
上述方案,通过对各像素的第一损失进行加权,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
在一些实施例中,像素的第一损失的权重与像素的边界距离相关,像素的边界距离为像素与真实显著性区域的边界之间的距离,真实显著性区域为样本图像中由标注位置信息定义的显著性区域。
上述方案,通过根据像素的边界距离确定权重,使得利用加权后的第二损失调整显著性检测模型的参数更准确。
在一些实施例中,像素的边界距离越小,像素的第一损失的权重越大。
上述方案,像素的边界距离与像素的第一损失的权重呈负相关,使得得到的第二损失更准确。
在一些实施例中,显著性检测模型至少包括以下至少一个:显著性检测模型为MobileNetV3的网络结构、显著性检测模型包括特征提取子网络和第一检测子网络和第二检测子网络;第一检测模块33配置为利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于 显著性区域的预测位置信息,包括:利用特征提取子网络对样本图像进行特征提取,得到样本图像对应的特征图;利用第一检测子网络对特征图进行初始检测,得到样本图像中关于显著性区域的初始位置信息;将特征图和初始位置信息进行融合,得到融合结果;利用第二检测子网络对融合结果进行最终检测,得到样本图像的预测位置信息。
上述方案,因MobileNetV3的网络结构简单,通过使用MobileNetV3的网络结构,能够加快检测效率,而且可以使得处理能力较小的设备也可使用该显著性检测模型实现显著性检测;另,通过第一检测子网络对特征图进行初始检测之后,再使用第二检测子网络对初始检测结果进行最终检测,能够提高检测的准确度。
在一些实施例中,第一检测模块33配置为在利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息之前,筛选模块32还配置为:对经过滤后的样本图像进行数据增强;其中,数据增强的方式包括对样本图像中除显著性区域以外的背景区域进行填充。
上述方案,通过对样本图像进行数据增强,能够提高显著性检测模型的适用性。
请参阅图15,图15是本申请显著性检测装置一实施例的结构示意图。显著性检测装置40包括第二获取模块41以及第二检测模块42。第二获取模块41,配置为获取待处理图像;第二检测模块42,配置为利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息,其中,显著性检测模型是由上述显著性检测模型的训练方法训练得到的。
上述方案,通过使用显著性检测模型的训练方法训练得到的显著性检测模型对待处理图像进行检测,能够提高得到关于显著性区域的预测位置信息的准确度。
在一些实施例中,在利用显著性检测模型对待处理图像进行处理,得到待处理图像内容中关于显著性区域的预测位置信息之后,显著性检测装置还包括功能模块(图未示),功能模块配置为:利用预测位置信息,对显著性区域进行骨骼提取,得到目标骨骼;为目标骨骼选择一骨骼模型作为源骨骼;将与源骨骼相关的第一动画驱动数据迁移至目标骨骼上,得到目标骨骼的第二动画驱动数据。
上述方案,通过利用预测位置信息,对显著性区域进行骨胳提取,能够提高目标骨骼的准确度。
请参阅图16,图16是本申请电子设备一实施例的结构示意图。电子设备50包括存储器51和处理器52,处理器52用于执行存储器51中存储的程序指令,以实现上述任一显著性检测模型的训练方法实施例中的步骤和/或显著性检测方法实施例中的步骤。在一个实施场景中,电子设备50可以包括但不限于:医疗设备、微型计算机、台式电脑、服务器,此外,电子设备50还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。
处理器52用于控制其自身以及存储器51以实现上述任一显著性检测模型的训练方法实施例中的步骤。处理器52还可以称为CPU(Central Processing Unit,中央处理单元)。处理器52可能是一种集成电路芯片,具有信号的处理能力。处理器52还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器52可以由集成电路芯片共同实现。
上述方案,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检测模型后续对图像进行检测的结果更准确。
请参阅图17,图17是本申请计算机可读存储介质一实施例的结构示意图。计算机可读存储介质60存储有能够被处理器运行的程序指令61,程序指令61用于实现上述任一显著性检测模型的训练方法实施例中的步骤和/或显著性检测方法实施例中的步骤。
上述方案,通过对获取到的预设图像类型的目标样本图像进行按照其显著性区域的轮廓缺失情况,对目标样本图像进行过滤,使得保留下的样本图像中显著性区域较为完整,进而利用这种保留下的质量较高的样本图像对显著性检测模型进行训练,可以使得训练得到的显著性检 测模型后续对图像进行检测的结果更准确。
在一些实施例中,本申请实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其实现可以参照上文方法实施例的描述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
工业实用性
本申请实施例公开了一种显著性检测方法及其模型的训练方法和装置、设备、介质及程序,显著性检测模型的训练方法包括:获取至少一张样本图像,其中,至少一张样本图像包括属于预设图像类型的目标样本图像;基于目标样本图像中显著性区域的轮廓缺失情况,对目标样本图像进行过滤;利用显著性检测模型对经过滤后的样本图像进行检测,得到样本图像中关于显著性区域的预测位置信息;基于样本图像关于显著性区域的标注位置信息与预测位置信息,调整显著性检测模型的参数。上述方案,通过对样本图像进行筛选再利用筛选后的样本图像对显著性检测模型进行训练,能够提高显著性检测模型输出结果的准确度。

Claims (18)

  1. 一种显著性检测模型的训练方法,所述方法由电子设备执行,所述方法包括:
    获取至少一张样本图像,其中,所述至少一张样本图像包括属于预设图像类型的目标样本图像;
    基于所述目标样本图像中显著性区域的轮廓缺失情况,对所述目标样本图像进行过滤;
    利用显著性检测模型对经过滤后的所述样本图像进行检测,得到所述样本图像中关于显著性区域的预测位置信息;
    基于所述样本图像关于所述显著性区域的标注位置信息与所述预测位置信息,调整所述显著性检测模型的参数。
  2. 根据权利要求1所述的方法,其中,所述基于所述目标样本图像中显著性区域的轮廓缺失情况,对所述目标样本图像进行过滤,包括:
    对所述目标样本图像中所述显著性区域的轮廓进行填补,得到填补样本图像;
    获取所述填补样本图像与所述目标样本图像中关于所述显著性区域的差异;
    在所述差异满足预设要求的情况下,过滤所述目标样本图像。
  3. 根据权利要求2所述的方法,其中,所述预设要求为所述差异大于预设差异值;
    所述对所述目标样本图像中所述显著性区域的轮廓进行填补,得到填补样本图像,包括:
    对所述目标样本图像进行闭运算,得到所述填补样本图像;
    所述获取所述填补样本图像与所述目标样本图像中关于所述显著性区域的差异,包括:
    获取所述填补样本图像关于所述显著性区域的第一面积,以及所述目标样本图像中关于所述显著性区域的第二面积;
    将所述第一面积和所述第二面积之差,确定为所述差异。
  4. 根据权利要求2或3所述的方法,其中,在所述基于所述目标样本图像中显著性区域的轮廓缺失情况,对所述目标样本图像进行过滤之后,所述方法还包括:
    基于所述填补样本图像的显著性区域的位置信息,得到所述目标样本图像关于所述显著性区域的标注位置信息。
  5. 根据权利要求1至4任一项所述的方法,其中,所述至少一张样本图像包括多种图像类型。
  6. 根据权利要求5所述的方法,其中,所述多种图像类型包括对真实物体拍摄得到的图像、手绘图以及卡通图中的至少两种。
  7. 根据权利要求1所述的方法,其中,所述基于所述样本图像关于所述显著性区域的标注位置信息与所述预测位置信息,调整所述显著性检测模型的参数,包括:
    基于所述样本图像关于所述显著性区域的标注位置信息和所述预测位置信息,获取所述样本图像中各像素的第一损失;
    将所述样本图像中所述各像素的第一损失进行加权,得到所述样本图像的第二损失;
    基于所述第二损失,调整所述显著性检测模型的参数。
  8. 根据权利要求7所述的方法,其中,所述像素的第一损失的权重与所述像素的边界距离相关,所述像素的边界距离为所述像素与真实显著性区域的边界之间的距离,所述真实显著性区域为所述样本图像中由所述标注位置信息定义的显著性区域。
  9. 根据权利要求8所述的方法,其中,所述像素的边界距离越小,所述像素的第一损失的权重越大。
  10. 根据权利要求1至9任一项所述的方法,其中,所述显著性检测模型至少包括以下之一:所述显著性检测模型为MobileNetV3的网络结构、所述显著性检测模型包括特征提取子网络和第一检测子网络和第二检测子网络;
    所述利用显著性检测模型对经过滤后的所述样本图像进行检测,得到所述样本图像中关于显著性区域的预测位置信息,包括:
    利用所述特征提取子网络对所述样本图像进行特征提取,得到所述样本图像对应的特征图;
    利用所述第一检测子网络对所述特征图进行初始检测,得到所述样本图像中关于所述显著 性区域的初始位置信息;
    将所述特征图和所述初始位置信息进行融合,得到融合结果;
    利用所述第二检测子网络对所述融合结果进行最终检测,得到所述样本图像的所述预测位置信息。
  11. 根据权利要求1至10任一项所述的方法,其中,在所述利用显著性检测模型对经过滤后的所述样本图像进行检测,得到所述样本图像中关于显著性区域的预测位置信息之前,所述方法还包括:
    对经过滤后的所述样本图像进行数据增强;
    其中,所述数据增强的方式包括对所述样本图像中除所述显著性区域以外的背景区域进行填充。
  12. 一种显著性检测方法,其中,包括:
    获取待处理图像;
    利用显著性检测模型对所述待处理图像进行处理,得到所述待处理图像内容中关于显著性区域的预测位置信息,其中,所述显著性检测模型是由权利要求1至11任一项方法训练得到的。
  13. 根据权利要求12所述的方法,其中,在所述利用显著性检测模型对所述待处理图像进行处理,得到所述待处理图像内容中关于显著性区域的预测位置信息之后,所述方法还包括:
    利用所述预测位置信息,对所述显著性区域进行骨骼提取,得到目标骨骼;
    为所述目标骨骼选择一骨骼模型作为源骨骼;
    将与所述源骨骼相关的第一动画驱动数据迁移至所述目标骨骼上,得到所述目标骨骼的第二动画驱动数据。
  14. 一种显著性检测模型的训练装置,其中,包括:
    第一获取模块,配置为获取至少一张样本图像,其中,所述至少一张样本图像包括属于预设图像类型的目标样本图像;
    筛选模块,配置为基于所述目标样本图像中显著性区域的轮廓缺失情况,对所述目标样本图像进行过滤;
    第一检测模块,配置为利用显著性检测模型对经过滤后的所述样本图像进行检测,得到所述样本图像中关于显著性区域的预测位置信息;
    调整模块,配置为基于所述样本图像关于所述显著性区域的标注位置信息与所述预测位置信息,调整所述显著性检测模型的参数。
  15. 一种显著性检测装置,其中,包括:
    第二获取模块,配置为获取待处理图像;
    第二检测模块,配置为利用显著性检测模型对所述待处理图像进行处理,得到所述待处理图像内容中关于显著性区域的预测位置信息,其中,所述显著性检测模型是由权利要求1至11任一项方法训练得到的。
  16. 一种电子设备,其中,包括存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至13任一项所述的方法。
  17. 一种计算机可读存储介质,其上存储有程序指令,其中,所述程序指令被处理器执行时实现权利要求1至13任一项所述的方法。
  18. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至13任一项所述的方法。
PCT/CN2021/127459 2021-06-30 2021-10-29 显著性检测方法及其模型的训练方法和装置、设备、介质及程序 WO2023273069A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110735893.4A CN113505799B (zh) 2021-06-30 2021-06-30 显著性检测方法及其模型的训练方法和装置、设备、介质
CN202110735893.4 2021-06-30

Publications (1)

Publication Number Publication Date
WO2023273069A1 true WO2023273069A1 (zh) 2023-01-05

Family

ID=78009429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127459 WO2023273069A1 (zh) 2021-06-30 2021-10-29 显著性检测方法及其模型的训练方法和装置、设备、介质及程序

Country Status (3)

Country Link
CN (1) CN113505799B (zh)
TW (1) TWI778895B (zh)
WO (1) WO2023273069A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505799B (zh) * 2021-06-30 2022-12-23 深圳市慧鲤科技有限公司 显著性检测方法及其模型的训练方法和装置、设备、介质
CN114419341B (zh) * 2022-01-20 2024-04-26 大连海事大学 一种基于迁移学习改进的卷积神经网络图像识别方法
CN117478806A (zh) * 2022-07-22 2024-01-30 索尼集团公司 信息处理设备和方法、计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574866A (zh) * 2015-12-15 2016-05-11 努比亚技术有限公司 一种实现图像处理的方法及装置
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
CN108647634A (zh) * 2018-05-09 2018-10-12 深圳壹账通智能科技有限公司 图像边框查找方法、装置、计算机设备及存储介质
CN112734775A (zh) * 2021-01-19 2021-04-30 腾讯科技(深圳)有限公司 图像标注、图像语义分割、模型训练方法及装置
CN113505799A (zh) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 显著性检测方法及其模型的训练方法和装置、设备、介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007971B2 (en) * 2016-03-14 2018-06-26 Sensors Unlimited, Inc. Systems and methods for user machine interaction for image-based metrology
CN107103608B (zh) * 2017-04-17 2019-09-27 大连理工大学 一种基于区域候选样本选择的显著性检测方法
CN109146847B (zh) * 2018-07-18 2022-04-05 浙江大学 一种基于半监督学习的晶圆图批量分析方法
CN111325217B (zh) * 2018-12-14 2024-02-06 京东科技信息技术有限公司 数据处理方法、装置、系统和介质
CN110570442A (zh) * 2019-09-19 2019-12-13 厦门市美亚柏科信息股份有限公司 一种复杂背景下轮廓检测方法、终端设备及存储介质
CN110751157B (zh) * 2019-10-18 2022-06-24 厦门美图之家科技有限公司 图像显著性分割、图像显著性模型训练方法及装置
CN110866897B (zh) * 2019-10-30 2022-10-14 上海联影智能医疗科技有限公司 一种图像检测方法及计算机可读存储介质
CN111476292B (zh) * 2020-04-03 2021-02-19 北京全景德康医学影像诊断中心有限公司 医学图像分类处理人工智能的小样本元学习训练方法
CN112164129A (zh) * 2020-09-02 2021-01-01 北京电影学院 基于深度卷积网络的无配对动作迁移方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574866A (zh) * 2015-12-15 2016-05-11 努比亚技术有限公司 一种实现图像处理的方法及装置
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
CN108647634A (zh) * 2018-05-09 2018-10-12 深圳壹账通智能科技有限公司 图像边框查找方法、装置、计算机设备及存储介质
CN112734775A (zh) * 2021-01-19 2021-04-30 腾讯科技(深圳)有限公司 图像标注、图像语义分割、模型训练方法及装置
CN113505799A (zh) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 显著性检测方法及其模型的训练方法和装置、设备、介质

Also Published As

Publication number Publication date
TWI778895B (zh) 2022-09-21
CN113505799B (zh) 2022-12-23
TW202303446A (zh) 2023-01-16
CN113505799A (zh) 2021-10-15

Similar Documents

Publication Publication Date Title
WO2023273069A1 (zh) 显著性检测方法及其模型的训练方法和装置、设备、介质及程序
CN109493350B (zh) 人像分割方法及装置
US11151723B2 (en) Image segmentation method, apparatus, and fully convolutional network system
JP6636154B2 (ja) 顔画像処理方法および装置、ならびに記憶媒体
CN108961303B (zh) 一种图像处理方法、装置、电子设备和计算机可读介质
CN110379020B (zh) 一种基于生成对抗网络的激光点云上色方法和装置
CN110889824A (zh) 一种样本生成方法、装置、电子设备及计算机可读存储介质
WO2020207203A1 (zh) 一种前景数据生成及其应用方法、相关装置和系统
AU2019477545B2 (en) Methods for handling occlusion in augmented reality applications using memory and device tracking and related apparatus
JP2018045693A (ja) 動画像背景除去方法及び動画像背景除去システム
WO2016112797A1 (zh) 一种用于确定图片陈列信息的方法及设备
CN115699082A (zh) 缺陷检测方法及装置、存储介质及电子设备
CN112001274A (zh) 人群密度确定方法、装置、存储介质和处理器
CN111476710A (zh) 基于移动平台的视频换脸方法及系统
CN112101386B (zh) 文本检测方法、装置、计算机设备和存储介质
WO2020063835A1 (zh) 模型生成
CN114445651A (zh) 一种语义分割模型的训练集构建方法、装置及电子设备
WO2022194079A1 (zh) 天空区域分割方法、装置、计算机设备和存储介质
CN113516697B (zh) 图像配准的方法、装置、电子设备及计算机可读存储介质
WO2024041108A1 (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
WO2023174063A1 (zh) 背景替换的方法和电子设备
CN113012030A (zh) 图像拼接方法、装置及设备
JP2018190394A (ja) 監視ビデオにおけるデータの拡張方法及び装置
WO2022185403A1 (ja) 画像処理装置、画像処理方法、およびプログラム
KR102661488B1 (ko) 생성형 ai 모델을 이용한 특수효과 합성 및 3d 모델 생성 서비스 제공 서버, 시스템, 방법 및 프로그램

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE