WO2022194079A1

WO2022194079A1 - Sky region segmentation method and apparatus, computer device, and storage medium

Info

Publication number: WO2022194079A1
Application number: PCT/CN2022/080598
Authority: WO
Inventors: 贾配洋; 林晓帆
Original assignee: 影石创新科技股份有限公司
Priority date: 2021-03-19
Filing date: 2022-03-14
Publication date: 2022-09-22
Also published as: CN113034514A

Abstract

A sky region segmentation method and apparatus, a computer device, and a storage medium. A computer device determines, according to the proportion of sky elements in a target panoramic video, whether to perform sky segmentation on the target panoramic video; if yes, extracts M panoramic image frames of the target panoramic video, and inputs each of the M panoramic image frames into a preset model to obtain a mask image of each of the M panoramic image frames; and performs, according to the mask image, sky segmentation on the panoramic image frame in the M panoramic image frames corresponding to the mask image. Before performing sky segmentation on the target panoramic video, a determination is made on the target panoramic video first, and by considering the sky proportion of the panoramic video, scenarios which have relatively small sky regions and are unsuitable for sky segmentation can be effectively filtered, thereby improving the sky segmentation effect; moreover, the accurate recognition of the panoramic video by the preset model can improve the effect of sky segmentation for the panoramic video.

Description

Sky area segmentation method, apparatus, computer equipment and storage medium

technical field

The present application relates to the technical field of image processing, and in particular, to a sky area segmentation method, apparatus, computer equipment and storage medium.

Background technique

With the development of image processing technology, sky segmentation technology has emerged, which can distinguish sky pixels and non-sky pixels in an image, so as to realize the special effect of sky replacement for the image.

In the traditional technology, when performing sky segmentation to achieve sky replacement, the camera or mobile phone can send the plane image taken to the cloud, and the cloud uses the pre-stored sky segmentation algorithm to perform sky segmentation on the plane image, and returns the segmented result. to the camera or mobile phone; the camera or mobile phone can also perform sky segmentation on the plane image through the locally stored sky segmentation algorithm.

technical problem

However, the processing speed of the sky segmentation algorithm in the prior art is relatively slow, and the processing accuracy of edge segmentation is low; and the current sky segmentation algorithm processes only plane images, and the sky segmentation processing effect for panoramic images or panoramic videos is relatively low. Difference.

technical solutions

Based on this, it is necessary to provide a sky area segmentation method, device, computer equipment and storage medium that can realize accurate sky segmentation of panoramic images and panoramic videos in view of the above technical problems.

In a first aspect, a method for segmenting a sky area is provided, the method comprising:

According to the proportion of sky elements in the target panoramic video, determine whether to perform sky segmentation processing on the target panoramic video;

In the case of judging that the target panoramic video is subjected to sky segmentation processing, extract M frames of panoramic images of the target panoramic video, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain the M frames of panoramic images The mask image of each frame of panoramic image in ; the mask image includes sky area and non-sky area;

According to the mask image, sky segmentation processing is performed on the panoramic image corresponding to the mask image among the M frames of panoramic images.

In one embodiment, according to the proportion of sky elements in the target panoramic video, judging whether to perform sky segmentation processing on the target panoramic video, including:

Determine N frames of panoramic images from the M frames of panoramic images;

Inputting the N frames of panoramic images into the preset model to obtain N mask images corresponding to the N frames of panoramic images;

According to the proportion of sky pixels in the N mask images, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, determining whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky pixels in the N mask images includes:

Determine the proportion of the sky pixels according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images;

According to the relationship between the proportion of the sky pixels and the first preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.

Determine N frames of panoramic images from the M frames of target panoramic images;

The N frames of panoramic images are identified by the second preset model, and N output results are obtained; the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain the sky;

According to the proportion of panoramic images containing the sky in the N output results, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, according to the proportion of panoramic images containing the sky in the N output results, judging whether to perform sky segmentation processing on the target panoramic video, including:

According to the number of panoramic images that contain the sky in the N output results and the number of panoramic images that do not contain the sky in the N output results, determine the proportion of the panoramic images that contain the sky;

According to the relationship between the proportion of the panoramic image including the sky and the second preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the training sample set of the preset model includes multiple frames of panoramic images and a reference image corresponding to each frame of panoramic images in the multiple frames of panoramic images, where the reference image is used to indicate the panoramic image corresponding to the reference image sky area.

In one embodiment, the training process of the preset model includes:

The panoramic images in the training sample set are input into a neural network model, and the parameters of the neural network model are adjusted according to the output of the neural network model and the loss value of the reference image to obtain the preset model.

In one embodiment, the training process of the preset model further includes:

The neural network model is processed with a model compression algorithm to update the neural network model.

In a second aspect, a device for segmenting a sky area is provided, the device comprising:

The judgment module is used to judge whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video;

The acquisition module is used to extract M frames of panoramic images of the target panoramic video when it is judged to perform sky segmentation processing on the target panoramic video, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain The mask image of each frame of panoramic image in the M frames of panoramic images; the mask image includes sky area and non-sky area;

A segmentation module, configured to perform sky segmentation processing on the panoramic image corresponding to the mask image in the M frames of panoramic images according to the mask image.

In a third aspect, a computer device is provided, comprising a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

Extracting M frames of panoramic images of the target panoramic video, inputting each frame of panoramic images in the M frames of panoramic images into a preset model, and obtaining a mask image of each frame of panoramic images in the M frames of panoramic images; the mask image including sky areas and non-sky areas;

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

技术效果technical effect

The above-mentioned sky area segmentation method, device, computer equipment and storage medium, the computer equipment determines whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video; after determining whether to perform sky segmentation processing on the target panoramic video Under the situation, extract the M frames of panoramic images of this target panoramic video, input each frame of panoramic images in this M frames of panoramic images into a preset model, and obtain the mask image of each frame of panoramic images in this M frames of panoramic images; Further, sky segmentation processing may be performed on the panoramic image corresponding to the mask image among the M frames of panoramic images according to the mask image, wherein the mask image includes a sky area and a non-sky area. It can be seen that in this embodiment, before the sky segmentation processing of the target panoramic video is performed, the target panoramic video is first judged to determine whether the target panoramic video is suitable for sky segmentation, and when the target panoramic video is suitable for sky segmentation, the The preset model performs sky segmentation processing on the target panoramic video; this can avoid the problem of poor sky segmentation effect caused by still performing sky segmentation when the panoramic video contains less sky areas in the prior art. By considering the sky proportion of the panoramic video, the scenes with less sky area that are not suitable for sky segmentation can be effectively filtered, thereby improving the sky segmentation effect; further, each frame of panoramic image can be accurately obtained through the preset model in this embodiment. Respectively corresponding mask images, improve the edge processing accuracy of panoramic images, and improve the processing speed of sky segmentation, realize accurate recognition of panoramic images and panoramic videos, and greatly improve the accuracy of panoramic image and panoramic video recognition. Improve the segmentation effect of panoramic images and panoramic videos. Therefore, by using the sky region segmentation method in this embodiment, the sky segmentation effect on panoramic images and panoramic videos can be greatly improved.

Description of drawings

Fig. 1 is the application environment diagram of the sky area segmentation method in one embodiment;

2 is a schematic flowchart of a method for segmenting a sky region in one embodiment;

3 is a schematic flowchart of a method for segmenting a sky region in another embodiment;

4 is a schematic flowchart of a method for segmenting a sky region in another embodiment;

5 is a structural block diagram of an apparatus for dividing a sky area in one embodiment;

FIG. 6 is a diagram of the internal structure of a computer device in one embodiment.

Embodiments of the present invention

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

The sky area segmentation method provided in this application can be applied to the computer equipment as shown in FIG. 1 . The computer equipment can be, but is not limited to, any type of terminal capable of image processing or video processing, such as: ordinary cameras, panoramic cameras , smart phones, personal computers, notebook computers, tablet computers, and VR eyes, etc.; the internal structure of the computer equipment is shown in Figure 1, including a processor, memory, communication interface, display screen and input device connected through the system bus .

In one embodiment, as shown in FIG. 2, a method for segmenting a sky area is provided, and the method is applied to the computer device in FIG. 1 as an example for description, including the following steps:

Step 201, according to the proportion of sky elements in the target panoramic video, determine whether to perform sky segmentation processing on the target panoramic video.

Wherein, the target panoramic video is a panoramic video to be processed that needs to be divided into the sky.

When the computer equipment performs sky segmentation on the target panoramic video, it needs to first determine whether the target panoramic video is suitable for sky segmentation, that is to say, only when the target panoramic video is suitable for sky segmentation, the target panoramic video is processed. Sky segmentation operation; optionally, it can be determined whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video; wherein, the target panoramic video may include multiple frames of panoramic images, each frame of panoramic video The image can include sky elements and non-sky elements; optionally, the color panoramic image can be converted into a grayscale image through operations such as preprocessing, normalization, subtraction of mean difference, and division of variance, so as to reduce the amount of data processing. Optionally, when determining the proportion of sky elements in the target panoramic video, image processing technology can be used to mark the sky area and non-sky area of each frame of panoramic image, and then the target panoramic video can be determined. The total area of the sky area, that is, the sum of the area of the sky area of each frame of panoramic image, and the total area of the target panoramic video, that is, the sum of the area of each frame of panoramic image; then, according to the sky of the target panoramic video The proportion of sky elements in the target panoramic video can be obtained by dividing the total area of the area by the total area of the target panoramic video. Further, when judging whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video, optionally, the proportion of sky elements in the target panoramic video and the size of the preset threshold can be used. relationship to determine whether to perform sky segmentation processing on the target panoramic video; when the proportion of sky elements in the target panoramic video is greater than the preset threshold, it can be determined to perform sky segmentation processing on the target panoramic video; in the target panoramic video When the proportion of the sky elements is not greater than the preset threshold, the sky segmentation process is not performed on the target panoramic video; Split processing.

Step 202, if yes, then extract M frames of panoramic images of the target panoramic video, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain the mask of each frame of panoramic images in the M frames of panoramic images image; the mask image includes sky and non-sky areas.

The M frames of panoramic images of the target panoramic video are all corresponding panoramic images after converting the target panoramic video into video frame images according to a preset video frame conversion ratio, that is, M is the total number of frames of panoramic images. In addition, the preset model includes a panoramic sky segmentation algorithm for accurately identifying panoramic images. The input is a panoramic image, and the output is a mask image corresponding to the panoramic image. The mask image includes sky areas and non-sky areas. area.

After step 201, the computer device can extract the target panoramic video when it determines that the target panoramic video is subjected to sky segmentation processing, that is, the target panoramic video has a large proportion of sky elements and is suitable for sky segmentation processing of the target panoramic video. M frames of panoramic images of the video, and input each frame of the panoramic images of the M frames of panoramic images into the preset model to obtain the mask images corresponding to each frame of the panoramic images in the M frames of panoramic images, that is, Get M mask images.

Step 203: Perform sky segmentation processing on the panoramic image corresponding to the mask image among the M frames of panoramic images according to the mask image.

After obtaining the mask image corresponding to each frame of panoramic image, the frame of the panoramic image corresponding to the mask image can be subjected to sky segmentation processing according to the mask image, that is, to remove the frame of the panoramic image and the mask image. For each pixel value corresponding to the sky area in the image, and retaining the pixel value corresponding to the non-sky area, the non-sky area image obtained by dividing the sky area can be obtained.

In this embodiment, the computer device determines whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video; if it is determined to perform sky segmentation processing on the target panoramic video, extract the target panoramic video M frames of panoramic images, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain a mask image of each frame of panoramic images in the M frames of panoramic images; The panoramic image corresponding to the mask image in the M frames of panoramic images is subjected to sky segmentation processing; wherein, the mask image includes a sky area and a non-sky area. It can be seen that in this embodiment, before the sky segmentation processing of the target panoramic video is performed, the target panoramic video is first judged to determine whether the target panoramic video is suitable for sky segmentation, and when the target panoramic video is suitable for sky segmentation, the The preset model performs sky segmentation processing on the target panoramic video; this can avoid the problem of poor sky segmentation effect caused by still performing sky segmentation when the panoramic video contains less sky areas in the prior art. By considering the sky proportion of the panoramic video, the scenes with less sky area that are not suitable for sky segmentation can be effectively filtered, thereby improving the sky segmentation effect; further, each frame of panoramic image can be accurately obtained through the preset model in this embodiment. Respectively corresponding mask images, improve the edge processing accuracy of panoramic images, and improve the processing speed of sky segmentation, realize accurate recognition of panoramic images and panoramic videos, and greatly improve the accuracy of panoramic image and panoramic video recognition. Improve the segmentation effect of panoramic images and panoramic videos. Therefore, by using the sky region segmentation method in this embodiment, the sky segmentation effect on panoramic images and panoramic videos can be greatly improved.

FIG. 3 is a schematic flowchart of a method for segmenting a sky region in another embodiment. This embodiment relates to an optional implementation process of judging whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video; on the basis of the above embodiment, as shown in FIG. 3 , the above step 201 includes:

Step 301: Determine N frames of panoramic images from the M frames of panoramic images.

In an optional implementation manner of this embodiment, when determining the proportion of sky elements in the target panoramic video, a partial panoramic image may be selected from all frames of panoramic images corresponding to the target panoramic video, and according to the partial panoramic video The proportion of sky elements in the image is used to determine whether to perform sky segmentation processing on the target panoramic video, so as to reduce the amount of data processing of computer equipment and the amount of calculation of memory. That is, N frames of panoramic images can be determined from the M frames of panoramic images; alternatively, continuous N frames of panoramic images can be randomly selected from the M frames of panoramic images, or different frames of panoramic images can be randomly selected from the M frames of panoramic images. Consecutive N frames of panoramic images. The manner of acquiring N frames of panoramic images is not limited in this embodiment.

Step 302: Input the N frames of panoramic images into the preset model to obtain N mask images corresponding to the N frames of panoramic images.

Step 303, according to the proportion of sky pixels in the N mask images, determine whether to perform sky segmentation processing on the target panoramic video.

In an optional implementation manner of this embodiment, after obtaining the mask images corresponding to the N frames of panoramic images respectively, since the mask image includes a sky area and a non-sky area, that is, the sky area and the non-sky area Pixels are marked by binary values of 0 and 1, or 0 and 255; further, it can be judged whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky pixels in the N mask images; optional The proportion of the sky pixels can be determined according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images; and according to the proportion of the sky pixels and the first preset The size relationship of the threshold value determines whether to perform sky segmentation processing on the target panoramic video. For example, the number of sky pixels in the N mask images can be expressed as sky_pixels, and the number of non-sky pixels in the N mask images can be expressed as background_pixels, then the sky_ratio of the sky pixels can be expressed as: sky_ratio=sky_pixels/(sky_pixels+background_pixels); further, according to the ratio sky_ratio of the sky pixels and the first preset threshold, it can be judged whether to perform sky segmentation processing on the target panoramic video; optionally, in the sky pixel When the proportion sky_ratio is greater than the first preset threshold, it can be judged that the target panoramic video is subjected to sky segmentation processing; when the proportion sky_ratio of the sky pixels is not greater than the first preset threshold, it can be judged that the target panoramic video is not to be processed. Perform sky segmentation processing.

In this embodiment, the computer device determines N frames of panoramic images from the M frames of panoramic images, and inputs the N frames of panoramic images into the preset model to obtain N mask images corresponding to the N frames of panoramic images; then, according to The proportion of sky pixels in the N mask images determines whether to perform sky segmentation processing on the target panoramic video; that is, in this embodiment, the computer device selects a part of the panoramic image and divides the sky pixels of the part of the panoramic image. The proportion of the target panoramic video is used as the judgment basis for the target panoramic video to determine whether to perform sky segmentation processing on the target panoramic video, which can reduce the data processing volume of the computer equipment; in addition, by inputting the selected N panoramic images into the above-mentioned preset model. , obtain the corresponding N mask images, since the preset model can accurately identify the sky area and non-sky area in the panoramic image, therefore, according to the proportion of sky pixels obtained from the N mask images, and then according to the sky area The proportion of pixels determines whether to perform sky segmentation processing on the target panoramic video, which greatly improves the accuracy of judging the target panoramic video.

FIG. 4 is a schematic flowchart of a method for segmenting a sky region in another embodiment. This embodiment relates to another optional implementation process of judging whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video; on the basis of the above embodiment, as shown in FIG. 4 As shown, the above step 201 includes:

Step 401: Determine N frames of panoramic images from the M frames of target panoramic images.

Referring to the discussion of step 301, details are not repeated here.

Step 402, using the second preset model to identify the N frames of panoramic images to obtain N output results.

Wherein, the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain the sky. Optionally, a positive sample image containing a sky area and a negative sample image not containing a sky area can be input into any traditional classification algorithm model, and the sky classification algorithm model can be obtained by training, and the sky classification algorithm model can be used as the second classification algorithm model. A preset model; wherein, any traditional classification algorithm model can be based on a support vector machine (Support Vector Machine) Machine, referred to as SVM), adjacent algorithm (K-NearestNeighbor, referred to as KNN), and deep learning convolutional neural network (Convolutional Neural Networks, CNN for short) and other classification algorithms; among them, deep learning CNN algorithms may include: Residual Neural Networks (Residual Neural Networks) Network, referred to as ResNet), high-resolution network (High-Resoultion Net, referred to as HRNet), and high-efficiency models for mobile and embedded vision applications (referred to as MobileNet), etc.; this embodiment uses which traditional classification algorithm model training The obtained second preset model is not limited.

In an optional implementation manner of this embodiment, after N frames of panoramic images are obtained, the N frames of panoramic images may be respectively input into the second preset model, and the N frames of panoramic images may be input by using the second preset model. Each frame of panoramic image is identified separately, and the corresponding output results of each frame of panoramic image are obtained, that is, N output results are obtained; wherein, each output result indicates whether the corresponding frame of panoramic image is a panoramic image containing the sky, or does not contain the sky. panoramic image.

Step 403 , according to the proportion of panoramic images including the sky in the N output results, determine whether to perform sky segmentation processing on the target panoramic video.

In an optional implementation manner of this embodiment, after N output results are obtained, it may be determined whether to perform sky segmentation on the target panoramic video according to the proportion of panoramic images containing the sky in the N output results processing; optionally, according to the number of panoramic images that contain the sky in the N output results and the number of panoramic images that do not contain the sky in the N output results, determine the proportion of the panoramic images that contain the sky; and then according to the Whether to perform sky segmentation processing on the target panoramic video is determined according to the relationship between the proportion of panoramic images including the sky and the second preset threshold. For example, the number of panoramic images containing the sky in the N output results can be expressed as sky_nums, and the number of panoramic images that do not contain the sky in the N output results can be expressed as background_nums, then, the proportion of the panoramic images containing the sky sky_ratio It can be expressed as: sky_ratio=sky_nums/(sky_nums+background_nums); further, according to the proportion of the panoramic image containing the sky and the second preset threshold, it can be judged whether to perform sky segmentation processing on the target panoramic video; optionally , when the proportion sky_ratio of the panoramic image including the sky is greater than the second preset threshold, it can be determined that the target panoramic video is subjected to sky segmentation processing; when the proportion of the panoramic image including the sky sky_ratio is not greater than the second preset threshold When , it can be determined that the sky segmentation process is not to be performed on the target panoramic video.

In this embodiment, the computer device determines N frames of panoramic images from the M frames of target panoramic images, and uses the second preset model to identify the N frames of panoramic images to obtain N output results, and then, according to the N outputs The proportion of the panoramic images containing the sky in the result determines whether to perform sky segmentation processing on the target panoramic video; that is, in this embodiment, the computer equipment selects a part of the panoramic image and divides the panoramic image of the part of the panoramic image including the sky. The proportion of images is used as the basis for judging the target panoramic video to determine whether to perform sky segmentation processing on the target panoramic video, which can reduce the data processing volume of computer equipment; in addition, pre-trained can accurately identify whether the panoramic image contains sky. The second preset model is to identify the part of the panoramic images respectively to obtain the corresponding output results, and then determine the proportion of the panoramic images containing the sky according to the output results and determine whether to perform sky segmentation processing on the target panoramic video, which can improve the The accuracy of the target panoramic video judgment.

In an optional embodiment of the present application, the training sample set of the preset model may include multiple frames of panoramic images and a reference image corresponding to each frame of panoramic images in the multiple frames of panoramic images, where the reference image is used to indicate the The sky area of the panoramic image corresponding to the reference image, that is to say, the sky area and the non-sky area in the panoramic image corresponding to the reference image can be distinguished according to the reference image; in this embodiment, the panoramic image can be in any form. The sky area is marked to obtain a reference image that can distinguish the sky area in the panoramic image. Optionally, the reference image can be a mask image corresponding to each frame of panoramic image, and the mask image can be marked by relevant software, for example: a frame of panoramic image can be marked by PS software to distinguish the sky. area and non-sky area to obtain the mask image corresponding to the frame of panoramic image, wherein the sky area in the mask image can be marked as 1 or 255, and the non-sky area can be marked as 0; the reference image can also be divided into In the form of a line, the sky area is included to obtain a reference image that can mark the sky area. Can. In addition, the multi-frame panoramic images may include multi-frame panoramic images corresponding to different scenes, may also include panoramic images corresponding to different resolutions, and may also include any panoramic image with distortion or an indistinct boundary between sky and non-sky etc.; this embodiment does not limit it. By collecting different types of panoramic images, the processing effect of the prediction model on the panoramic image can be greatly improved, and the edge processing accuracy of the panoramic image can be improved.

In an optional embodiment of the present application, the training process of the preset model includes: inputting the panoramic images in the training sample set into a neural network model, and adjusting the neural network model according to the output of the neural network model and the loss value of the reference image parameters of the neural network model to obtain the preset model. Optionally, a partial panoramic image in the training sample set and a reference image corresponding to each panoramic image in the partial panoramic image can be input into a neural network model, and model training can be performed to obtain a panoramic sky segmentation algorithm; Input another part of the panoramic image in the training sample set into the trained neural network model, test the panoramic sky segmentation algorithm, and obtain the output of the neural network model, that is, obtain the mask image corresponding to each panoramic image. ; Adjust the parameters of the neural network model according to the loss value of the mask image corresponding to each panoramic image and the reference image corresponding to each panoramic image respectively, so as to further optimize the neural network model, and obtain a better output result. The neural network model of the accurate panoramic sky segmentation algorithm is used as the preset model; wherein, the loss value of the reference image is the loss between the output of the neural network model, that is, the mask image, and the reference image, and the loss value is The larger the value is, the larger the error between the mask image and the reference image is; the neural network model is further optimized according to the loss value until the loss value reaches the minimum, that is, the neural network model The output mask image has the smallest error with the corresponding reference image, which makes the output of the neural network model more accurate. Among them, the neural network model can use a fully convolutional network (Fully Convolutional Networks (FCN for short), Deep Convolutional Network (Deeplab), Pyramid Scene Parsing Network (PSPNet), High-Resoultion Net (HRNet) and other deep learning-based convolutional neural networks Network model; this is not limited in this embodiment of the present application.

Optionally, after the trained preset model is obtained, that is, after the trained panoramic sky segmentation algorithm is obtained, the performance evaluation of the preset model can be performed, the accuracy of the preset model can be calculated, and the actual measurement of the preset model can be performed. The processing speed on the computer equipment; optionally, any number of panoramic images can be input into the preset model, respectively, the mask images corresponding to the multiple panoramic images are output, and the mask images are calculated according to the multiple mask images. The accuracy of the preset model, the accuracy of the preset model may include calculating the average intersection ratio of the preset model (mean Intersection over Union, referred to as mIoU), and calculate the pixel accuracy of the preset model (Accuracy, referred to as acc), etc.; and calculate the average time-consuming of the preset model to process a panoramic image, that is, the processing speed of the model.

In an optional embodiment of the present application, a model compression algorithm may also be used to process the neural network model to update the neural network model. Optionally, before the panoramic images in the training sample set are input into the neural network model, model compression can be performed on the neural network model through a model compression algorithm to reduce the data processing volume of the model; and then according to the neural network model after model compression. The training of the panoramic sky segmentation algorithm model is carried out to improve the processing speed of the computer equipment to execute the panoramic sky segmentation algorithm and achieve the effect of real-time processing. Optionally, the model compression algorithm may include model pruning, reducing the number of channels, reducing the number of network layers, reducing the input resolution, knowledge distillation, ablation experiments, and replacing the neural network model with more lightweight feature extraction. In order to reduce the parameters and calculation amount of the neural network model, and ensure that the accuracy of the model is not significantly reduced, the processing speed of the model is improved to meet the requirements of real-time processing. The model compression algorithm is not specifically limited in this embodiment, as long as the processing speed of the neural network model can be improved under the condition that the accuracy of the neural network model is not significantly reduced.

It should be understood that although the steps in the flowcharts of FIGS. 2-4 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-4 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.

In one embodiment, as shown in FIG. 5, a sky area segmentation device is provided, including: a judgment module 501, an acquisition module 502 and a segmentation module 503, wherein:

The judgment module 501 is configured to judge whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video.

The acquisition module 502 is configured to extract M frames of panoramic images of the target panoramic video when it is judged to perform sky segmentation processing on the target panoramic video, and input each frame of panoramic images in the M frames of panoramic images into a preset model, A mask image of each frame of panoramic images in the M frames of panoramic images is obtained; the mask image includes a sky area and a non-sky area.

The segmentation module 503 is configured to perform sky segmentation processing on the panoramic image corresponding to the mask image in the M frames of panoramic images according to the mask image.

In one embodiment, the above judgment module 501 is specifically configured to determine N frames of panoramic images from the M frames of panoramic images before determining to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video image; input the N frames of panoramic images into the preset model, and obtain N mask images corresponding to the N frames of panoramic images; according to the proportion of sky pixels in the N mask images, determine whether to perform Sky segmentation processing.

In one embodiment, the above judgment module 501 is specifically configured to determine the proportion of the sky pixels according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images; The relationship between the proportion of the sky pixels and the first preset threshold determines whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the above judgment module 501 is further configured to determine N frames from the M frames of target panoramic images before determining to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video Panoramic image; use the second preset model to identify the N frames of panoramic images, and obtain N output results; the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain the sky ; According to the proportion of panoramic images containing the sky in the N output results, determine whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the above judgment module 501 is specifically configured to determine the number of panoramic images containing the sky according to the number of panoramic images containing the sky in the N output results and the number of panoramic images that do not contain the sky in the N output results. The proportion of the panoramic image; according to the relationship between the proportion of the panoramic image including the sky and the second preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the apparatus further includes: a model training module; the model training module is used for inputting the panoramic images in the training sample set into a neural network model, and according to the output of the neural network model and the loss of the reference image value to adjust the parameters of the neural network model to obtain the preset model.

In one embodiment, the model training module is further configured to process the neural network model by using a model compression algorithm to update the neural network model.

For the specific limitation of the sky area segmentation device, reference may be made to the definition of the sky area segmentation method above, which will not be repeated here. All or part of the modules in the above-mentioned sky area segmentation device can be implemented by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided, and the computer device may be the above-mentioned terminal, and its internal structure diagram may be as shown in FIG. 6 . The computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies. The computer program, when executed by a processor, implements a sky region segmentation method. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:

In one embodiment, the processor further implements the following steps when executing the computer program: determining N frames of panoramic images from the M frames of panoramic images; inputting the N frames of panoramic images into the preset model to obtain the corresponding N frames of panoramic images. N mask images; according to the proportion of sky pixels in the N mask images, determine whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the processor further implements the following steps when executing the computer program: determining the proportion of the sky pixels according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images ; According to the relationship between the proportion of the sky pixels and the first preset threshold, it is judged whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the processor further implements the following steps when executing the computer program: determining N frames of panoramic images from the M frames of target panoramic images; identifying the N frames of panoramic images by using a second preset model to obtain N outputs Result; the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain the sky; according to the proportion of the panoramic images containing the sky in the N output results, it is judged whether the target The panoramic video is processed by sky segmentation.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: according to the number of panoramic images including the sky in the N output results and the number of panoramic images that do not include the sky in the N output results, determine the number of the panoramic images including the sky. The proportion of the panoramic image of the sky; according to the relationship between the proportion of the panoramic image containing the sky and the second preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, the processor further implements the following steps when executing the computer program: the training sample set of the preset model includes multiple frames of panoramic images and a reference image corresponding to each frame of panoramic images in the multiple frames of panoramic images, the reference image It is used to indicate the sky area of the panoramic image corresponding to the reference image.

In one embodiment, when the processor executes the computer program, the following steps are further implemented: input the panoramic images in the training sample set into a neural network model, and adjust the output of the neural network model according to the output of the neural network model and the loss value of the reference image. parameters to obtain the preset model.

In one embodiment, the processor further implements the following steps when executing the computer program: processing the neural network model with a model compression algorithm to update the neural network model.

In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

According to the mask image, sky segmentation processing is performed on the panorama image corresponding to the mask image among the M frames of panorama images.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: determining N frames of panoramic images from the M frames of panoramic images; inputting the N frames of panoramic images into the preset model to obtain the corresponding N frames of panoramic images. N mask images of ; according to the proportion of sky pixels in the N mask images, it is judged whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: determining the proportion of the sky pixels according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images ratio; according to the relationship between the proportion of the sky pixels and the first preset threshold, it is judged whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: determine N frames of panoramic images from the M frames of target panoramic images; identify the N frames of panoramic images by using the second preset model, and obtain N frames of panoramic images. The output result; the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain the sky; according to the proportion of the panoramic images containing the sky in the N output results, determine whether to The target panoramic video is subjected to sky segmentation processing.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: according to the number of panoramic images including the sky in the N output results and the number of panoramic images not including the sky in the N output results, determine the The proportion of the panoramic image including the sky; according to the relationship between the proportion of the panoramic image including the sky and the second preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: the training sample set of the preset model includes multiple frames of panoramic images and a reference image corresponding to each frame of panoramic images in the multiple frames of panoramic images, the reference The image is used to indicate the sky area of the panoramic image corresponding to the reference image.

In one embodiment, the computer program further implements the following steps when executed by the processor: inputting the panoramic images in the training sample set into a neural network model, and adjusting the neural network model according to the output of the neural network model and the loss value of the reference image parameters to obtain the preset model.

In one embodiment, the computer program, when executed by the processor, further implements the step of: processing the neural network model with a model compression algorithm to update the neural network model.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory or optical memory, etc. Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (Dynamic Random Access Memory). Access Memory, DRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

A method for segmenting a sky area, characterized in that the method comprises:

According to the proportion of sky elements in the target panoramic video, determine whether to perform sky segmentation processing on the target panoramic video;

If so, extract M frames of panoramic images of the target panoramic video, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain the mask of each frame of panoramic images in the M frames of panoramic images an image; the mask image includes a sky area and a non-sky area;

Perform sky segmentation processing on the panoramic image corresponding to the mask image in the M frames of panoramic images according to the mask image.
The method according to claim 1, wherein, according to the proportion of sky elements in the target panoramic video, judging whether to perform sky segmentation processing on the target panoramic video, comprising:

Determine N frames of panoramic images from the M frames of panoramic images;

Inputting the N frames of panoramic images into the preset model to obtain N mask images corresponding to the N frames of panoramic images;

According to the proportion of sky pixels in the N mask images, it is determined whether to perform sky segmentation processing on the target panoramic video.
The method according to claim 2, wherein, according to the proportion of sky pixels in the N mask images, judging whether to perform sky segmentation processing on the target panoramic video, comprising:

Determine the proportion of the sky pixels according to the number of sky pixels in the N mask images and the number of non-sky pixels in the N mask images;

According to the relationship between the proportion of the sky pixels and the first preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.
The method according to claim 1, wherein, according to the proportion of sky elements in the target panoramic video, judging whether to perform sky segmentation processing on the target panoramic video, comprising:

Determine N frames of panoramic images from the M frames of target panoramic images;

Use the second preset model to identify the N frames of panoramic images, and obtain N output results; the input of the second preset model is a panoramic image, and the output is that the panoramic image contains the sky or the panoramic image does not contain Sky;

According to the proportion of panoramic images including the sky in the N output results, it is determined whether to perform sky segmentation processing on the target panoramic video.
The method according to claim 4, wherein determining whether to perform sky segmentation processing on the target panoramic video according to the proportion of panoramic images containing the sky in the N output results, comprising:

According to the number of panoramic images that include the sky in the N output results and the number of panoramic images that do not include the sky in the N output results, determine the proportion of the panoramic images that include the sky;

According to the relationship between the proportion of the panoramic image including the sky and the second preset threshold, it is determined whether to perform sky segmentation processing on the target panoramic video.
The method according to claim 1, wherein the training sample set of the preset model comprises multiple frames of panoramic images and a reference image corresponding to each frame of panoramic images in the multiple frames of panoramic images, and the reference images are is used to indicate the sky area of the panoramic image corresponding to the reference image.
The method according to claim 6, wherein the training process of the preset model comprises:

The panoramic images in the training sample set are input into a neural network model, and the parameters of the neural network model are adjusted according to the output of the neural network model and the loss value of the reference image to obtain the preset model.
The method according to claim 7, wherein the training process of the preset model further comprises:

The neural network model is processed using a model compression algorithm to update the neural network model.
A sky area segmentation device, characterized in that the device includes:

a judgment module, configured to judge whether to perform sky segmentation processing on the target panoramic video according to the proportion of sky elements in the target panoramic video;

an acquisition module, configured to extract M frames of panoramic images of the target panoramic video, input each frame of panoramic images in the M frames of panoramic images into a preset model, and obtain each frame of the M frames of panoramic images A mask image of a panoramic image; the mask image includes a sky area and a non-sky area;

A segmentation module, configured to perform sky segmentation processing on the panoramic image corresponding to the mask image in the M frames of panoramic images according to the mask image.
A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 8 when the processor executes the computer program.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.