WO2023051362A1

WO2023051362A1 - Image area processing method and device

Info

Publication number: WO2023051362A1
Application number: PCT/CN2022/120322
Authority: WO
Inventors: 朱渊略; 黄佳斌; 王一同
Original assignee: 北京字跳网络技术有限公司
Priority date: 2021-09-30
Filing date: 2022-09-21
Publication date: 2023-04-06
Also published as: CN115908792A

Abstract

Provided in the embodiments of the present disclosure are an image area processing method and a device. The method comprises: acquiring a target image, and a device posture of a terminal when photographing the target image; performing image area recognition on the target image, so as to obtain an initial recognition result, wherein an image area comprises at least one of a ceiling area, a wall area and a ground area; and correcting the initial recognition result according to the device posture, so as to obtain a corrected recognition result. Therefore, by means of correcting an image area recognition result on the basis of a device posture, the accuracy of image area recognition is improved.

Description

Image area processing method and device

Related Application Cross Reference

This disclosure claims the priority of the Chinese patent application with application number 202111168828.4 and titled "Image Area Processing Method and Device" filed on September 30, 2021, the entire contents of which are incorporated herein by reference.

technical field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to an image region processing method and device.

Background technique

The deep learning algorithm based on convolutional neural network can realize end-to-end learning, and has good performance, and has broad application prospects in the field of image recognition. Among them, the recognition of indoor ceilings, walls, floors and other areas is one of the research directions in the field of image recognition.

When the ceiling, wall, and ground are shot at close range, due to the lack of reference objects, their planes are very similar, and it is difficult for deep learning algorithms to distinguish whether it is the ceiling, the wall, or the ground. However, the traditional algorithm based on edge detection and spatial geometric information relies on the boundary for plane segmentation, and the segmented plane has smoothness. For some videos or images with blurred edges, this method is prone to failure.

It can be seen that the accuracy of image region recognition in the image needs to be improved.

Contents of the invention

Embodiments of the present disclosure provide an image region processing method and device to overcome the problem of low accuracy of image region recognition in an image.

In a first aspect, an embodiment of the present disclosure provides an image region processing method, including:

Obtaining the target image and the device posture of the terminal when capturing the target image;

Performing image region recognition on the target image to obtain an initial recognition result, the image region includes at least one of a ceiling region, a wall region, and a floor region;

The initial recognition result is corrected according to the posture of the device to obtain a corrected recognition result.

In a second aspect, an embodiment of the present disclosure provides an image processing device, including:

an acquisition unit, configured to acquire the target image and the equipment posture of the terminal when capturing the target image;

An identification unit, configured to perform image area identification on the target image to obtain an initial identification result, the image area includes at least one of a ceiling area, a wall area, and a floor area;

A correction unit is configured to correct the initial recognition result according to the posture of the device to obtain a corrected recognition result.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and a memory;

the memory stores computer-executable instructions;

The at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the image region processing method described in the above first aspect or various possible designs of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the above first aspect or first Aspects of various possible designs of the image region processing method.

In a fifth aspect, according to one or more embodiments of the present disclosure, a computer program product is provided, the computer program product includes computer-executable instructions, and when a processor executes the computer-executable instructions, the first aspect or Various possible designs of the image region processing method in the first aspect.

In a sixth aspect, according to one or more embodiments of the present disclosure, a computer program is provided, the computer program is used to implement the image region processing method described in the first aspect or various possible designs of the first aspect.

The image region processing method and device provided in this embodiment perform image region recognition on the target image to obtain an initial recognition result, the image region includes at least one of a ceiling region, a wall region, and a floor region; the initial recognition result is processed according to the posture of the device Correction to get the corrected recognition result. Therefore, by initially identifying the image area and then correcting the initial identification result based on the device pose corresponding to the image, the accuracy of image area identification is improved.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present disclosure. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.

FIG. 1 is a schematic diagram of an application scenario applicable to an embodiment of the present disclosure;

FIG. 2 is a first schematic flow diagram of an image region processing method provided by an embodiment of the present disclosure;

FIG. 3 is a second schematic flow diagram of an image region processing method provided by an embodiment of the present disclosure;

FIG. 4 is a structural block diagram of an image area processing device provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of the embodiments of the present disclosure, but not all of them. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.

When identifying image areas such as ceilings, walls, and floors in an image, there are several ways:

Method 1, using a convolutional neural network-based deep learning model to identify image regions. The deep learning model can achieve end-to-end training and has good performance. However, in the case of close-up shooting, due to the lack of reference objects, the similarity between the ceiling area, wall area, and ground area is high, and the deep learning model is difficult. Distinguish between ceiling areas, wall areas, and floor areas in the image.

The second way is to identify the image area based on the algorithm of edge detection and spatial geometric information. For videos or images with blurred edges, the accuracy of image region recognition in this manner is low.

In order to improve the accuracy of image area recognition, the present disclosure provides an image area processing method and device. In this method, considering that the device posture of the image captured by the terminal has a great influence on the position distribution of the ceiling area, wall area, and ground area in the image, after the image area is recognized on the image, based on the device posture when the terminal captures the image , correct the recognition result of the image area, so as to improve the recognition accuracy of the ceiling area, wall area, and ground area in the image.

Referring to FIG. 1 , FIG. 1 is a schematic diagram of an application scenario applicable to an embodiment of the present disclosure.

The application scenario shown in FIG. 1 is an image processing scenario, and in this application scenario, the involved devices include an image processing device 101 for performing image region recognition on an image. Wherein, the image processing device 101 may be a terminal or a server, and the server is taken as an example in FIG. 1 .

Optionally, the devices involved in the application scenario further include an image capturing device 102 configured to capture images and send the captured images to the image processing device 101 .

Wherein, the image capture device 102 is a terminal with a camera function, such as a camera, a handheld device with a camera (such as a smart phone, a tablet computer), a computing device with a camera (such as a personal computer (personal computer, PC for short)), a Wearable devices with cameras (such as smart watches), smart home devices with cameras.

Wherein, the image processing device 101 and the image capturing device 102 are the same device, for example, the image processing device 101 and the image capturing device 102 are the same smart phone, and the smart phone captures images and performs real-time or non-real-time image area recognition on the images. Alternatively, the image processing device 101 and the image capturing device 102 are different devices, and the image capturing device 102 sends the captured image to the image processing device 101, for example, through the network, and the image processing device 101 performs image scene recognition on the image, for example, by a smart phone The captured image is sent to the server, and the server recognizes the image area.

Optionally, the image is a scene image of an indoor scene. Therefore, based on the image region processing method provided by the embodiments of the present disclosure, the accuracy of image region recognition for indoor scene images that are difficult to recognize in ceiling regions, wall regions, and floor regions can be improved.

Exemplarily, the image region processing method provided by the embodiments of the present disclosure may be applied to electronic devices, such as terminals and servers. Wherein, the terminal can be a personal digital assistant (personal digital assistant, PDA for short) device, a handheld device (such as a smart phone, a tablet computer), a computing device (such as a personal computer), a vehicle device, a wearable device (such as a smart watch, a smart hand Ring), and smart home devices (such as smart display devices), etc. The server may be a distributed server, a centralized server, a cloud server, or the like.

Referring to FIG. 2 , FIG. 2 is a first schematic flowchart of an image region processing method provided by an embodiment of the present disclosure. As shown in Figure 2, the image region processing method includes:

S201. Acquire a target image and a device posture of the terminal when capturing the target image.

The device posture includes a device angle (also referred to as a device orientation). In the same scene, different device postures of the terminal result in different captured image contents. The target image may be an image captured by the terminal in real time, or an image stored in a database, or an image input by a user. The target image may also be a video frame in the target video, and the target video may be a video captured by the terminal in real time, or a video stored in a database, or a video input by a user.

In an example, the target image captured by the terminal in real time and the device posture of the terminal when capturing the target image are acquired, so as to perform real-time image area recognition on the target image.

In yet another example, the target image and the equipment posture of the terminal when shooting the target image are acquired from an online or offline database, so as to perform image region recognition on the target image pre-stored in the database, wherein the acquisition method of the target image is, for example, as follows: Obtain target images randomly or sequentially from the database, or obtain user-specified target images from the database.

In yet another example, the target image input by the user and the device posture of the terminal input by the user when capturing the target image are acquired, so as to perform image region recognition on the target image input by the user.

S202. Perform image region recognition on the target image to obtain an initial recognition result, where the image region includes at least one of a ceiling region, a wall region, and a floor region.

Wherein, the initial recognition result may include the image region recognized in the target image, specifically, the initial recognition result includes at least one of the recognized ceiling region, wall region and floor region.

In this embodiment, the image recognition model can be used to identify at least one image area in the ceiling area, wall area, and ground area of the target image, and obtain the ceiling area, wall area, and ground area included in the identified target image. at least one of . Wherein, the image recognition model is a deep learning model for image region recognition, such as a convolutional neural network.

Optionally, before performing image region recognition on the target image, a preprocessing operation is performed on the target image, so that the target image meets the requirements of the deep learning model for input data, and at the same time, the image quality of the target image is improved. Wherein, the preprocessing operation includes one or more of the following operations: size scaling operation, cropping operation, flipping operation, and image enhancement operation, and the image enhancement operation includes enhancement of one or more aspects of image contrast, image saturation, and image tone.

As an example, the preprocessing process of the target image includes: first, randomly scaling the target image to a size within a preset multiple range; then, randomly cropping the scaled target image to the target size; The image is randomly flipped horizontally; finally, data enhancement is performed on the contrast, saturation, and hue of the flipped target image.

S203. Correct the initial recognition result according to the device posture to obtain a corrected recognition result.

In this embodiment, since the planes of the ceiling area, the wall area, and the ground area are relatively similar, especially the planes of the ceiling area and the ground area are relatively similar, when the target image is recognized by the image recognition model, recognition errors may occur, for example Ceiling areas are misidentified as floor areas and floor areas are misidentified as ceiling areas. Considering that the device posture of the terminal when shooting the target image has a great influence on the image position distribution of the ceiling region, wall region, and ground region on the target image, after obtaining the initial recognition result, according to the device posture of the terminal when shooting the target image, Correct the misrecognized regions in the initial recognition results to improve the accuracy of image region recognition.

For example, when the terminal captures the target image, one or more factors such as the distance from the ground and the tilt angle in the device posture affect whether the terminal can capture the ceiling, wall, and ground, and then affect whether the target image includes the ceiling area, wall area, ground area. If it can be determined based on the device posture of the terminal that the terminal cannot capture the ground when shooting the target image, the target image should not contain the ground area. At this time, if the initial recognition result includes the ground area contained in the recognized target image, then the initial The ground area in the recognition result is the misrecognition area.

Optionally, when correcting the misrecognition area in the initial recognition result, the misrecognition area can be corrected to the remaining two types except the misrecognition area in the ceiling, wall, and ground based on the equipment posture of the terminal when shooting the target image. The image region most likely to appear in the region, thereby improving the accuracy of image region recognition. For example, the ground area identified in the initial recognition result is a misrecognition area. If the most likely image area in the target image is determined to be the ceiling area based on the equipment posture when the terminal captures the target image, then the ground area identified in the initial recognition result The area is corrected to be the ceiling area; if the most likely image area in the target image is determined to be the wall area based on the equipment posture when the terminal captures the target image, then the ground area identified in the initial recognition result is corrected as the wall area.

Optionally, when correcting the misrecognized area in the initial recognition result, the image recognition model used in the initial recognition or a deep learning model that is more complex than the image recognition model used in the initial recognition can also be used to correct the target image. Re-identification is performed, and based on the recognition result obtained from the re-identification, the misrecognized area in the initial recognition result is corrected, thereby improving the accuracy of image area recognition.

In the embodiment of the present disclosure, on the basis of the image area recognition of the target image by the image recognition model and the initial recognition result, the initial recognition result is corrected based on the shooting posture of the terminal when the image is taken, so as to solve the problem of the ceiling of the image. The recognition accuracy of one or more of the areas, wall areas, and ground areas is not high, and the accuracy of image area identification is improved.

In some embodiments, the device attitude of the terminal includes the elevation angle of the terminal and/or the depression angle of the terminal, and the elevation angle and depression angle of the terminal refer to the angle between the shooting direction (also understood as line of sight) of the camera on the terminal and the horizontal line. When the shooting direction of the camera on the terminal is higher than the horizontal line, the angle between the shooting direction of the camera on the terminal and the horizontal line is the elevation angle of the terminal; when the shooting direction of the camera on the terminal is lower than the horizontal line, the angle between the shooting direction of the camera on the terminal and the horizontal line angle is the depression angle of the terminal. Based on this, refer to FIG. 3 . FIG. 3 is a second schematic flow diagram of an image region processing method provided by an embodiment of the present disclosure. As shown in Figure 3, the image region processing method includes:

S301. Acquire a target image and a device posture of the terminal when capturing the target image, where the device posture of the terminal includes an elevation angle of the terminal and/or a depression angle of the terminal.

In this embodiment, the elevation angle and/or depression angle of the terminal when capturing the target image can be obtained through a sensor on the terminal. Wherein, the sensor may be an angle sensor, a gravity sensor and the like. For example, when the terminal is a mobile phone, the sensor is an inertial measurement unit (Inertial Measurement Unit, IMU) in the terminal.

S302. Perform image region recognition on the target image to obtain an initial recognition result, where the image region includes at least one of a ceiling region, a wall region, and a floor region.

Wherein, the implementation principle and technical effect of S302 may refer to the foregoing embodiments, and details are not repeated here.

S303. Compare the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold.

Wherein, the angle threshold includes an angle threshold for comparing with the elevation angle of the terminal and an angle threshold for comparing with the depression angle of the terminal, and the angle threshold for comparing with the elevation angle of the terminal is the same as the angle threshold for comparing with the depression angle of the terminal. The angle thresholds can be the same or different.

In this embodiment, the elevation angle of the terminal may be compared with a corresponding angle threshold to obtain a comparison result; and/or the depression angle of the terminal may be compared with a corresponding angle threshold to obtain a comparison result.

S304. Correct the initial recognition result according to the comparison result to obtain a corrected recognition result.

In this embodiment, according to the comparison result of the elevation angle of the terminal and the corresponding angle threshold, and/or according to the comparison result of the depression angle of the terminal and the corresponding angle threshold, it is determined whether there is a misrecognition area in the initial recognition result, and if so, correct the misrecognition area. The recognition area is corrected to obtain the corrected recognition result.

In some embodiments, the angle threshold compared with the elevation angle of the terminal includes a first threshold. At this time, a possible implementation of S304 includes: if the comparison result between the elevation angle of the terminal and the first threshold is that the elevation angle of the terminal is greater than the first threshold A threshold value, the ground area in the initial identification result is re-identified as the ceiling area, and the corrected identification result is obtained.

Specifically, if the elevation angle of the terminal is greater than the first threshold, it indicates that the shooting direction of the camera on the terminal is seriously inclined towards the ceiling, and the terminal cannot capture the ground area. At this time, if the initial recognition result includes the identified ground area, then determine the ground area in the initial recognition result as a misrecognition area, considering that the plane similarity between the ground area and the ceiling area is higher than that between the ground area and the wall area degree, the ground area in the initial recognition result is re-identified as the ceiling area, and the revised recognition result is obtained. Thus, the accuracy of image region recognition is improved.

In some embodiments, the angle threshold compared with the elevation angle of the terminal includes a third threshold, and the third threshold is greater than the first threshold. At this time, another possible implementation of S304 includes: if the elevation angle of the terminal is different from the third threshold If the comparison result shows that the elevation angle of the terminal is greater than the third threshold, the ground area and the wall area in the initial identification result are re-identified as the ceiling area to obtain the corrected identification result.

Specifically, compared with the first threshold, the value of the third threshold is larger. If the elevation angle of the terminal is greater than the third threshold, it indicates that the shooting direction of the camera on the terminal is more inclined toward the ceiling, and the terminal cannot capture the floor area and wall area. At this time, if the initial recognition result includes the recognized ground area and/or wall area, it is determined that the ground area and the wall area in the initial recognition result are misrecognized areas, and both the ground area and the wall area in the initial recognition result are recognized For the ceiling area, the corrected recognition result is obtained. Thus, the accuracy of image region recognition is improved.

In some embodiments, the angle threshold compared with the depression angle of the terminal includes a second threshold, where the second threshold may be equal to the first threshold, or may be different from the first threshold. At this time, another possible implementation of S304 includes: if the comparison result of the depression angle of the terminal with the second threshold is that the depression angle of the terminal is greater than the second threshold, re-identify the ceiling area in the initial identification result as the ground area, and obtain Corrected recognition results.

Specifically, if the depression angle of the terminal is greater than the second threshold, it indicates that the shooting direction of the camera on the terminal is seriously inclined towards the ground, and the terminal cannot capture the ceiling area. At this time, if the initial recognition result includes the identified ceiling area, then determine the ceiling area in the initial recognition result as a misrecognition area, considering that the plane similarity between the ceiling area and the ground area is higher than the plane similarity between the ceiling area and the wall area , re-identify the ceiling area in the initial recognition result as the ground area, and get the revised recognition result. Thus, the accuracy of image region recognition is improved.

In some embodiments, the angle threshold compared with the depression angle of the terminal includes a fourth threshold, the fourth threshold is greater than the second threshold, and the fourth threshold may be equal to or different from the third threshold. At this time, another possible implementation of S304 includes: if the comparison result of the depression angle of the terminal and the fourth threshold is that the depression angle of the terminal is greater than the fourth threshold, re-identify the wall area and the ceiling area in the initial identification result as the ground area to get the corrected recognition result.

Specifically, compared with the second threshold, the value of the fourth threshold is larger. If the depression angle of the terminal is greater than the fourth threshold, it indicates that the shooting direction of the camera on the terminal is more inclined toward the ground, and the terminal cannot capture Ceiling area and wall area. At this time, if the initial recognition result includes the identified ceiling area and/or wall area, it is determined that the ceiling area and the wall area in the initial identification result are misrecognized areas, and both the ceiling area and the wall area in the initial identification result are identified is the ground area, and the corrected recognition result is obtained. Thus, the accuracy of image region recognition is improved.

In some embodiments, the initial recognition result may include area probabilities corresponding to multiple pixel points of the target image, and the area probabilities include at least one of ceiling probability, wall probability, and ground probability. Among them, the ceiling probability corresponding to the pixel point is the probability that the area where the pixel point is located is the ceiling area, the wall area corresponding to the pixel point is the probability that the area where the pixel point is located is the wall area, and the ground probability corresponding to the pixel point is the area where the pixel point is located is the ground area The probability. Wherein, during image region recognition, the region probabilities corresponding to multiple pixel points in the target image can be obtained through image recognition model recognition.

At this time, another possible implementation of S304 includes: according to the comparison result between the elevation angle and/or depression angle of the terminal and the angle threshold, in the initial recognition result, adjusting the area probabilities corresponding to the multiple pixel points of the target image; According to the adjusted region probabilities corresponding to the multiple pixels of the target image, a corrected recognition result is obtained. Wherein, the angle threshold may include one or more angle thresholds for comparison with the elevation angle of the terminal and one or more angle thresholds for comparison with the depression angle of the terminal, not limited to the above-mentioned first threshold, second threshold, third threshold and fourth threshold.

Specifically, if the elevation angle of the terminal is greater than the angle threshold, it indicates that the probability of the terminal photographing the ceiling is relatively high, and the probability of photographing the ground is small, and the ceiling probability corresponding to multiple pixels in the target image can be increased, and/or, Ground probabilities corresponding to multiple pixels in a small target image. If the depression angle of the terminal is greater than the angle threshold, it indicates that the probability of the terminal capturing the ceiling is small, and the probability of capturing the ground is relatively high. The probability of the ceiling corresponding to multiple pixels in the target image can be reduced, and/or, the target image can be increased. The ground probability corresponding to multiple pixel points in . On the basis of adjusting the ceiling probability and/or the ground probability, the wall probability corresponding to multiple pixel points in the target image may also be adjusted.

After adjusting the area probability, the area where the pixel is located can be determined according to the maximum probability value in the area probability corresponding to the pixel point in the target image. For example, the area probability corresponding to the pixel point has the largest ceiling probability, and the area where the pixel point is located is the ceiling area. Furthermore, an image area in the target image is determined based on the areas where the plurality of pixel points in the target image are located.

Optionally, there are multiple angle thresholds, and different angle thresholds may correspond to different probability adjustment amounts. The larger the angle threshold, the larger the corresponding probability adjustment. details as follows:

There are multiple angle thresholds used for comparison with the elevation angle of the terminal, and different angle thresholds correspond to different probability adjustment amounts. For example, if the elevation angle of the terminal is greater than 45 degrees, increase the ceiling probability corresponding to multiple pixels on the target image by 10% respectively, and decrease the ground probability corresponding to multiple pixel points by 10%; if the terminal elevation angle is greater than 60 degrees , the ceiling probabilities corresponding to multiple pixel points on the target image are increased by 20% respectively, and the ground probabilities corresponding to multiple pixel points are respectively decreased by 20%.

And/or, there are multiple angle thresholds used for comparison with the depression angle of the terminal, and different angle thresholds correspond to different probability adjustment amounts. Here, we will not give examples one by one.

Optionally, if the elevation angle of the terminal is greater than the first threshold, considering that the terminal cannot capture the ground area at this time, the ground probability corresponding to multiple pixels on the target image is reduced to 0, and the ceiling probability corresponding to the pixel is Add the ground probability before pixel adjustment.

Optionally, if the elevation angle of the terminal is greater than the third threshold, considering that the terminal cannot capture the ground area and the wall area, the ground probability and the wall probability corresponding to multiple pixels on the target image are reduced to 0, and the pixel points corresponding to The ceiling probability is adjusted to 100%.

Optionally, if the depression angle of the terminal is greater than the second threshold, considering that the terminal cannot capture the ceiling area at this time, the ceiling probability corresponding to multiple pixels on the target image is reduced to 0, and the ground probability corresponding to the pixel is Add the ceiling probability before pixel adjustment.

Optionally, if the depression angle of the terminal is greater than the fourth threshold, considering that the terminal cannot capture the ceiling area and the wall area, the ceiling probability and wall probability corresponding to multiple pixels on the target image are both reduced to 0, and the pixel points corresponding to The ground probability is adjusted to 100%.

Wherein, the first threshold, the second threshold, the third threshold and the fourth threshold may refer to the foregoing embodiments.

Optionally, when the region probability corresponding to a pixel includes ceiling probability, wall probability and ground probability, the sum of ceiling probability, wall probability and ground probability is 1 for the same pixel point. At this time, while increasing the probability of the ceiling, the probability of the ground needs to be reduced, and the probability of the wall can be further reduced. While increasing the probability of the ground, the probability of the ceiling needs to be reduced, and the probability of the wall can be further reduced.

Therefore, by adjusting the region probability corresponding to the pixel in the target image based on the comparison result, the flexibility and accuracy of image region correction can be improved, and the accuracy of image region recognition can be improved.

In the embodiment of the present disclosure, on the basis of the image area recognition of the target image by the image recognition model to obtain the initial recognition result, based on the comparison result of the elevation angle and/or depression angle of the terminal when capturing the image and the angle threshold, the initial recognition result The correction is made to solve the problem that the recognition accuracy is not high when one or more of the ceiling area, the wall area, and the ground area are identified in the image, and the accuracy of the image area identification is effectively improved.

Based on any of the foregoing embodiments, optionally, the image recognition model is a deep learning model trained through model distillation, thereby not only improving the recognition accuracy of the image recognition model, but also reducing the model size of the image recognition model, In particular, a lightweight image recognition model can be trained to facilitate the deployment of the image recognition model to various terminals, and realize real-time recognition of the image area on the image and/or video on the terminal. For example, the user can walk around while holding the terminal. When shooting a video, the terminal uses the image area processing method provided by any embodiment to identify the image area of each video frame in the video while shooting the video, effectively improving user experience.

Optionally, when training the image recognition model by means of model distillation, the training data and the loss function of the teacher model are used to train the teacher model multiple times to obtain a trained teacher model, and then use the training data and the trained teacher model The loss function of the model and the student model, the student model is trained multiple times to obtain a trained student model, and the trained student model is determined as an image recognition model for image region recognition. Among them, the model scale of the teacher model is larger than that of the student model, and both the teacher model and the student model are deep learning models.

Wherein, the training data may include multiple training images, and at least one of the ground area, the ceiling area, and the wall area may be pre-marked on the training image. Therefore, when training the teacher model, the teacher model may be determined based on the loss function of the teacher model The difference between the output image region recognition result and the image region marked on the training image, and then adjust the model parameters of the teacher model based on the difference.

Among them, when training the student model, the training image can be input into the student model and the teacher model, and based on the loss function of the student model, the image region recognition results output by the student model and the image region recognition results output by the teacher model, and the mark on the training image are determined. The difference in the image region, and based on the difference, the model parameters of the student model are adjusted.

Optionally, the main network structure of the teacher model adopts deeplab v3, and the image region recognition accuracy of the teacher model is improved through deeplab v3, thereby improving the image region recognition accuracy of the student model.

Optionally, the loss function of the teacher model adopts a binary cross entropy (Binary Cross Entropy, BCE) loss function, and the model training effect of the teacher model is improved through the binary cross entropy loss function.

Optionally, the loss function of the teacher model can also be obtained by weighted summation of the BCE loss function and the Regional Mutual Information (RMI) loss function, so as to improve the model performance and reduce the missing segmentation of the teacher model.

Further, in the weighted summation, the weight ratio of the BCE loss function and the RMI loss function is the same.

Optionally, the main network structure of the student model (that is, the image recognition model) adopts the network structure of Ghostnet, wherein Ghostnet is a lightweight network structure, which is convenient to be deployed on lightweight devices. Therefore, the use of Ghostnet as the main network structure of the student model is beneficial to reduce the model size of the student model, facilitate the deployment of the trained student model to the user terminal, increase the range of devices that the model can apply to, and improve user experience.

Optionally, the loss function of the student model may include a weighted loss function obtained by the BCE loss function and the RMI loss function, and may also include a distillation loss function. Among them, the loss function obtained by weighting the BCE loss function and the RMI loss function is used to determine the difference between the image region recognition result output by the student model and the image region marked on the training image, and the distillation loss function is used to determine the image region recognition result output by the student model The difference between the image region recognition results output by the teacher model. Therefore, on the one hand, it reduces the situation that the student model sends missing segmentation, on the other hand, the training of the student model is guided by the trained teacher model. Furthermore, the model performance of the student model is effectively improved.

Further, the distillation loss function can use the KL (Kullback-Leibler) divergence loss function, so that the value of the image segmentation index that can be achieved by the student model for image segmentation can be improved through the KL divergence loss function, in other words, through the KL divergence loss function The divergence loss function enables image segmentation through the student model to achieve better image segmentation results. Among them, the image segmentation index is such as Mean Intersection over Union (MIoU) index.

Based on any of the foregoing embodiments, optionally, after the corrected recognition result is obtained, the corrected recognition result may be displayed on the target image. When the execution subject is a server, the server may send the corrected recognition result to the user terminal, and the user terminal may display the corrected recognition result on the target image. When the execution subject is a terminal, the terminal can display the image area recognition result corresponding to the video frame or image in real time on the video frame or image while shooting the video or image. Therefore, the user can intuitively see each identified region on the target image, thereby improving user experience.

Further, when displaying, different image regions may be marked on the target image with different colors to improve the display effect.

Corresponding to the image area processing method in the above embodiments, FIG. 4 is a structural block diagram of an image area processing device provided in an embodiment of the present disclosure. For ease of description, only the parts related to the embodiments of the present disclosure are shown. Referring to FIG. 4 , the image area processing device includes: an acquisition unit 401 , an identification unit 402 and a correction unit 403 .

An acquisition unit 401, configured to acquire the target image and the equipment posture of the terminal when capturing the target image;

The recognition unit 402 is configured to perform image region recognition on the target image to obtain an initial recognition result, where the image region includes at least one of a ceiling region, a wall region, and a floor region;

The correction unit 403 is configured to correct the initial recognition result according to the posture of the device to obtain a corrected recognition result.

In an embodiment of the present disclosure, the device posture includes the elevation angle of the terminal and/or the depression angle of the terminal, and the correction unit 403 is further configured to: compare the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold; The recognition result is corrected to obtain the corrected recognition result.

In an embodiment of the present disclosure, the correction unit 403 is further configured to: if the comparison result shows that the elevation angle of the terminal is greater than the first threshold, re-identify the ground area in the initial identification result as the ceiling area to obtain a corrected identification result.

In an embodiment of the present disclosure, the correction unit 403 is further configured to: if the comparison result shows that the depression angle of the terminal is greater than the second threshold, re-identify the ceiling area in the initial identification result as the ground area to obtain a corrected identification result.

In an embodiment of the present disclosure, the correction unit 403 is further configured to: if the comparison result is that the elevation angle of the terminal is greater than the third threshold, re-identify the ground area and the wall area in the initial recognition result as the ceiling area, and obtain the corrected The recognition result, wherein the third threshold is greater than the first threshold.

In an embodiment of the present disclosure, the correction unit 403 is further configured to: if the comparison result is that the depression angle of the terminal is greater than the fourth threshold, re-identify the wall area and the ceiling area in the initial recognition result as the ground area, and obtain the corrected A recognition result, wherein the fourth threshold is greater than the second threshold.

In an embodiment of the present disclosure, the initial recognition result includes area probabilities corresponding to multiple pixel points of the target image, where the area probabilities include at least one of ceiling probability, wall probability, and ground probability, and the correction unit 403 is further configured to: Comparing the results, in the initial recognition result, adjusting the region probability corresponding to multiple pixels of the target image; and obtaining the corrected recognition result according to the adjusted region probability corresponding to the multiple pixels of the target image.

In an embodiment of the present disclosure, the recognition unit 402 is further configured to: use an image recognition model to perform image region recognition on the target image to obtain an initial recognition result, and the image recognition model is a deep learning model trained through model distillation.

In an embodiment of the present disclosure, the image area processing device further includes: a display unit 404, configured to display the corrected recognition result on the target image.

The device provided in this embodiment can be used to implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, so this embodiment will not repeat them here.

Referring to FIG. 5 , it shows a schematic structural diagram of an electronic device 500 suitable for implementing the embodiments of the present disclosure. The electronic device 500 may be a terminal device or a server. Among them, the terminal equipment may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia players (Portable Media Player, PMP for short), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital televisions (Television, TV), desktop computers, and the like. The electronic device shown in FIG. 5 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.

As shown in FIG. 5 , an electronic device 500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 508 is loaded into the program in the random access memory (Random Access Memory, referred to as RAM) 503 to execute various appropriate actions and processes. In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to the bus 504 .

Generally, the following devices can be connected to the I/O interface 505: an input device 506 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; ), a speaker, a vibrator, etc.; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 509, or from storage means 508, or from ROM 502. When the computer program is executed by the processing device 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Electrical Programmable Read Only Memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . The program code contained on the computer readable medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the methods shown in the above-mentioned embodiments.

Computer program code for carrying out the operations of the present disclosure can be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external A computer (connected via the Internet, eg, using an Internet service provider).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (Field Programmable Gate Arrays, FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Product, ASSP), System On Chip (System On Chip, SOC), Complex Programmable Logic Device (Complex Programmable Logic Device, CPLD) and so on.

In the first aspect, according to one or more embodiments of the present disclosure, there is provided an image region processing method, including: acquiring a target image and the device posture of the terminal when capturing the target image; performing image region processing on the target image Recognition, obtaining an initial recognition result, where the image area includes at least one of a ceiling area, a wall area, and a ground area; correcting the initial recognition result according to the device posture to obtain a corrected recognition result.

According to one or more embodiments of the present disclosure, the device pose includes an elevation angle of the terminal and/or a depression angle of the terminal, and the initial recognition result is corrected according to the device pose to obtain a corrected The recognition result includes: comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold; correcting the initial recognition result according to the comparison result to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the correcting the initial identification result according to the comparison result to obtain the corrected identification result includes: if the comparison result is that the elevation angle of the terminal is greater than the first threshold, then re-identify the ground area in the initial identification result as the ceiling area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the correcting the initial recognition result according to the comparison result to obtain the corrected recognition result includes: if the comparison result is that the depression angle of the terminal is greater than the second threshold, the ceiling area in the initial identification result is re-identified as a floor area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the correcting the initial identification result according to the comparison result to obtain the corrected identification result includes: if the comparison result is that the elevation angle of the terminal is greater than the third threshold, re-identify the floor area and the wall area in the initial identification result as the ceiling area to obtain the corrected identification result, wherein the third threshold is greater than the first threshold.

According to one or more embodiments of the present disclosure, the correcting the initial recognition result according to the comparison result to obtain the corrected recognition result includes: if the comparison result is that the depression angle of the terminal is greater than the fourth threshold, re-identify the wall area and the ceiling area in the initial identification result as the floor area to obtain the corrected identification result, wherein the fourth threshold is greater than the second threshold.

According to one or more embodiments of the present disclosure, the initial recognition result includes area probabilities corresponding to multiple pixel points of the target image, and the area probabilities include at least one of ceiling probability, wall probability, and ground probability, The correcting the initial recognition result according to the comparison result to obtain the corrected recognition result includes: according to the comparison result, corresponding to a plurality of pixels of the target image in the initial recognition result Adjust the regional probability; obtain the corrected recognition result according to the adjusted regional probability corresponding to the plurality of pixels of the target image.

According to one or more embodiments of the present disclosure, performing image region recognition on the target image to obtain an initial recognition result includes: performing image region recognition on the target image through an image recognition model to obtain the initial recognition result , the image recognition model is a deep learning model trained through model distillation.

According to one or more embodiments of the present disclosure, after obtaining the corrected recognition result, further includes: displaying the corrected recognition result on the target image.

In a second aspect, according to one or more embodiments of the present disclosure, there is provided an image area processing device, including: an acquisition unit configured to acquire a target image and a device posture of a terminal when capturing the target image; a recognition unit, It is used to perform image region recognition on the target image to obtain an initial recognition result, and the image region includes at least one of a ceiling region, a wall region, and a floor region; a correction unit is configured to correct the initial recognition result according to the device posture The recognition result is corrected to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the device attitude includes an elevation angle of the terminal and/or a depression angle of the terminal, and the correction unit is further configured to: adjust the elevation angle of the terminal and/or the terminal The depression angle is compared with the angle threshold; and the initial recognition result is corrected according to the comparison result to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: if the comparison result is that the elevation angle of the terminal is greater than a first threshold, re-identify the ground area in the initial identification result as a ceiling region to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: if the comparison result is that the depression angle of the terminal is greater than a second threshold, re-identify the ceiling area in the initial identification result as the ground region to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: if the comparison result is that the elevation angle of the terminal is greater than a third threshold, renew the ground area and the wall area in the initial recognition result to Identifying it as a ceiling area, and obtaining the corrected identification result, wherein the third threshold is greater than the first threshold.

According to one or more embodiments of the present disclosure, the correcting unit is further configured to: if the comparison result is that the depression angle of the terminal is greater than a fourth threshold, renew the wall area and the ceiling area in the initial recognition result to The ground area is identified to obtain the corrected identification result, wherein the fourth threshold is greater than the second threshold.

According to one or more embodiments of the present disclosure, the initial recognition result includes area probabilities corresponding to multiple pixel points of the target image, and the area probabilities include at least one of ceiling probability, wall probability, and ground probability, The correction unit is further configured to: according to the comparison result, in the initial recognition result, adjust the region probability corresponding to the multiple pixels of the target image; The adjusted regional probability is obtained to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the recognition unit is further configured to: use an image recognition model to perform image region recognition on the target image to obtain the initial recognition result, and the image recognition model is obtained through model distillation The deep learning model trained by the method.

According to one or more embodiments of the present disclosure, the image area processing device further includes: a display unit configured to display the corrected recognition result on the target image.

In a third aspect, according to one or more embodiments of the present disclosure, an electronic device is provided, including: at least one processor and a memory;

the memory stores computer-executable instructions;

In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable storage medium is provided, the computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, Realize the image region processing method described in the above first aspect or various possible designs of the first aspect.

The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

An image region processing method, comprising:

Obtaining the target image and the device posture of the terminal when capturing the target image;

Performing image region recognition on the target image to obtain an initial recognition result, the image region includes at least one of a ceiling region, a wall region, and a floor region;

The initial recognition result is corrected according to the posture of the device to obtain a corrected recognition result.
According to the image area processing method according to claim 1, the device posture includes the elevation angle of the terminal and/or the depression angle of the terminal, and the initial recognition result is corrected according to the device posture to obtain the corrected recognition results, including:

comparing the elevation angle of the terminal and/or the depression angle of the terminal to an angle threshold;

The initial recognition result is corrected according to the comparison result to obtain the corrected recognition result.
According to the image area processing method according to claim 2, the said initial recognition result is corrected according to the comparison result to obtain the corrected recognition result, comprising:

If the comparison result is that the elevation angle of the terminal is greater than the first threshold, the ground area in the initial identification result is re-identified as a ceiling area to obtain the corrected identification result.
According to the image area processing method according to claim 2, the said initial recognition result is corrected according to the comparison result to obtain the corrected recognition result, comprising:

If the comparison result is that the depression angle of the terminal is greater than the second threshold, re-identify the ceiling area in the initial identification result as a ground area to obtain the corrected identification result.
According to the image area processing method according to claim 3, the said initial recognition result is corrected according to the comparison result to obtain the corrected recognition result, comprising:

If the comparison result is that the elevation angle of the terminal is greater than a third threshold, re-identify the ground area and the wall area in the initial identification result as a ceiling area to obtain the corrected identification result, wherein the first The third threshold is greater than the first threshold.
According to the image area processing method according to claim 4, the said initial recognition result is corrected according to the comparison result to obtain the corrected recognition result, comprising:

If the comparison result is that the depression angle of the terminal is greater than the fourth threshold, re-identify the wall area and the ceiling area in the initial identification result as the ground area to obtain the corrected identification result, wherein the first The fourth threshold is greater than the second threshold.
According to the image area processing method according to any one of claims 2 to 6, the initial recognition result includes area probabilities corresponding to a plurality of pixels of the target image, and the area probabilities include ceiling probability, wall probability, and ground probability. At least one of the probabilities, the correction of the initial recognition result according to the comparison result to obtain the corrected recognition result includes:

According to the comparison result, in the initial recognition result, adjust the region probability corresponding to the plurality of pixels of the target image;

The corrected recognition result is obtained according to the adjusted region probabilities corresponding to the multiple pixels of the target image.
According to the image area processing method according to any one of claims 1 to 7, performing image area identification on the target image to obtain an initial identification result comprises:

Image region recognition is performed on the target image by an image recognition model to obtain the initial recognition result, and the image recognition model is a deep learning model obtained through model distillation training.
According to the image area processing method according to any one of claims 1 to 8, after obtaining the corrected recognition result, further comprising:

On the target image, the corrected recognition result is displayed.
An image area processing device, comprising:

an acquisition unit, configured to acquire the target image and the equipment posture of the terminal when capturing the target image;

An identification unit, configured to perform image area identification on the target image to obtain an initial identification result, the image area includes at least one of a ceiling area, a wall area, and a floor area;

A correction unit is configured to correct the initial recognition result according to the posture of the device to obtain a corrected recognition result.
An electronic device comprising: at least one processor and memory;

the memory stores computer-executable instructions;

The at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the image region processing method according to any one of claims 1-9.
A computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, image region processing according to any one of claims 1 to 9 is realized method.
A computer program product, the computer program product includes computer-executable instructions, and when a processor executes the computer-executable instructions, the image region processing method according to any one of claims 1 to 9 is implemented.
A computer program for realizing the image region processing method according to any one of claims 1 to 9.