CN115908792A

CN115908792A - Image area processing method and equipment

Info

Publication number: CN115908792A
Application number: CN202111168828.4A
Authority: CN
Inventors: 朱渊略; 黄佳斌; 王一同
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-04-04
Also published as: WO2023051362A1

Abstract

The embodiment of the disclosure provides an image area processing method and device, wherein the method comprises the following steps: acquiring a target image and the equipment posture of a terminal when the target image is shot; carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area; and correcting the initial recognition result according to the equipment attitude to obtain a corrected recognition result. Therefore, the accuracy of image region identification is improved by correcting the image distinguishing identification result based on the equipment posture.

Description

Image area processing method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an image area processing method and device.

Background

The deep learning algorithm based on the convolutional neural network can realize end-to-end learning, has better performance and has wide application prospect in the field of image recognition. Among them, recognition of areas such as indoor ceilings, walls, and floors is one of the research directions in the field of image recognition.

In the case of a close-up shot of a ceiling, a wall, and a floor, the planes are very similar due to the lack of a reference object, and it is difficult for the deep learning algorithm to distinguish whether it is the ceiling, the wall, or the floor. In the traditional algorithm based on edge detection and space geometric information, plane segmentation is carried out depending on boundaries, segmented planes have smoothness, and the method is easy to fail for some videos or images with blurred edges.

It can be seen that the accuracy of image region identification in an image needs to be improved.

Disclosure of Invention

The embodiment of the disclosure provides an image area processing method and device, so as to overcome the problem that the accuracy of image area identification in an image is not high.

In a first aspect, an embodiment of the present disclosure provides an image area processing method, including:

acquiring a target image and an equipment posture of a terminal when the target image is shot;

carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area;

and correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result.

In a second aspect, an embodiment of the present disclosure provides an image processing apparatus including:

the terminal comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a target image and an equipment posture of the terminal when the target image is shot;

the identification unit is used for carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area;

and the correcting unit is used for correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the image area processing method as set forth in the first aspect or various possible designs of the first aspect above.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the image area processing method according to the first aspect or various possible designs of the first aspect is implemented.

In a fifth aspect, according to one or more embodiments of the present disclosure, there is provided a computer program product comprising computer executable instructions which, when executed by a processor, implement the image area processing method according to the first aspect or various possible designs of the first aspect.

The image area processing method and device provided by the embodiment perform image area identification on a target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a floor area; and correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result. Therefore, the accuracy of image area identification is improved by a mode of primarily identifying the image area and then correcting the primary identification result based on the equipment posture corresponding to the image.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and those skilled in the art can obtain other drawings without inventive labor.

FIG. 1 is a schematic diagram of an application scenario in which embodiments of the present disclosure are applicable;

fig. 2 is a first flowchart of an image area processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of an image region processing method according to an embodiment of the present disclosure;

fig. 4 is a block diagram of an image area processing apparatus provided in an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

When identifying image areas such as ceilings, walls, floors, etc. in an image, there are several ways:

in the first mode, a deep learning model based on a convolutional neural network is adopted to identify an image region. The deep learning model can achieve end-to-end training and has good performance, however, in the case of close-range shooting, due to the lack of reference objects, the similarity of the ceiling area, the wall area and the ground area is high, and the deep learning model is difficult to distinguish the ceiling area, the wall area and the ground area in the image.

In the second mode, the image area is identified based on an algorithm of edge detection and spatial geometrical information. For video or edge blurred images, the accuracy of image region identification is low.

In order to improve the accuracy of image region identification, the disclosure provides an image region processing method and device. In the method, the influence of the equipment posture of the image shot by the terminal on the position distribution of the ceiling area, the wall area and the ground area in the image is considered to be large, after the image area is identified, the image area identification result is corrected based on the equipment posture of the image shot by the terminal, and therefore the identification accuracy of the ceiling area, the wall area and the ground area in the image is improved.

Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario to which the embodiment of the present disclosure is applicable.

The application scenario shown in fig. 1 is an image processing scenario, in which the devices involved include an image processing device 101 for performing image region identification on an image. The image processing apparatus 101 may be a terminal or a server, and fig. 1 illustrates a server as an example.

Optionally, the devices involved in the application scenario further include an image capturing device 102, configured to capture an image and send the captured image to the image processing device 101.

The image capturing apparatus 102 is a terminal having an image capturing function, and includes: cameras, handheld devices with cameras (e.g., smartphones, tablets), computing devices with cameras (e.g., personal Computers (PCs)), wearable devices with cameras (e.g., smartwatches), smart home devices with cameras.

The image processing apparatus 101 and the image capturing apparatus 102 are the same apparatus, for example, the image processing apparatus 101 and the image capturing apparatus 102 are the same smartphone, and capture an image by the smartphone and perform real-time or non-real-time image area recognition on the image. Alternatively, the image processing apparatus 101 and the image capturing apparatus 102 are different apparatuses, the image capturing apparatus 102 transmits a captured image to the image processing apparatus 101 through, for example, a network, the image processing apparatus 101 performs image scene recognition on the image, for example, a smartphone transmits the captured image to a server, and the server performs image area recognition.

Optionally, the image is a scene image of an indoor scene. Therefore, the image area processing method provided by the embodiment of the disclosure can improve the accuracy of identifying the image areas of the indoor scene images which are difficult to identify in the ceiling area, the wall area and the ground area.

For example, the image area processing method provided by the embodiment of the present disclosure may be applied to electronic devices, such as a terminal and a server. The terminal may be a Personal Digital Assistant (PDA) device, a handheld device (e.g., a smart phone or a tablet), a computing device (e.g., a personal computer), an in-vehicle device, a wearable device (e.g., a smart watch or a smart band), and a smart home device (e.g., a smart display device). The server may be a distributed server, a centralized server, a cloud server, and the like.

Referring to fig. 2, fig. 2 is a first flowchart illustrating an image area processing method according to an embodiment of the disclosure. As shown in fig. 2, the image area processing method includes:

s201, acquiring a target image and an equipment posture of the terminal when the target image is shot.

The device attitude includes a device angle (also may be referred to as a device direction), and in the same scene, the device attitude of the terminal is different, and the captured image content is also different. The target image may be an image photographed in real time by the terminal, or an image stored in a database, or an image input by the user. The target image may also be a video frame in a target video, and the target video may be a video shot by the terminal in real time, or a video stored in a database, or a video input by the user.

In one example, a target image shot by a terminal in real time and the equipment posture of the terminal when the target image is shot are acquired, so that the target image is subjected to real-time image area recognition.

In another example, the target image and the device posture of the terminal when the target image is captured are obtained from an online or offline database to perform image area recognition on the target image pre-stored in the database, where the target image is obtained by, for example: the target images are randomly or sequentially acquired from the database, or the target images specified by the user are acquired from the database.

In yet another example, a target image input by a user and a device posture of a terminal input by the user when the target image is captured are acquired to perform image area recognition on the target image input by the user.

S202, carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area.

The initial recognition result may include an image area recognized in the target image, and specifically, the initial recognition result includes at least one of a ceiling area, a wall area, and a floor area recognized.

In this embodiment, the image recognition model may be used to recognize at least one of the ceiling area, the wall area, and the ground area of the target image, so as to obtain at least one of the ceiling area, the wall area, and the ground area included in the recognized target image. The image recognition model is a deep learning model for image region recognition, such as a convolutional neural network.

Optionally, before performing image region identification on the target image, a preprocessing operation is performed on the target image, so that the target image meets the requirement of the deep learning model on the input data, and meanwhile, the image quality of the target image is improved. Wherein the preprocessing operation comprises one or more of the following operations: the image enhancement method comprises the following steps of carrying out a size scaling operation, a cropping operation, a turning operation and an image enhancement operation, wherein the image enhancement operation comprises the enhancement of one or more aspects of image contrast, image saturation and image tone.

As an example, the preprocessing process of the target image includes: firstly, carrying out size random scaling within a preset multiple range on a target image; then, randomly cutting the zoomed target image to a target size; then, randomly and horizontally turning the cut target image; and finally, performing data enhancement on the contrast, the saturation and the hue of the inverted target image.

And S203, correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result.

In this embodiment, since the planes of the ceiling area, the wall area, and the floor area are similar, especially the planes of the ceiling area and the floor area are similar, when the target image is identified by the image identification model, an identification error may occur, for example, the ceiling area is mistakenly identified as the floor area, and the floor area is mistakenly identified as the ceiling area. In consideration of the fact that the image position distribution of the ceiling area, the wall area and the ground area on the target image is greatly influenced by the equipment posture when the terminal shoots the target image, after the initial recognition result is obtained, the mistaken recognition area in the initial recognition result can be corrected according to the equipment posture when the terminal shoots the target image, and therefore the accuracy of image area recognition is improved.

For example, one or more factors such as the ground clearance and the inclination angle of the terminal in the device posture when the terminal captures the target image affect whether the terminal can capture the ceiling, the wall and the ground, and further affect whether the target image includes the ceiling area, the wall area and the ground area. If the ground area cannot be shot when the terminal shoots the target image based on the equipment posture of the terminal, the target image should not contain the ground area, and at the moment, if the initial recognition result comprises the ground area contained in the recognized target image, the ground area in the initial recognition result is a false recognition area.

Optionally, when the misrecognized region in the initial recognition result is corrected, the misrecognized region may be corrected to an image region that is most likely to appear in the remaining two regions except the misrecognized region in the ceiling, the wall, and the ground based on the device posture of the terminal when the target image is captured, so that the accuracy of image region recognition is improved. For example, the ground area identified in the initial identification result is an erroneous identification area, and if it is determined that the image area most likely to appear in the target image is a ceiling area based on the device posture when the target image is captured by the terminal, the ground area identified in the initial identification result is corrected to be the ceiling area; and if the most probable image area in the target image is determined to be the wall area based on the equipment posture when the terminal shoots the target image, correcting the ground area identified in the initial identification result into the wall area.

Optionally, when the misrecognized region in the initial recognition result is corrected, the target image may be re-recognized through an image recognition model used in the initial recognition or a deep learning model more complex than the image recognition model used in the initial recognition, and the misrecognized region in the initial recognition result is corrected based on the recognition result obtained by re-recognition, so that the accuracy of image region recognition is improved.

In the embodiment of the disclosure, on the basis of obtaining the initial recognition result by performing image area recognition on the target image through the image recognition model, the initial recognition result is corrected based on the shooting posture of the terminal when shooting the image, so that the problem of low recognition accuracy when recognizing one or more of a ceiling area, a wall area and a ground area of the image is solved, and the accuracy of image area recognition is improved.

In some embodiments, the device pose of the terminal includes an elevation angle of the terminal and/or a depression angle of the terminal, which refers to an angle between a shooting direction (also understood as a line of sight) of a camera on the terminal and a horizontal line. When the shooting direction of the camera on the terminal is higher than the horizontal line, the included angle between the shooting direction of the camera on the terminal and the horizontal line is the elevation angle of the terminal; when the shooting direction of the camera on the terminal is lower than the horizontal line, the included angle between the shooting direction of the camera on the terminal and the horizontal line is the depression angle of the terminal. Based on this, referring to fig. 3, fig. 3 is a schematic flow chart of the image region processing method provided by the embodiment of the disclosure. As shown in fig. 3, the image area processing method includes:

s301, acquiring a target image and the equipment posture of the terminal when the target image is shot, wherein the equipment posture of the terminal comprises the elevation angle of the terminal and/or the depression angle of the terminal.

In this embodiment, the elevation angle and/or depression angle of the terminal when shooting the target image may be obtained by a sensor on the terminal. Wherein the sensor may be an angle sensor, a gravity sensor, or the like. For example, when the terminal is a mobile phone, the sensor is an Inertial Measurement Unit (IMU) in the terminal.

S302, carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area.

The implementation principle and the technical effect of S302 may refer to the foregoing embodiments, and are not described again.

And S303, comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold value.

The angle threshold value comprises an angle threshold value used for comparing with the elevation angle of the terminal and an angle threshold value used for comparing with the depression angle of the terminal, and the angle threshold value used for comparing with the elevation angle of the terminal and the angle threshold value used for comparing with the depression angle of the terminal can be the same or different.

In this embodiment, the elevation angle of the terminal may be compared with a corresponding angle threshold to obtain a comparison result; and/or comparing the depression angle of the terminal with the corresponding angle threshold value to obtain a comparison result.

And S304, correcting the initial recognition result according to the comparison result to obtain a corrected recognition result.

In this embodiment, whether the initial recognition result has the misrecognized region is determined according to the comparison result between the elevation angle of the terminal and the corresponding angle threshold and/or according to the comparison result between the depression angle of the terminal and the corresponding angle threshold, and if so, the misrecognized region is corrected to obtain a corrected recognition result.

In some embodiments, the angle threshold compared to the elevation angle of the terminal comprises a first threshold, and in this case, one possible implementation of S304 comprises: and if the comparison result of the elevation angle of the terminal and the first threshold value is that the elevation angle of the terminal is larger than the first threshold value, re-identifying the ground area in the initial identification result as the ceiling area to obtain a corrected identification result.

Specifically, if the elevation angle of the terminal is greater than the first threshold, it indicates that the shooting direction of the camera on the terminal is inclined toward the ceiling direction at a serious angle, and the terminal cannot shoot the ground area. At this time, if the initial recognition result includes the recognized ground area, the ground area in the initial recognition result is determined as a misrecognized area, and the ground area in the initial recognition result is re-recognized as the ceiling area in consideration of that the plane similarity between the ground area and the ceiling area is higher than that between the ground area and the wall area, resulting in a corrected recognition result. Thus, the accuracy of image region identification is improved.

In some embodiments, the angle threshold compared to the elevation angle of the terminal includes a third threshold, and the third threshold is greater than the first threshold, at this time, another possible implementation manner of S304 includes: and if the comparison result of the elevation angle of the terminal and the third threshold value is that the elevation angle of the terminal is larger than the third threshold value, re-identifying the ground area and the wall area in the initial identification result as the ceiling area to obtain a corrected identification result.

Specifically, the third threshold is larger than the first threshold, and if the elevation angle of the terminal is larger than the third threshold, it indicates that the shooting direction of the camera on the terminal is inclined toward the ceiling at a more serious angle, and the terminal cannot shoot the floor area and the wall area. At this time, if the initial recognition result includes the recognized floor area and/or wall area, it is determined that both the floor area and the wall area in the initial recognition result are misrecognized areas, and both the floor area and the wall area in the initial recognition result are recognized as ceiling areas, so that a corrected recognition result is obtained. Thus, the accuracy of image region identification is improved.

In some embodiments, the angle threshold compared with the depression angle of the terminal includes a second threshold, where the second threshold may be equal to or different from the first threshold. At this time, another possible implementation manner of S304 includes: and if the comparison result of the depression angle of the terminal and the second threshold value is that the depression angle of the terminal is larger than the second threshold value, re-identifying the ceiling area in the initial identification result as the ground area to obtain a corrected identification result.

Specifically, if the depression angle of the terminal is greater than the second threshold, it indicates that the shooting direction of the camera on the terminal is inclined toward the ground direction at a serious angle, and the terminal cannot shoot the ceiling area. At this time, if the initial recognition result includes the recognized ceiling region, the ceiling region in the initial recognition result is determined as a misrecognized region, and the ceiling region in the initial recognition result is re-recognized as the floor region in consideration of that the plane similarity between the ceiling region and the floor region is higher than the plane similarity between the ceiling region and the wall region, to obtain a corrected recognition result. Thus, the accuracy of image region identification is improved.

In some embodiments, the angle threshold compared with the depression angle of the terminal includes a fourth threshold, where the fourth threshold is greater than the second threshold, and the fourth threshold may be equal to or different from the third threshold. At this time, another possible implementation manner of S304 includes: and if the comparison result of the depression angle of the terminal and the fourth threshold value shows that the depression angle of the terminal is larger than the fourth threshold value, re-identifying the wall area and the ceiling area in the initial identification result as the ground area to obtain a corrected identification result.

Specifically, the fourth threshold is larger than the second threshold, and if the depression angle of the terminal is larger than the fourth threshold, it indicates that the shooting direction of the camera on the terminal is inclined toward the ground at a more severe angle, and the terminal cannot shoot the ceiling area and the wall area. At this time, if the initial recognition result includes the recognized ceiling region and/or wall region, it is determined that both the ceiling region and the wall region in the initial recognition result are misrecognized regions, and both the ceiling region and the wall region in the initial recognition result are recognized as ground regions, so that a corrected recognition result is obtained. Thus, the accuracy of image region identification is improved.

In some embodiments, the initial recognition result may include region probabilities corresponding to a plurality of pixel points of the target image, and the region probabilities include at least one of a ceiling probability, a wall probability, and a floor probability. The probability of the ceiling corresponding to the pixel point is the probability that the area where the pixel point is located is the ceiling area, the probability that the area where the pixel point is located is the wall area, and the probability of the ground corresponding to the pixel point is the probability that the area where the pixel point is located is the ground area. When the image area is identified, the area probability corresponding to a plurality of pixel points in the target image can be obtained through the identification of the image identification model.

At this time, another possible implementation manner of S304 includes: according to the comparison result of the elevation angle and/or depression angle of the terminal and the angle threshold value, adjusting the region probability corresponding to a plurality of pixel points of the target image in the initial identification result; and obtaining a corrected recognition result according to the adjusted region probability corresponding to the plurality of pixel points of the target image. The angle threshold may include one or more angle thresholds for comparing with the elevation angle of the terminal and one or more angle thresholds for comparing with the depression angle of the terminal, which are not limited to the first threshold, the second threshold, the third threshold and the fourth threshold.

Specifically, if the elevation angle of the terminal is greater than the angle threshold, it indicates that the terminal has a high probability of shooting the ceiling and a low probability of shooting the ground, and the probability of shooting the ceiling corresponding to a plurality of pixel points in the target image can be increased and/or the probability of shooting the ground corresponding to a plurality of pixel points in the target image can be decreased. If the depression angle of the terminal is larger than the angle threshold, the probability that the terminal shoots the ceiling is low, the probability that the terminal shoots the ground is high, the probability that the ceiling corresponding to a plurality of pixel points in the target image can be reduced, and/or the probability that the ground corresponding to the pixel points in the target image is increased. On the basis of adjusting the ceiling probability and/or the ground probability, the wall probability corresponding to a plurality of pixel points in the target image can be adjusted.

After the region probability is adjusted, the region where the pixel point is located can be determined according to the maximum probability value in the region probability corresponding to the pixel point in the target image, for example, if the ceiling probability is the maximum in the region probability corresponding to the pixel point, the region where the pixel point is located is the ceiling region. And then, determining an image area in the target image based on the areas where the plurality of pixel points in the target image are located.

Optionally, the angle threshold is multiple, and different angle thresholds may correspond to different probability adjustment amounts.

The larger the angle threshold, the larger the corresponding probability adjustment amount. The method comprises the following specific steps:

the angle thresholds used for comparing with the elevation angle of the terminal are multiple, and different angle thresholds correspond to different probability adjustment amounts. For example, if the elevation angle of the terminal is greater than 45 degrees, the ceiling probabilities corresponding to the multiple pixel points on the target image are respectively increased by 10%, and the ground probabilities corresponding to the multiple pixel points are respectively decreased by 10%; and if the elevation angle of the terminal is more than 60 degrees, respectively increasing the ceiling probability corresponding to the plurality of pixel points on the target image by 20 percent, and respectively reducing the ground probability corresponding to the plurality of pixel points by 20 percent.

And/or the angle threshold value used for comparing with the depression angle of the terminal is a plurality of, and different angle threshold values correspond to different probability adjustment amounts. Here, this is not an example.

Optionally, if the elevation angle of the terminal is greater than the first threshold, considering that the terminal cannot shoot the ground area at this time, the ground probabilities corresponding to the multiple pixel points on the target image are all reduced to 0, and the ground probability before the pixel point adjustment is added to the ceiling probability corresponding to the pixel point.

Optionally, if the elevation angle of the terminal is greater than the third threshold, considering that the terminal cannot shoot the ground area and the wall area, the ground probability and the wall probability corresponding to the plurality of pixel points on the target image are both reduced to 0, and the ceiling probability corresponding to the pixel points is adjusted to 100%.

Optionally, if the depression angle of the terminal is greater than the second threshold, considering that the terminal cannot shoot a ceiling region at this time, the ceiling probabilities corresponding to the plurality of pixel points on the target image are all reduced to 0, and the ceiling probability before the pixel point adjustment is added to the ground probability corresponding to the pixel point.

Optionally, if the depression angle of the terminal is greater than the fourth threshold, considering that the terminal cannot shoot the ceiling area and the wall area, the ceiling probability and the wall probability corresponding to a plurality of pixel points on the target image are both reduced to 0, and the ground probability corresponding to the pixel points is adjusted to 100%.

The first threshold, the second threshold, the third threshold and the fourth threshold may refer to the foregoing embodiments.

Optionally, when the region probability corresponding to the pixel point includes a ceiling probability, a wall probability, and a ground probability, for the same pixel point, the sum of the ceiling probability, the wall probability, and the ground probability is 1. At this time, the ceiling probability is increased, the floor probability needs to be reduced, and the wall probability can be further reduced. While the ground probability is increased, the ceiling probability needs to be reduced, and the wall probability can be further reduced.

Therefore, by means of adjusting the region probability corresponding to the pixel points in the target image based on the comparison result, flexibility and accuracy of image region correction are improved, and accuracy of image region identification is improved.

In the embodiment of the disclosure, on the basis of obtaining the initial recognition result by performing image area recognition on the target image through the image recognition model, the initial recognition result is corrected based on the comparison result of the elevation angle and/or the depression angle of the terminal when shooting the image and the angle threshold, so that the problem of low recognition accuracy when recognizing one or more of a ceiling area, a wall area and a ground area of the image is solved, and the accuracy of image area recognition is effectively improved.

Based on any of the foregoing embodiments, optionally, the image recognition model is a deep learning model obtained through model distillation training, so that not only can the recognition accuracy of the image recognition model be improved, but also the model scale of the image recognition model can be reduced, in particular, a light-weight image recognition model can be obtained through training, the image recognition model is convenient to deploy to various terminals, and real-time recognition of image and/or image areas on the video is realized on the terminals.

Optionally, when the image recognition model is trained in a model distillation mode, the teacher model is trained for multiple times by using the training data and the loss function of the teacher model to obtain the trained teacher model, the student model is trained for multiple times by using the training data, the trained teacher model and the loss function of the student model to obtain the trained student model, and the trained student model is determined as the image recognition model for image region recognition. The model scale of the teacher model is larger than that of the student model, and the teacher model and the student model are deep learning models.

The training data may include a plurality of training images, and at least one of a ground area, a ceiling area, and a wall area may be pre-labeled on the training images, so that, when the teacher model is trained, a difference between an image area recognition result output by the teacher model and an image area labeled on the training images may be determined based on a loss function of the teacher model, and then, based on the difference, a model parameter of the teacher model is adjusted.

When the student model is trained, the training image can be input into the student model and the teacher model, the difference between the image area recognition result output by the student model and the image area recognition result output by the teacher model and the image area marked on the training image is determined based on the loss function of the student model, and then the model parameter of the student model is adjusted based on the difference.

Optionally, the main network structure of the teacher model adopts deplab v3, and the image area identification accuracy of the teacher model is improved through the deplab v3, so that the image area identification accuracy of the student model is improved.

Optionally, the loss function of the teacher model adopts a Binary Cross Entropy (BCE) loss function, and the model training effect of the teacher model is improved through the Binary Cross Entropy loss function.

Optionally, the loss function of the teacher model may also be obtained by performing weighted summation on a BCE loss function and a Regional Mutual Information (RMI) loss function, so as to improve the model performance and reduce the occurrence of missing segmentation of the teacher model.

Further, in the weighted summation, the weight ratio of the BCE loss function and the RMI loss function is the same.

Optionally, a main network structure of the student model (i.e., the image recognition model) adopts a network structure of Ghostnet, where Ghostnet is a lightweight network structure and is convenient to deploy on lightweight devices. Therefore, the main network structure of the student model adopts Ghostnet, which is beneficial to reducing the model scale of the student model, and is convenient for deploying the trained student model to the user terminal, so that the range of the device which can be applied by the model is improved, and the user experience is improved.

Optionally, the loss function of the student model may include a loss function weighted by the BCE loss function and the RMI loss function, and further include a distillation loss function. The loss function obtained by weighting the BCE loss function and the RMI loss function is used for determining the difference between the image area recognition result output by the student model and the image area marked on the training image, and the distillation loss function is used for determining the difference between the image area recognition result output by the student model and the image area recognition result output by the teacher model. Therefore, on one hand, the condition that the student model is missed in segmentation is reduced, and on the other hand, the training of the student model is guided through the trained teacher model. Furthermore, the model performance of the student model is effectively improved.

Furthermore, the distillation loss function can adopt a KL (Kullback-Leibler) divergence loss function, so that the value of an image segmentation index which can be achieved by image segmentation through a student model is improved through the KL divergence loss function, in other words, a better image segmentation effect can be achieved by image segmentation through the student model through the KL divergence loss function. The image segmentation index is, for example, a Mean Intersection over Union (MIoU) index.

Based on any of the foregoing embodiments, optionally, after obtaining the corrected recognition result, the corrected recognition result may be displayed on the target image. When the execution subject is the server, the server may transmit the corrected recognition result to the user terminal, and the user terminal may display the corrected recognition result on the target image. When the execution subject is a terminal, the terminal can display the image area identification result corresponding to the video frame or image on the video frame or image in real time while shooting the video or image. Therefore, the user can visually see each area recognized on the target image, and the user experience is improved.

Further, when displaying, different image areas can be marked on the target image through different colors, so that the display effect is improved.

Fig. 4 is a block diagram of an image area processing apparatus according to an embodiment of the present disclosure, which corresponds to the image area processing method according to the above embodiment. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 4, the image area processing apparatus includes: an acquisition unit 401, a recognition unit 402 and a correction unit 403.

An acquisition unit 401 configured to acquire a target image and an apparatus posture of a terminal when the target image is captured;

an identifying unit 402, configured to perform image area identification on the target image to obtain an initial identification result, where the image area includes at least one of a ceiling area, a wall area, and a ground area;

a correcting unit 403, configured to correct the initial recognition result according to the device posture, so as to obtain a corrected recognition result.

In an embodiment of the present disclosure, the device attitude includes an elevation angle of the terminal and/or a depression angle of the terminal, and the correction unit 403 is further configured to: comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold; and correcting the initial recognition result according to the comparison result to obtain a corrected recognition result.

In an embodiment of the present disclosure, the modification unit 403 is further configured to: and if the comparison result is that the elevation angle of the terminal is larger than the first threshold value, re-identifying the ground area in the initial identification result as the ceiling area to obtain a corrected identification result.

In an embodiment of the present disclosure, the modification unit 403 is further configured to: and if the comparison result is that the depression angle of the terminal is larger than the second threshold value, re-identifying the ceiling area in the initial identification result as the ground area to obtain a corrected identification result.

In an embodiment of the present disclosure, the modification unit 403 is further configured to: and if the comparison result is that the elevation angle of the terminal is larger than a third threshold value, re-identifying the ground area and the wall area in the initial identification result as the ceiling area to obtain a corrected identification result, wherein the third threshold value is larger than the first threshold value.

In an embodiment of the present disclosure, the modification unit 403 is further configured to: and if the comparison result is that the depression angle of the terminal is larger than a fourth threshold value, re-identifying the wall area and the ceiling area in the initial identification result as the ground area to obtain a corrected identification result, wherein the fourth threshold value is larger than the second threshold value.

In an embodiment of the present disclosure, the initial recognition result includes region probabilities corresponding to a plurality of pixel points of the target image, the region probabilities include at least one of a ceiling probability, a wall probability, and a ground probability, and the modifying unit 403 is further configured to: according to the comparison result, in the initial identification result, adjusting the region probability corresponding to a plurality of pixel points of the target image; and obtaining a corrected recognition result according to the adjusted region probability corresponding to the plurality of pixel points of the target image.

In an embodiment of the present disclosure, the identifying unit 402 is further configured to: and carrying out image region identification on the target image through an image identification model to obtain an initial identification result, wherein the image identification model is a deep learning model obtained through model distillation training.

In one embodiment of the present disclosure, the image area processing apparatus further includes: a display unit 404, configured to display the corrected recognition result on the target image.

The device provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

Referring to fig. 5, a schematic structural diagram of an electronic device 500 suitable for implementing the embodiment of the present disclosure is shown, where the electronic device 500 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first obtaining unit may also be described as a "unit obtaining at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In a first aspect, according to one or more embodiments of the present disclosure, there is provided an image area processing method, including: acquiring a target image and an equipment posture of a terminal when the target image is shot; performing image area recognition on the target image to obtain an initial recognition result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area; and correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result.

According to one or more embodiments of the present disclosure, the device attitude includes an elevation angle of the terminal and/or a depression angle of the terminal, and the correcting the initial recognition result according to the device attitude to obtain a corrected recognition result includes: comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold; and correcting the initial recognition result according to the comparison result to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes: and if the comparison result is that the elevation angle of the terminal is larger than a first threshold value, re-identifying the ground area in the initial identification result as a ceiling area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes: and if the comparison result is that the depression angle of the terminal is larger than a second threshold value, re-identifying the ceiling area in the initial identification result as the ground area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes: and if the comparison result is that the elevation angle of the terminal is larger than a third threshold value, re-identifying the ground area and the wall area in the initial identification result as the ceiling area to obtain the corrected identification result, wherein the third threshold value is larger than the first threshold value.

According to one or more embodiments of the present disclosure, the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes: and if the comparison result is that the depression angle of the terminal is larger than a fourth threshold value, re-identifying the wall area and the ceiling area in the initial identification result as the ground area to obtain the corrected identification result, wherein the fourth threshold value is larger than the second threshold value.

According to one or more embodiments of the present disclosure, the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes: according to the comparison result, in the initial identification result, adjusting the region probability corresponding to a plurality of pixel points of the target image; and obtaining the corrected recognition result according to the adjusted region probability corresponding to the plurality of pixel points of the target image.

According to one or more embodiments of the present disclosure, the performing image region identification on the target image to obtain an initial identification result includes: and carrying out image region identification on the target image through an image identification model to obtain the initial identification result, wherein the image identification model is a deep learning model obtained through model distillation training.

According to one or more embodiments of the present disclosure, after obtaining the corrected recognition result, the method further includes: and displaying the corrected recognition result on the target image.

In a second aspect, according to one or more embodiments of the present disclosure, there is provided an image area processing apparatus including: the terminal comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a target image and an equipment posture of the terminal when the target image is shot; the identification unit is used for carrying out image area identification on the target image to obtain an initial identification result, wherein the image area comprises at least one of a ceiling area, a wall area and a ground area; and the correcting unit is used for correcting the initial recognition result according to the equipment posture to obtain a corrected recognition result.

According to one or more embodiments of the present disclosure, the device attitude includes an elevation angle of the terminal and/or a depression angle of the terminal, and the correction unit is further configured to: comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold; and correcting the initial recognition result according to the comparison result to obtain the corrected recognition result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: and if the comparison result is that the elevation angle of the terminal is larger than a first threshold value, re-identifying the ground area in the initial identification result as a ceiling area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: and if the comparison result is that the depression angle of the terminal is larger than a second threshold value, re-identifying the ceiling area in the initial identification result as the ground area to obtain the corrected identification result.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: and if the comparison result is that the elevation angle of the terminal is larger than a third threshold value, re-identifying the ground area and the wall area in the initial identification result as the ceiling area to obtain the corrected identification result, wherein the third threshold value is larger than the first threshold value.

According to one or more embodiments of the present disclosure, the correction unit is further configured to: and if the comparison result is that the depression angle of the terminal is larger than a fourth threshold value, re-identifying the wall area and the ceiling area in the initial identification result as the ground area to obtain the corrected identification result, wherein the fourth threshold value is larger than the second threshold value.

According to one or more embodiments of the present disclosure, the initial recognition result includes region probabilities corresponding to a plurality of pixel points of the target image, the region probabilities include at least one of a ceiling probability, a wall probability, and a ground probability, and the modification unit is further configured to: according to the comparison result, in the initial identification result, adjusting the region probability corresponding to a plurality of pixel points of the target image; and obtaining the corrected recognition result according to the adjusted region probability corresponding to the plurality of pixel points of the target image.

According to one or more embodiments of the present disclosure, the identification unit is further configured to: and carrying out image region identification on the target image through an image identification model to obtain the initial identification result, wherein the image identification model is a deep learning model obtained through model distillation training.

According to one or more embodiments of the present disclosure, the image area processing apparatus further includes: and the display unit is used for displaying the corrected recognition result on the target image.

In a third aspect, according to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor and memory;

the memory stores computer-executable instructions;

In a fourth aspect, according to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the image area processing method according to the first aspect or various possible designs of the first aspect.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An image region processing method, comprising:

2. The image region processing method according to claim 1, wherein the device attitude includes an elevation angle of the terminal and/or a depression angle of the terminal, and the modifying the initial recognition result according to the device attitude to obtain a modified recognition result includes:

comparing the elevation angle of the terminal and/or the depression angle of the terminal with an angle threshold;

and correcting the initial recognition result according to the comparison result to obtain the corrected recognition result.

3. The image region processing method according to claim 2, wherein the modifying the initial recognition result according to the comparison result to obtain the modified recognition result comprises:

and if the comparison result is that the elevation angle of the terminal is larger than a first threshold value, re-identifying the ground area in the initial identification result as a ceiling area to obtain the corrected identification result.

4. The image region processing method according to claim 2, wherein the modifying the initial recognition result according to the comparison result to obtain the modified recognition result comprises:

and if the comparison result is that the depression angle of the terminal is larger than a second threshold value, re-identifying the ceiling area in the initial identification result as the ground area to obtain the corrected identification result.

5. The image region processing method according to claim 3, wherein the modifying the initial recognition result according to the comparison result to obtain the modified recognition result comprises:

and if the comparison result is that the elevation angle of the terminal is larger than a third threshold value, re-identifying the ground area and the wall area in the initial identification result as the ceiling area to obtain the corrected identification result, wherein the third threshold value is larger than the first threshold value.

6. The image region processing method according to claim 4, wherein the modifying the initial recognition result according to the comparison result to obtain the modified recognition result comprises:

and if the comparison result is that the depression angle of the terminal is larger than a fourth threshold value, re-identifying the wall area and the ceiling area in the initial identification result as the ground area to obtain the corrected identification result, wherein the fourth threshold value is larger than the second threshold value.

7. The image region processing method according to any one of claims 2 to 6, wherein the initial recognition result includes region probabilities corresponding to a plurality of pixel points of the target image, the region probabilities include at least one of a ceiling probability, a wall probability, and a ground probability, and the modifying the initial recognition result according to the comparison result to obtain the modified recognition result includes:

according to the comparison result, in the initial identification result, adjusting the region probability corresponding to a plurality of pixel points of the target image;

and obtaining the corrected recognition result according to the adjusted region probability corresponding to the plurality of pixel points of the target image.

8. The image region processing method according to any one of claims 1 to 6, wherein the performing image region recognition on the target image to obtain an initial recognition result includes:

and carrying out image region identification on the target image through an image identification model to obtain the initial identification result, wherein the image identification model is a deep learning model obtained through model distillation training.

9. The image region processing method according to any one of claims 1 to 6, further comprising, after obtaining the corrected recognition result:

and displaying the corrected recognition result on the target image.

10. An image area processing apparatus comprising:

11. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the image area processing method of any of claims 1 to 9.

12. A computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the image area processing method of any one of claims 1 to 9.

13. A computer program product comprising computer executable instructions which, when executed by a processor, implement the image area processing method of any of claims 1 to 9.