CN111275020A

CN111275020A - Room state identification method

Info

Publication number: CN111275020A
Application number: CN202010172320.0A
Authority: CN
Inventors: 姚国庆; 蒲庆; 陈浩; 高靖; 崔岩; 卢述奇
Original assignee: Qingwutong Co ltd
Current assignee: Qingwutong Co ltd
Priority date: 2020-03-12
Filing date: 2020-03-12
Publication date: 2020-06-12

Abstract

The invention discloses a room state identification method, and relates to the technical field of room image identification and processing. One embodiment of the method comprises: acquiring a room image to be identified; inputting a room image to be identified into a semantic segmentation network, and dividing component members (wall surface, ground surface and top surface) of a room in the room image to obtain a first identification image; the room image to be identified is input into the image identification network, the problem area (dirty area, damaged area, defective area) of the room is identified in the room image, and a second identification image is obtained. The embodiment is based on the combination of semantic segmentation and target detection, automatic identification and analysis are carried out on the room images, the decoration conditions of the wall surface, the ground surface and the top surface and the existing problems are uniformly scored or evaluated, the requirements on users are reduced, the operation is simple, the processing speed is high, the standards are uniform, the results are accurate and reliable, the visualization of the identification results is also provided, and the manual error correction is convenient.

Description

Room state identification method

Technical Field

The invention relates to the technical field of room image identification and processing, in particular to a room state identification method.

Background

After the house is built, house exploration is one of the key links, and the development of subsequent related matters (acceptance, maintenance, sale, decoration, valuation and the like) needs to refer to the result of the house exploration.

The site house inspection is a commonly used mode at present, when the site house inspection is performed, the site inspection can be performed on each room, people can generally score or evaluate the house according to the site inspection result of each room, and the scoring or evaluating content comprises but is not limited to: construction conditions, room problems, finishing conditions, finishing problems.

Since the experience of house exploration needs to be accumulated for a long time, most people know little about the experience, lack sufficient cognition and experience, and have to seek professionals with relatively rich experience. The following problems thus arise:

the house investigation is carried out by adopting a manual site investigation mode, which mainly depends on the actual experience of the investigation personnel, so that the investigation standards are difficult to unify, and the scoring or evaluation made by the manual site investigation is accurate and credible and is difficult to evaluate in a unified way.

Due to the fact that experience reading of the prospecting personnel is different, even though people with abundant prospecting experience exist, due to different mastery experiences, the final prospecting result can be different for the prospecting of the same house.

Under the condition of lacking of unified evaluation standards, management of room information, maintenance of room problems and tracing of room history conditions are difficult to realize, and house acceptance and house management standardization and automation are not facilitated.

In the prior art, appointed objects can be identified from video frames by using opencv, and the captured object pictures can be stored, but corresponding classifiers need to be trained first, and then the trained classifiers are used for identification, and the core idea is as follows: the api function of opencv is called to realize room state recognition (determination of the room state). The disadvantages of this solution are: the identification of the image is realized through an opencv image processing library calling algorithm, so that the room state identification is completed, the identification accuracy rate is bottleneck, and the accuracy rate is low. If the room state identification needs to be carried out on 1000 pictures of a room, the identification of the image is realized through an opencv image processing library calling algorithm, the identification accuracy is about 67%, the number of processed images per second is 1-2 frames, and the commercial value is difficult to realize.

Disclosure of Invention

In view of this, embodiments of the present invention provide a room state identification method, which performs automatic identification and analysis on a room image based on a combination of semantic segmentation and target detection, and performs uniform scoring or evaluation on decoration conditions and existing problems of a wall surface, a ground surface, and a top surface, so as to reduce requirements on users, and the method has the advantages of simple operation, fast processing speed, uniform standard, accurate and reliable results, and also has visualization of an identification result, and is convenient for manual error correction.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a room status recognition method including:

acquiring a room image to be identified;

inputting a room image to be identified into a semantic segmentation network, and dividing a room component in the room image to obtain a first identification image;

the constituent members include at least: wall, ground, top;

inputting a room image to be identified into an image identification network, identifying a problem area of a room in the room image, and obtaining a second identification image;

the problem area includes at least: a stained area, a damaged area, a defective area.

Further, the room image is from a still picture or a video clip,

the room image is an image of the interior condition of the room, the interior condition of the room refers to the condition of an indoor wall surface, a floor surface and a top surface, and at least comprises any one or part or all of the following:

the condition of the wall surface in the room,

the conditions of the floor inside the room are,

room interior ceiling conditions;

at least one or more room images are available, and when the number of the room images is multiple, all the room images completely cover the wall surface, the ground surface and the top surface of the room;

the room image shooting angles are the same or different;

the room image, the wall, floor or ceiling in the image is partially or fully repeated.

Further, the stained area refers to an area where stains are present,

the damaged area refers to an area where there is a crack or damage,

a defective region refers to a region where there is a structural deletion;

in the target detection processing, the decoration state of the room is identified in the room image, and the decoration state is divided into: blank room, hardcover room.

Still further, the method further comprises: and performing IOU calculation on the first identification image and the second identification image, and outputting the corresponding relation between the component and the problem area.

Further, the correspondence relationship includes:

corresponding area coordinates and categories after area division are carried out according to the wall surface, the ground surface and the top surface,

the coordinates of the problem area, typically the upper left and lower right point coordinate values,

and judging the central point of the problem area on the wall surface, the ground surface or the top surface.

Still further, the method further comprises: the visualization of the correspondence of the component members to the problem areas at least comprises:

a first prediction frame is formed in the component member region,

a second prediction bounding box is formed in the problem area.

Still further, the method further comprises: and (3) manual error correction processing: and selecting the first prediction frame or the second prediction frame, marking the content with the display error to form an artificial error correction image, and transmitting the artificial error correction image to the client or the cloud.

Furthermore, the semantic segmentation network and the image recognition network sequentially perform the following processing before inputting the room image to be recognized:

training, namely inputting a training image into a semantic segmentation network or an image recognition network, and performing feature extraction on the image to be trained through multiple forward propagation and multiple backward propagation to finally form a prediction model;

test processing, namely inputting a test image into a semantic segmentation network or an image recognition network which finishes training processing, analyzing the test image through a prediction model to finally form a prediction result, and evaluating the accuracy of the prediction model based on the prediction result;

storing the trained parameters of the prediction model meeting the accuracy requirement and a model file, wherein the model file comprises the structure of the neural network and the weight parameters learned in the training;

the accuracy rate is not less than 93 percent and is considered to meet the accuracy rate requirement;

for the semantic segmentation network, the accuracy is calculated by calculating the average intersection ratio mIOU,

for the image recognition network, the accuracy is calculated by calculating the average accuracy mAP.

Furthermore, during the training process, the selection of the training set images meets the requirement that the input resolution is not lower than 1333 x 800;

and (3) selecting the training set images, wherein the following conditions are met:

the shelters for the wall surface, the ground surface and the top surface are not needed as much as possible,

the conditions of exposure, over-darkness, blurring and noise in the image are avoided as much as possible;

during training, the learning rate is defaulted to 0.01, and the training times are defaulted to 60 ten thousand;

during the test processing, the selection of images in the test set is required, and the selection of images in the training set is required.

Furthermore, the training set image and the test set image generate corresponding identification labels for each image through a label tool,

at least the following information is stored in the tag file:

the image name, the name of the labeling frame and the position information of the labeling frame;

when the corresponding identification label is generated, more than two marking frames are only linearly distributed along the vertical direction or the horizontal direction, no intersection exists between the marking frames, and the problem area is not branched or forked.

One embodiment of the above invention has the following advantages or benefits:

the room state identification method provided by the invention is based on the combination of semantic segmentation and target detection, automatically identifies and analyzes the room image, uniformly scores or evaluates the decoration conditions and existing problems of the wall surface, the ground surface and the top surface, reduces the requirements on users, is simple to operate, high in processing speed, uniform in standard, accurate and reliable in result, has the visualization of the identification result, and is convenient for manual error correction.

The room state identification method disclosed by the invention is novel in design, combines semantic segmentation and target detection, is applied to room state identification, improves the identification accuracy rate and the identification efficiency, can process 6-8 frames of images per second, has the accuracy rate of 93.5%, and is suitable for commercial production application.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a flowchart of an embodiment 1 of a room status recognition method according to the present invention;

FIG. 2 is a flow chart of embodiment 2 of the room status identification method of the present invention;

FIG. 3 is a flow chart of embodiment 3 of the room status identification method of the present invention;

fig. 4 is a schematic diagram of a label box corresponding to an identification tag.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of an embodiment of a room status identification method according to an embodiment of the present invention, and as shown in fig. 1, an embodiment of the present invention provides a room status identification method, including:

acquiring a room image to be identified;

the room image is from a still picture or video clip, such as: still pictures taken by a camera or a video camera, for example: a video clip taken by the camera is displayed,

the camera can be a mobile phone camera and can be a camera arranged on a holder;

as an alternative embodiment, the room image is an image of the interior condition of a room, which refers to the condition of an indoor wall surface, a floor surface and a top surface, and at least includes any one or part or all of the following:

room interior wall condition, the wall includes: a bearing wall surface, a partition wall surface,

the conditions of the floor inside the room are,

the condition of the ceiling inside the room, i.e. the ceiling,

note that, the case where the room image is not a house facade;

as an alternative embodiment, at least one, or more than one, of the room images may be provided, and preferably, the room images completely cover the wall, the floor, and the top of the room, that is: when the number of the room images is multiple, all the images completely cover the wall surface, the ground surface and the top surface of the room;

as an alternative embodiment, the room image capturing angles are the same or different, for example: room images taken from the same angle at different magnification scales, again for example: room images of the same magnification taken from different angles,

the magnification ratio can be selected as required when the shooting angles are the same or different, and the magnification ratio can be adjusted through a zooming function, particularly optical zooming;

as an alternative embodiment, the room image, the wall, floor or ceiling in the image is partially or totally repeated, for example: when the image appears in the plurality of room images, the room image can only comprise a local part of one section of the wall surface A, and the room image can also comprise the whole part of one section of the wall surface A;

semantic segmentation processing: inputting a room image to be identified into a semantic segmentation network, and dividing a room component in the room image to obtain a first identification image;

the constituent members include at least: wall, ground, top;

as an alternative embodiment, the constituent members may further include: a beam body, a column body, a stair, a guardrail, a door body, a window body and a step;

and (3) target detection processing: inputting a room image to be identified into an image identification network, identifying a problem area of a room in the room image, and obtaining a second identification image;

the problem area includes at least: a stained area, a damaged area, a defective area;

the stained area refers to an area where stains exist, such as stains caused by scratches of a pen, stains caused by shoe marks, ball marks, and the like, stains caused by adhesion of other substances (colloid, mold, and the like) to a wall surface or a floor surface or a ceiling surface, stains caused by water stains, and the like,

the damaged area refers to an area where there is a crack, damage, such as cracking, peeling, bubbling, chipping,

a defective area refers to an area where a structural defect exists, such as an unsealed wall opening, a missing corner of a suspended ceiling or a missing piece of suspended ceiling (a missing aluminum gusset plate, a missing integrated suspended ceiling module, and the like).

Further, at the time of the target detection processing, a finishing state of the room is recognized in the room image, the finishing state being defaulted as: blank room, hardcover room.

It should be noted that: the problem area does not distinguish the state of the finish of the room, i.e.: and directly judging whether the wall surface, the ground surface and the top surface are good or bad according to the identified problem area, and having no relation with the decoration state of the room. In addition, different problems in different areas are realized by the same set of strategy in the training and prediction stages without different treatment.

On the basis of the above technical solution, as shown in fig. 2, the method further includes: and performing IOU (Intersection over Union) calculation on the first identification image and the second identification image, and outputting the corresponding relation between the component and the problem area.

As an optional implementation, the correspondence relationship includes:

the corresponding area coordinates and categories after the area division is carried out according to the wall surface, the ground surface and the top surface, the category refers to that the area is marked as the wall surface, the ground surface or the top surface, the coordinates refers to that the range and the outline of the area are marked by a plurality of coordinate values, the area division can be carried out by adopting a rectangle,

The corresponding relation between the component and the problem area is calculated and output through the IOU, whether the problem area exists in each component is automatically prompted, what problem area exists, the assessment mode is unified, the scoring standard is unified, the decoration condition of the wall surface, the ground surface and the top surface and the existing problem can be uniformly scored or assessed, and the house association standardization and automation are facilitated.

On the basis of the above technical solution, as shown in fig. 3, the method further includes: the visualization of the correspondence of the component members to the problem areas at least comprises:

a first prediction frame is formed in the component member region,

a second prediction bounding box is formed in the problem area.

Through visual processing, the corresponding relation between the component and the problem area is conveniently and visually and quickly checked, the user experience is improved, and the management and maintenance of the problem area are simplified.

On the basis of the above technical solution, as shown in fig. 3, the method further includes: and (3) manual error correction processing: and selecting the first prediction frame or the second prediction frame, marking the content with the display error to form an artificial error correction image, and transmitting the artificial error correction image to the client or the cloud.

Through the manual error correction processing, the accuracy and the credibility of the room state recognition result are further ensured, the complex degree of manual error correction is reduced by combining the visual processing, and the efficiency of the manual error correction is improved.

On the basis of the technical scheme, the semantic segmentation network and the image recognition network sequentially perform the following processing before inputting the room image to be recognized:

and storing the trained parameters of the prediction model meeting the accuracy requirement and a model file, wherein the model file comprises the structure of the neural network and the weight parameters learned in the training.

As an alternative embodiment, the accuracy of no less than 93% is considered to meet the accuracy requirement.

As an alternative embodiment, for the semantic segmentation network, the accuracy is calculated by calculating the average intersection ratio mIOU,

As an alternative embodiment, the training set images are selected during the training process to ensure that the input training set image resolution (PPI, pixel per inch) is large enough and clear enough, because some relatively small problems (e.g., small holes, cracks) require the images to be clear enough to be identified. For example: for room problem identification, the input resolution (PPI) is not less than 1333 × 800, the resolution unit may be dpi,

as an optional implementation scheme, during the training process, the selection of the training set image should ensure that no shielding objects for the wall surface, the ground surface and the top surface are required in the input training set image as much as possible, and avoid the situations of exposure, over-darkness, blur and noise in the image as much as possible.

As an optional implementation scheme, during training, the learning rate is defaulted to 0.01, the training times are defaulted to 60 ten thousand, and because the types of room problems are more complex, the initial learning rate is large enough, the training times are also sufficient, and the learning rate is guaranteed to be good enough to reach the global optimal solution through a warp + cosine (warm-up phase + cosine annealing) learning rate adjustment strategy.

As an alternative embodiment, the selection of the test set images, which requires the selection of the training set images, is performed during the test process.

As an optional implementation, the training set image and the test set image each generate a corresponding identification tag for each image through a tag tool, where the tag tool may be a labellimg tool, the identification tag is a tag file in xml format or json format, and the tag file stores at least the following information:

image name, labeling frame name and labeling frame position information. Wherein the name of the label box corresponds to the type of the problem area, for example: if a crack is included in a certain image, the name of the labeling box is the crack.

For example: during training, training images used for inputting the semantic segmentation network are set as segmentation data sets, wherein each training image corresponds to a label file in json format,

during training, training images used for inputting the image recognition network are set as recognition data sets, wherein each training image corresponds to a tag file in an xml format.

As an optional implementation, when generating the corresponding identification tag, more than two labeling frames are only linearly distributed along the vertical direction or the horizontal direction, no intersection exists between the labeling frames, and the problem area is not branched or forked. Referring to fig. 4, in this example, the labeled boxes are linearly distributed along the vertical direction, there is no intersection between the labeled boxes, and there is no branch or bifurcation in the problem area. The problem area is not easy to generalize if the problem area comprises branches or bifurcations, so that more than two labeling boxes are only linearly distributed along the vertical direction or the horizontal direction. Although fig. 4 only shows the indication frame of the crack, other problem areas such as air bubbles, dirt, and messy drawings, etc. may be the indication frame of the reference crack, or the non-linear direct indication frame.

As an optional embodiment, during the training process, the smoothing process Label _ smoothing is performed on the Label, which specifically includes: and converting the multi-classification labels into one-hot vectors, defaulting that the weight of each category after one _ hot is 1, modifying the weight of each category after one _ hot, and softening the one-hot type labels, so that the overfitting phenomenon can be effectively inhibited when a loss function is calculated.

As one of the alternative embodiments, during the training process, the training images are divided into the training set and the verification set according to a certain proportion, namely the training set images and the test set images are divided according to a certain proportion,

as an alternative embodiment, the training set accounts for 80% and the validation set accounts for 20%.

Furthermore, the total number of training images is 50-1000,

the training images are divided into: blank room training images and finishing training images, wherein the training images of different problem areas of different component members are 50 pieces at least.

Furthermore, in order to improve generalization capability and solve the problems of unbalanced data and insufficient data, during training, a mixup interpolation operation is further performed, and two pictures are fused together according to a certain proportion as input, specifically including:

dividing the image into an image set containing a problem area as a first image set and an image set not containing the problem area as a second image set, taking any one of the first image set and the second image set, fusing two images according to a certain proportion through a mixup interpolation operation as input, generating more data samples (image samples) through a mixup mode, and performing training processing together with the original data set.

Furthermore, during the training process, a test timing enhancement (TTA) operation is further performed, a plurality of different transformations are performed on the sample image to obtain a plurality of different prediction results, and then the prediction results are averaged to improve the precision, for example: using 3 × TTA, comprising: random crop of the picture, random horizontal flip of the picture, random crop of the picture, and random vertical flip of the picture.

On the basis of the technical scheme, the semantic segmentation network is any one of the following networks: deeplabv3, Mask-Rcnn, Unet;

the image recognition network is any one of the following: fast RCNN, Cascade RCNN.

Further, when the Cascade RCNN image recognition network is used, at least 9 Anchor frames Anchor boxes with different sizes are obtained through a k-means clustering algorithm based on distance, and the generated Anchor frames Anchor boxes are written into a configuration file for training.

On the basis of the technical scheme, during training processing, after a training image is iterated for a specified number of times, loss functions loss and accuracy rates acc of a training set and a verification set are output, a loss/acc curve is drawn when training is finished and stored,

wherein:

in the semantic segmentation network, the loss function loss is: the cross-entropy loss function is a function of,

in the image recognition network, the loss function loss is: cross entropy loss function, smoothL1 loss function, sum _ ohem loss function.

The samples comprise more samples which are difficult to learn, so the problem of sample imbalance is solved through a sum _ ohem loss function.

The following is a specific implementation example.

The room image in jpg format is input,

the output format is as follows:

[{"image_id": "public-20190906-FvPEmPiZk1Hj0OlGnL81FIvoEYNz",

"tag": "wall_tag",

"status": 1,

"error": []}]

wherein:

image _ id: a unique id number representing the image;

tag: the output is represented by which type of room component, the wall surface is wall _ tag, the ground surface is floor _ tag, and the top surface is top _ tag;

status: after passing through a semantic segmentation model and an image recognition model, carrying out iou calculation and outputting a result, wherein 1 represents 'good and no problem', and 0 represents 'problem';

error: and outputting error information.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A room status recognition method, comprising:

acquiring a room image to be identified;

the constituent members include at least: wall, ground, top;

2. The identification method according to claim 1, characterized in that the room image is from a still picture or a video clip,

the condition of the wall surface in the room,

the conditions of the floor inside the room are,

room interior ceiling conditions;

the room image shooting angles are the same or different;

3. The identification method according to claim 1, wherein the stained area refers to an area where stains are present,

the damaged area refers to an area where there is a crack or damage,

a defective region refers to a region where there is a structural deletion;

4. The identification method according to claim 1, further comprising: and performing IOU calculation on the first identification image and the second identification image, and outputting the corresponding relation between the component and the problem area.

5. The identification method according to claim 4, wherein the correspondence relationship comprises:

6. The identification method of claim 4, further comprising: the visualization of the correspondence of the component members to the problem areas at least comprises:

a first prediction frame is formed in the component member region,

a second prediction bounding box is formed in the problem area.

7. The identification method of claim 6, further comprising: and (3) manual error correction processing: and selecting the first prediction frame or the second prediction frame, marking the content with the display error to form an artificial error correction image, and transmitting the artificial error correction image to the client or the cloud.

8. The recognition method according to claim 1, wherein the semantic segmentation network and the image recognition network sequentially perform the following processes before inputting the room image to be recognized:

9. The recognition method according to claim 8, wherein, in the training process, the training set images are selected so that the input resolution is not lower than 1333 x 800;

10. The recognition method of claim 9, wherein the training set images and the test set images each generate a corresponding recognition label for each image by a labeling tool,

at least the following information is stored in the tag file: