CN111967443A

CN111967443A - Image processing and BIM-based method for analyzing interested area in archive

Info

Publication number: CN111967443A
Application number: CN202010920952.0A
Authority: CN
Inventors: 邵传宏; 徐彩营
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-11-20

Abstract

The invention relates to an analysis method of interested areas in an archive based on image processing and BIM (building information modeling), which comprises the steps of firstly constructing the BIM of the archive and an information exchange module thereof, and collecting human body images of visitors; then, carrying out multi-target posture estimation on the human body image by using a DNN neural network to obtain different human body target enclosing frames, detecting the two-foot positions of the human body target in a two-foot key point mode, and projecting the two-foot key point positions on the BIM ground of the archive in real time by using projection transformation; obtaining the stay duration and the watching state of the visitor in the showcase area according to the key points of the two feet and the coordinate change of the key points; finally, judging the interest degree of the visitors in the showcase area according to the total visitors entering the showcase area in a set time period, the stay time of the visitors in the showcase area and the watching state; the method can quickly analyze the interest degree of the visitor in the showcase in the archive, and has high accuracy and strong real-time property.

Description

Image processing and BIM-based method for analyzing interested area in archive

The technical field is as follows:

the invention relates to a method for analyzing an interested area in an archive, in particular to a method for analyzing the interested area in the archive based on image processing and BIM.

Background art:

the file is a history record with preservation value in various forms directly formed by people in various social activities, is one of the most important witnesses of human history, has various carriers, and not only comprises character paper, charts and photo records, but also can be audio records, video records, real objects and electronic files formed in the current digital era.

Most of the ordinary people come to archives to look up data, especially to look up files related to themselves, such as personal or relatives' personal files, unit files that they have worked on, personal real estate files, etc. The archives can open some precious historical documents at regular or irregular time for people to visit and study, and the archives are placed in a specific file showcase in order to protect the archives from being damaged. The archives do some statistical work in order to know which part of historical literature and historical materials are interested by visitors, generally take a showcase as a unit, and count changes of the number and density of visitors in front of a showcase or the retention time of each target, but the rough mode easily brings wrong judgment, and the obtained result is inaccurate and reliable; in actual life, although some people often stay in the showcase area for a long time, but do not watch showcase articles, the people are uninteresting on the showcases, and if the people are judged according to the number, density and stay time of targets, the targets can increase the interest degree of the judged people on the showcases, such as: a large number of crowds are blocked in the showcase area by a group of tourist groups, students and the like, but the number and the density of the crowds interested in the exhibits are not as much as those reflected by the number and the density; visitors in front of the showcase only hurry, and others carefully watch the showcase in a standing manner, the number, density and residence time of visitors in front of a certain showcase are only used for reflecting the interest degree of the visitors in the showcase, and the interest degree of the visitors in the showcase is not proper, and meanwhile, the standing time of the visitors in front of the showcase and whether the visitors watch files, namely the states of the visitors, can be comprehensively reflected. Due to the large number of visitors, the existing manual counting or common electronic counting method can only roughly count the interests of the visitors, but cannot accurately reflect the interest degree of the visitors in a certain showcase.

The invention content is as follows:

the technical problem to be solved by the invention is as follows: the method for analyzing the interesting area in the archive based on the image processing and BIM can quickly analyze the interesting degree of the visitor to the showcase in the archive, and is high in accuracy and real-time.

The technical scheme of the invention is as follows:

an analysis method of interest areas in an archive based on image processing and BIM comprises the following steps:

step 1, constructing a BIM (building information model) of an archive and an information exchange module thereof, wherein the BIM mainly comprises acquired camera perception information, geographical position information of the archive and current environment information of the archive, and the information exchange module is an access module of a CIM (city information model) database;

the visual detection technology based on the computer has the remarkable advantages of non-contact, high efficiency, economy and the like, and has wide application prospect in various detection management applications, so that the monitoring efficiency can be effectively improved by adopting a form of combining the BIM and the computer vision. The visual detection result is uploaded to Webgis as information, visual processing is carried out by means of the Webgis, and a supervisor can search, inquire and analyze on the web, so that the supervisor can monitor crowd flowing in the archive in real time, know popularity of each showcase and know interests of the crowd.

Step 2, shooting visitors in any showcase area by using a camera in the archive, and collecting images of human bodies of the visitors;

step 3, in the image shot by the camera, the number of the human bodies is unknown, and the human bodies have complex interrelations, such as contact, shielding, joint connection and the like, so that the relationship between joints is difficult to establish, and the running speed of the general method has a great relationship with the number of targets in the image, so that the real-time performance is difficult to achieve. The DNN neural network is used for carrying out multi-target posture estimation on the collected images of the human body of the visitor, so that a human body target under a multi-target scene is obtained, the human body target in the images is marked in a form of a surrounding frame, the total number P of people entering the showcase area in a set time period is obtained, and multi-target detection is realized: human body detection, people counting and stay duration sensing;

the DNN neural network as a specific training element of the human body detection network comprises the following steps:

(1) the data collection adopts the human body image of the visitor in the archive, which is shot by the camera, and the human body image comprises a large-density visitor group image in the archive;

(2) the data labels are x, y, w and h, wherein x and y are coordinates of the center of the bounding box, w is the width of the bounding box, and h is the height of the bounding box; when in marking, the human body of the shielded part should be marked with the surrounding frame; the labels x, y, w and h need to be normalized;

(3) the loss function is a mean square error loss function.

Step 4, detecting the feet parts in the image corresponding to each human body target by using a DNN neural network, and marking the feet parts in the image in a key point mode; the reason for selecting the key points of the two feet is as follows: the human body is three-dimensional in the space, and the projection process is changed into the overlooking visual angle from oblique visual angle and is referred to with ground, and other key points of human body all have certain spatial distance to ground, always can produce great error after the projection, makes the projection point fall to other showcases region even to influence the judged result. The errors of the projection position and the actual position can be reduced as much as possible by selecting key points of the two feet.

The DNN neural network as a specific training element of the two-foot key point detection network comprises the following steps:

(1) the data set adopts cut human body images of visitors, which should include human body images with two blocked feet;

(2) the label is the key point of the two feet of the human body: a left foot key point and a right foot key point; the labeling process is as follows: each key point corresponds to a single channel, the pixel position of the key point is marked in the channel, and then Gaussian blur is adopted to form key point hot spots at the marked point; a total of two types of key points are used: the left foot key point and the right foot key point, so that the label image comprises two channels; the key points of the human body which is blocked are marked;

(3) the loss function is a mean square error loss function.

Step 5, after the key points of the two feet of each target in the image are obtained, projecting the key point positions of the two feet in the image in the step 4 on the BIM ground in real time by using projection transformation according to an imaging principle so as to carry out overall judgment in a overlooking visual angle in the BIM space;

step 6, judging the lingering time of the human body target according to the coordinate change of the key points of the two feet on the time sequence to obtain a lingering coefficient of the visitor in the showcase area for evaluating the subsequent interest degree;

human detection has achieved target counting and tracking functions.

Step 7, obtaining the orientation of the human body by using the key points of the two feet, and judging whether the human body faces the showcase or not; if the human body faces the showcase, the human body is considered to be in a watching state; calculating the ratio of the watching time length and the stay time length of the human body to obtain the watching coefficient of the visitor in the showcase area;

connecting key points of the two feet, representing the whole position of the human body by the central point of the connecting line, representing the forward direction of the human body by the perpendicular direction of the connecting line, and identifying the orientation of the human body according to the difference between the left foot and the right foot in the two feet.

The purpose of obtaining the orientation is to make the judgment result more accurate. If the stay visitor is not oriented toward the display case, the interest level in the display case area should be reduced even if the stay visitor stays in the display case area for a long time.

The mode of judging whether the human body faces the showcase is as follows: obtaining coordinates (X) of showcase through BIM₀，Y₀) After the two-foot key points are projected on the BIM ground, the coordinates and the orientation of the two-foot key points are obtained, the horizontal axis is taken as the X axis, the vertical axis is taken as the Y axis, and the coordinate of the left-foot key point is taken as (X)_a，Y_a) The coordinate of the key point of the right foot is (X)_b，Y_b) If X is_a<X_b，Y_a>Y_bThe human body faces to the upper right; while if X_a<X₀,Y_b<Y₀Then the person is considered to be looking at the showcase.

Step 8, accumulating the interest degrees of all visitors in the showcase area to obtain the total interest degree of the showcase area, so as to obtain the popularity level of the showcase area; obtaining the number of people appearing in the showcase area according to the ID of the target frame in the image detection, wherein the interest degree of each visitor is the viewing time within the stay time of each visitor, namely the product of the stay coefficient and the viewing coefficient;

the overall interest level L of the showcase region is as follows:

wherein P is the total number of people entering the showcase area within a set time period, T_iFor the stay factor of the ith visitor in the showcase area, M_iThe viewing coefficient of the ith visitor in the showcase area is shown.

In step 1, the information exchange form of the information exchange module is RESTful or MQ.

In step 3, two methods for performing multi-target attitude estimation are commonly used: the Top-down Top-down method and the Bottom-up method. The Top-down method comprises the following steps: firstly, detecting targets by using a detector, and then estimating the single attitude of each detected target; the process of the Bottom-up method is as follows: detecting all key points, and combining the key points according to an Affinity relationship field PAFs (part Affinity fields) to obtain a key point combination of each individual; the invention uses a Top-down Top-down method to carry out multi-target attitude estimation: firstly, detecting a human body of a visitor to obtain a bounding box (bounding box), then carrying out image crop to obtain a cutting image, sending the cutting image to a two-foot detection module to obtain two-foot key points of a human body target, and finally projecting the cutting image to the ground of the BIM by utilizing projection transformation.

The center point, width and height of the bounding box of the human body were obtained by regression using the method of centret.

The method comprises the steps of adopting a mode of calculating the intersection ratio (IOU) of adjacent surrounding frames to realize multi-target tracking, avoiding repeated detection of different surrounding frames on the same target, namely obtaining the intersection ratio (IOU) of two adjacent surrounding frames, and judging that the two adjacent surrounding frames are detected as the same target when the value of the intersection ratio (IOU) is greater than 0.7.

The invention has the beneficial effects that:

1. the invention combines the computer vision detection technology with the BIM, determines the interest degree of the visitor in the showcase in the archive by analyzing and calculating the time length of the visitor towards the showcase in the total residence time, and obtains more accurate results which accord with the actual situation.

2. The DNN neural network is used for processing the collected multi-target images of the human bodies of the visitors, and even if the number of the visitors is large and the human bodies of the visitors have complex interrelations such as contact, shielding and joint connection, all human body targets in the images can be marked in a form of a surrounding frame, so that real-time analysis is achieved, and the real-time performance is strong.

3. The invention has high analysis speed, can upload the result to a supervision network, and supervisors can search, inquire and analyze on the web, thereby overcoming the defects of the prior manual counting or common electronic counting method.

Description of the drawings:

FIG. 1 is a schematic diagram of a method for determining whether a human body faces a showcase in an image processing and BIM-based method for analyzing an area of interest in an archive;

in the figure, 1 is the key point of the left foot, 2 is the key point of the right foot, 3 is the central point, 4 is the orientation, and 5 is the showcase.

The specific implementation mode is as follows:

the method for analyzing the interested area in the archive based on image processing and BIM comprises the following steps:

the visual detection technology based on the computer has the remarkable advantages of non-contact, high efficiency, economy and the like, and has wide application prospect in various detection management applications, so that the monitoring efficiency can be effectively improved by adopting a form of combining the BIM and the computer vision. The visual detection result is uploaded to Webgis as information, visual processing is carried out by means of the Webgis, and a supervisor can search, inquire and analyze on the web, so that the supervisor can monitor crowd flowing in the archive in real time, know the popularity of each showcase 5 and know the interest of the crowd.

Step 2, shooting visitors in any area of the showcase 5 by using a camera in the archive, and collecting images of human bodies of the visitors;

step 3, in the image shot by the camera, the number of the human bodies is unknown, and the human bodies have complex interrelations, such as contact, shielding, joint connection and the like, so that the relationship between joints is difficult to establish, and the running speed of the general method has a great relationship with the number of targets in the image, so that the real-time performance is difficult to achieve. The DNN neural network is used for carrying out multi-target posture estimation on the collected images of the human body of the visitor, so that a human body target under a multi-target scene is obtained, the human body target in the images is marked in a form of a surrounding frame, the total number P of people entering the area of the showcase 5 in a set time period is obtained, and multi-target detection is realized: human body detection, people counting and stay duration sensing;

(3) the loss function is a mean square error loss function.

Step 4, detecting the feet parts in the image corresponding to each human body target by using a DNN neural network, and marking the feet parts in the image in a key point mode; the reason for selecting the key points of the two feet is as follows: the human body is three-dimensional in the space, and the projection process is changed into the overlooking visual angle from oblique visual angle and is referred to with ground, and other key points of the human body all have certain spatial distance to ground, always can produce great error after the projection, even make the projection point fall into other 5 regions of showcase to influence the judged result. The errors of the projection position and the actual position can be reduced as much as possible by selecting key points of the two feet.

(2) the label is the key point of the two feet of the human body: a left foot key point 1 and a right foot key point 2; the labeling process is as follows: each key point corresponds to a single channel, the pixel position of the key point is marked in the channel, and then Gaussian blur is adopted to form key point hot spots at the marked point; a total of two types of key points are used: a left foot key point 1 and a right foot key point 2, so that the label image comprises two channels; the key points of the human body which is blocked are marked;

(3) the loss function is a mean square error loss function.

step 6, judging the lingering time of the human body target according to the coordinate change of the key points of the two feet on the time sequence to obtain a lingering coefficient of the visitor in the area of the showcase 5 for subsequent evaluation of the interest degree;

human detection has achieved target counting and tracking functions. And for the target newly appearing in the image, the appearance time is 1, the sampling frequency of the system is 3s once, and if the projection coordinate of the target key point of the next frame image still exists in the set display cabinet 5 area, the mark value is added by 1. And obtaining the length of the stay of the target by counting the mark values corresponding to the target. When the time range is 1 second to 10 minutes, namely the same state is remained in the area of the showcase 5 for 10 minutes and less, the value range of the mark value is [1,200], and the staying coefficient T is obtained after normalization treatment.

Step 7, obtaining the orientation 4 of the human body by using the key points of the two feet, and judging whether the human body faces the showcase 5; if the human body faces the showcase 5, the human body is considered to be in a watching state; calculating the ratio of the watching time length and the stay time length of the human body to obtain the watching coefficient of the visitor in the area of the showcase 5;

connecting key points of the two feet, representing the whole position of the human body by the central point 3 of the connecting line, representing the forward direction of the human body by the vertical line direction of the connecting line, and identifying the orientation 4 of the human body according to the difference between the left foot and the right foot in the two feet, as shown in figure 1.

The purpose of obtaining the direction 4 is to make the determination result more accurate. If the visitor is not facing the display case 5, his interest in the area of the display case 5 should be reduced even if he stays in the area of the display case 5 for a long time.

The way of judging whether the human body faces the showcase 5 is as follows: coordinates (X) of the showcase 5 are obtained by BIM₀，Y₀) After the two-foot key points are projected on the BIM ground, the coordinates and the orientation 4 of the two-foot key points are obtained, as shown in FIG. 1, the horizontal axis is the X axis, the vertical axis is the Y axis, and the coordinate of the left-foot key point 1 is (X)_a，Y_a) The coordinate of the right foot key point 2 is (X)_b，Y_b) If X is_a<X_b，Y_a>Y_bThe human body faces to the upper right; while if X_a<X₀,Y_b<Y₀Then the person is deemed to be looking at the showcase 5.

Assuming that the sampling frequency is 3s once, each frame of image is divided into a viewing state and an unviewed state, if the target staying time is 5 minutes, the target staying time corresponds to 100 frames of image, wherein each frame is divided into a viewing state and an unviewed state, and if the statistical viewing state is 90 frames, the viewing coefficient M is 90/100 ═ 0.9.

Step 8, accumulating the interest degrees of all visitors in the area of the showcase 5 to obtain the total interest degree of the area of the showcase 5, so as to obtain the popularity level of the area of the showcase 5; obtaining the number of people appearing in the area of the showcase 5 according to the ID of the target frame in the image detection, wherein the interest degree of each visitor is the viewing time within the stay time of each visitor, namely the product of the stay coefficient and the viewing coefficient;

the overall interest level L of the area of the showcase 5 is:

wherein P is the total number of people entering the area 5 of the showcase in a set time period, T_iFor the stay factor of the ith visitor in the area of the showcase 5, M_iThe viewing factor of the ith visitor in the area of the showcase 5.

Claims

1. An interest area analysis method in an archive based on image processing and BIM is characterized in that: comprises the following steps:

step 1, constructing a BIM of an archive and an information exchange module thereof;

step 3, carrying out multi-target posture estimation on the collected images of the human bodies of the visitors by using a DNN neural network to obtain human body targets under a multi-target scene, marking the human body targets in the images in a form of a surrounding frame, and obtaining the total number P of people entering the showcase area within a set time period;

step 4, detecting the feet parts in the image corresponding to each human body target by using a DNN neural network, and marking the feet parts in the image in a key point mode;

step 5, projecting the positions of the key points of the two feet in the image in the step 4 on the BIM ground in real time by using projection transformation;

step 6, judging the lingering duration of the human body target according to the coordinate change of the key points of the two feet on the time sequence to obtain the lingering coefficient of the visitor in the showcase area;

step 8, accumulating the interest degrees of all visitors in the showcase area to obtain the total interest degree of the showcase area,

the overall interest level L of the showcase region is as follows:

2. The image processing and BIM based method for analyzing regions of interest in an archive as claimed in claim 1, wherein: in the step 1, the information exchange form of the information exchange module is RESTful or MQ.

3. The image processing and BIM based method for analyzing regions of interest in an archive as claimed in claim 1, wherein: in the step 3, a Top-down Top-down method is used for multi-target attitude estimation: firstly, detecting the human body of the visitor to obtain an enclosing frame, and then carrying out image crop to obtain a cutting image.

4. The image processing and BIM based method of analyzing regions of interest in an archive as claimed in claim 3, wherein: the center point, width and height of the bounding box of the human body were obtained by regression using the method of centret.

5. The image processing and BIM based method of analyzing regions of interest in an archive as claimed in claim 4, wherein: and (3) solving the intersection ratio of the two adjacent surrounding frames, and judging that the two adjacent surrounding frames are detected as the same target when the intersection ratio is greater than 0.7.