CN115410121A

CN115410121A - Video-based automatic determination method for joint seal person, electronic device and storage medium

Info

Publication number: CN115410121A
Application number: CN202210987107.4A
Authority: CN
Inventors: 杨玉春; 胡东平; 叶新江
Original assignee: Merit Interactive Co Ltd
Current assignee: Merit Interactive Co Ltd
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-11-29

Abstract

The invention provides a video-based automatic judgment method for tight-lock personnel, which comprises the steps of firstly, acquiring a corresponding monitoring video based on related information of a target object, then acquiring all images containing the target object from the monitoring video, then, generating action tracks of the target object based on all the acquired images for protecting the target object, and then, acquiring personnel with space intersection with the target object for each frame of image in each action track. The invention also provides electronic equipment and a storage medium. The invention can quickly and accurately find out the personnel with the intersection with the target object in the time space.

Description

Video-based automatic judgment method for contact person, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method for automatically determining a person in contact with a skin, an electronic device, and a storage medium.

Background

In some application scenarios, it is necessary to acquire related people having intersection with the target people on the temporal-spatial information. The current methods for acquiring related people mainly include two methods, one is to manually collect the action track of the target person and then to visit the information of the related people on the action track as much as possible; the other method is to determine whether the person is the related person through a set identifier on the mobile terminal, and then acquire information of the related person based on the acquired action information and the action track of the target person. The first method can acquire relatively accurate personnel information, but is time-consuming and labor-consuming. Compared with the first mode, the second mode uses an information means to acquire the information of the related personnel, so that the time and the labor can be saved, but the acquired information of the related personnel is not accurate due to low information density and large position error.

Disclosure of Invention

Aiming at the technical problem, the technical scheme adopted by the invention is as follows:

the embodiment of the invention provides a video-based automatic judgment method for joint sealing personnel, which comprises the following steps:

s100, acquiring a first monitoring video about a target object;

s110, decoding the first monitoring video to obtain n1 frame images; identifying the human body information in any frame image i in the n1 frame images to obtain an identification information table C of the image i _i ，C _i The j-th row in (b) includes identification information (BID) of the j-th human body identified in the image i _ij ，P _ij ，G _ij ，t _i ，CID _i )，BID _ij For the ID, P of the j-th body identified in image i _ij For the image feature vector of the j-th body identified in image i, G _ij Is the position, t, of the j-th body identified in the image i _i CID as the photographing time of image i _i ID of the camera for image i; j takes values from 1 to m (i), and m (i) is the number of human bodies identified in the image i; the value of i is 1 to n1;

s120, based on any identification information table C _i Obtaining a similarity set S _i ＝(S _i1 ，S _i2 ，…，S _ij ，…，S _im(i) ) Wherein S is _ij Is C _i P in (1) _ij Similarity with an image feature vector P in the image information of the target object;

s130, if max (S) _i ) If the number of the images is more than or equal to K, storing the image i into the first target image set; k is a set similarity threshold;

s140, acquiring a motion track sequence of the target object based on the first target image set;

s150, for any frame image r in t (p) frame images corresponding to any action track sequence p, identifying information table C based on image r _r Acquiring the distance between any human body except the target object in the image r and the target object in a world coordinate system;

s160, acquiring identification information corresponding to the distance within the set distance threshold value, and generating a first set;

s170, carrying out duplicate removal processing on the information in the first set to obtain a first target set.

The present invention also provides a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the aforementioned method.

The invention also provides an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

The invention has at least the following beneficial effects:

the automatic judgment method for the tight-lock person based on the video comprises the steps of firstly obtaining a corresponding monitoring video based on relevant information of a target object, then obtaining all images containing the target object from the monitoring video, then generating action tracks of the target object based on all the obtained images including the target object, and then obtaining a person with space intersection with the target object for each frame of image in each action track.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for automatically determining a person in close contact based on a video according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a video-based automatic judgment method for a joint sealing person, which comprises the following steps of:

s100, a first monitoring video about the target object is obtained.

In an exemplary embodiment of the invention, the target object may be a person who needs to be restricted from freedom for a particular period of time, for example, in one particular application scenario, the target object may be a person diagnosed as having an infectious disease.

In an embodiment of the present invention, the first surveillance video may be obtained based on the identity information of the target object and the geographic area passed by within the set time period. The identity information of the target object includes image information of the target object. Those skilled in the art know that it is prior art to acquire corresponding surveillance video based on the identity information of the target object and the geographic area that has passed within a set period of time. The set time period may be determined based on actual conditions, and may be, for example, 7 days or 14 days, etc.

Further, the corresponding image feature vector P may also be acquired based on the image information of the target object. High-dimensional feature vectors can be extracted from image information through an image feature extractor, and corresponding image feature vectors are obtained. Those skilled in the art know that extracting feature vectors of an image, forming a set of image feature vectors, may be a prior art, for example, extracting by means of a deep neural network, etc.

S110, decoding the first monitoring video to obtain n1 frame images; identifying the human body information in any frame image i in the n1 frame images to obtain an identification information table C of the image i _i ，C _i The j-th row in (b) includes identification information (BID) of the j-th human body identified in the image i _ij ，P _ij ，G _ij ，t _i ，CID _i )，BID _ij ID, P of the j-th body identified in image i _ij For the image feature vector of the j-th body identified in image i, G _ij Is the position, t, of the j-th body identified in the image i _i CID as the photographing time of image i _i ID of the camera for image i; j takes values from 1 to m (i), and m (i) is the number of human bodies identified in the image i; i takes on values from 1 to n1.

In an embodiment of the present invention, the ID of each human body may be customized or generated in a time sequence appearing in the video data, for example, assigned in sequence in a natural number sequence, and the like.

In the embodiment of the invention, each human body in each frame of image is automatically detected by the neural network detector, and the detection frame of the human body is identified. Wherein G is _ij The midpoint position of the bottom side of the detection frame for the jth pedestrian identified in the image i. G _ij ＝(x _ij ，y _ij )，x _ij For the abscissa, y, of the jth individual in the image i in the image coordinate system _ij Is the ordinate of the jth individual in image i in the image coordinate system, in pixels.

S120, based on any identification information table C _i Obtaining a similarity set S _i ＝(S _i1 ，S _i2 ，…，S _ij ，…，S _im(i) ) Wherein S is _ij Is C _i P in (1) _ij And the similarity with the image feature vector P in the image information of the target object.

In the embodiment of the present invention, the similarity between two image feature vectors, for example, a cosine distance, a euclidean distance, a hamming distance, and the like, may be calculated by a set similarity calculation method.

In the embodiment of the invention, the camera device can be a camera.

S130, if max (S) _i ) If the number of the images is more than or equal to K, storing the image i into the first target image set; and K is a set similarity threshold.

If max (S) _i ) And K or more, the human body corresponding to the target object is found in the image i. In the embodiment of the invention, the first target image set is stored as ordered data, and is initialized to be a list containing a head element, and the length of the list can be expanded.

In the embodiment of the present invention, K may be determined based on an existing manner, for example, K may be 0.85 to 0.95.

S140, acquiring a motion track sequence of the target object based on the first target image set.

In the embodiment of the present invention, image slices of a target object with temporal and spatial continuity can be integrated into a continuous track sequence based on a target tracking algorithm. It is known to those skilled in the art that, based on a target tracking algorithm, the integration of image slices of a target object with temporal and spatial continuity into one continuous trajectory sequence may be an existing method. Each action track sequence comprises the shooting time of i in any image in the first target image set and max (S) in any image i _i ) The corresponding ID, image feature vector and position of the human body.

Through S140, the data of the target object in the continuous frames can be deduplicated, and the subsequent processing efficiency is improved.

S150, for any moving railAny frame image r in t (p) frame images corresponding to the trace sequence p, and identification information table C based on the image r _r And acquiring the distance between any human body except the target object in the image r and the target object in the world coordinate system.

Further, S150 may specifically include:

s151, for any frame image r in t (p) frame images corresponding to any action track sequence p, identifying information table C based on image r _r Obtaining a set of position information RG ^p _r ＝(RG ^p _r1 ，RG ^p _r2 ，…，RG ^p _s ，…，RG ^p _rm(r) )，RG ^p _rs For the position G of the s-th body in the image r _rs Mapping to a location in a world coordinate system; the value of r is 1 to t (p), the value of s is 1 to m (r), the value of p is 1 to H, and H is the number of the action track sequences.

The person skilled in the art knows that deriving the position in the world coordinate system based on the position in the image coordinate system may be prior art, e.g. by means of camera calibration. Camera calibration may be prior art and may for example comprise the steps of: (1) making a calibration chessboard; (2) Shooting an image of the calibration chessboard by a camera to serve as a first test image; (3) calculating camera internal parameters based on the first test image; (4) Arranging a plurality of coordinate marks on the ground of an area actually shot by a camera; (5) Shooting an image of the ground of the actually shot area as a second test image; (6) Acquiring camera external parameters according to the camera internal parameters, the positions of the coordinate marks in the world coordinate system and the positions in the second test image; (7) And calculating a coordinate transformation formula based on the camera internal parameters and the camera external parameters.

S152, based on any position information set RG ^p _r Obtaining a distance set D ^p _r ＝(D ^p _r1 ，D ^p _r2 ，…，D ^p _rt ，…，D ^p _r(m(r)-1) ) Wherein D is ^p _rt For the distance between the position of the tth individual in the image r, except the target object, in the world coordinate system and the position of the target object in the world coordinate systemSeparating; t is 1 to m (r) -1.

In an embodiment of the present invention, the distance between the two positions may be a euclidean distance.

S160, acquiring the identification information corresponding to the distance within the set distance threshold, and generating a first set.

Further, S160 may specifically include:

traverse D ^p _r If D is ^p _rt If D is less than or equal to D, then D is added ^p _rt Storing the corresponding identification information of the human body into a first set; generating a first set; d is a set distance threshold.

In the embodiment of the invention, the first set stores ordered data, the ordered data is initialized to a set containing a header element, and the length of combination can be expanded. The set distance threshold may be determined based on actual conditions, for example, may be 5 meters.

The technical effect of S100 to S170 is that, based on the image of the target object, a person who is spatially intersected with the target object in the presence of the target object is obtained, and a person who is temporally and spatially associated with the target object (hereinafter referred to as a first associated person for convenience of description) can be quickly and accurately found.

Further, the method provided by the embodiment of the invention further comprises the following steps:

s200, aiming at any BID in the first target set _q The following operations are performed:

s210, acquiring BID _q And a second monitoring video of the geographic area where the corresponding human body passes in a second specified time period.

The second specified time period may be determined based on actual circumstances, and may be, for example, 7 days or 14 days, etc.

S220, decoding the second monitoring video to obtain an n2 frame image; identifying the human body information in any frame image a in the n2 frame images to obtain an identification information table C of the image a _a ，C _a The b-th line in (b) includes identification information (BID) of the b-th human body identified in the image a _ab ，P _ab ，G _ab ，t _a ，CID _a )，BID _ab ID, P, of the b-th body identified in image a _ab For the image feature vector of the b-th body identified in image a, G _ab Is the position of the b-th body identified in image a in image i, t _a CID as the photographing time of image a _a ID of the camera for image a; the value of b is 1 to m (a), and m (a) is the number of human bodies identified in the image a; and a takes a value from 1 to n2.

S230, based on any identification information table C _a Obtaining a similarity set S _a ＝(S _a1 ，S _a2 ，…，S _ab ，…，S _a(m(a)-1) ) Wherein S is _ab Is C _a P in (1) _ab And BID _q Corresponding image feature vector P _q The similarity between them.

S240, if max (S) _a ) If the number of the images a is more than or equal to K, storing the images a into a second target image set; k is a set similarity threshold, and may be, for example, 0.85 to 0.95.

S250, acquiring BID based on the second target image set _q And (4) corresponding action track sequence of the human body.

The specific implementation of S220 to S250 can refer to the foregoing S110 to S140, and detailed descriptions thereof are omitted for avoiding redundancy.

S260, identifying information table C of any frame image e in t (p 1) frame images corresponding to any action track sequence p1 acquired in S250 _e Obtaining the target object and BID in the image e _q Any human body other than the corresponding human body and BID _q And the corresponding human body is in distance in the world coordinate system, and identification information corresponding to the distance within the set distance threshold is acquired to generate a second set.

Further, S260 may specifically include:

s261, if C _e Including the ID of the target object, a first information table C is obtained ¹ _a Executing S262; otherwise, executing S263; c ¹ _a Is C _a And an information table obtained after the identification information of the target object is deleted.

S263, obtaining a first position information set RG1 ^p1 _e ＝(RG1 ^p1 _e ，RG1 ^p1 _e ，…，RG1 ^p1 _s1 ，…，RG1 ^p1 _e(m(e)-1) )，RG1 ^p1 _es1 Is C ¹ _a S1 th human body position G in (1) _es1 Mapping to a location in a world coordinate system; the value of e is 1 to t (p 1), the value of S1 is 1 to m (e) -1, the value of p1 is 1 to H1, and H1 is the number of the action track sequences obtained in S250; s264 is executed.

S263, obtain the second position information set RG2 ^p1 _e ＝(RG2 ^p1 _e ，RG2 ^p1 _e ，…，RG2 ^p1 _s2 ，…，RG2 ^p1 _em(e) )，RG1 ^p1 _es2 Is C _a S2 th human body position G _es2 Mapping to a location in a world coordinate system; s2 takes on a value of 1 to m (e); s265 is executed.

S264, based on any first position information set RG1 ^p1 _e Obtaining a first distance set D1 ^p1 _e ＝(D1 ^p1 _e1 ，D1 ^p1 _e2 ，…，D1 ^p1 _et1 ，…，D1 ^p1 _e(m(e)-2) ) Wherein D1 ^p1 _et1 For dividing BID in image e _q Corresponding position and BID of t1 th person except human body in world coordinate system _q The distance between the positions of the corresponding human body in the world coordinate system; t1 takes the value from 1 to m (r) -2; s266 is executed.

S265, based on any second position information set RG2 ^p1 _e Obtaining a second distance set D2 ^p1 _e ＝(D2 ^p1 _e1 ，D2 ^p1 _e2 ，…，D2 ^p1 _et2 ，…，D2 ^p1 _e(m(e)-1) ) Wherein, D2 ^p1 _et2 For dividing BID in image e _q Corresponding position and BID of t2 th person except human body in world coordinate system _q The distance between the positions of the corresponding human body in the world coordinate system; t2 takes the value from 1 to m (r) -1; s267 is performed.

S266, traverse D1 ^p1 _e If D1 is ^p1 _et1 If D is less than or equal to D, then D1 is added ^p1 _et1 The corresponding human body identification information is stored into the second set.

S267, go through D2 ^p1 _e If D2 is ^p1 _et2 D is less than or equal to D, then D2 is added ^p1 _et2 The corresponding human body identification information is stored into the second set.

In the embodiment of the invention, the second set stores ordered data, the ordered data is initialized to a set containing a header element, and the length of combination can be expanded.

The technical effect of S210 to S260 is that, compared to the foregoing embodiment, the person associated with each first associated person, i.e., the second associated person associated with the target object, can be further confirmed, so that all persons associated with the target object can be found as completely as possible to provide accurate information for the relevant department.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, which may be configured in an electronic device to store at least one instruction or at least one program for implementing a method of the method embodiments, where the at least one instruction or the at least one program is loaded into and executed by a processor to implement the method provided by the above embodiments.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code means for causing an electronic device to carry out the steps of the method according to various exemplary embodiments of the invention described above in the present description, when said program product is run on the electronic device.

Although some specific embodiments of the present invention have been described in detail by way of illustration, it should be understood by those skilled in the art that the above illustration is only for the purpose of illustration and is not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A video-based automatic judgment method for a joint sealing person is characterized by comprising the following steps:

s100, acquiring a first monitoring video about a target object;

s110, decoding the first monitoring video to obtain n1 frame images; identifying the human body information in any frame image i in the n1 frame images to obtain an identification information table C of the image i _i ，C _i The jth line in (b) includes identification information (BID) of the jth human body identified in the image i _ij ，P _ij ，G _ij ，t _i ，CID _i )，BID _ij For the ID, P of the j-th body identified in image i _ij For the image feature vector of the j-th body identified in image i, G _ij Is the position, t, of the j-th body identified in the image i _i CID as the photographing time of image i _i ID of the camera for image i; j takes values from 1 to m (i), and m (i) is the number of human bodies identified in the image i; the value of i is 1 to n1;

s130, if max (S) _i ) If the number of the images is more than or equal to K, storing the image i into a first target image set; k is a set similarity threshold;

s150, for any frame image r in t (p) frame images corresponding to any action track sequence p, identifying information table C based on image r _r Acquiring any human body except the target object in the image r and the target object in the worldDistance in a coordinate system;

2. The method according to claim 1, wherein in S100, the first surveillance video is obtained based on the identity information of the target object and a geographical area that has passed within a set period of time.

3. The method according to claim 1, wherein S150 specifically comprises:

s151, for any frame image r in t (p) frame images corresponding to any action track sequence p, identifying information table C based on image r _r Obtaining a set of position information RG ^p _r ＝(RG ^p _r1 ，RG ^p _r2 ，…，RG ^p _s ，…，RG ^p _rm(r) )，RG ^p _rs As the position G of the s-th body in the image r _rs Mapping to a location in a world coordinate system; the value of r is 1 to t (p), the value of s is 1 to m (r), the value of p is 1 to H, and H is the number of the action track sequences;

s152, based on any position information set RG ^p _r Obtaining a distance set D ^p _r ＝(D ^p _r1 ，D ^p _r2 ，…，D ^p _rt ，…，D ^p _r(m(r)-1) ) Wherein D is ^p _rt The distance between the position of the tth individual in the image r except the target object in the world coordinate system and the position of the target object in the world coordinate system; t is 1 to m (r) -1.

4. The method according to claim 1, wherein S160 specifically comprises:

5. The method of claim 1, further comprising the steps of:

s210, acquiring BID _q A second monitoring video of a geographic area where the corresponding human body passes in a second designated time period;

s220, decoding the second monitoring video to obtain n2 frame images; identifying the human body information in any frame image a in the n2 frame images to obtain an identification information table C of the image a _a ，C _a The b-th line in (b) includes identification information (BID) of the b-th human body identified in the image a _ab ，P _ab ，G _ab ，t _a ，CID _a )，BID _ab For the b-th body ID, P identified in image a _ab For the image feature vector of the b-th body identified in image a, G _ab Is the position of the b-th body identified in image a in image i, t _a CID as the photographing time of image a _a ID of the camera for image a; the value of b is 1 to m (a), and m (a) is the number of human bodies identified in the image a; the value of a is 1 to n2;

s230, based on any identification information table C _a Obtaining a similarity set S _a ＝(S _a1 ，S _a2 ，…，S _ab ，…，S _a(m(a)-1) ) Wherein S is _ab Is C _a P in (1) _ab And BID _q Corresponding image feature vector P _q The similarity between them;

s240, go to if max (S) _a ) If the number of the images a is more than or equal to K, storing the images a into a second target image set;

s250, acquiring BID based on the second target image set _q Corresponding action track sequence of human body;

s260, identifying information table of any frame image e in t (p 1) frame images corresponding to any action track sequence p1 acquired in S250C _e Obtaining the target object and BID in the image e _q Any human body other than the corresponding human body and BID _q The distance of the corresponding human body in the world coordinate system is obtained, identification information corresponding to the distance within a set distance threshold is obtained, and a second set is generated;

and S270, carrying out duplicate removal processing on the information in the second set to obtain a second target set.

6. The method according to claim 5, wherein S260 specifically comprises:

s261, if C _e Including the ID of the target object, a first information table C is obtained ¹ _a Executing S262; otherwise, executing S263; c ¹ _a Is C _a An information table obtained after the identification information of the target object is deleted;

s262, obtaining a first position information set RG1 ^p1 _e ＝(RG1 ^p1 _e ，RG1 ^p1 _e ，…，RG1 ^p1 _s1 ，…，RG1 ^p1 _e(m(e)-1) )，RG1 ^p1 _es1 Is C ¹ _a S1 th human body position G _es1 Mapping to a location in a world coordinate system; the value of e is 1 to t (p 1), the value of S1 is 1 to m (e) -1, the value of p1 is 1 to H1, and H1 is the number of the action track sequences obtained in S250; executing S264;

s263, obtain the second position information set RG2 ^p1 _e ＝(RG2 ^p1 _e ，RG2 ^p1 _e ，…，RG2 ^p1 _s2 ，…，RG2 ^p1 _em(e) )，RG1 ^p1 _es2 Is C _a S2 th human body position G in (1) _es2 Mapping to a location in a world coordinate system; s2 takes on a value of 1 to m (e); executing S265;

s264, based on any first position information set RG1 ^p1 _e Obtaining a first distance set D1 ^p1 _e ＝(D1 ^p1 _e1 ，D1 ^p1 _e2 ，…，D1 ^p1 _et1 ，…，D1 ^p1 _e(m(e)-2) ) Which isIn (D1) ^p1 _et1 For dividing BID in image e _q Corresponding position and BID of t1 th human body except the human body in the world coordinate system _q The distance between the positions of the corresponding human body in the world coordinate system; t1 is 1 to m (r) -2; executing S266;

s265, based on any second position information set RG2 ^p1 _e Obtaining a second distance set D2 ^p1 _e ＝(D2 ^p1 _e1 ，D2 ^p1 _e2 ，…，D2 ^p1 _et2 ，…，D2 ^p1 _e(m(e)-1) ) Wherein, D2 ^p1 _et2 Removing BID from image e _q Corresponding position and BID of t2 th person except human body in world coordinate system _q The distance between the positions of the corresponding human body in the world coordinate system; t2 takes the value from 1 to m (r) -1; executing S267;

s266, go through D1 ^p1 _e If D1 is ^p1 _et1 If D is less than or equal to D, then D1 is added ^p1 _et1 Storing the corresponding identification information of the human body into a second set;

s267, traverse D2 ^p1 _e If D2 is ^p1 _et2 If D is less than or equal to D, adding D2 ^p1 _et2 The corresponding identification information of the human body is stored into the second set.

7. The method according to claim 1, characterized in that the image of each human body recognized in image i is identified by a detection box; wherein, G _ij The midpoint position of the bottom side of the detection frame for the jth pedestrian identified in the image i.

8. A non-transitory computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method of any one of claims 1-7.

9. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 8.