CN114267018A - Cross-shot target object extraction method and device, electronic equipment and storage medium - Google Patents

Cross-shot target object extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114267018A
CN114267018A CN202111633193.0A CN202111633193A CN114267018A CN 114267018 A CN114267018 A CN 114267018A CN 202111633193 A CN202111633193 A CN 202111633193A CN 114267018 A CN114267018 A CN 114267018A
Authority
CN
China
Prior art keywords
sample
target object
position information
monitoring node
mapping model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111633193.0A
Other languages
Chinese (zh)
Inventor
夏凤君
郑新想
汪昊
周斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202111633193.0A priority Critical patent/CN114267018A/en
Publication of CN114267018A publication Critical patent/CN114267018A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a cross-shot target object extraction method and device, electronic equipment and a storage medium, which are used for improving the efficiency of cross-shot target object image extraction. In the embodiment of the application, first position information of a target object in a first image is obtained; secondly, inputting the first position information into the position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by each second monitoring node; and finally, for any second monitoring node, extracting the image of the target object based on second position information of the target object in a second image acquired by the second monitoring node. According to the position information and the relation mapping model in the first monitoring node, the position of the target object in each second monitoring node is obtained, calculation, extraction and comparison of human body or face features are not needed, calculation power is greatly reduced, and efficiency is improved.

Description

Cross-shot target object extraction method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for cross-shot target object extraction, an electronic device, and a storage medium.
Background
Entrances such as subways, airports and large-scale exhibitions often adopt a plurality of monitoring nodes to monitor the same scene. Once there is a strange situation, the monitoring personnel need to collect and arrange the information of the target object by browsing a plurality of monitoring cameras, and the manual arrangement is obviously low in processing efficiency.
The related art also provides a method for automatically searching the same target object in different monitoring nodes in the same scene. The method is mainly realized by a human body characteristic comparison mode. Namely: all pedestrian targets appearing in a certain period of time are detected through a characteristic detection comparison algorithm, then the human body characteristics of each pedestrian are extracted through a human body characteristic comparison algorithm, the same target object is found through the similarity of the human body characteristics, and then the same target object in different monitoring nodes is associated. However, the method needs to analyze the multi-channel video, always calculates, extracts and compares the characteristics of the pedestrians, and has high calculation power.
Disclosure of Invention
The application aims to provide a cross-shot target object extraction method, a cross-shot target object extraction device, an electronic device and a storage medium, which are used for reducing the computational power of cross-shot target object image extraction.
In a first aspect, an embodiment of the present application provides a cross-shot target object extraction method, where a first monitoring node and at least one second monitoring node have a common shooting area, and the method includes:
acquiring first position information of a target object in a first image, wherein the first image is an image acquired by the first monitoring node;
inputting the first position information into a position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by the second monitoring nodes respectively;
and for any second monitoring node, extracting the image of the target object based on second position information of the target object in the second image acquired by the second monitoring node.
According to the position information and the relation mapping model in the first monitoring node, the position of the target object in the second monitoring node is obtained, calculation, extraction and comparison of human body or face features are not needed, and calculation power is greatly reduced.
In some possible embodiments, the following is performed separately for any of the at least one second monitoring node:
constructing a first training sample, wherein the first training sample comprises a first sample position of a sample object in a first sample frame acquired by the first monitoring node and a second sample position of the sample object in a second sample frame acquired by each second monitoring node, and the acquisition time of the first sample frame is the same as that of each second sample frame;
inputting the first sample position to the relational mapping model, and training the relational mapping model with the second sample position of each of the second sample frames as a desired output.
In the method and the device, the position information of the same target object in the second monitoring node is obtained according to the position information in the first monitoring node by training the relational mapping model, and the calculation cost is greatly saved.
In some possible embodiments, the inputting the first sample position to the relational mapping model, and training the relational mapping model with the second sample position of each of the second sample frames as a desired output, includes:
inputting the first sample position into the relational mapping model, and taking the second sample position of each second sample frame as expected output to obtain generated position information output by the relational mapping model and aiming at each second sample position;
determining a loss value according to the generation position information and the second sample position;
and carrying out parameter adjustment on the relation mapping model based on the loss value.
In some possible embodiments, the method further comprises:
acquiring generation position information, which is output by the relational mapping model based on the first sample position and aims at each second sample position, and constructing a second training sample, wherein the second training sample comprises the first sample position and each generation position information;
determining an intersection ratio of each of the generated position information and the corresponding second sample position;
if the intersection ratio of each generated position information and the corresponding second sample position is greater than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position;
and training the relational mapping model by adopting the third training sample.
In the application, the relational mapping model is trained by constructing the third training sample set, so that the relational mapping model is trained more comprehensively, and the accuracy of the relational mapping model is improved.
In some possible embodiments, after determining the intersection ratio of each of the generated position information and the corresponding second sample position, the method further includes:
if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;
combining the third training sample and the fourth training sample into a fifth training sample;
and training the relational mapping model by adopting a fifth training sample.
In the application, the relational mapping model is trained by constructing the fifth training sample set, so that the accuracy of the relational mapping model is further improved.
In some possible embodiments, obtaining first position information of the target object in the first image includes:
and carrying out target detection on a target object in the first image to obtain position information of the target object in the first image, wherein the position information comprises vertex coordinates of a rectangular frame and length information and width information of the rectangular frame.
In the application, the target object can be accurately locked by acquiring the rectangular frame of the target object.
In some possible embodiments, the target detection of the target object in the first image includes:
and carrying out target detection on the target object in the first image by adopting a human face detection method or a human body detection method.
The present application also provides a cross-shot target object extraction apparatus, where a first monitoring node and at least one second monitoring node have a common shooting area, the apparatus comprising:
the first position information acquisition module is used for acquiring first position information of a target object in a first image, and the first image is an image acquired by the first monitoring node;
the second position information acquisition module is used for inputting the first position information into a position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by the second monitoring nodes respectively;
and the extraction module is used for extracting the image of the target object based on second position information of the target object in the second image acquired by the second monitoring node aiming at any second monitoring node.
In some possible embodiments, the second location information obtaining module performs, for any of the at least one second monitoring node:
constructing a first training sample, wherein the first training sample comprises a first sample position of a sample object in a first sample frame acquired by the first monitoring node and a second sample position of the sample object in a second sample frame acquired by each second monitoring node, and the acquisition time of the first sample frame is the same as that of each second sample frame;
inputting the first sample position to the relational mapping model, and training the relational mapping model with the second sample position of each of the second sample frames as a desired output.
In some possible embodiments, the second position information obtaining module, when executing the input of the first sample position to the relational mapping model and training the relational mapping model with the second sample position of each of the second sample frames as the expected output, is configured to:
inputting the first sample position into the relational mapping model, and taking the second sample position of each second sample frame as expected output to obtain generated position information output by the relational mapping model and aiming at each second sample position;
determining a loss value according to the generation position information and the second sample position;
and carrying out parameter adjustment on the relation mapping model based on the loss value.
In some possible embodiments, the apparatus further comprises:
a second training sample construction module, configured to obtain generation position information, which is output by the relational mapping model based on the first sample position and is respectively for each second sample position, and construct a second training sample, where the second training sample includes the first sample position and each generation position information;
the intersection ratio determining module is used for determining the intersection ratio of each piece of generated position information and the corresponding second sample position;
a third training sample construction module, configured to construct a third training sample by using the generated position information and the first sample position if an intersection ratio of each generated position information to the corresponding second sample position is greater than or equal to a first preset value;
and the training module is used for training the relational mapping model by adopting the third training sample.
In some possible embodiments, after determining the intersection ratio of each of the generated position information and the corresponding second sample position, the intersection ratio determining module is further configured to:
if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;
combining the third training sample and the fourth training sample into a fifth training sample;
and training the relational mapping model by adopting a fifth training sample.
In some possible embodiments, the first position information acquiring module, when executing acquiring the first position information of the target object in the first image, is configured to:
and carrying out target detection on a target object in the first image to obtain position information of the target object in the first image, wherein the position information comprises vertex coordinates of a rectangular frame and length information and width information of the rectangular frame.
In some possible embodiments, the first position information obtaining module, when performing target detection on a target object in the first image, is configured to: and carrying out target detection on the target object in the first image by adopting a human face detection method or a human body detection method.
In a third aspect, another embodiment of the present application further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform any one of the methods provided by the embodiments of the first aspect of the present application.
In a fourth aspect, another embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is configured to cause a computer to execute any one of the methods provided in the first aspect of the present application.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1A is an application scene diagram of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 1B is a schematic diagram of a common area of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 2A is an overall flowchart of a cross-shot target object extraction method according to an embodiment of the present disclosure;
fig. 2B is a schematic diagram of acquiring a first image of a target object according to the method for extracting a target object across lenses provided in the embodiment of the present application;
fig. 3 is a flowchart of a training relationship mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 4 is a flowchart of a method for cross-shot target object extraction according to an embodiment of the present disclosure for constructing a first training sample set;
fig. 5A is a schematic diagram of a common area of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 5B is a schematic diagram of determining a first sample frame position of a sample object at a monitoring node according to the cross-shot target object extraction method provided in the embodiment of the present application;
fig. 5C is a schematic diagram of a relational mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 6A is an input/output schematic diagram of a relational mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 6B is an internal schematic view of a relational mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 7 is a schematic diagram of a training relationship mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 8 is a schematic diagram of a training relationship mapping model of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 9 is a schematic device diagram of a cross-shot target object extraction method according to an embodiment of the present application;
fig. 10 is a schematic view of an electronic device of a cross-shot target object extraction method according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
It is noted that the terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Entrances such as subways, airports and large-scale exhibitions often adopt a plurality of monitoring nodes to monitor the same scene. Once there is a strange situation, the monitoring personnel need to collect and arrange the information of the target object by browsing a plurality of monitoring cameras, and the manual arrangement is obviously low in processing efficiency. The related art also provides a method for automatically searching the same target object in different monitoring nodes in the same scene. The method is mainly characterized in that human body features are compared and associated, namely: all pedestrian targets appearing in a certain period of time are detected through a characteristic detection comparison algorithm, then the human body characteristics of each pedestrian are extracted through a human body characteristic comparison algorithm, the same target object is found through the similarity of the human body characteristics, and then the same target object in different monitoring nodes is associated. However, the method needs to analyze multiple paths of videos, always calculates, extracts and compares the human body or face features of pedestrians, and has high calculation power. In addition, in some scenes, a plurality of cameras can monitor in a cooperative manner, so that how to associate the same target in the air at the same time is important, the problems of efficiently and quickly positioning and collecting cross-lens suspicious target data are solved, and the problem of searching the original video is greatly shortened.
In view of the above, the present application provides a cross-shot target object extraction method, apparatus, electronic device and storage medium to solve the above problem. The inventive concept of the present application can be summarized as follows: aiming at a public shooting area with a plurality of monitoring nodes, a position relation mapping model can be established, and after a target object is acquired by any monitoring node, the position information of the target object in other monitoring node images of the public shooting area can be obtained according to the position information of the target object in the monitoring node image.
Therefore, the positions of the same target object in different monitoring node images in the public shooting area can be clarified, the same target object does not need to be searched and compared in the whole area of each monitoring node image, and the target object can be extracted according to the position information. Therefore, compared with the mode that target detection and human body feature comparison are carried out from all paths of monitoring node overall images in the related technology, the complexity is low, and the scheme provided by the embodiment of the application can reduce the requirement on equipment calculation capacity.
For convenience of understanding, a cross-shot target object extraction method provided by the embodiments of the present application is described in detail below with reference to the accompanying drawings:
taking an example that four monitoring nodes have a common shooting area, as shown in fig. 1A, the method is an application scene diagram of a cross-shot target object extraction method in the embodiment of the present application. The figure includes: the monitoring system comprises a network 10, a server 20, a storage 30, a first monitoring node 40 and a second monitoring node set 50, wherein the second monitoring node set 50 comprises a second monitoring node 501, a second monitoring node 502 and a second monitoring node 503; the first monitoring node 40 and the second monitoring node 501, the second monitoring node 502, and the second monitoring node 503 have a common shooting area, wherein the common shooting area is as shown in fig. 1B, wherein:
the server 20 first obtains first position information of the target object in a first image, where the first image is an image collected by the first monitoring node 40; then, inputting the first position information into the position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by a second monitoring node 501, a second monitoring node 502 and a second monitoring node 503 respectively; wherein the relational mapping model is installed in the memory 30; and aiming at any second monitoring node, extracting the image of the target object based on second position information of the target object in a second image acquired by the second monitoring node.
The description in this application is detailed in terms of only a single server or monitoring node, but it will be understood by those skilled in the art that the first monitoring node 40, the second monitoring node 501, the second monitoring node 502, the second monitoring node 503, the server 20 and the memory 30 shown are intended to represent the operations of the terminal device, the server and the memory involved in the technical solution of this application. The individual servers and memories are described in detail for purposes of illustration only and not to imply a limitation on the number, type, or location of monitoring nodes and servers. It should be noted that the underlying concepts of the example embodiments of the present application may not be altered if additional modules are added or removed from the illustrated environments. In addition, although fig. 1A shows a bidirectional arrow from the memory 30 to the server 20 for convenience of explanation, it will be understood by those skilled in the art that the above-described data transmission and reception also need to be implemented through the network 10.
It should be noted that the storage in the embodiment of the present application may be, for example, a cache system, or a hard disk storage, a memory storage, and the like. In addition, the method for extracting the cross-shot target object provided by the application is not only suitable for the application scene shown in fig. 1A, but also suitable for any device with a cross-shot target object extraction requirement.
As shown in fig. 2A, a schematic flowchart of a cross-shot target object extraction method provided in an embodiment of the present application is shown, where:
in step 201: acquiring first position information of a target object in a first image, wherein the first image is an image acquired by a first monitoring node;
in some embodiments, when the first position information of the target object in the first image is obtained, the following steps may be specifically implemented: and carrying out target detection on the target object in the first image to obtain the position information of the target object in the first image, wherein the position information comprises the vertex coordinates of the rectangular frame and the length information and the width information of the rectangular frame. For example: as shown in fig. 2B, since the target detection is performed on the target object in the first image, the length information, the width information, and the vertex coordinates of the rectangular frame of the target object are obtained as h, w, and (x, y), the position information of the target object in the first image is obtained as (x, y, w, h).
When the target detection mode is implemented, the first position information of the target object in the first monitoring node is detected by comparing modes such as face detection, human body detection and the like with known characteristics of the target object.
In step 202: inputting the first position information into the position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by each second monitoring node;
in step 203: and aiming at any second monitoring node, extracting the image of the target object based on second position information of the target object in a second image acquired by the second monitoring node.
For ease of understanding, the steps in fig. 2A are described in detail below:
in order to realize cross-shot extraction of image information of a target object, a relational mapping model is trained in the application, wherein the training steps of the relational mapping model are shown in fig. 3:
in step 301: constructing a first training sample set;
in some embodiments, the construction of the first training sample set may be embodied as the steps shown in fig. 4:
in step 401: acquiring a first image which is acquired by a first monitoring node and contains a sample object to form a first sample frame; acquiring a first sample position of the sample object in the first sample frame;
in the embodiments of the present application, for example: as shown in fig. 5A, the monitoring nodes A, B, C, D cover the same public area, the monitoring node a is used as a first monitoring node, the monitoring node B, C, D is used as a second monitoring node, images of the monitoring node A, B, C, D collected at the same time in the course of the sample object moving in the public area are collected, and assuming that the monitoring nodes A, B, C, D collect the images of the sample object at the time 1, 2, 3, 4, and 5, the images collected by the monitoring node a at the time 1, 2, 3, 4, and 5 respectively constitute a first sample frame. As shown in fig. 5B, the position of the sample object in each first sample frame of the monitoring node a is then determined, and the position of the sample object in each sample frame is the corresponding first sample position of the sample frame.
In step 402: for any second monitoring node, acquiring a second image which is acquired by the second monitoring node and contains the sample object and corresponds to the first image to form a second sample frame; acquiring a second sample position of the sample object in a second sample frame;
continuing with the example of fig. 5A, the images acquired by the monitoring node B, C, D respectively form respective second sample frames, and the monitoring node B is taken as an example to illustrate, where fig. 5B is an image of the sample object acquired by the monitoring node B at time 1, 2, 3, 4, and 5, the image corresponding to time 1 forms a second sample frame, the image corresponding to time 2 forms a second sample frame, the image corresponding to time 3 forms a second sample frame, the image corresponding to time 4 forms a second sample frame, and the image corresponding to time 5 forms a second sample frame; and then determining the position of the sample object in each second sample frame of the monitoring node B, wherein the position of the sample object in each second sample frame is the second sample position corresponding to the sample frame.
In step 403: a first set of training samples is constructed from the first sample positions and the second sample positions of each second sample frame.
In summary, the first training sample includes a first sample position of the sample object in the first sample frame collected by the first monitoring node, and a second sample position of the sample object in the second sample frame collected by each second monitoring node, and the collection time of the first sample frame is the same as that of each second sample frame. For example: taking fig. 5A and 5B as an example, the first training sample includes a position of the sample object in the image acquired by the monitoring node a at time 1, a position of the sample object in the image acquired by the monitoring node B at time 1, a position of the sample object in the image acquired by the monitoring node C at time 1, and a position of the sample object in the image acquired by the monitoring node D at time 1.
In step 302: and inputting the first sample position to the relational mapping model, and outputting the training relational mapping model by taking the second sample position of each second sample frame as the expected output.
For example, the four monitoring nodes have a common shooting area, and as shown in fig. 5C, the relational mapping model is a model that inputs three outputs, that is, the position of the target object in one monitoring node is input, and the positions of the target object in the other three monitoring nodes are output. In implementation, the relational mapping model may be a Deep Neural Networks (DNN) model.
As shown in FIG. 6A, the first sample position of the sample object collected by monitor node A and the second sample position of the sample object collected by monitor node B, C, D are used to train the relational mapping model, and the expected output is the second sample position of the sample object collected by monitor node B, C, D. Further, in order to enable the relational mapping model to map the position of the target object in any monitoring node according to the position of the target object in other monitoring nodes when in use, after the relational mapping model is trained by using the first sample position of the sample object collected by the monitoring node a and the second sample position of the sample object collected by the monitoring node B, C, D, the training of the relational mapping model can be continued by using the monitoring node B, C, D as the first monitoring node and the other monitoring nodes as the second monitoring nodes in turn. For example: firstly, taking a monitoring node A as a first monitoring node, taking the monitoring node B, C, D as a second monitoring node, and training a relational mapping model by adopting a first sample position of a sample object acquired by the monitoring node A and a second sample position of the sample object acquired by the monitoring node B, C, D; then, the monitoring node B is used as a first monitoring node, the monitoring node A, C, D is used as a second monitoring node, and the first sample position of the sample object collected by the monitoring node B and the second sample position of the sample object collected by the monitoring node A, C, D are used to train the relational mapping model.
Finally, an internal schematic diagram of the relationship mapping model after the convergence is trained is shown in fig. 6B, and the scene has 4 monitoring nodes, and each monitoring node can be used as a first monitoring node. The training convergence is, for example, that the number of iterative training times reaches a specified number, and further, for example, that the position precision output by the position relation mapping model reaches a specified precision.
In the embodiment of the present application, data generated by the relational mapping model inevitably has a slight error, and in order to further improve the accuracy of the relational mapping model, the further training of the relational mapping model in the present application may be specifically implemented as the steps shown in fig. 7:
in step 701: acquiring generation position information, which is output by the relational mapping model based on the first sample position and aims at each second sample position, and constructing a second training sample, wherein the second training sample comprises the first sample position and each generation position information;
in step 702: determining the intersection ratio of each generated position information and the corresponding second sample position;
in some embodiments, the cross-over ratio may be determined using equation 1:
Figure BDA0003441670170000131
wherein, Pb' to generate the location information, PbFor the second sample position, the IOU is the cross-over ratio.
In step 703: if the intersection ratio of each generated position information and the corresponding second sample position is greater than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position;
if the intersection ratio is greater than or equal to the first preset value, it is indicated that the generated position information generated this time is more accurate, and therefore the generated position information can be used as a high-quality sample, and therefore the generated position information is used as a third training sample in the application.
In step 704: and training the relational mapping model by using a third training sample.
For example: the monitor node A, B, C, D covers the same public area, and uses monitor node a as the first monitor node and monitor node B, C, D as the second monitor node, the first training sample includes: the position of the sample object at time 1, 2, 3, 4, 5 in the second sample frame of monitoring node B, C, D, the second sample position, and the position of the sample object at time 1, 2, 3, 4, 5 in the first sample frame of monitoring node a, i.e., the first sample position. If the intersection ratio between the second sample position at the second sample frame 2 time of the monitoring node B, C, D and the corresponding generated position information is greater than the first preset value, it is determined that the third training sample is: the position of the sample object at time 2 in the second sample frame of monitoring node B, C, D and the position of the sample object at time 2 in the first sample frame of monitoring node a.
In some embodiments, there may be differences in pedestrian heights in the monitoring scene as a result of collecting coordinate position data of sample objects at the same time in different monitoring nodes. Therefore, the sample data is often collected without covering the entire height of the sample object, which causes the collected sample to be not rich enough, and causes an error in the trained model. Therefore, in the present application, in order to further improve the accuracy of the relational mapping model, after determining the intersection ratio between each generated position information and the corresponding second sample position, the steps shown in fig. 8 may be implemented:
in step 801: if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;
in specific implementation, if the cross-over ratio is smaller than the second preset value, it is indicated that the mapping of this time is an error mapping, and therefore, it is necessary to collect the second sample position and the first sample position corresponding to the generated position information, and continue training the relational mapping model using the second sample position and the first sample position, but because there are fewer samples corresponding to the error mapping, it is necessary to continue performing 802 to expand the samples.
In step 802: combining the third training sample and the fourth training sample into a fifth training sample;
in step 803: and training the relational mapping model by adopting a fifth training sample.
For example: monitoring node A, B, C, D covers the same common area, with monitoring node a being the first monitoring node, using the monitor node B, C, D as a second monitor node, determining the third position information of the sample object at the time points 1, 2, 3, 4 and 5 in the second sample frame of the monitor node B, C, D by using a target detection algorithm, then, the generated position information of the sample objects in the second training sample set at the time points 1, 2, 3, 4 and 5 in the second sample frame of the monitoring node B, C, D is determined, the intersection ratio between the third position information at the time point 1 of the monitoring node B and the generated position information corresponding to the time point 1 of the monitoring node B is determined, and the intersection ratio is compared with the preset value, if the intersection ratio is smaller than the second preset value, it indicates that the generation position information generated by the relational mapping model is not accurate, and therefore, the second sample position and the first sample position corresponding to the generation position information are collected.
In the present application, since the process of training the relational mapping model using each sample is the same, for the convenience of understanding, the following description will be made in detail by taking, as an example, a process of inputting the first sample position to the relational mapping model and outputting the training relational mapping model with the second sample position of each second sample frame as the expected output:
firstly, inputting a first sample position into a relational mapping model, taking a second sample position of each second sample frame as expected output, and acquiring generated position information which is output by the relational mapping model and aims at each second sample position; then determining a loss value according to the generated position information and the position of the second sample; and finally, carrying out parameter adjustment on the relational mapping model based on the loss value.
In conclusion, according to the position information and the relation mapping model in the first monitoring node, the position of the target object in the second monitoring node is obtained, calculation, extraction and comparison of human body or human face features are not needed, calculation force is greatly reduced, and efficiency is improved.
As shown in fig. 9, based on the same inventive concept, a cross-shot target object extracting apparatus 900 is proposed, which includes:
a first position information obtaining module 9001, configured to obtain first position information of a target object in a first image, where the first image is an image collected by the first monitoring node;
a second location information acquiring module 9002, configured to input the first location information to a location relationship mapping model, so as to obtain second location information of the target object output by the location relationship mapping model in second images acquired by each second monitoring node, respectively;
an extracting module 9003, configured to, for any of the second monitoring nodes, extract an image of the target object based on second position information of the target object in the second image acquired by the second monitoring node.
In some possible embodiments, the following is performed separately for any of the at least one second monitoring node:
constructing a first training sample, wherein the first training sample comprises a first sample position of a sample object in a first sample frame acquired by the first monitoring node and a second sample position of the sample object in a second sample frame acquired by each second monitoring node, and the acquisition time of the first sample frame is the same as that of each second sample frame;
inputting the first sample position to the relational mapping model, and training the relational mapping model with the second sample position of each of the second sample frames as a desired output.
In some possible embodiments, the second position information obtaining module, when executing the input of the first sample position to the relational mapping model and training the relational mapping model with the second sample position of each of the second sample frames as the expected output, is configured to:
inputting the first sample position into the relational mapping model, and taking the second sample position of each second sample frame as expected output to obtain generated position information output by the relational mapping model and aiming at each second sample position;
determining a loss value according to the generation position information and the second sample position;
and carrying out parameter adjustment on the relation mapping model based on the loss value.
In some possible embodiments, the apparatus further comprises:
a second training sample construction module, configured to obtain generation position information, which is output by the relational mapping model based on the first sample position and is respectively for each second sample position, and construct a second training sample, where the second training sample includes the first sample position and each generation position information;
the intersection ratio determining module is used for determining the intersection ratio of each piece of generated position information and the corresponding second sample position;
a third training sample construction module, configured to construct a third training sample by using the generated position information and the first sample position if an intersection ratio of each generated position information to the corresponding second sample position is greater than or equal to a first preset value;
and the training module is used for training the relational mapping model by adopting the third training sample.
In some possible embodiments, after determining the intersection ratio of each of the generated position information and the corresponding second sample position, the intersection ratio determining module is further configured to:
if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;
combining the third training sample and the fourth training sample into a fifth training sample;
and training the relational mapping model by adopting a fifth training sample.
In some possible embodiments, the first position information acquiring module, when executing acquiring the first position information of the target object in the first image, is configured to:
and carrying out target detection on a target object in the first image to obtain position information of the target object in the first image, wherein the position information comprises vertex coordinates of a rectangular frame and length information and width information of the rectangular frame.
In some possible embodiments, the first position information obtaining module, when performing target detection on a target object in the first image, is configured to: and carrying out target detection on the target object in the first image by adopting a human face detection method or a human body detection method.
Having described the cross-shot target object extraction method and apparatus of the exemplary embodiments of the present application, an electronic device according to another exemplary embodiment of the present application is next described.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the cross-shot target object extraction method according to various exemplary embodiments of the present application described above in the present specification.
The electronic apparatus 130 according to this embodiment of the present application is described below with reference to fig. 10. The electronic device 130 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 10, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).
Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.
Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In some possible embodiments, the aspects of a cross-shot target object extraction method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of a cross-shot target object extraction method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for cross-shot target object extraction of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A cross-shot target object extraction method, wherein a first monitoring node and at least one second monitoring node have a common monitoring area, the method comprising:
acquiring first position information of a target object in a first image, wherein the first image is an image acquired by the first monitoring node;
inputting the first position information into a position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by the second monitoring nodes respectively;
and for any second monitoring node, extracting the image of the target object based on second position information of the target object in the second image acquired by the second monitoring node.
2. The method according to claim 1, characterized by performing, for any of the at least one second monitoring node, respectively:
constructing a first training sample, wherein the first training sample comprises a first sample position of a sample object in a first sample frame acquired by the first monitoring node and a second sample position of the sample object in a second sample frame acquired by each second monitoring node, and the acquisition time of the first sample frame is the same as that of each second sample frame;
inputting the first sample position to the relational mapping model, and training the relational mapping model with the second sample position of each of the second sample frames as a desired output.
3. The method of claim 2, wherein inputting the first sample position to the relational mapping model, training the relational mapping model with the second sample position of each of the second sample frames as a desired output, comprises:
inputting the first sample position into the relational mapping model, and taking the second sample position of each second sample frame as expected output to obtain generated position information output by the relational mapping model and aiming at each second sample position;
determining a loss value according to the generation position information and the second sample position;
and carrying out parameter adjustment on the relation mapping model based on the loss value.
4. The method of claim 2, further comprising:
acquiring generation position information, which is output by the relational mapping model based on the first sample position and aims at each second sample position, and constructing a second training sample, wherein the second training sample comprises the first sample position and each generation position information;
determining an intersection ratio of each of the generated position information and the corresponding second sample position;
if the intersection ratio of each generated position information and the corresponding second sample position is greater than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position;
and training the relational mapping model by adopting the third training sample.
5. The method of claim 4, wherein after determining the intersection ratio of each of the generated location information and the corresponding second sample location, the method further comprises:
if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting each second sample position and the first sample position to construct a fourth training sample,
combining the third training sample and the fourth training sample into a fifth training sample;
and training the relational mapping model by adopting a fifth training sample.
6. The method of any one of claims 1-5, wherein obtaining first position information of the target object in the first image comprises:
and carrying out target detection on a target object in the first image to obtain position information of the target object in the first image, wherein the position information comprises vertex coordinates of a rectangular frame and length information and width information of the rectangular frame.
7. The method of claim 6, wherein the target detecting a target object in the first image comprises:
and carrying out target detection on the target object in the first image by adopting a human face detection method or a human body detection method.
8. A cross-shot target object extraction apparatus, wherein a first monitoring node and at least one second monitoring node have a common monitoring area, the apparatus comprising:
the first position information acquisition module is used for acquiring first position information of a target object in a first image, and the first image is an image acquired by the first monitoring node;
the second position information acquisition module is used for inputting the first position information into a position relation mapping model to obtain second position information of the target object output by the position relation mapping model in second images acquired by the second monitoring nodes respectively;
and the extraction module is used for extracting the image of the target object based on second position information of the target object in the second image acquired by the second monitoring node aiming at any second monitoring node.
9. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method of any one of claims 1-7.
CN202111633193.0A 2021-12-29 2021-12-29 Cross-shot target object extraction method and device, electronic equipment and storage medium Pending CN114267018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111633193.0A CN114267018A (en) 2021-12-29 2021-12-29 Cross-shot target object extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111633193.0A CN114267018A (en) 2021-12-29 2021-12-29 Cross-shot target object extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114267018A true CN114267018A (en) 2022-04-01

Family

ID=80831257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111633193.0A Pending CN114267018A (en) 2021-12-29 2021-12-29 Cross-shot target object extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114267018A (en)

Similar Documents

Publication Publication Date Title
TWI795667B (en) A target tracking method, device, system, and computer accessible storage medium
US11379696B2 (en) Pedestrian re-identification method, computer device and readable medium
WO2022170742A1 (en) Target detection method and apparatus, electronic device and storage medium
EP3445044A1 (en) Video recording method, server, system, and storage medium
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
US11538286B2 (en) Method and apparatus for vehicle damage assessment, electronic device, and computer storage medium
CN112488073A (en) Target detection method, system, device and storage medium
CN107886048A (en) Method for tracking target and system, storage medium and electric terminal
CN116188821B (en) Copyright detection method, system, electronic device and storage medium
CN110826594A (en) Track clustering method, equipment and storage medium
CN102231820B (en) Monitoring image processing method, device and system
JPWO2014199505A1 (en) Video surveillance system, surveillance device
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
CN109902681B (en) User group relation determining method, device, equipment and storage medium
CN114898416A (en) Face recognition method and device, electronic equipment and readable storage medium
CN113989696A (en) Target tracking method and device, electronic equipment and storage medium
CN113239792A (en) Big data analysis processing system and method
KR101942646B1 (en) Feature point-based real-time camera pose estimation method and apparatus therefor
US20220172377A1 (en) Method, apparatus, computing device and computer-readable storage medium for correcting pedestrian trajectory
Gao et al. Multi-object tracking with Siamese-RPN and adaptive matching strategy
CN113010731B (en) Multimodal video retrieval system
CN112183431A (en) Real-time pedestrian number statistical method and device, camera and server
CN114267018A (en) Cross-shot target object extraction method and device, electronic equipment and storage medium
CN113869163B (en) Target tracking method and device, electronic equipment and storage medium
CN110849380A (en) Map alignment method and system based on collaborative VSLAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination