CN114283454A

CN114283454A - Training method of position relation mapping model and related device

Info

Publication number: CN114283454A
Application number: CN202111639017.8A
Authority: CN
Inventors: 夏凤君; 汪昊; 周斌
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-05

Abstract

The application discloses a training method and a related device of a position relation mapping model, which are used for providing dimensional information of a collected human body in a plurality of lenses. In the embodiment of the application, a position relation mapping model can be established for a public shooting area with a plurality of monitoring nodes, and after a target object is acquired by any monitoring node, the position information of the target object in other monitoring node images of the public shooting area can be obtained according to the position information of the target object in the monitoring node image. In the method, the position relation mapping model is trained by adopting the images collected by the monitoring nodes distributed in the same target scene, so that the trained position relation model can accurately obtain the position information of the target object in other monitoring nodes according to the position information of the target object in one monitoring node, and the method is simple to operate and easy to implement.

Description

Training method of position relation mapping model and related device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and a related apparatus for training a position relationship mapping model.

Background

The security industry is supported by technologies such as AI and big data, and the security industry is greatly developed in hundreds of industries, and the functions are more diversified, complicated and humanized. When abnormal conditions occur, the single-dimensional human body feature acquisition cannot meet the requirements of users.

In the related technology, a front-end camera adopts a plurality of independent cameras such as a face snapshot machine and a high-definition structured camera to acquire target features, and a back-end software platform system is used for carrying out human body feature association in a plurality of scenes. However, in the same time and space, the collected target feature angles and dimensions are generally single and limited, and synchronization among a plurality of face snapshot machines is not easy to realize.

Disclosure of Invention

The application aims to provide a training method and a related device of a position relation mapping model, which are used for providing and collecting dimension information of a human body in a plurality of shots.

In a first aspect, an embodiment of the present application provides a method for training a position relationship mapping model, where the position relationship mapping model is applied to a target scene, the target scene includes a plurality of monitoring nodes, and the monitoring nodes are synchronized in time, and the method includes:

controlling the plurality of monitoring nodes to synchronously acquire videos; wherein the sample object moves within a common monitoring range of the plurality of monitoring nodes;

screening an image sequence containing a sample object from a video acquired by each monitoring node, and extracting position information of the sample object in each frame of image of the image sequence;

determining position information of sample objects acquired by each monitoring node at the same time to construct a first training sample based on the time stamp of each frame of image; the first training sample comprises a first sample position of a sample object in an image acquired by a first monitoring node and a second sample position of the sample object in an image acquired by each second monitoring node, wherein the first monitoring node is any monitoring node in the plurality of monitoring nodes, and the second monitoring node is a monitoring node except the first monitoring node in the plurality of monitoring nodes;

and training the position relation mapping model by adopting the first training sample.

In the application, the position relation mapping model is trained by adopting images acquired by a plurality of monitoring nodes distributed in the same target scene, so that the trained position relation model can accurately obtain the position information of the target object in other monitoring nodes according to the position information of the target object in one monitoring node, and the operation is simple and easy to realize.

In some possible embodiments, the screening out, by the video acquired by the monitoring node, an image sequence including the sample object includes:

performing target detection on each frame of image in the video collected by the monitoring node aiming at the sample object, and screening out the image containing the sample object;

and sequencing the screened images according to the time stamps of the images to obtain the image sequence.

In the method, the method for screening the image containing the sample object from each frame of image is adopted, so that the efficiency is further improved, and the waste of resources caused by the subsequent steps of useless images is avoided.

In some possible embodiments, the method further comprises:

for any path of monitoring node, determining the ratio of a sampling range corresponding to the acquired position information to the monitoring range based on the position information of all sample objects included in the video acquired by the monitoring node;

and if the ratio is smaller than a preset ratio threshold value, continuously acquiring the video.

In the application, the comprehensiveness of the collected samples is determined by determining the proportion of the collected position information in the monitoring range, so that the trained position relationship mapping model is more accurate.

In some possible embodiments, the determining, based on the location information of all sample objects included in the video acquired by the monitoring node, a ratio of a sampling range corresponding to the acquired location information to the common monitoring range includes:

determining a convex hull containing the collected position information and a sampling position density in the convex hull;

and if the adopted position density is larger than a preset density threshold value, determining the ratio of the convex hull area to the common monitoring range.

In the application, the convex hull of the collected position information can be adopted to determine the proportion of the collected position information in the common monitoring range, and the accuracy rate of determining the comprehensiveness of the collected sample is further improved.

In some possible embodiments, the method further comprises:

acquiring generated position information, which is output by the position relation mapping model based on the position of the first sample and aims at each second sample position, and constructing a second training sample, wherein the second training sample comprises the position of the first sample and the generated position information;

determining Euclidean distances between each of the generated position information and the corresponding second sample position;

if the Euclidean distance between each piece of generated position information and the corresponding second sample position is larger than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position;

and training the position relation mapping model by adopting the third training sample.

In some possible embodiments, after determining the euclidean distance between each of the generated location information and the corresponding second sample location, the method further comprises:

if the Euclidean distance corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;

combining the third training sample and the fourth training sample into a fifth training sample;

and training the position relation mapping model by adopting a fifth training sample.

In some possible embodiments, the training the position relationship mapping model using a fifth training sample includes:

inputting the generated position information in the fifth training sample into the position relation mapping model, and taking the second sample position in the fifth training sample as expected output to obtain the position information output by the position relation mapping model corresponding to the generated position information;

determining a Euclidean distance between the output position information and the second sample position;

determining a loss value according to the Euclidean distance;

and carrying out parameter adjustment on the position relation mapping model based on the loss value.

In a second aspect, the present application further provides a distributed multi-lens camera, which is applied to a target scene, and includes: a plurality of control nodes, integrated module, host system, a plurality of control node time synchronization and have common control range, wherein:

the master control module is used for responding to an acquisition instruction and sending a clock synchronization signal to each monitoring node;

each monitoring node is used for responding to a clock synchronization signal sent by the main control module to synchronously acquire a video to obtain a serialized signal;

the integrated module is used for controlling the monitoring nodes to synchronously expose, deserializing the serialized signals sent by the monitoring nodes and sending the deserialized videos to the main control module;

the main control module is also used for screening out an image sequence containing a sample object from the video sent by the integration module, extracting the position information of the sample object in each frame of image of the image sequence, and determining the position information of the sample object collected by each monitoring node at the same time to construct a first training sample based on the timestamp of each frame of image; the first training sample comprises a first sample position of a sample object in an image acquired by a first monitoring node and a second sample position of the sample object in an image acquired by each second monitoring node, wherein the first monitoring node is any monitoring node in the plurality of monitoring nodes, and the second monitoring node is a monitoring node except the first monitoring node in the plurality of monitoring nodes; and training the position relation mapping model by adopting the first training sample.

In a third aspect, the present application further provides a device for training a position relationship mapping model, where the position relationship mapping model is applied to a target scene, the target scene includes a plurality of monitoring nodes, and the monitoring nodes are synchronized in time, and the device includes:

the acquisition module is used for controlling the plurality of monitoring nodes to synchronously acquire videos; wherein the sample object moves within a common monitoring range of the plurality of monitoring nodes;

the extraction module is used for screening out an image sequence containing a sample object from videos collected by the monitoring nodes aiming at each monitoring node and extracting position information of the sample object in each frame of image of the image sequence;

the sample construction module is used for determining the position information of the sample object acquired by each monitoring node at the same time to construct a first training sample based on the time stamp of each frame of image; the first training sample comprises a first sample position of a sample object in an image acquired by a first monitoring node and a second sample position of the sample object in an image acquired by each second monitoring node, wherein the first monitoring node is any monitoring node in the plurality of monitoring nodes, and the second monitoring node is a monitoring node except the first monitoring node in the plurality of monitoring nodes;

and the first training module is used for training the position relation mapping model by adopting the first training sample.

In some possible embodiments, when the extraction module performs screening out of the video collected by the monitoring node the image sequence including the sample object, the extraction module is configured to:

In some possible embodiments, the apparatus further comprises:

the proportion determining module is used for determining the proportion of a sampling range corresponding to the acquired position information relative to the monitoring range according to the position information of all sample objects included in the video acquired by the monitoring node aiming at any path of monitoring node;

and the video acquisition module is used for continuously acquiring the video if the ratio is smaller than a preset ratio threshold.

In some possible embodiments, the occupation ratio determining module, when determining the occupation ratio of the sampling range corresponding to the acquired position information with respect to the common monitoring range based on the position information of all the sample objects included in the video acquired by the monitoring node, is configured to:

In some possible embodiments, the apparatus further comprises:

an obtaining module, configured to obtain generation position information, which is output by the position relationship mapping model based on the first sample position and is respectively for each second sample position, and construct a second training sample, where the second training sample includes the first sample position and each generation position information;

the Euclidean distance determining module is used for determining the Euclidean distance between each piece of generated position information and the corresponding second sample position;

a third training sample construction module, configured to construct a third training sample by using the generated position information and the first sample position if the euclidean distance between each generated position information and the corresponding second sample position is greater than or equal to a first preset value;

and the second training module is used for training the position relation mapping model by adopting the third training sample.

In some possible embodiments, after determining the euclidean distance between each of the generated location information and the corresponding second sample location, the euclidean distance determining module is further configured to:

In some possible embodiments, the euclidean distance determining module, when performing the training of the position relationship mapping model using a fifth training sample, is configured to:

determining a loss value according to the Euclidean distance;

In a fourth aspect, another embodiment of the present application further provides an electronic device, including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform any one of the methods provided by the embodiments of the first aspect of the present application.

In a fifth aspect, another embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is configured to cause a computer to execute any one of the methods provided in the embodiment of the first aspect of the present application.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is an application scenario diagram of a location relationship mapping model method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a public shooting area of a position relationship mapping model method according to an embodiment of the present application;

fig. 3 is a flowchart of a training relationship mapping model of a position relationship mapping model method according to an embodiment of the present application;

fig. 4 is a flowchart of a method for constructing a first training sample set according to a location relationship mapping model provided in the embodiment of the present application;

fig. 5A is a schematic diagram of a public area of a location relation mapping model method according to an embodiment of the present application;

fig. 5B is a schematic diagram of determining a first sample frame position of a sample object at a monitoring node according to the position relationship mapping model method provided in the embodiment of the present application;

fig. 5C is a schematic diagram of a relationship mapping model of a position relationship mapping model method according to an embodiment of the present application;

fig. 6A is a schematic diagram of input and output of a relationship mapping model of a method for mapping a position relationship provided in an embodiment of the present application;

fig. 6B is an internal schematic diagram of a relationship mapping model of a method for mapping a position relationship model according to an embodiment of the present application;

fig. 7 is a schematic diagram of a training relationship mapping model of a position relationship mapping model method according to an embodiment of the present application;

fig. 8 is a schematic diagram of a training relationship mapping model of a position relationship mapping model method according to an embodiment of the present application;

fig. 9 is a schematic flowchart of a method for mapping a position relationship according to an embodiment of the present application;

fig. 10 is a schematic flowchart of a sample object screening method of a location relation mapping model method according to an embodiment of the present application;

fig. 11 is a schematic diagram of a sample object screening method of a location relation mapping model method according to an embodiment of the present application;

fig. 12 is a schematic diagram of determining a sampling range of a method for mapping a position relation according to an embodiment of the present application;

fig. 13 is a schematic diagram of a distributed multi-lens camera of a position relationship mapping model method according to an embodiment of the present application;

fig. 14 is a schematic device diagram of a method for mapping a position relationship model according to an embodiment of the present application;

fig. 15 is a schematic view of an electronic device of a method for mapping a position relationship model according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

It is noted that the terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The inventor researches and discovers that the security industry is greatly developed in all the fields under the support of technologies such as AI, big data and the like, and the functions are more diversified, complicated and humanized. The higher requirement on artificial intelligence and the bottleneck encountered by the current intelligent algorithm become a main contradiction in the current security industry, which puts higher requirements on the acquisition side, and the more complex functions require more data, wherein more data not only refers to the quantity, but also includes the dimensionality. The single-dimension human feature acquisition cannot meet the requirements of security users.

In the related technology, a front-end camera adopts a plurality of independent cameras such as a face snapshot machine and a high-definition structured camera to acquire target features, and a back-end software platform system is used for carrying out human body feature association in a plurality of scenes. However, in the same time and space, the angles and dimensions of the features of the targets that can be collected are generally single and limited, so that in security services like portrait gathering, due to the environment, light, camera shooting angles of collected pictures, target reloading behaviors and the like, the requirements on image algorithms and feature comparison of human feature extraction are very strict, and synchronization among a plurality of face capturing machines is not easy to realize.

In view of the above, the present application provides a method and a related apparatus for training a position relationship mapping model to solve the above problems. The inventive concept of the present application can be summarized as follows: aiming at a public shooting area with a plurality of monitoring nodes, a position relation mapping model can be established, and after a target object is acquired by any monitoring node, the position information of the target object in other monitoring node images of the public shooting area can be obtained according to the position information of the target object in the monitoring node image.

For convenience of understanding, the following describes in detail a training method of a position relationship mapping model provided in an embodiment of the present application with reference to the accompanying drawings:

taking an example that four monitoring nodes have a common shooting area, as shown in fig. 1, an application scenario diagram of a training method of a position relationship mapping model in the embodiment of the present application is shown. The figure includes: the position relation mapping model provided by the embodiment of the application is built in the distributed multi-lens camera 20; the monitoring system comprises a memory 30, a first monitoring node 40 and a second monitoring node set 50, wherein the second monitoring node set comprises a second monitoring node 501, a second monitoring node 502 and a second monitoring node 503, the first monitoring node is any monitoring node in a plurality of monitoring nodes, and the second monitoring node is a monitoring node in the plurality of monitoring nodes except the first monitoring node 40; the first monitoring node 40 and the second monitoring node 501, the second monitoring node 502, and the second monitoring node 503 have a common shooting area and a plurality of monitoring nodes are time-synchronized, wherein the common shooting area is as shown in fig. 2, and wherein:

the distributed multi-lens camera 20 firstly controls a plurality of monitoring nodes to synchronously acquire videos; wherein the sample object moves within a common monitoring range of a plurality of monitoring nodes; for each monitoring node, screening an image sequence containing a sample object from a video acquired by the monitoring node, and extracting position information of the sample object in each frame of image of the image sequence; determining position information of sample objects acquired by each monitoring node at the same time to construct a first training sample based on the time stamp of each frame of image; the first training sample includes a first sample position of the sample object in the image acquired by the first monitoring node 40, and a second sample position of the sample object in the image acquired by each second monitoring node, and the first training sample is used to train the position relationship mapping model.

Only a single distributed multi-lens camera is detailed in the description of the present application, but it should be understood by those skilled in the art that the illustrated distributed multi-lens camera, the monitoring node, are intended to represent the operations of the distributed multi-lens camera, the monitoring node, and the memory to which the technical solution of the present application relates. The detailed description of the single distributed multi-lens camera, the monitoring node and the memory is at least for convenience of explanation and does not imply a limitation on the number, type or location of the distributed multi-lens camera, the monitoring node, etc. It should be noted that the underlying concepts of the example embodiments of the present application may not be altered if additional modules are added or removed from the illustrated environments. In addition, although a bidirectional arrow from the memory 30 to the distributed multi-lens camera 20 is shown in fig. 1 for convenience of explanation, it will be understood by those skilled in the art that the above data transmission and reception also need to be implemented through the network 10.

It should be noted that the storage in the embodiment of the present application may be, for example, a cache system, or a hard disk storage, a memory storage, and the like. In addition, the training method of the position relation mapping model provided by the application is not only suitable for the application scenario shown in fig. 1, but also suitable for any device with the training requirement of the position relation mapping model.

In order to facilitate understanding of the training method of the position relationship mapping model provided in the embodiment of the present application, the following first describes the training method of the relationship mapping model in detail, and the training steps of the relationship mapping model are shown in fig. 3:

in step 301: constructing a first training sample set;

in some embodiments, the construction of the first training sample set may be embodied as the steps shown in fig. 4:

in step 401: acquiring a first image which is acquired by a first monitoring node and contains a sample object to form a first sample frame; acquiring a first sample position of the sample object in the first sample frame;

in the embodiments of the present application, for example: as shown in fig. 5A, the monitoring nodes A, B, C, D cover the same public area, the monitoring node a is used as a first monitoring node, the monitoring node B, C, D is used as a second monitoring node, images of the monitoring node A, B, C, D collected at the same time in the course of the sample object moving in the public area are collected, and assuming that the monitoring nodes A, B, C, D collect the images of the sample object at the

time

1, 2, 3, 4, and 5, the images collected by the monitoring node a at the

time

1, 2, 3, 4, and 5 respectively constitute a first sample frame. As shown in fig. 5B, the position of the sample object in each first sample frame of the monitoring node a is then determined, and the position of the sample object in each sample frame is the corresponding first sample position of the sample frame.

In step 402: for any second monitoring node, acquiring a second image which is acquired by the second monitoring node and contains the sample object and corresponds to the first image to form a second sample frame; acquiring a second sample position of the sample object in a second sample frame;

continuing with the example of fig. 5A, the images acquired by the monitoring node B, C, D respectively form respective second sample frames, and the monitoring node B is taken as an example to illustrate, where fig. 5B is an image of the sample object acquired by the monitoring node B at

time

1, 2, 3, 4, and 5, the image corresponding to time 1 forms a second sample frame, the image corresponding to time 2 forms a second sample frame, the image corresponding to time 3 forms a second sample frame, the image corresponding to time 4 forms a second sample frame, and the image corresponding to time 5 forms a second sample frame; and then determining the position of the sample object in each second sample frame of the monitoring node B, wherein the position of the sample object in each second sample frame is the second sample position corresponding to the sample frame.

In step 403: a first set of training samples is constructed from the first sample positions and the second sample positions of each second sample frame.

In summary, the first training sample includes a first sample position of the sample object in the first sample frame collected by the first monitoring node, and a second sample position of the sample object in the second sample frame collected by each second monitoring node, and the collection time of the first sample frame is the same as that of each second sample frame. For example: taking fig. 5A and 5B as an example, the first training sample includes a position of the sample object in the image acquired by the monitoring node a at time 1, a position of the sample object in the image acquired by the monitoring node B at time 1, a position of the sample object in the image acquired by the monitoring node C at time 1, and a position of the sample object in the image acquired by the monitoring node D at time 1.

In step 302: and inputting the first sample position to the relational mapping model, and outputting the training relational mapping model by taking the second sample position of each second sample frame as the expected output.

For example, the four monitoring nodes have a common shooting area, and as shown in fig. 5C, the relational mapping model is a model that inputs three outputs, that is, the position of the target object in one monitoring node is input, and the positions of the target object in the other three monitoring nodes are output. In implementation, the relational mapping model may be a Deep Neural Networks (DNN) model.

As shown in FIG. 6A, the first sample position of the sample object collected by monitor node A and the second sample position of the sample object collected by monitor node B, C, D are used to train the relational mapping model, and the expected output is the second sample position of the sample object collected by monitor node B, C, D. Further, in order to enable the relational mapping model to map the position of the target object in any monitoring node according to the position of the target object in other monitoring nodes when in use, after the relational mapping model is trained by using the first sample position of the sample object collected by the monitoring node a and the second sample position of the sample object collected by the monitoring node B, C, D, the training of the relational mapping model can be continued by using the monitoring node B, C, D as the first monitoring node and the other monitoring nodes as the second monitoring nodes in turn. For example: firstly, taking a monitoring node A as a first monitoring node, taking the monitoring node B, C, D as a second monitoring node, and training a relational mapping model by adopting a first sample position of a sample object acquired by the monitoring node A and a second sample position of the sample object acquired by the monitoring node B, C, D; then, the monitoring node B is used as a first monitoring node, the monitoring node A, C, D is used as a second monitoring node, and the first sample position of the sample object collected by the monitoring node B and the second sample position of the sample object collected by the monitoring node A, C, D are used to train the relational mapping model.

Finally, an internal schematic diagram of the relationship mapping model after the convergence is trained is shown in fig. 6B, and the scene has 4 monitoring nodes, and each monitoring node can be used as a first monitoring node. The training convergence is, for example, that the number of iterative training times reaches a specified number, and further, for example, that the position precision output by the position relation mapping model reaches a specified precision.

In the embodiment of the present application, data generated by the relational mapping model inevitably has a slight error, and in order to further improve the accuracy of the relational mapping model, the further training of the relational mapping model in the present application may be specifically implemented as the steps shown in fig. 7:

in step 701: acquiring generation position information, which is output by the relational mapping model based on the first sample position and aims at each second sample position, and constructing a second training sample, wherein the second training sample comprises the first sample position and each generation position information;

in step 702: determining the intersection ratio of each generated position information and the corresponding second sample position;

in some embodiments, the cross-over ratio may be determined using equation 1:

wherein, P'_bTo generate position information, P_bFor the second sample position, the IOU is the cross-over ratio.

In step 703: if the intersection ratio of each generated position information and the corresponding second sample position is greater than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position;

if the intersection ratio is greater than or equal to the first preset value, it is indicated that the generated position information generated this time is more accurate, and therefore the generated position information can be used as a high-quality sample, and therefore the generated position information is used as a third training sample in the application.

In step 704: and training the relational mapping model by using a third training sample.

For example: the monitor node A, B, C, D covers the same public area, and uses monitor node a as the first monitor node and monitor node B, C, D as the second monitor node, the first training sample includes: the position of the sample object at

time

1, 2, 3, 4, 5 in the second sample frame of monitoring node B, C, D, the second sample position, and the position of the sample object at

time

1, 2, 3, 4, 5 in the first sample frame of monitoring node a, i.e., the first sample position. If the intersection ratio between the second sample position at the second sample frame 2 time of the monitoring node B, C, D and the corresponding generated position information is greater than the first preset value, it is determined that the third training sample is: the position of the sample object at time 2 in the second sample frame of monitoring node B, C, D and the position of the sample object at time 2 in the first sample frame of monitoring node a.

In some embodiments, there may be differences in pedestrian heights in the monitoring scene as a result of collecting coordinate position data of sample objects at the same time in different monitoring nodes. Therefore, the sample data is often collected without covering the entire height of the sample object, which causes the collected sample to be not rich enough, and causes an error in the trained model. Therefore, in the present application, in order to further improve the accuracy of the relational mapping model, after determining the intersection ratio between each generated position information and the corresponding second sample position, the steps shown in fig. 8 may be implemented:

in step 801: if the intersection ratio corresponding to any generated position information is smaller than a second preset value, adopting the second sample position and the first sample position to construct a fourth training sample;

in specific implementation, if the cross-over ratio is smaller than the second preset value, it is indicated that the mapping of this time is an error mapping, and therefore, it is necessary to collect the second sample position and the first sample position corresponding to the generated position information, and continue training the relational mapping model using the second sample position and the first sample position, but because there are fewer samples corresponding to the error mapping, it is necessary to continue performing 802 to expand the samples.

In step 802: combining the third training sample and the fourth training sample into a fifth training sample;

in step 803: and training the relational mapping model by adopting a fifth training sample.

For example: monitoring node A, B, C, D covers the same common area, with monitoring node a being the first monitoring node, using the monitor node B, C, D as a second monitor node, determining the third position information of the sample object at the

time points

1, 2, 3, 4 and 5 in the second sample frame of the monitor node B, C, D by using a target detection algorithm, then, the generated position information of the sample objects in the second training sample set at the

time points

1, 2, 3, 4 and 5 in the second sample frame of the monitoring node B, C, D is determined, the intersection ratio between the third position information at the time point 1 of the monitoring node B and the generated position information corresponding to the time point 1 of the monitoring node B is determined, and the intersection ratio is compared with the preset value, if the intersection ratio is smaller than the second preset value, it indicates that the generation position information generated by the relational mapping model is not accurate, and therefore, the second sample position and the first sample position corresponding to the generation position information are collected.

After introducing the training method of the position relationship mapping model, the following describes in detail the training method of the position relationship mapping model provided in the embodiment of the present application, and as shown in fig. 9, is an overall flowchart of the training method of the position relationship mapping model, where:

in step 901: controlling a plurality of monitoring nodes to synchronously acquire videos; wherein the sample object moves within a common monitoring range of a plurality of monitoring nodes;

in step 902: for each monitoring node, screening an image sequence containing a sample object from a video acquired by the monitoring node, and extracting position information of the sample object in each frame of image of the image sequence;

in the embodiment of the present application, there may be an image that does not include a sample object in a view screen acquired by a monitoring node, and if the image is also subsequently calculated, unnecessary calculation resources are wasted, so the steps shown in fig. 10 may be adopted:

in step 1001: performing target detection on each frame of image in the video collected by the monitoring node aiming at the sample object respectively, and screening out images containing the sample object;

in step 1002: and sequencing the screened images according to the time stamps of the images to obtain an image sequence.

For example, as shown in fig. 11, for 1-5 frames of images acquired by the monitoring node a, target detection is performed on the 1-5 frames of images, it is determined that the 1 st, 2 nd, 3 th, and 5 th frames of images include sample objects, the 1 st, 2 nd, 3 th, and 5 th frames of images are screened from the video acquired by the monitoring node a, and are sorted according to frame numbers, so as to obtain an image sequence.

In step 903: determining position information of sample objects acquired by each monitoring node at the same time to construct a first training sample based on the time stamp of each frame of image; the first training sample comprises a first sample position of the sample object in the image acquired by the first monitoring node and a second sample position of the sample object in the image acquired by each second monitoring node;

the method for constructing the training sample is the same as the relational mapping model, and is not described herein again.

In step 904: and training a position relation mapping model by using a first training sample.

In the present application, in order to further facilitate and further reduce computational effort when training the position relationship mapping model, a method of determining an euclidean distance between production position information of the position relationship mapping model and second position information is adopted to replace a way of intersection ratio in the relationship mapping model, and in some embodiments, generation position information, which is output by the position relationship mapping model based on a first sample position and is respectively specific to each second sample position, may be obtained, and a second training sample may be constructed, where the second training sample includes the first sample position and each generation position information; determining Euclidean distances between each piece of generated position information and the corresponding second sample position; if the Euclidean distance between each piece of generated position information and the corresponding second sample position is larger than or equal to a first preset value, constructing a third training sample by using the generated position information and the first sample position; and training the position relation mapping model by adopting a third training sample.

time

1, 2, 3, 4, 5 in the first sample frame of monitoring node a, i.e., the first sample position. If the euclidean distances between the second sample position of the second sample frame 2 of the monitoring node B, C, D and the corresponding generated position information are all greater than the first preset value, determining that the third training sample is: the position of the sample object at time 2 in the second sample frame of monitoring node B, C, D and the position of the sample object at time 2 in the first sample frame of monitoring node a.

In other embodiments, in order to make the position relationship mapping model more accurate, therefore, in the present application: after the Euclidean distance between each piece of generated position information and the corresponding second sample position is determined, if the Euclidean distance corresponding to any piece of generated position information is smaller than a second preset value, a fourth training sample is constructed by adopting each second sample position and the first sample position, and the third training sample and the fourth training sample are combined into a fifth training sample; and training the position relation mapping model by adopting a fifth training sample.

time points

1, 2, 3, 4 and 5 in the second sample frame of the monitoring node B, C, D is determined, the euclidean distance between the third position information at the time point 1 of the monitoring node B and the generated position information corresponding to the time point 1 of the monitoring node B is determined, and the euclidean distance is compared with a preset value, if the euclidean distance is smaller than a second preset value, it indicates that the generation position information generated by the relational mapping model is not accurate, and therefore, the second sample position and the first sample position corresponding to the generation position information are collected.

In a specific implementation, the process of training the position relationship model by using each sample is the same, and for convenience of understanding, the following description will be given by taking the fifth training sample as an example to train the position relationship mapping model:

during training, firstly inputting the generated position information in the fifth training sample into a position relation mapping model, taking the position of the second sample in the fifth training sample as expected output, and acquiring the position information output by the position relation mapping model corresponding to the generated position information; determining a Euclidean distance between the output position information and the second sample position; determining a loss value according to the Euclidean distance; and carrying out parameter adjustment on the position relation mapping model based on the loss value.

In some possible embodiments, it is desirable that the mapping of the position relationship mapping model is more accurate, and therefore it is required to ensure that the acquisition of the samples during the training of the position relationship mapping model can cover all the common monitoring ranges as much as possible, which may be implemented as the steps shown in fig. 12:

in step 1201: for any path of monitoring node, determining the ratio of a sampling range corresponding to the acquired position information to a monitoring range based on the position information of all sample objects included in the video acquired by the monitoring node;

in the embodiment of the present application, the manner of determining the occupation ratio can be implemented as the following methods:

1. determining a ratio from a convex hull

In the present application, determining a convex hull containing the collected location information, and the density of sampling locations within the convex hull may be employed; and if the adopted position density is larger than the preset density threshold value, determining the occupation ratio of the convex hull area to the common monitoring range.

2. Determining the ratio of occupation according to the area

In the application, the position information can be used as a circle center, a circle with a preset diameter is selected, the area of the circle corresponding to all the position information is calculated, and the proportion of the area in the common monitoring range is determined.

In step 1202: and if the ratio is smaller than the preset ratio threshold, continuously acquiring the video.

In this application, there is also provided a distributed multi-lens camera, as shown in fig. 13, the camera is applied to a target scene, the camera includes: multiple monitoring nodes 13001-1300n, an integrated module 1302, a master control module 1303, multiple monitoring nodes time synchronization and have a common monitoring range, wherein:

the main control module 1303 is used for responding to the acquisition instruction and sending a clock synchronization signal to each monitoring node;

each monitoring node 13001-1300n is respectively used for synchronously acquiring a video in response to a clock synchronization signal sent by the master control module to obtain a serialized signal;

the integration module 1302 is configured to control the monitoring nodes to synchronously expose, deserialize the serialized signals sent by the monitoring nodes, and send the deserialized videos to the main control module;

the main control module 1303 further screens out an image sequence containing the sample object from the video sent by the integration module, extracts position information of the sample object in each frame of image of the image sequence, and determines the position information of the sample object collected by each monitoring node at the same time to construct a first training sample based on the timestamp of each frame of image; the first training sample comprises a first sample position of a sample object in an image acquired by a first monitoring node and a second sample position of the sample object in an image acquired by each second monitoring node, the first monitoring node is any monitoring node in the plurality of monitoring nodes, and the second monitoring node is a monitoring node except the first monitoring node in the plurality of monitoring nodes; and training a position relation mapping model by adopting the first training sample.

As shown in fig. 14, based on the same inventive concept, a training apparatus 1400 for a position relationship mapping model is provided, where the position relationship mapping model is applied to a target scene, the target scene includes a plurality of monitoring nodes, and the monitoring nodes are synchronized in time, and the apparatus includes:

an acquisition module 14001, configured to control the multiple monitoring nodes to acquire videos synchronously; wherein the sample object moves within a common monitoring range of the plurality of monitoring nodes;

an extracting module 14002, configured to, for each monitoring node, screen out an image sequence including a sample object from a video acquired by the monitoring node, and extract position information of the sample object in each frame of image of the image sequence;

the sample construction module 14003 is configured to determine, based on the timestamp of each frame of image, position information of a sample object acquired by each monitoring node at the same time to construct a first training sample; the first training sample comprises a first sample position of a sample object in an image acquired by a first monitoring node and a second sample position of the sample object in an image acquired by each second monitoring node, wherein the first monitoring node is any monitoring node in the plurality of monitoring nodes, and the second monitoring node is a monitoring node except the first monitoring node in the plurality of monitoring nodes;

a first training module 14004 is configured to train the position relationship mapping model using the first training sample.

In some possible embodiments, the apparatus further comprises:

determining a loss value according to the Euclidean distance;

Having described the method and apparatus for training a position relationship mapping model according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one memory. The memory stores program code, and the program code, when executed by the processor, causes the processor to execute the steps of the training method of the position relationship mapping model according to the various exemplary embodiments of the present application described above in the present specification.

The electronic apparatus 130 according to this embodiment of the present application is described below with reference to fig. 15. The electronic device 130 shown in fig. 15 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 15, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, the aspects of a method for training a position relationship mapping model provided in the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of a method for training a position relationship mapping model according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for training of a positional relationship mapping model of an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executable on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A training method of a position relation mapping model is characterized in that the position relation mapping model is applied to a target scene, the target scene comprises a plurality of monitoring nodes, the monitoring nodes are time-synchronized, and the method comprises the following steps:

2. The method of claim 1, wherein the video captured by the monitoring node is screened out of a sequence of images containing sample objects, comprising:

3. The method of claim 1, further comprising:

4. The method according to claim 3, wherein the determining, based on the position information of all sample objects included in the video acquired by the monitoring node, a ratio of a sampling range corresponding to the acquired position information to the common monitoring range comprises:

5. The method of claim 1, further comprising:

6. The method of claim 5, wherein after determining the Euclidean distance between each of the generated position information and the corresponding second sample position, the method further comprises:

if the Euclidean distance corresponding to any generated position information is smaller than a second preset value, adopting the second sample positions and the first sample position to construct a fourth training sample,

7. The method of claim 6, wherein the training the position relationship mapping model using a fifth training sample comprises:

determining a loss value according to the Euclidean distance;

8. A distributed multi-lens camera for application to a target scene, the camera comprising: a plurality of control nodes, integrated module, host system, a plurality of control node time synchronization and have common control range, wherein:

the main control module is used for responding to an acquisition instruction and sending clock synchronization signals to each monitoring node

9. A device for training a position relationship mapping model is characterized in that the position relationship mapping model is applied to a target scene, the target scene comprises a plurality of monitoring nodes, the monitoring nodes are time-synchronized, and the device comprises:

and the training module is used for training the position relation mapping model by adopting the first training sample.

10. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

11. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to execute the method of any one of claims 1-7.