CN111553228A

CN111553228A - Method, device, equipment and storage medium for detecting personal bag relationship

Info

Publication number: CN111553228A
Application number: CN202010318852.0A
Authority: CN
Inventors: 李昆明; 冯琰一; 张少文; 李德紘
Original assignee: Guangdong Huazhiyuan Information Engineering Co ltd; Guangzhou Jiadu Technology Software Development Co ltd; Guangzhou Xinke Jiadu Technology Co Ltd; PCI Suntek Technology Co Ltd
Current assignee: Guangdong Huazhiyuan Information Engineering Co ltd; Guangzhou Jiadu Technology Software Development Co ltd; Guangzhou Xinke Jiadu Technology Co Ltd; PCI Suntek Technology Co Ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2020-08-18
Anticipated expiration: 2040-04-21
Also published as: CN111553228B

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for detecting a personal bag relationship. According to the technical scheme, people and bags in the image to be analyzed are identified through the neural network structure, the positions of the people and the bags are obtained, meanwhile, the people association embedding vector and the bag association embedding vector are obtained, the association embedding relation cost and the prior cost are calculated, a people and bag relation corresponding cost matrix is constructed based on the association embedding relation cost and the prior cost, the identified corresponding relation between the people and the bags can be obtained after the people and bag relation corresponding cost matrix is solved, and the corresponding efficiency and accuracy of the people and the bags in a crowded scene are improved.

Description

Method, device, equipment and storage medium for detecting personal bag relationship

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for detecting a personal bag relationship.

Background

Generally, the existing target detection technology is relatively mature and widely applied, but for specific scenes such as the correspondence relation of a person package needing to be determined, only the detection of a specific target cannot meet the requirement.

For example, in airports, stations, etc., the use of people and bag detection techniques enables the detection of bags and people, but does not give which person the bag belongs to, which bags the person has. Although the analysis based on the positional relationship can solve the problem to some extent, such as determining the affiliation of a package and a person when the relative distance between the package and the person is less than a certain threshold. However, the method for corresponding the human package relationship has the problems that the accuracy is not high, and the method cannot be applied to relatively crowded scenes.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for detecting a personal bag relationship, so as to improve the accuracy rate of personal bag correspondence.

In a first aspect, an embodiment of the present application provides a method for detecting a personal bag relationship, including:

extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in an image to be analyzed through a person-package relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one by one;

calculating an association embedding relation cost according to the human association embedding vector and the packet association embedding vector;

calculating a prior cost from the person location box and the package location box;

and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.

Further, the calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector includes:

obtaining vector values of the person-associated embedded vectors

And the vector value of the packet associated embedded vector

Vector values based on the person-associated embedded vector

And the vector value of the packet associated embedded vector

Calculating the associated embedding relationship cost according to the following formula:

where i denotes the ith individual, j denotes the jth package,

indicating the associated embedding distance, T₁、T₂Is a predetermined threshold, H is a predetermined constant, f_ass() Indicating monotonous incrementsA mapping of embedding distance to embedding loss is correlated.

Further, said calculating a priori cost from said person location box and said package location box comprises:

based on the relative relationship of the people position box and the bag position box, calculating a prior cost according to the following formula:

wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, f_prior() Representing a mapping of a priori knowledge to a priori loss.

Further, the constructing a cost matrix corresponding to the personal package relationship based on the associated embedding relationship cost and the prior cost, and determining the personal package corresponding relationship based on the cost matrix corresponding to the personal package relationship, includes:

constructing a cost matrix corresponding to the personal bag relationship based on the associated embedding relationship cost and the prior cost, and calculating the corresponding cost of the personal bag relationship according to the following formula:

wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,

the associated embedding relationship cost is represented as,

representing a priori cost;

constructing a cost matrix corresponding to the personal bag relationship based on the cost corresponding to the personal bag relationship;

and determining the corresponding relationship of the person and the bag based on the corresponding cost matrix of the person and the bag relationship.

Further, the determining the personal bag corresponding relationship based on the personal bag corresponding cost matrix includes:

and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.

Further, before extracting the human position frame, the package position frame, the human association embedded vector and the package association embedded vector in the image to be analyzed through the human-package relationship detection network, the method further includes:

establishing a human packet relation detection network based on a neural network structure;

training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in a training process meets a training requirement, wherein the training sample image is marked with a human position frame, a bag position frame and a human-bag corresponding relationship.

Further, the human-package relationship detection network comprises a backbone network, a position regression branch, a classification branch and an associated embedding vector branch;

the backbone network is used for outputting a feature map to the position regression branch, the classification branch and the association embedded vector branch;

the classification branch outputs a feature classification based on the feature map, the feature classification including persons and bags;

the position regression branch outputs a position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a human position frame and a bag position frame;

and the association embedding vector branch outputs association embedding vectors based on the feature map, and determines the types of the association embedding vectors according to the output result of the classification branch, wherein the types of the association embedding vectors comprise human association embedding vectors and packet association embedding vectors.

Further, the loss function includes a regression loss L_regClass loss L_clsAnd associated embedding loss L_ass；

The regression Loss is obtained through smooth-L1-Loss function, IOU-Loss function or GIou-Loss function calculation;

the classification loss is obtained through cross-entropy-loss function calculation;

the associated embedding loss is calculated by the following formula:

L_ass＝μL_pull+νL_push

wherein,

s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, N' is the number of all the affiliations, and delta₂Is a preset distance threshold;

expressing the associated embedding vector values of the people and the bags in the current affiliation, and expressing the weighting coefficients mu and v;

the loss function is calculated by the following formula:

L＝αL_cls+βL_reg+ηL_ass

where α, β, η represent the loss weight.

In a second aspect, an embodiment of the present application provides a personal bag relationship detection apparatus, including a detection network extraction module, an association embedding cost calculation module, a priori cost calculation module, and a corresponding relationship determination module, where:

the detection network extraction module is used for extracting a person position frame, a packet position frame, a person correlation embedding vector and a packet correlation embedding vector in the image to be analyzed through a person-packet relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the packet position frame corresponds to the packet correlation embedding vector one by one;

an association embedding cost calculation module for calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector;

a priori cost calculation module for calculating a priori cost according to the human location box and the package location box;

and the corresponding relation determining module is used for constructing a corresponding cost matrix of the personal package relation based on the incidence embedding relation cost and the prior cost, and determining the corresponding relation of the personal package based on the corresponding cost matrix of the personal package relation.

Further, the associated embedding cost calculation module is specifically configured to:

obtaining vector values of the person-associated embedded vectors

And the vector value of the packet associated embedded vector

Vector values based on the person-associated embedded vector

And the vector value of the packet associated embedded vector

where i denotes the ith individual, j denotes the jth package,

indicating the associated embedding distance, T₁、T₂Is a predetermined threshold, H is a predetermined constant, f_ass() Representing a monotonically increasing mapping of the associated embedding distance to the embedding loss.

Further, the prior cost calculation module is specifically configured to:

Further, the correspondence determining module is specifically configured to:

wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,

the associated embedding relationship cost is represented as,

representing a priori cost;

Further, when the correspondence determining module determines the personal bag correspondence based on the personal bag correspondence cost matrix, the correspondence determining module specifically includes:

Further, the apparatus further comprises a neural network creation module, the neural network creation module is configured to:

the associated embedding loss is calculated by the following formula:

L_ass＝μL_pull+νL_push

wherein,

the loss function is calculated by the following formula:

L＝αL_cls+βL_reg+ηL_ass

where α, β, η represent the loss weight.

In a third aspect, an embodiment of the present application provides a computer device, including: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the person-package relationship detection method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions for performing the person-package relationship detection method according to the first aspect when executed by a computer processor.

According to the embodiment of the application, people and bags in the image to be analyzed are identified through the neural network structure, the positions of the people and the bags are obtained, meanwhile, the people association embedding vector and the bag association embedding vector are obtained, the association embedding relation cost and the prior cost are calculated, the corresponding cost matrix of the people and bags relation is established based on the association embedding relation cost and the prior cost, the corresponding relation of the identified people and bags can be obtained after the corresponding cost matrix of the people and bags relation is solved, and the corresponding efficiency and accuracy of the people and bags in a crowded scene are improved.

Drawings

Fig. 1 is a flowchart of a method for detecting a personal bag relationship according to an embodiment of the present application;

FIG. 2 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a relationship between a person and a bag according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a human-bag relationship detection network provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a human bag relationship detection apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Fig. 1 is a flowchart of a human package relationship detection method according to an embodiment of the present application, where the human package relationship detection method according to the embodiment of the present application may be executed by a human package relationship detection apparatus, and the human package relationship detection apparatus may be implemented in a hardware and/or software manner and integrated in a computer device.

The following description will be given taking as an example a case where the human-bag relationship detection apparatus performs the human-bag relationship detection method. Referring to fig. 1, the person-package relationship detection method includes:

s101: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.

Herein, the term bag is to be understood in a broad sense and generally refers to luggage such as luggage, backpacks, handbags, satchels, cartons and the like in public, and the relationship of a person to a bag is to be understood as the relationship of membership of the bag to a person. The source of the image to be analyzed can be a video frame returned by the monitoring camera, or an image needing to be input based on the human package analysis.

The human packet relation detection network is composed of human packet detection and associated embedded vectors. The human packet detection may be performed based on a one-stage or two-stage detection framework of a neural network (e.g., a deep neural network, a convolutional neural network, a recurrent neural network, etc.), and identify the positions and categories of the human and the packet in the image (whether the positions correspond to the human or the packet), and output the positions in a position frame manner, for example, outputting a corner point (e.g., upper left corner) of the position frame and the height and width of the position frame. The human correlation embedding vector and the packet correlation embedding vector are obtained through neural network embedding and used for representing the relationship attributes of the corresponding human or packet.

It can be understood that the human position frame and the human association embedding vector, and the package position frame and the package association embedding vector are in a one-to-one correspondence relationship, and for each human position frame and the package position frame, the corresponding human association embedding vector and the corresponding package association embedding vector correspond to each other.

For example, after receiving an image to be analyzed, the image to be analyzed is input into a person-bag relationship detection network, the person-bag relationship detection network analyzes the image to be analyzed, outputs an identification object and a position frame in the image to be analyzed, classifies the identification object, that is, distinguishes the identification object as a person or a bag, and determines the corresponding position frame as a person position frame and a bag position frame. Meanwhile, the person-package relation detection network outputs a person-to-package correlation embedded vector and a package correlation embedded vector corresponding to the person and the package.

S102: and calculating the associated embedding relation cost according to the human associated embedding vector and the packet associated embedding vector.

Illustratively, after determining the person associated embedding vector and the packet associated embedding vector corresponding to the image to be processed, the associated embedding relationship cost is calculated according to the associated embedding vector distance between each person and each packet. It can be understood that the weaker the correspondence between a person and a package, the larger the association embedding vector distance between the corresponding person and the package, and the higher the calculated association embedding relationship cost.

S103: calculating a prior cost from the person location box and the package location box.

The prior cost is obtained by using the prior knowledge of human as judgment and mapping the position relation of the person package, so that the accuracy of judging the corresponding relation of the person package can be further improved. For example, the bag may not normally be present on top of the head of the person to whom it belongs, the bag may not be too far from the person, and so on. The relative relationship between the human location box and the package location box may be mapped to a priori costs based on experience with the human package relationship. It will be appreciated that the weaker the correspondence between a person and a bag, the higher the corresponding a priori cost.

Illustratively, after the human position frame and the bag position frame are determined, based on every two opposite human position frames and bag position frames, the prior cost corresponding to the two opposite human position frames and bag position frames is calculated according to the mapping relation between the position frames and the prior cost.

S104: and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.

Illustratively, for two opposite people and packages, the correlation embedding relationship cost and the prior cost obtained by the calculation are summed according to a preset proportion relationship to obtain the corresponding cost of each person for each package, and a corresponding cost matrix of the person-package relationship is constructed based on the corresponding cost of the person-package relationship.

It can be understood that the weaker the correspondence between the person and the package, the higher the corresponding person-package relationship corresponding cost, and the larger the corresponding value in the person-package relationship corresponding cost matrix. For example, the weaker the correspondence between the fifth person and the sixth packet, the higher the corresponding person-packet-relationship correspondence cost, and the larger the value corresponding to M (5, 6) in the person-packet-relationship correspondence cost matrix M (the abscissa represents the person number and the ordinate represents the packet number).

Further, the corresponding cost of the person-package relationship between each person and the package can be judged according to the corresponding cost matrix of the person-package relationship, at this time, the correspondence between the person and the package is converted from a non-standard assignment problem into a standard assignment problem, the optimal assignment result between the person and the package can be obtained based on a standard assignment solution, and the person-package corresponding relationship is determined based on the assignment result.

Further, after the corresponding relationship of the person package is determined, the person position frame, the package position frame and the corresponding relationship of the person package in the image to be analyzed are output. For example, a person position frame and a bag position frame may be marked in the form of a box on the screen to be analyzed, and the position frames of persons and bags having corresponding (subordinate) relationships may be displayed in the same color.

The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene.

Fig. 2 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application, where the method for detecting a personal bag relationship is an embodiment of the method for detecting a personal bag relationship. Referring to fig. 2, the person-package relationship detection method includes:

s201: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.

S202: obtaining vector values of the person-associated embedded vectors

And the vector value of the packet associated embedded vector

Specifically, after the human-associated embedded vector and the packet-associated embedded vector are obtained, the vector values of the currently detected human-associated embedded vector are obtained respectively

Vector values corresponding to packet-associated embedded vectors

Wherein

Vector values representing the person associated embedded vectors corresponding to the ith person,

and associating the vector value of the embedded vector by the packet corresponding to the jth packet.

S203: vector values based on the person-associated embedded vector

And the vector value of the packet associated embedded vector

And calculating the associated embedding relation cost according to the associated embedding relation cost formula.

Specifically, the associated embedding relationship cost formula is:

where i denotes the ith individual, j denotes the jth package,

Specifically, for each person associated embedding vector, the associated embedding distance delta between each person associated embedding vector and each packet associated embedding vector is calculated₁And according to Δ₁With a predetermined threshold value T₁And T₂Determines the associated embedding relationship cost C between the corresponding person and the package_ass。

For example, when Δ₁Less than or equal to T₁Then, the relationship cost C is embedded in the association_assIs determined as 0 when Δ₁Greater than T₂Then, the relationship cost C is embedded in the association_assIs determined as H, and at Δ₁At T₁And T₂Determining the associated embedding relation cost C according to the mapping relation from the associated embedding distance to the embedding loss_assThe value of (c).

Wherein the monotonically increasing mapping of the associated embedding distance to the embedding loss f_ass() It should be understood that the larger the associated embedding distance, the larger the embedding loss, and the higher the corresponding associated embedding relationship cost. The mapping of the different values of the associated embedding distance to the embedding loss (or associated embedding relationship cost) may be set according to the actual situation, for example, the associated embedding is determined according to the range of the values of the associated embedding distanceThe relationship cost, or the mapping function is determined according to the mapping relationship between the associated embedding distance and the associated embedding relationship cost.

S204: and calculating prior cost according to a prior cost formula based on the relative relation between the human position frame and the bag position frame.

The prior cost utilizes the prior knowledge of human beings as judgment, the position relation of the person package is obtained by mapping, and the accuracy of judgment of the corresponding relation of the person package can be improved by utilizing the prior cost.

Specifically, the prior cost formula is:

Specifically, for each personal position box, the relative relation theta between the personal position box and each package position box is obtained, and the mapping relation function f from priori knowledge to priori loss is substituted according to the relative relation theta_prior() To determine the a priori cost between the corresponding person and the pack.

Wherein the mapping f from a priori knowledge to a priori loss is_prior() And determining according to the corresponding relation between the person and the bag in the prior knowledge. It will be appreciated that the weaker the correspondence between a person and a bag, the higher the corresponding a priori cost.

For example, it is often the case that a package owned by a person does not appear on top of the person's head, and from a priori knowledge, the location of the occurrence of the package location box generally does not appear above the person location box with the corresponding (dependent) relationship, then it can be determined that the a priori cost is greater when the package location box is above the person location box. At this time, the up-down positional relationship between the package position frame and the person position frame can be used as the relative relationship Θ between the person position frame and the package position frame. For another example, in general, the distance between the corresponding person and the package is not too far, and it can be known from the prior knowledge that the appearance position of the package position frame is not generally beyond the threshold distance from the person position frame having the corresponding (dependent) relationship, and it can be determined that the farther the package position frame is from the person position frame, the greater the corresponding prior cost. At this time, the distance position relationship between the package position frame and the person position frame can be used as the relative relationship Θ between the person position frame and the package position frame.

It can be understood that the type of the relative relationship Θ can be determined according to actual needs, and when the relative relationship Θ includes a plurality of types, corresponding weights can be set for the prior costs calculated by the different types of relative relationships Θ, and the prior costs calculated under the different types of relative relationships are summed according to the corresponding weight ratios and taken as the final prior cost C_prior。

S205: and constructing a cost matrix corresponding to the personal packet relationship based on the associated embedding relationship cost and the prior cost, and calculating the cost corresponding to the personal packet relationship according to a cost formula corresponding to the personal packet relationship.

Specifically, the formula of the corresponding cost of the personal bag relationship is as follows:

wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,

the associated embedding relationship cost is represented as,

representing a priori costs.

After the detected associated embedding relation cost and prior cost between each person and each packet are obtained, the associated embedding relation cost C is obtained according to the serial numbers of the persons and the packets_assAnd a priori cost C_priorSubstituting into the corresponding cost formula of the relationship between people and bags to obtain the corresponding cost C of the relationship between people and bags_judge。

S206: and constructing a cost matrix corresponding to the personal package relationship based on the cost corresponding to the personal package relationship.

Specifically, the corresponding cost of the person-package relationship between each person and each package is obtainedC_judgeThen, a person-to-package relationship corresponding cost matrix is constructed according to the serial numbers of the person and the package, for example, the serial number of the person is used as an abscissa, the serial number of the package is used as an ordinate, and the person-to-package relationship corresponding cost is used as a value.

Assuming that p persons, b packages, are detected in the currently detected image to be analyzed, a p × b person-package relationship corresponding cost matrix M can be created:

it can be understood that the higher the corresponding cost of the person-package relationship is, the larger the value of the corresponding sequence number in the cost matrix M corresponding to the person-package relationship is, which means the weaker the corresponding relationship between the person and the package is.

S207: and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.

Specifically, after the cost matrix corresponding to the person-bag relationship is constructed, the correspondence between the person and the bag is converted from a non-standard assignment problem into a standard assignment problem, an optimal assignment result between the person and the bag can be obtained based on a standard assignment solution, and the person-bag correspondence relationship is determined based on the assignment result.

The optimal assignment result for assigning a packet to a person is found, for example, by the hungarian algorithm or the Kuhn-Munkras algorithm, and the correspondence of the person and the packet is determined according to the assignment relationship of the person and the packet indicated in the assignment result, i.e., the packet assigned to a person is determined to correspond to (be subordinate to) the person. It can be understood that, in an actual situation, there is a case where one person corresponds to a plurality of packages, and the cost matrix corresponding to the package relationship of the person can be solved based on a solving method of an assignment problem that one person can do several things (which is equivalent to that a plurality of packages can be assigned to the same person at the same time).

Further, after the optimal solution of the cost matrix corresponding to the person-bag relationship is solved, the person position frame, the bag position frame and the person-bag corresponding relationship are output based on the optimal solution of the corresponding cost matrix.

Fig. 3 is a schematic diagram illustrating a relationship between persons and bags according to an embodiment of the present invention, and as shown in fig. 3, it is assumed that three persons Person _ A, Person _ B and Person _ C are detected in one image to be analyzed, four bags of Bag _ A, Bag _ B, Bag _ C and Bag _ D are detected, a Person position frame of three persons and a Bag position frame of four bags are output after detection and analysis by the network for detecting and analyzing the relationship between persons and bags, and finally the relationship between persons and bags existing in the image to be analyzed is solved by the cost matrix for corresponding to the relationship between persons and bags, that is, Bag _ a corresponds to Person _ a, Bag _ C and Bag _ D correspond to Person _ C, and Bag _ B and Person _ B do not correspond to each other because the bags are above the persons or are too far away from the persons.

The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene. Meanwhile, the optimal solution of the cost matrix corresponding to the person-to-bag relationship is solved based on an assignment problem solving method, and the accuracy of the person-to-bag relationship is effectively improved.

Fig. 4 is a flowchart of another human-bag relationship detection method provided in an embodiment of the present application, where the human-bag relationship detection method is an embodiment of the human-bag relationship detection method. Referring to fig. 4, the person-package relationship detection method includes:

s301: a person-to-package relationship detection network is created based on a neural network structure.

Specifically, a human package relationship detection network is created based on a deep neural network, a convolutional neural network, a cyclic neural network, and the like (for example, vgg, a resnet, a densnet, and other network structures), and fig. 5 is a schematic structural diagram of the human package relationship detection network provided in the embodiment of the present application. As shown in fig. 5, the human package relationship detection network includes a backbone network, a location regression branch, a classification branch, and an associated embedding vector branch.

The position regression branch, the classification branch and the associated embedded vector branch share the same backbone network, the backbone network is used for receiving an image to be analyzed, a feature map (the resolution is a matrix of C multiplied by H multiplied by W, C is the number of channels, for RGB images, C is 3, and H and W are height and width) is generated according to the image to be analyzed, the backbone network identifies people and packages, generates the feature map based on the identified people or packages, and outputs the feature map to the position regression branch, the classification branch and the associated embedded vector branch.

Further, the classification branch outputs a feature classification based on the feature map, and the feature classification includes people and bags. The position regression branch outputs the position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a person position frame and a bag position frame. And the associated embedding vector branch outputs an associated embedding vector based on the feature map, and determines the type of the associated embedding vector according to the output result of the classification branch, wherein the type of the associated embedding vector comprises a human associated embedding vector and a packet associated embedding vector.

The position frame obtained by position regression branch regression does not know whether the position frame is a human position frame or a packet position frame, the determination of the position frame type depends on the classification branch, and the output of the classification branch indicates the frame type. For the associated embedding vector branch, the output embedding vector value has no class attribute, namely whether the associated embedding vector is a package or a person is not known, or whether the type of the associated embedding vector is determined by depending on the classification branch. In summary, the type of the location regression branch and the associated embedded vector branch outputs are specified in dependence on the classification branch output. In the associated embedding vector branch, the associated embedding vector branch calculates an associated embedding vector for each anchor frame in the feature map, and judges the feature classification corresponding to the feature map by combining the classification branch, so that whether the corresponding position frame is a human position frame or a packet position frame can be determined.

For example, for a feature map with a resolution of m × n, a feature vector map of m × n × q may be generated when the feature map passes through the associated embedded vector branch, where q is the number of anchor frames to which each anchor point belongs, taking the feature map output by the backbone network at a certain scale as an example, coordinates of one anchor point of the feature map are (i, j), at which 3 different anchor frames are preset, 3 associated embedded vectors will be generated after passing through the associated embedded vector branchQuantity [ k ]₁，k₂，k₃]And the 3 vectors represent the relational attributes of the 3 anchor boxes.

S302: training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in the training process meets the training requirement.

Specifically, a training set is established by collecting a large number of training sample images, and the corresponding relationship among the human position frame, the bag position frame and the human bag in the training sample images is marked in a manual marking mode, so that the corresponding relationship among the human position frame, the bag position frame and the human bag is marked on the training sample images. And inputting the marked training set into a human-bag relationship detection network for recognition training until a loss function in the training process of the human-bag relationship detection network meets the training requirement or reaches the minimum value.

Wherein the loss function comprises a regression loss L_regClass loss L_clsAnd associated embedding loss L_assThe regression Loss can be calculated by a smooth-L1-Loss function, an IOU-Loss function or a GIou-Loss function, and the classification Loss can be calculated by a cross-entropy-Loss function.

Further, the associated embedding loss is calculated by the following formula:

L_ass＝μL_pull+νL_push

wherein,

and the values of the associated embedding vectors of the people and the packages in the current affiliation are represented, and mu and nu represent weighting coefficients.

Further, the regression loss L is calculated_regClass loss L_clsAnd associated embedding loss L_assThen, the loss function L can be obtained by calculating the following formula:

L＝αL_cls+βL_reg+ηL_ass

where α, β, η represent the loss weight.

It can be understood that the process of training the human packet relationship detection network is actually a process of training the loss function, and the goal is to minimize the loss function (which can be solved by a gradient descent algorithm).

S303: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.

S304: and calculating the associated embedding relation cost according to the human associated embedding vector and the packet associated embedding vector.

S305: calculating a prior cost from the person location box and the package location box.

S306: and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.

The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene. Meanwhile, the position frame, the feature classification and the associated embedded vector are respectively output through the position regression branch, the classification branch and the associated embedded vector branch, the detection efficiency and the detection accuracy of the person-in-package relationship detection network are improved, and the detection accuracy of the person-in-package relationship detection network is further improved through reasonably determining the loss function.

Fig. 6 is a schematic structural diagram of a human bag relationship detection apparatus according to an embodiment of the present application. Referring to fig. 6, the personal bag relationship detecting apparatus provided in this embodiment includes a detecting network extracting module 61, an associated embedded cost calculating module 62, a priori cost calculating module 63, and a corresponding relationship determining module 64.

The detection network extraction module 61 is configured to extract a person position frame, a package position frame, a person association embedding vector and a package association embedding vector in an image to be analyzed through a person-package relationship detection network, where the person position frame corresponds to the person association embedding vector, and the package position frame corresponds to the package association embedding vector one to one; an association embedding cost calculation module 62 for calculating an association embedding relationship cost from the person association embedding vector and the package association embedding vector; a priori cost calculation module 63 for calculating a priori cost from the person location box and the package location box; and a corresponding relation determining module 64, configured to construct a cost matrix corresponding to the personal package relation based on the associated embedding relation cost and the prior cost, and determine a personal package corresponding relation based on the cost matrix corresponding to the personal package relation.

In a possible embodiment, the associated embedded cost calculation module 62 is specifically configured to:

obtaining the person association embeddingVector value of vector

And the vector value of the packet associated embedded vector

Vector values based on the person-associated embedded vector

And the vector value of the packet associated embedded vector

where i denotes the ith individual, j denotes the jth package,

In a possible embodiment, the a priori cost calculation module 63 is specifically configured to:

In a possible embodiment, the correspondence determining module 64 is specifically configured to:

wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,

the associated embedding relationship cost is represented as,

representing a priori cost;

In a possible embodiment, when the correspondence determining module 64 determines the personal bag correspondence based on the personal bag correspondence cost matrix, specifically:

In one possible embodiment, the apparatus further comprises a neural network creation module configured to:

In one possible embodiment, the human package relationship detection network comprises a backbone network, a location regression branch, a classification branch and an associated embedding vector branch;

In one possible embodiment, the loss function includes a regression loss L_regClass loss L_clsAnd associated embedding loss L_ass；

the associated embedding loss is calculated by the following formula:

L_ass＝μL_pull+νL_push

wherein,

s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, and N' is all the affiliationsNumber of relationships, Δ₂Is a preset distance threshold;

the loss function is calculated by the following formula:

L＝αL_cls+βL_reg+ηL_ass

where α, β, η represent the loss weight.

The embodiment of the application also provides computer equipment which can integrate the person-bag relationship detection device provided by the embodiment of the application. Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. Referring to fig. 7, the computer apparatus includes: an input device 73, an output device 74, a memory 72, and one or more processors 71; the memory 72 for storing one or more programs; when the one or more programs are executed by the one or more processors 71, the one or more processors 71 are caused to implement the person-package relationship detection method provided in the above embodiment. The input device 73, the output device 74, the memory 72 and the processor 71 may be connected by a bus or other means, and fig. 7 illustrates the example of the bus connection.

The memory 72 is a storage medium readable by a computing device, and can be used for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the human package relationship detection method according to any embodiment of the present application (for example, the detection network extracting module 61, the associated embedded cost calculating module 62, the prior cost calculating module 63, and the corresponding relationship determining module 64 in the human package relationship detection apparatus). The memory 72 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 72 may further include memory located remotely from the processor 71, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 73 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function control of the apparatus. The output device 74 may include a display device such as a display screen.

The processor 71 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 72, that is, the human package relationship detection method described above is realized.

The human package relationship detection device and the computer provided by the embodiment can be used for executing the human package relationship detection method provided by the embodiment, and have corresponding functions and beneficial effects.

Embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for detecting a person-package relationship provided in the foregoing embodiments, where the method for detecting a person-package relationship includes: extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in an image to be analyzed through a person-package relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one by one; calculating an association embedding relation cost according to the human association embedding vector and the packet association embedding vector; calculating a prior cost from the person location box and the package location box; and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application and containing computer-executable instructions is not limited to the above-mentioned human package relationship detection method, and may also perform related operations in the human package relationship detection method provided in any embodiments of the present application.

The human package relationship detection device, the apparatus, and the storage medium provided in the foregoing embodiments may execute the human package relationship detection method provided in any embodiment of the present application, and refer to the human package relationship detection method provided in any embodiment of the present application without detailed technical details described in the foregoing embodiments.

The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims

1. A person-package relationship detection method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector comprises:

obtaining vector values of the person-associated embedded vectors

And the vector value of the packet associated embedded vector

Vector values based on the person-associated embedded vector

And the vector value of the packet associated embedded vector

where i denotes the ith individual, j denotes the jth package,

3. The method according to claim 1, wherein the calculating a priori cost from the human location box and the package location box comprises:

4. The method according to claim 1, wherein the constructing a bag-for-person relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a bag-for-person relationship based on the bag-for-person relationship corresponding cost matrix comprises:

wherein，λ∈[0，1]I denotes the ith person, j denotes the jth package,

the associated embedding relationship cost is represented as,

representing a priori cost;

5. The method according to claim 4, wherein the determining a person-to-package correspondence based on the person-to-package correspondence cost matrix comprises:

6. The people-package relationship detection method according to any one of claims 1 to 5, wherein before extracting the people position box, the package position box, the people association embedding vector and the package association embedding vector in the image to be analyzed through the people-package relationship detection network, the method further comprises:

7. The personal bag relationship detection method of claim 6, wherein the personal bag relationship detection network comprises a backbone network, a location regression branch, a classification branch and an association embedding vector branch;

8. The people-package relationship detection method of claim 6, wherein the loss function comprises a regression loss L_regClass loss L_clsAnd associated embedding loss L_ass；

the associated embedding loss is calculated by the following formula:

L_ass＝μL_pull+νL_push

wherein,

s represents the total number of packages and people in the current affiliation, and N is the affiliation of the situation that all the packages exist in the current imageThe number of (2), N' is the number of all affiliations, Δ₂Is a preset distance threshold;

the loss function is calculated by the following formula:

L＝αL_cls+βL_reg+ηL_ass

where α, β, η represent the loss weight.

9. The person-bag relation detection device is characterized by comprising a detection network extraction module, an associated embedding cost calculation module, a priori cost calculation module and a corresponding relation determination module, wherein:

10. A computer device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the person-package relationship detection method of any one of claims 1-8.

11. A storage medium containing computer-executable instructions for performing the person-package relationship detection method of any one of claims 1-8 when executed by a computer processor.