CN111553228A - Method, device, equipment and storage medium for detecting personal bag relationship - Google Patents

Method, device, equipment and storage medium for detecting personal bag relationship Download PDF

Info

Publication number
CN111553228A
CN111553228A CN202010318852.0A CN202010318852A CN111553228A CN 111553228 A CN111553228 A CN 111553228A CN 202010318852 A CN202010318852 A CN 202010318852A CN 111553228 A CN111553228 A CN 111553228A
Authority
CN
China
Prior art keywords
relationship
person
package
embedding
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010318852.0A
Other languages
Chinese (zh)
Other versions
CN111553228B (en
Inventor
李昆明
冯琰一
张少文
李德紘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Huazhiyuan Information Engineering Co ltd
Guangzhou Jiadu Technology Software Development Co ltd
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Suntek Technology Co Ltd
Original Assignee
Guangdong Huazhiyuan Information Engineering Co ltd
Guangzhou Jiadu Technology Software Development Co ltd
Guangzhou Xinke Jiadu Technology Co Ltd
PCI Suntek Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Huazhiyuan Information Engineering Co ltd, Guangzhou Jiadu Technology Software Development Co ltd, Guangzhou Xinke Jiadu Technology Co Ltd, PCI Suntek Technology Co Ltd filed Critical Guangdong Huazhiyuan Information Engineering Co ltd
Priority to CN202010318852.0A priority Critical patent/CN111553228B/en
Publication of CN111553228A publication Critical patent/CN111553228A/en
Application granted granted Critical
Publication of CN111553228B publication Critical patent/CN111553228B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/38Outdoor scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for detecting a personal bag relationship. According to the technical scheme, people and bags in the image to be analyzed are identified through the neural network structure, the positions of the people and the bags are obtained, meanwhile, the people association embedding vector and the bag association embedding vector are obtained, the association embedding relation cost and the prior cost are calculated, a people and bag relation corresponding cost matrix is constructed based on the association embedding relation cost and the prior cost, the identified corresponding relation between the people and the bags can be obtained after the people and bag relation corresponding cost matrix is solved, and the corresponding efficiency and accuracy of the people and the bags in a crowded scene are improved.

Description

Method, device, equipment and storage medium for detecting personal bag relationship
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for detecting a personal bag relationship.
Background
Generally, the existing target detection technology is relatively mature and widely applied, but for specific scenes such as the correspondence relation of a person package needing to be determined, only the detection of a specific target cannot meet the requirement.
For example, in airports, stations, etc., the use of people and bag detection techniques enables the detection of bags and people, but does not give which person the bag belongs to, which bags the person has. Although the analysis based on the positional relationship can solve the problem to some extent, such as determining the affiliation of a package and a person when the relative distance between the package and the person is less than a certain threshold. However, the method for corresponding the human package relationship has the problems that the accuracy is not high, and the method cannot be applied to relatively crowded scenes.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for detecting a personal bag relationship, so as to improve the accuracy rate of personal bag correspondence.
In a first aspect, an embodiment of the present application provides a method for detecting a personal bag relationship, including:
extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in an image to be analyzed through a person-package relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one by one;
calculating an association embedding relation cost according to the human association embedding vector and the packet association embedding vector;
calculating a prior cost from the person location box and the package location box;
and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.
Further, the calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector includes:
obtaining vector values of the person-associated embedded vectors
Figure BDA0002460564580000011
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000021
Vector values based on the person-associated embedded vector
Figure BDA0002460564580000022
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000023
Calculating the associated embedding relationship cost according to the following formula:
Figure BDA0002460564580000024
where i denotes the ith individual, j denotes the jth package,
Figure BDA0002460564580000025
indicating the associated embedding distance, T1、T2Is a predetermined threshold, H is a predetermined constant, fass() Indicating monotonous incrementsA mapping of embedding distance to embedding loss is correlated.
Further, said calculating a priori cost from said person location box and said package location box comprises:
based on the relative relationship of the people position box and the bag position box, calculating a prior cost according to the following formula:
Figure BDA0002460564580000026
wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, fprior() Representing a mapping of a priori knowledge to a priori loss.
Further, the constructing a cost matrix corresponding to the personal package relationship based on the associated embedding relationship cost and the prior cost, and determining the personal package corresponding relationship based on the cost matrix corresponding to the personal package relationship, includes:
constructing a cost matrix corresponding to the personal bag relationship based on the associated embedding relationship cost and the prior cost, and calculating the corresponding cost of the personal bag relationship according to the following formula:
Figure BDA0002460564580000027
wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,
Figure BDA0002460564580000028
the associated embedding relationship cost is represented as,
Figure BDA0002460564580000029
representing a priori cost;
constructing a cost matrix corresponding to the personal bag relationship based on the cost corresponding to the personal bag relationship;
and determining the corresponding relationship of the person and the bag based on the corresponding cost matrix of the person and the bag relationship.
Further, the determining the personal bag corresponding relationship based on the personal bag corresponding cost matrix includes:
and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.
Further, before extracting the human position frame, the package position frame, the human association embedded vector and the package association embedded vector in the image to be analyzed through the human-package relationship detection network, the method further includes:
establishing a human packet relation detection network based on a neural network structure;
training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in a training process meets a training requirement, wherein the training sample image is marked with a human position frame, a bag position frame and a human-bag corresponding relationship.
Further, the human-package relationship detection network comprises a backbone network, a position regression branch, a classification branch and an associated embedding vector branch;
the backbone network is used for outputting a feature map to the position regression branch, the classification branch and the association embedded vector branch;
the classification branch outputs a feature classification based on the feature map, the feature classification including persons and bags;
the position regression branch outputs a position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a human position frame and a bag position frame;
and the association embedding vector branch outputs association embedding vectors based on the feature map, and determines the types of the association embedding vectors according to the output result of the classification branch, wherein the types of the association embedding vectors comprise human association embedding vectors and packet association embedding vectors.
Further, the loss function includes a regression loss LregClass loss LclsAnd associated embedding loss Lass
The regression Loss is obtained through smooth-L1-Loss function, IOU-Loss function or GIou-Loss function calculation;
the classification loss is obtained through cross-entropy-loss function calculation;
the associated embedding loss is calculated by the following formula:
Figure BDA0002460564580000031
Figure BDA0002460564580000032
Lass=μLpull+νLpush
wherein,
Figure BDA0002460564580000033
s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, N' is the number of all the affiliations, and delta2Is a preset distance threshold;
Figure BDA0002460564580000034
expressing the associated embedding vector values of the people and the bags in the current affiliation, and expressing the weighting coefficients mu and v;
the loss function is calculated by the following formula:
L=αLcls+βLreg+ηLass
where α, β, η represent the loss weight.
In a second aspect, an embodiment of the present application provides a personal bag relationship detection apparatus, including a detection network extraction module, an association embedding cost calculation module, a priori cost calculation module, and a corresponding relationship determination module, where:
the detection network extraction module is used for extracting a person position frame, a packet position frame, a person correlation embedding vector and a packet correlation embedding vector in the image to be analyzed through a person-packet relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the packet position frame corresponds to the packet correlation embedding vector one by one;
an association embedding cost calculation module for calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector;
a priori cost calculation module for calculating a priori cost according to the human location box and the package location box;
and the corresponding relation determining module is used for constructing a corresponding cost matrix of the personal package relation based on the incidence embedding relation cost and the prior cost, and determining the corresponding relation of the personal package based on the corresponding cost matrix of the personal package relation.
Further, the associated embedding cost calculation module is specifically configured to:
obtaining vector values of the person-associated embedded vectors
Figure BDA0002460564580000041
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000042
Vector values based on the person-associated embedded vector
Figure BDA0002460564580000043
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000044
Calculating the associated embedding relationship cost according to the following formula:
Figure BDA0002460564580000045
where i denotes the ith individual, j denotes the jth package,
Figure BDA0002460564580000046
indicating the associated embedding distance, T1、T2Is a predetermined threshold, H is a predetermined constant, fass() Representing a monotonically increasing mapping of the associated embedding distance to the embedding loss.
Further, the prior cost calculation module is specifically configured to:
based on the relative relationship of the people position box and the bag position box, calculating a prior cost according to the following formula:
Figure BDA0002460564580000047
wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, fprior() Representing a mapping of a priori knowledge to a priori loss.
Further, the correspondence determining module is specifically configured to:
constructing a cost matrix corresponding to the personal bag relationship based on the associated embedding relationship cost and the prior cost, and calculating the corresponding cost of the personal bag relationship according to the following formula:
Figure BDA0002460564580000051
wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,
Figure BDA0002460564580000052
the associated embedding relationship cost is represented as,
Figure BDA0002460564580000053
representing a priori cost;
constructing a cost matrix corresponding to the personal bag relationship based on the cost corresponding to the personal bag relationship;
and determining the corresponding relationship of the person and the bag based on the corresponding cost matrix of the person and the bag relationship.
Further, when the correspondence determining module determines the personal bag correspondence based on the personal bag correspondence cost matrix, the correspondence determining module specifically includes:
and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.
Further, the apparatus further comprises a neural network creation module, the neural network creation module is configured to:
establishing a human packet relation detection network based on a neural network structure;
training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in a training process meets a training requirement, wherein the training sample image is marked with a human position frame, a bag position frame and a human-bag corresponding relationship.
Further, the human-package relationship detection network comprises a backbone network, a position regression branch, a classification branch and an associated embedding vector branch;
the backbone network is used for outputting a feature map to the position regression branch, the classification branch and the association embedded vector branch;
the classification branch outputs a feature classification based on the feature map, the feature classification including persons and bags;
the position regression branch outputs a position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a human position frame and a bag position frame;
and the association embedding vector branch outputs association embedding vectors based on the feature map, and determines the types of the association embedding vectors according to the output result of the classification branch, wherein the types of the association embedding vectors comprise human association embedding vectors and packet association embedding vectors.
Further, the loss function includes a regression loss LregClass loss LclsAnd associated embedding loss Lass
The regression Loss is obtained through smooth-L1-Loss function, IOU-Loss function or GIou-Loss function calculation;
the classification loss is obtained through cross-entropy-loss function calculation;
the associated embedding loss is calculated by the following formula:
Figure BDA0002460564580000061
Figure BDA0002460564580000062
Lass=μLpull+νLpush
wherein,
Figure BDA0002460564580000063
s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, N' is the number of all the affiliations, and delta2Is a preset distance threshold;
Figure BDA0002460564580000064
expressing the associated embedding vector values of the people and the bags in the current affiliation, and expressing the weighting coefficients mu and v;
the loss function is calculated by the following formula:
L=αLcls+βLreg+ηLass
where α, β, η represent the loss weight.
In a third aspect, an embodiment of the present application provides a computer device, including: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the person-package relationship detection method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions for performing the person-package relationship detection method according to the first aspect when executed by a computer processor.
According to the embodiment of the application, people and bags in the image to be analyzed are identified through the neural network structure, the positions of the people and the bags are obtained, meanwhile, the people association embedding vector and the bag association embedding vector are obtained, the association embedding relation cost and the prior cost are calculated, the corresponding cost matrix of the people and bags relation is established based on the association embedding relation cost and the prior cost, the corresponding relation of the identified people and bags can be obtained after the corresponding cost matrix of the people and bags relation is solved, and the corresponding efficiency and accuracy of the people and bags in a crowded scene are improved.
Drawings
Fig. 1 is a flowchart of a method for detecting a personal bag relationship according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a relationship between a person and a bag according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a human-bag relationship detection network provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a human bag relationship detection apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Fig. 1 is a flowchart of a human package relationship detection method according to an embodiment of the present application, where the human package relationship detection method according to the embodiment of the present application may be executed by a human package relationship detection apparatus, and the human package relationship detection apparatus may be implemented in a hardware and/or software manner and integrated in a computer device.
The following description will be given taking as an example a case where the human-bag relationship detection apparatus performs the human-bag relationship detection method. Referring to fig. 1, the person-package relationship detection method includes:
s101: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.
Herein, the term bag is to be understood in a broad sense and generally refers to luggage such as luggage, backpacks, handbags, satchels, cartons and the like in public, and the relationship of a person to a bag is to be understood as the relationship of membership of the bag to a person. The source of the image to be analyzed can be a video frame returned by the monitoring camera, or an image needing to be input based on the human package analysis.
The human packet relation detection network is composed of human packet detection and associated embedded vectors. The human packet detection may be performed based on a one-stage or two-stage detection framework of a neural network (e.g., a deep neural network, a convolutional neural network, a recurrent neural network, etc.), and identify the positions and categories of the human and the packet in the image (whether the positions correspond to the human or the packet), and output the positions in a position frame manner, for example, outputting a corner point (e.g., upper left corner) of the position frame and the height and width of the position frame. The human correlation embedding vector and the packet correlation embedding vector are obtained through neural network embedding and used for representing the relationship attributes of the corresponding human or packet.
It can be understood that the human position frame and the human association embedding vector, and the package position frame and the package association embedding vector are in a one-to-one correspondence relationship, and for each human position frame and the package position frame, the corresponding human association embedding vector and the corresponding package association embedding vector correspond to each other.
For example, after receiving an image to be analyzed, the image to be analyzed is input into a person-bag relationship detection network, the person-bag relationship detection network analyzes the image to be analyzed, outputs an identification object and a position frame in the image to be analyzed, classifies the identification object, that is, distinguishes the identification object as a person or a bag, and determines the corresponding position frame as a person position frame and a bag position frame. Meanwhile, the person-package relation detection network outputs a person-to-package correlation embedded vector and a package correlation embedded vector corresponding to the person and the package.
S102: and calculating the associated embedding relation cost according to the human associated embedding vector and the packet associated embedding vector.
Illustratively, after determining the person associated embedding vector and the packet associated embedding vector corresponding to the image to be processed, the associated embedding relationship cost is calculated according to the associated embedding vector distance between each person and each packet. It can be understood that the weaker the correspondence between a person and a package, the larger the association embedding vector distance between the corresponding person and the package, and the higher the calculated association embedding relationship cost.
S103: calculating a prior cost from the person location box and the package location box.
The prior cost is obtained by using the prior knowledge of human as judgment and mapping the position relation of the person package, so that the accuracy of judging the corresponding relation of the person package can be further improved. For example, the bag may not normally be present on top of the head of the person to whom it belongs, the bag may not be too far from the person, and so on. The relative relationship between the human location box and the package location box may be mapped to a priori costs based on experience with the human package relationship. It will be appreciated that the weaker the correspondence between a person and a bag, the higher the corresponding a priori cost.
Illustratively, after the human position frame and the bag position frame are determined, based on every two opposite human position frames and bag position frames, the prior cost corresponding to the two opposite human position frames and bag position frames is calculated according to the mapping relation between the position frames and the prior cost.
S104: and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.
Illustratively, for two opposite people and packages, the correlation embedding relationship cost and the prior cost obtained by the calculation are summed according to a preset proportion relationship to obtain the corresponding cost of each person for each package, and a corresponding cost matrix of the person-package relationship is constructed based on the corresponding cost of the person-package relationship.
It can be understood that the weaker the correspondence between the person and the package, the higher the corresponding person-package relationship corresponding cost, and the larger the corresponding value in the person-package relationship corresponding cost matrix. For example, the weaker the correspondence between the fifth person and the sixth packet, the higher the corresponding person-packet-relationship correspondence cost, and the larger the value corresponding to M (5, 6) in the person-packet-relationship correspondence cost matrix M (the abscissa represents the person number and the ordinate represents the packet number).
Further, the corresponding cost of the person-package relationship between each person and the package can be judged according to the corresponding cost matrix of the person-package relationship, at this time, the correspondence between the person and the package is converted from a non-standard assignment problem into a standard assignment problem, the optimal assignment result between the person and the package can be obtained based on a standard assignment solution, and the person-package corresponding relationship is determined based on the assignment result.
Further, after the corresponding relationship of the person package is determined, the person position frame, the package position frame and the corresponding relationship of the person package in the image to be analyzed are output. For example, a person position frame and a bag position frame may be marked in the form of a box on the screen to be analyzed, and the position frames of persons and bags having corresponding (subordinate) relationships may be displayed in the same color.
The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene.
Fig. 2 is a flowchart of another method for detecting a personal bag relationship according to an embodiment of the present application, where the method for detecting a personal bag relationship is an embodiment of the method for detecting a personal bag relationship. Referring to fig. 2, the person-package relationship detection method includes:
s201: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.
S202: obtaining vector values of the person-associated embedded vectors
Figure BDA0002460564580000091
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000092
Specifically, after the human-associated embedded vector and the packet-associated embedded vector are obtained, the vector values of the currently detected human-associated embedded vector are obtained respectively
Figure BDA0002460564580000093
Vector values corresponding to packet-associated embedded vectors
Figure BDA0002460564580000094
Wherein
Figure BDA0002460564580000095
Vector values representing the person associated embedded vectors corresponding to the ith person,
Figure BDA0002460564580000101
and associating the vector value of the embedded vector by the packet corresponding to the jth packet.
S203: vector values based on the person-associated embedded vector
Figure BDA0002460564580000102
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000103
And calculating the associated embedding relation cost according to the associated embedding relation cost formula.
Specifically, the associated embedding relationship cost formula is:
Figure BDA0002460564580000104
where i denotes the ith individual, j denotes the jth package,
Figure BDA0002460564580000105
indicating the associated embedding distance, T1、T2Is a predetermined threshold, H is a predetermined constant, fass() Representing a monotonically increasing mapping of the associated embedding distance to the embedding loss.
Specifically, for each person associated embedding vector, the associated embedding distance delta between each person associated embedding vector and each packet associated embedding vector is calculated1And according to Δ1With a predetermined threshold value T1And T2Determines the associated embedding relationship cost C between the corresponding person and the packageass
For example, when Δ1Less than or equal to T1Then, the relationship cost C is embedded in the associationassIs determined as 0 when Δ1Greater than T2Then, the relationship cost C is embedded in the associationassIs determined as H, and at Δ1At T1And T2Determining the associated embedding relation cost C according to the mapping relation from the associated embedding distance to the embedding lossassThe value of (c).
Wherein the monotonically increasing mapping of the associated embedding distance to the embedding loss fass() It should be understood that the larger the associated embedding distance, the larger the embedding loss, and the higher the corresponding associated embedding relationship cost. The mapping of the different values of the associated embedding distance to the embedding loss (or associated embedding relationship cost) may be set according to the actual situation, for example, the associated embedding is determined according to the range of the values of the associated embedding distanceThe relationship cost, or the mapping function is determined according to the mapping relationship between the associated embedding distance and the associated embedding relationship cost.
S204: and calculating prior cost according to a prior cost formula based on the relative relation between the human position frame and the bag position frame.
The prior cost utilizes the prior knowledge of human beings as judgment, the position relation of the person package is obtained by mapping, and the accuracy of judgment of the corresponding relation of the person package can be improved by utilizing the prior cost.
Specifically, the prior cost formula is:
Figure BDA0002460564580000106
wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, fprior() Representing a mapping of a priori knowledge to a priori loss.
Specifically, for each personal position box, the relative relation theta between the personal position box and each package position box is obtained, and the mapping relation function f from priori knowledge to priori loss is substituted according to the relative relation thetaprior() To determine the a priori cost between the corresponding person and the pack.
Wherein the mapping f from a priori knowledge to a priori loss isprior() And determining according to the corresponding relation between the person and the bag in the prior knowledge. It will be appreciated that the weaker the correspondence between a person and a bag, the higher the corresponding a priori cost.
For example, it is often the case that a package owned by a person does not appear on top of the person's head, and from a priori knowledge, the location of the occurrence of the package location box generally does not appear above the person location box with the corresponding (dependent) relationship, then it can be determined that the a priori cost is greater when the package location box is above the person location box. At this time, the up-down positional relationship between the package position frame and the person position frame can be used as the relative relationship Θ between the person position frame and the package position frame. For another example, in general, the distance between the corresponding person and the package is not too far, and it can be known from the prior knowledge that the appearance position of the package position frame is not generally beyond the threshold distance from the person position frame having the corresponding (dependent) relationship, and it can be determined that the farther the package position frame is from the person position frame, the greater the corresponding prior cost. At this time, the distance position relationship between the package position frame and the person position frame can be used as the relative relationship Θ between the person position frame and the package position frame.
It can be understood that the type of the relative relationship Θ can be determined according to actual needs, and when the relative relationship Θ includes a plurality of types, corresponding weights can be set for the prior costs calculated by the different types of relative relationships Θ, and the prior costs calculated under the different types of relative relationships are summed according to the corresponding weight ratios and taken as the final prior cost Cprior
S205: and constructing a cost matrix corresponding to the personal packet relationship based on the associated embedding relationship cost and the prior cost, and calculating the cost corresponding to the personal packet relationship according to a cost formula corresponding to the personal packet relationship.
Specifically, the formula of the corresponding cost of the personal bag relationship is as follows:
Figure BDA0002460564580000111
wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,
Figure BDA0002460564580000112
the associated embedding relationship cost is represented as,
Figure BDA0002460564580000113
representing a priori costs.
After the detected associated embedding relation cost and prior cost between each person and each packet are obtained, the associated embedding relation cost C is obtained according to the serial numbers of the persons and the packetsassAnd a priori cost CpriorSubstituting into the corresponding cost formula of the relationship between people and bags to obtain the corresponding cost C of the relationship between people and bagsjudge
S206: and constructing a cost matrix corresponding to the personal package relationship based on the cost corresponding to the personal package relationship.
Specifically, the corresponding cost of the person-package relationship between each person and each package is obtainedCjudgeThen, a person-to-package relationship corresponding cost matrix is constructed according to the serial numbers of the person and the package, for example, the serial number of the person is used as an abscissa, the serial number of the package is used as an ordinate, and the person-to-package relationship corresponding cost is used as a value.
Assuming that p persons, b packages, are detected in the currently detected image to be analyzed, a p × b person-package relationship corresponding cost matrix M can be created:
Figure BDA0002460564580000121
it can be understood that the higher the corresponding cost of the person-package relationship is, the larger the value of the corresponding sequence number in the cost matrix M corresponding to the person-package relationship is, which means the weaker the corresponding relationship between the person and the package is.
S207: and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.
Specifically, after the cost matrix corresponding to the person-bag relationship is constructed, the correspondence between the person and the bag is converted from a non-standard assignment problem into a standard assignment problem, an optimal assignment result between the person and the bag can be obtained based on a standard assignment solution, and the person-bag correspondence relationship is determined based on the assignment result.
The optimal assignment result for assigning a packet to a person is found, for example, by the hungarian algorithm or the Kuhn-Munkras algorithm, and the correspondence of the person and the packet is determined according to the assignment relationship of the person and the packet indicated in the assignment result, i.e., the packet assigned to a person is determined to correspond to (be subordinate to) the person. It can be understood that, in an actual situation, there is a case where one person corresponds to a plurality of packages, and the cost matrix corresponding to the package relationship of the person can be solved based on a solving method of an assignment problem that one person can do several things (which is equivalent to that a plurality of packages can be assigned to the same person at the same time).
Further, after the optimal solution of the cost matrix corresponding to the person-bag relationship is solved, the person position frame, the bag position frame and the person-bag corresponding relationship are output based on the optimal solution of the corresponding cost matrix.
Fig. 3 is a schematic diagram illustrating a relationship between persons and bags according to an embodiment of the present invention, and as shown in fig. 3, it is assumed that three persons Person _ A, Person _ B and Person _ C are detected in one image to be analyzed, four bags of Bag _ A, Bag _ B, Bag _ C and Bag _ D are detected, a Person position frame of three persons and a Bag position frame of four bags are output after detection and analysis by the network for detecting and analyzing the relationship between persons and bags, and finally the relationship between persons and bags existing in the image to be analyzed is solved by the cost matrix for corresponding to the relationship between persons and bags, that is, Bag _ a corresponds to Person _ a, Bag _ C and Bag _ D correspond to Person _ C, and Bag _ B and Person _ B do not correspond to each other because the bags are above the persons or are too far away from the persons.
The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene. Meanwhile, the optimal solution of the cost matrix corresponding to the person-to-bag relationship is solved based on an assignment problem solving method, and the accuracy of the person-to-bag relationship is effectively improved.
Fig. 4 is a flowchart of another human-bag relationship detection method provided in an embodiment of the present application, where the human-bag relationship detection method is an embodiment of the human-bag relationship detection method. Referring to fig. 4, the person-package relationship detection method includes:
s301: a person-to-package relationship detection network is created based on a neural network structure.
Specifically, a human package relationship detection network is created based on a deep neural network, a convolutional neural network, a cyclic neural network, and the like (for example, vgg, a resnet, a densnet, and other network structures), and fig. 5 is a schematic structural diagram of the human package relationship detection network provided in the embodiment of the present application. As shown in fig. 5, the human package relationship detection network includes a backbone network, a location regression branch, a classification branch, and an associated embedding vector branch.
The position regression branch, the classification branch and the associated embedded vector branch share the same backbone network, the backbone network is used for receiving an image to be analyzed, a feature map (the resolution is a matrix of C multiplied by H multiplied by W, C is the number of channels, for RGB images, C is 3, and H and W are height and width) is generated according to the image to be analyzed, the backbone network identifies people and packages, generates the feature map based on the identified people or packages, and outputs the feature map to the position regression branch, the classification branch and the associated embedded vector branch.
Further, the classification branch outputs a feature classification based on the feature map, and the feature classification includes people and bags. The position regression branch outputs the position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a person position frame and a bag position frame. And the associated embedding vector branch outputs an associated embedding vector based on the feature map, and determines the type of the associated embedding vector according to the output result of the classification branch, wherein the type of the associated embedding vector comprises a human associated embedding vector and a packet associated embedding vector.
The position frame obtained by position regression branch regression does not know whether the position frame is a human position frame or a packet position frame, the determination of the position frame type depends on the classification branch, and the output of the classification branch indicates the frame type. For the associated embedding vector branch, the output embedding vector value has no class attribute, namely whether the associated embedding vector is a package or a person is not known, or whether the type of the associated embedding vector is determined by depending on the classification branch. In summary, the type of the location regression branch and the associated embedded vector branch outputs are specified in dependence on the classification branch output. In the associated embedding vector branch, the associated embedding vector branch calculates an associated embedding vector for each anchor frame in the feature map, and judges the feature classification corresponding to the feature map by combining the classification branch, so that whether the corresponding position frame is a human position frame or a packet position frame can be determined.
For example, for a feature map with a resolution of m × n, a feature vector map of m × n × q may be generated when the feature map passes through the associated embedded vector branch, where q is the number of anchor frames to which each anchor point belongs, taking the feature map output by the backbone network at a certain scale as an example, coordinates of one anchor point of the feature map are (i, j), at which 3 different anchor frames are preset, 3 associated embedded vectors will be generated after passing through the associated embedded vector branchQuantity [ k ]1,k2,k3]And the 3 vectors represent the relational attributes of the 3 anchor boxes.
S302: training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in the training process meets the training requirement.
Specifically, a training set is established by collecting a large number of training sample images, and the corresponding relationship among the human position frame, the bag position frame and the human bag in the training sample images is marked in a manual marking mode, so that the corresponding relationship among the human position frame, the bag position frame and the human bag is marked on the training sample images. And inputting the marked training set into a human-bag relationship detection network for recognition training until a loss function in the training process of the human-bag relationship detection network meets the training requirement or reaches the minimum value.
Wherein the loss function comprises a regression loss LregClass loss LclsAnd associated embedding loss LassThe regression Loss can be calculated by a smooth-L1-Loss function, an IOU-Loss function or a GIou-Loss function, and the classification Loss can be calculated by a cross-entropy-Loss function.
Further, the associated embedding loss is calculated by the following formula:
Figure BDA0002460564580000141
Figure BDA0002460564580000142
Lass=μLpull+νLpush
wherein,
Figure BDA0002460564580000143
s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, N' is the number of all the affiliations, and delta2Is a preset distance threshold;
Figure BDA0002460564580000144
and the values of the associated embedding vectors of the people and the packages in the current affiliation are represented, and mu and nu represent weighting coefficients.
Further, the regression loss L is calculatedregClass loss LclsAnd associated embedding loss LassThen, the loss function L can be obtained by calculating the following formula:
L=αLcls+βLreg+ηLass
where α, β, η represent the loss weight.
It can be understood that the process of training the human packet relationship detection network is actually a process of training the loss function, and the goal is to minimize the loss function (which can be solved by a gradient descent algorithm).
S303: and extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in the image to be analyzed through a person-package relation detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one to one.
S304: and calculating the associated embedding relation cost according to the human associated embedding vector and the packet associated embedding vector.
S305: calculating a prior cost from the person location box and the package location box.
S306: and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.
The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene. Meanwhile, the position frame, the feature classification and the associated embedded vector are respectively output through the position regression branch, the classification branch and the associated embedded vector branch, the detection efficiency and the detection accuracy of the person-in-package relationship detection network are improved, and the detection accuracy of the person-in-package relationship detection network is further improved through reasonably determining the loss function.
Fig. 6 is a schematic structural diagram of a human bag relationship detection apparatus according to an embodiment of the present application. Referring to fig. 6, the personal bag relationship detecting apparatus provided in this embodiment includes a detecting network extracting module 61, an associated embedded cost calculating module 62, a priori cost calculating module 63, and a corresponding relationship determining module 64.
The detection network extraction module 61 is configured to extract a person position frame, a package position frame, a person association embedding vector and a package association embedding vector in an image to be analyzed through a person-package relationship detection network, where the person position frame corresponds to the person association embedding vector, and the package position frame corresponds to the package association embedding vector one to one; an association embedding cost calculation module 62 for calculating an association embedding relationship cost from the person association embedding vector and the package association embedding vector; a priori cost calculation module 63 for calculating a priori cost from the person location box and the package location box; and a corresponding relation determining module 64, configured to construct a cost matrix corresponding to the personal package relation based on the associated embedding relation cost and the prior cost, and determine a personal package corresponding relation based on the cost matrix corresponding to the personal package relation.
The method comprises the steps of identifying people and bags in an image to be analyzed through a neural network structure, obtaining positions of the people and the bags, obtaining a people association embedding vector and a bag association embedding vector at the same time, calculating association embedding relation cost and prior cost, constructing a people and bag relation corresponding cost matrix based on the association embedding relation cost and the prior cost, solving the people and bag relation corresponding cost matrix to obtain the corresponding relation of the identified people and bags, and improving the corresponding efficiency and accuracy of the people and bags in a crowded scene.
In a possible embodiment, the associated embedded cost calculation module 62 is specifically configured to:
obtaining the person association embeddingVector value of vector
Figure BDA0002460564580000161
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000162
Vector values based on the person-associated embedded vector
Figure BDA0002460564580000163
And the vector value of the packet associated embedded vector
Figure BDA0002460564580000164
Calculating the associated embedding relationship cost according to the following formula:
Figure BDA0002460564580000165
where i denotes the ith individual, j denotes the jth package,
Figure BDA0002460564580000166
indicating the associated embedding distance, T1、T2Is a predetermined threshold, H is a predetermined constant, fass() Representing a monotonically increasing mapping of the associated embedding distance to the embedding loss.
In a possible embodiment, the a priori cost calculation module 63 is specifically configured to:
based on the relative relationship of the people position box and the bag position box, calculating a prior cost according to the following formula:
Figure BDA0002460564580000167
wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, fprior() Representing a mapping of a priori knowledge to a priori loss.
In a possible embodiment, the correspondence determining module 64 is specifically configured to:
constructing a cost matrix corresponding to the personal bag relationship based on the associated embedding relationship cost and the prior cost, and calculating the corresponding cost of the personal bag relationship according to the following formula:
Figure BDA0002460564580000168
wherein, λ ∈ [0, 1]I denotes the ith person, j denotes the jth package,
Figure BDA0002460564580000169
the associated embedding relationship cost is represented as,
Figure BDA00024605645800001610
representing a priori cost;
constructing a cost matrix corresponding to the personal bag relationship based on the cost corresponding to the personal bag relationship;
and determining the corresponding relationship of the person and the bag based on the corresponding cost matrix of the person and the bag relationship.
In a possible embodiment, when the correspondence determining module 64 determines the personal bag correspondence based on the personal bag correspondence cost matrix, specifically:
and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.
In one possible embodiment, the apparatus further comprises a neural network creation module configured to:
establishing a human packet relation detection network based on a neural network structure;
training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in a training process meets a training requirement, wherein the training sample image is marked with a human position frame, a bag position frame and a human-bag corresponding relationship.
In one possible embodiment, the human package relationship detection network comprises a backbone network, a location regression branch, a classification branch and an associated embedding vector branch;
the backbone network is used for outputting a feature map to the position regression branch, the classification branch and the association embedded vector branch;
the classification branch outputs a feature classification based on the feature map, the feature classification including persons and bags;
the position regression branch outputs a position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a human position frame and a bag position frame;
and the association embedding vector branch outputs association embedding vectors based on the feature map, and determines the types of the association embedding vectors according to the output result of the classification branch, wherein the types of the association embedding vectors comprise human association embedding vectors and packet association embedding vectors.
In one possible embodiment, the loss function includes a regression loss LregClass loss LclsAnd associated embedding loss Lass
The regression Loss is obtained through smooth-L1-Loss function, IOU-Loss function or GIou-Loss function calculation;
the classification loss is obtained through cross-entropy-loss function calculation;
the associated embedding loss is calculated by the following formula:
Figure BDA0002460564580000171
Figure BDA0002460564580000172
Lass=μLpull+νLpush
wherein,
Figure BDA0002460564580000173
s represents the total number of packages and people in the current affiliation, N is the number of affiliations of the situation that all the packages exist in the current image, and N' is all the affiliationsNumber of relationships, Δ2Is a preset distance threshold;
Figure BDA0002460564580000174
expressing the associated embedding vector values of the people and the bags in the current affiliation, and expressing the weighting coefficients mu and v;
the loss function is calculated by the following formula:
L=αLcls+βLreg+ηLass
where α, β, η represent the loss weight.
The embodiment of the application also provides computer equipment which can integrate the person-bag relationship detection device provided by the embodiment of the application. Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. Referring to fig. 7, the computer apparatus includes: an input device 73, an output device 74, a memory 72, and one or more processors 71; the memory 72 for storing one or more programs; when the one or more programs are executed by the one or more processors 71, the one or more processors 71 are caused to implement the person-package relationship detection method provided in the above embodiment. The input device 73, the output device 74, the memory 72 and the processor 71 may be connected by a bus or other means, and fig. 7 illustrates the example of the bus connection.
The memory 72 is a storage medium readable by a computing device, and can be used for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the human package relationship detection method according to any embodiment of the present application (for example, the detection network extracting module 61, the associated embedded cost calculating module 62, the prior cost calculating module 63, and the corresponding relationship determining module 64 in the human package relationship detection apparatus). The memory 72 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 72 may further include memory located remotely from the processor 71, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 73 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function control of the apparatus. The output device 74 may include a display device such as a display screen.
The processor 71 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 72, that is, the human package relationship detection method described above is realized.
The human package relationship detection device and the computer provided by the embodiment can be used for executing the human package relationship detection method provided by the embodiment, and have corresponding functions and beneficial effects.
Embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for detecting a person-package relationship provided in the foregoing embodiments, where the method for detecting a person-package relationship includes: extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in an image to be analyzed through a person-package relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one by one; calculating an association embedding relation cost according to the human association embedding vector and the packet association embedding vector; calculating a prior cost from the person location box and the package location box; and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application and containing computer-executable instructions is not limited to the above-mentioned human package relationship detection method, and may also perform related operations in the human package relationship detection method provided in any embodiments of the present application.
The human package relationship detection device, the apparatus, and the storage medium provided in the foregoing embodiments may execute the human package relationship detection method provided in any embodiment of the present application, and refer to the human package relationship detection method provided in any embodiment of the present application without detailed technical details described in the foregoing embodiments.
The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (11)

1. A person-package relationship detection method is characterized by comprising the following steps:
extracting a person position frame, a package position frame, a person correlation embedding vector and a package correlation embedding vector in an image to be analyzed through a person-package relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the package position frame corresponds to the package correlation embedding vector one by one;
calculating an association embedding relation cost according to the human association embedding vector and the packet association embedding vector;
calculating a prior cost from the person location box and the package location box;
and constructing a person-to-package relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a person-to-package corresponding relationship based on the person-to-package relationship corresponding cost matrix.
2. The method according to claim 1, wherein the calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector comprises:
obtaining vector values of the person-associated embedded vectors
Figure FDA0002460564570000011
And the vector value of the packet associated embedded vector
Figure FDA0002460564570000012
Vector values based on the person-associated embedded vector
Figure FDA0002460564570000013
And the vector value of the packet associated embedded vector
Figure FDA0002460564570000014
Calculating the associated embedding relationship cost according to the following formula:
Figure FDA0002460564570000015
where i denotes the ith individual, j denotes the jth package,
Figure FDA0002460564570000016
indicating the associated embedding distance, T1、T2Is a predetermined threshold, H is a predetermined constant, fass() Representing a monotonically increasing mapping of the associated embedding distance to the embedding loss.
3. The method according to claim 1, wherein the calculating a priori cost from the human location box and the package location box comprises:
based on the relative relationship of the people position box and the bag position box, calculating a prior cost according to the following formula:
Figure FDA0002460564570000017
wherein i represents the ith person, j represents the jth package, Θ represents the relative relationship of the person location box and the package location box, fprior() Representing a mapping of a priori knowledge to a priori loss.
4. The method according to claim 1, wherein the constructing a bag-for-person relationship corresponding cost matrix based on the associated embedding relationship cost and the prior cost, and determining a bag-for-person relationship based on the bag-for-person relationship corresponding cost matrix comprises:
constructing a cost matrix corresponding to the personal bag relationship based on the associated embedding relationship cost and the prior cost, and calculating the corresponding cost of the personal bag relationship according to the following formula:
Figure FDA0002460564570000021
wherein,λ∈[0,1]I denotes the ith person, j denotes the jth package,
Figure FDA0002460564570000022
the associated embedding relationship cost is represented as,
Figure FDA0002460564570000023
representing a priori cost;
constructing a cost matrix corresponding to the personal bag relationship based on the cost corresponding to the personal bag relationship;
and determining the corresponding relationship of the person and the bag based on the corresponding cost matrix of the person and the bag relationship.
5. The method according to claim 4, wherein the determining a person-to-package correspondence based on the person-to-package correspondence cost matrix comprises:
and solving the cost matrix corresponding to the personal bag relationship by an assignment problem algorithm so as to determine the personal bag relationship.
6. The people-package relationship detection method according to any one of claims 1 to 5, wherein before extracting the people position box, the package position box, the people association embedding vector and the package association embedding vector in the image to be analyzed through the people-package relationship detection network, the method further comprises:
establishing a human packet relation detection network based on a neural network structure;
training the human-bag relationship detection network by using a training sample image until a loss function of the human-bag relationship detection network in a training process meets a training requirement, wherein the training sample image is marked with a human position frame, a bag position frame and a human-bag corresponding relationship.
7. The personal bag relationship detection method of claim 6, wherein the personal bag relationship detection network comprises a backbone network, a location regression branch, a classification branch and an association embedding vector branch;
the backbone network is used for outputting a feature map to the position regression branch, the classification branch and the association embedded vector branch;
the classification branch outputs a feature classification based on the feature map, the feature classification including persons and bags;
the position regression branch outputs a position frame based on the feature map, and determines the type of the position frame according to the output result of the classification branch, wherein the type of the position frame comprises a human position frame and a bag position frame;
and the association embedding vector branch outputs association embedding vectors based on the feature map, and determines the types of the association embedding vectors according to the output result of the classification branch, wherein the types of the association embedding vectors comprise human association embedding vectors and packet association embedding vectors.
8. The people-package relationship detection method of claim 6, wherein the loss function comprises a regression loss LregClass loss LclsAnd associated embedding loss Lass
The regression Loss is obtained through smooth-L1-Loss function, IOU-Loss function or GIou-Loss function calculation;
the classification loss is obtained through cross-entropy-loss function calculation;
the associated embedding loss is calculated by the following formula:
Figure FDA0002460564570000031
Figure FDA0002460564570000032
Lass=μLpull+νLpush
wherein,
Figure FDA0002460564570000033
s represents the total number of packages and people in the current affiliation, and N is the affiliation of the situation that all the packages exist in the current imageThe number of (2), N' is the number of all affiliations, Δ2Is a preset distance threshold;
Figure FDA0002460564570000034
expressing the associated embedding vector values of the people and the bags in the current affiliation, and expressing the weighting coefficients mu and v;
the loss function is calculated by the following formula:
L=αLcls+βLreg+ηLass
where α, β, η represent the loss weight.
9. The person-bag relation detection device is characterized by comprising a detection network extraction module, an associated embedding cost calculation module, a priori cost calculation module and a corresponding relation determination module, wherein:
the detection network extraction module is used for extracting a person position frame, a packet position frame, a person correlation embedding vector and a packet correlation embedding vector in the image to be analyzed through a person-packet relationship detection network, wherein the person position frame corresponds to the person correlation embedding vector, and the packet position frame corresponds to the packet correlation embedding vector one by one;
an association embedding cost calculation module for calculating an association embedding relationship cost according to the human association embedding vector and the package association embedding vector;
a priori cost calculation module for calculating a priori cost according to the human location box and the package location box;
and the corresponding relation determining module is used for constructing a corresponding cost matrix of the personal package relation based on the incidence embedding relation cost and the prior cost, and determining the corresponding relation of the personal package based on the corresponding cost matrix of the personal package relation.
10. A computer device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the person-package relationship detection method of any one of claims 1-8.
11. A storage medium containing computer-executable instructions for performing the person-package relationship detection method of any one of claims 1-8 when executed by a computer processor.
CN202010318852.0A 2020-04-21 2020-04-21 Method, device, equipment and storage medium for detecting personal bag relationship Active CN111553228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010318852.0A CN111553228B (en) 2020-04-21 2020-04-21 Method, device, equipment and storage medium for detecting personal bag relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010318852.0A CN111553228B (en) 2020-04-21 2020-04-21 Method, device, equipment and storage medium for detecting personal bag relationship

Publications (2)

Publication Number Publication Date
CN111553228A true CN111553228A (en) 2020-08-18
CN111553228B CN111553228B (en) 2021-10-01

Family

ID=72005822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010318852.0A Active CN111553228B (en) 2020-04-21 2020-04-21 Method, device, equipment and storage medium for detecting personal bag relationship

Country Status (1)

Country Link
CN (1) CN111553228B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998575A (en) * 2022-06-29 2022-09-02 支付宝(杭州)信息技术有限公司 Method and apparatus for training and using target detection models

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303549A (en) * 2015-06-29 2016-02-03 北京格灵深瞳信息技术有限公司 Method of determining position relation between detected objects in video image and device
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model
CN107316317A (en) * 2017-05-23 2017-11-03 深圳市深网视界科技有限公司 A kind of pedestrian's multi-object tracking method and device
CN107392254A (en) * 2017-07-28 2017-11-24 深圳市唯特视科技有限公司 A kind of semantic segmentation method by combining the embedded structural map picture from pixel
CN109740573A (en) * 2019-01-24 2019-05-10 北京旷视科技有限公司 Video analysis method, apparatus, equipment and server
CN110188603A (en) * 2019-04-17 2019-08-30 特斯联(北京)科技有限公司 A kind of privacy divulgence prevention method and its system for intelligence community
CN110245564A (en) * 2019-05-14 2019-09-17 平安科技(深圳)有限公司 A kind of pedestrian detection method, system and terminal device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303549A (en) * 2015-06-29 2016-02-03 北京格灵深瞳信息技术有限公司 Method of determining position relation between detected objects in video image and device
CN105740891A (en) * 2016-01-27 2016-07-06 北京工业大学 Target detection method based on multilevel characteristic extraction and context model
CN107316317A (en) * 2017-05-23 2017-11-03 深圳市深网视界科技有限公司 A kind of pedestrian's multi-object tracking method and device
CN107392254A (en) * 2017-07-28 2017-11-24 深圳市唯特视科技有限公司 A kind of semantic segmentation method by combining the embedded structural map picture from pixel
CN109740573A (en) * 2019-01-24 2019-05-10 北京旷视科技有限公司 Video analysis method, apparatus, equipment and server
CN110188603A (en) * 2019-04-17 2019-08-30 特斯联(北京)科技有限公司 A kind of privacy divulgence prevention method and its system for intelligence community
CN110245564A (en) * 2019-05-14 2019-09-17 平安科技(深圳)有限公司 A kind of pedestrian detection method, system and terminal device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CEWU LU ET AL.: ""Visual Relationship Detection with Language Priors"", 《ARXIV:1608.00187V1》 *
HAN HU ET AL.: ""Relation Networks for Object Detection"", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
马增妍: ""基于卷积神经网络和上下文模型的目标检测"", 《中国安全防范技术与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998575A (en) * 2022-06-29 2022-09-02 支付宝(杭州)信息技术有限公司 Method and apparatus for training and using target detection models

Also Published As

Publication number Publication date
CN111553228B (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN103098076B (en) Gesture recognition system for TV control
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
US20120189207A1 (en) Identifying descriptor for person and object in an image (as amended)
CN103020985B (en) A kind of video image conspicuousness detection method based on field-quantity analysis
CN111985621A (en) Method for building neural network model for real-time detection of mask wearing and implementation system
CN113822153B (en) Unmanned aerial vehicle tracking method based on improved DeepSORT algorithm
CN109993061B (en) Face detection and recognition method, system and terminal equipment
CN109902576B (en) Training method and application of head and shoulder image classifier
CN112257799A (en) Method, system and device for detecting household garbage target
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
KR20220098312A (en) Method, apparatus, device and recording medium for detecting related objects in an image
CN111553228B (en) Method, device, equipment and storage medium for detecting personal bag relationship
CN114298187B (en) Target detection method integrating improved attention mechanism
CN111553337A (en) Hyperspectral multi-target detection method based on improved anchor frame
CN110070044A (en) Pedestrian's attribute recognition approach based on deep learning
CN117152838A (en) Gesture recognition method based on multi-core dynamic attention mechanism
CN116524314A (en) Unmanned aerial vehicle small target detection method based on anchor-free frame algorithm
CN111104921A (en) Multi-mode pedestrian detection model and method based on Faster rcnn
Jourdheuil et al. Heterogeneous adaboost with real-time constraints-application to the detection of pedestrians by stereovision
CN116311345A (en) Transformer-based pedestrian shielding re-recognition method
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN112149598A (en) Side face evaluation method and device, electronic equipment and storage medium
Ninomiya et al. An evaluation on robustness and brittleness of HOG features of human detection
CN111582107A (en) Training method and recognition method of target re-recognition model, electronic equipment and device
CN114155475B (en) Method, device and medium for identifying end-to-end personnel actions under view angle of unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Applicant after: Jiadu Technology Group Co.,Ltd.

Applicant after: Guangzhou Jiadu Technology Software Development Co.,Ltd.

Applicant after: GUANGZHOU XINKE JIADU TECHNOLOGY Co.,Ltd.

Applicant after: Guangdong Huazhiyuan Information Engineering Co.,Ltd.

Address before: Room 306, zone 2, building 1, Fanshan entrepreneurship center, Panyu energy saving technology park, No. 832 Yingbin Road, Donghuan street, Panyu District, Guangzhou City, Guangdong Province

Applicant before: PCI-SUNTEKTECH Co.,Ltd.

Applicant before: Guangzhou Jiadu Technology Software Development Co.,Ltd.

Applicant before: GUANGZHOU XINKE JIADU TECHNOLOGY Co.,Ltd.

Applicant before: Guangdong Huazhiyuan Information Engineering Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant