CN111008558A - Picture/video important person detection method combining deep learning and relational modeling - Google Patents

Picture/video important person detection method combining deep learning and relational modeling Download PDF

Info

Publication number
CN111008558A
CN111008558A CN201911042034.6A CN201911042034A CN111008558A CN 111008558 A CN111008558 A CN 111008558A CN 201911042034 A CN201911042034 A CN 201911042034A CN 111008558 A CN111008558 A CN 111008558A
Authority
CN
China
Prior art keywords
picture
importance
relation
relationship
important
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911042034.6A
Other languages
Chinese (zh)
Other versions
CN111008558B (en
Inventor
郑伟诗
洪发挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201911042034.6A priority Critical patent/CN111008558B/en
Publication of CN111008558A publication Critical patent/CN111008558A/en
Application granted granted Critical
Publication of CN111008558B publication Critical patent/CN111008558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps: s1, extracting the appearance information and the geometric information of the portrait in the picture/video, and fusing the appearance information and the geometric information into a personal characteristic representing high-level semantics; s2, by mining the relation between people and scenes in the scene, calculating the relation characteristics which cannot be expressed or cannot be highly expressed by the individual characteristics; and S3, performing importance classification, performing important or unimportant binary classification on the final characteristic expression of each portrait extracted from the relational computation model, and taking the probability of each portrait being classified into the important category as an importance score, wherein the portrait with the highest score is the important person identified by the relational computation model. By the method and the device, the relationship between the people in the picture/video and the relationship between the people and the events in the picture can be automatically constructed through learning, and the importance degree of the people can be automatically deduced.

Description

Picture/video important person detection method combining deep learning and relational modeling
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a picture/video important person detection method combining deep learning and relational modeling.
Background
The picture/video important person detection means that in a given picture containing a plurality of figures, the important person in the picture/video is obtained according to the wearing, action, position, interaction information and the scene of the person. The technology can be helpful for scene understanding and development of industries such as live text, film and television shooting and safety monitoring. For example, in live text broadcasting, what happens in a scene can be judged according to the behavior of a central character of a video, and text description can be directly generated. During live sports, the method is applied to detect important characters in a sports scene, such as a ball holder in a basketball or football match, and then a camera is used for tracking, so that the labor consumption is reduced. In the security work, important person detection is carried out in video monitoring, important protected objects in a scene are monitored, persons with abnormally high important scores are analyzed, and a proper prevention and control plan is carried out.
The existing picture/video important face detection mainly comprises the following two types:
1) sequencing based on pedestrian pairs: in order to automatically detect important pedestrians in the picture, the most direct way is to form a pedestrian pair for every two pedestrians in the picture, and predict the importance degree relation of the pedestrian pair. Therefore, it is proposed in the prior art to use a regression model to infer the importance relationship between two different people in a picture, and to infer the most important face in the picture from such importance relationship of a pedestrian pair.
2) Ranking based on perceptron: the most important people in the picture or video play a great role in the identification and detection of events in the video. Put forward among the prior art, carry out action characteristic and appearance characteristic extraction to the sportsman of difference in the basketball match, calculate different sportsman's important degree through the perceptron to promote the rate of accuracy to the discernment of event and detection in the basketball match.
3) And based on the multilayer mixed relational graph, judging whether a person is the most important person in the scene only depends on the appearance information and the action information of the person, and more importantly, the interactive information between the persons is important. Therefore, in the prior art, a hybrid relationship graph is constructed for pedestrians detected in a picture by using different features to model the relationship between the pedestrians in the picture, and the famous ranking algorithm PageRank is improved to be used for ranking the importance degrees of the pedestrians in the multilayer hybrid relationship graph, so that the most important pedestrian in the picture is finally detected.
However, the above-mentioned existing important face detection methods have many disadvantages. The technology for sequencing based on the pedestrian pairs is provided by extracting the spatial features and the significant features of the faces of the pedestrians and sequencing the pedestrian pairs so as to sequence the importance degrees of the pedestrians. According to the method, when the importance degrees of the pedestrians are ranked, the importance degrees of other people and the influence of the relationship among the pedestrians on the importance degrees are ignored. At the same time, the method also ignores the role of context information, motion information, appearance information, and attention information on important pedestrian detection. In the technology based on the sensor sequencing, the sensor is used for directly calculating the importance degree of the pedestrian based on the characteristics of each pedestrian. This ignores the effect of relationships between pedestrians on the importance analysis. In addition, the method also ignores the role of spatial information as well as attention information. The technology based on the multilayer mixed relational graph adopts the characteristics that the information is obtained by pre-training from other tasks, the information can not be well expressed at a high-level semantic level, and only the relation between people is considered, but the relation between people and scenes is not considered.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a picture/video important person detection method combining deep learning and relational modeling, which can autonomously construct the relationship between persons in a picture/video and the relationship between the persons and events in the picture through learning and automatically deduce the importance degree of the persons.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps of:
s1, extracting the appearance information and the geometric information of the portrait in the picture/video, fusing the appearance information and the geometric information of the portrait into a personal characteristic representing high-level semantics, and extracting the information of the whole picture/video as a global characteristic;
s2, calculating relation characteristics which cannot be expressed or cannot be highly expressed independently depending on personal characteristics by mining the relation between people and scenes in the scene, and fusing the relation characteristics rfeat and personal information pfeat to generate importance characteristics which can highly express the importance of the individuals in the scene, wherein the relation characteristics rfeat comprises the information of the relation between people and scenes;
and S3, performing importance classification, performing important or unimportant binary classification on the final characteristic expression of each portrait extracted from the relational computation model, and taking the probability of each portrait being classified into the important category as an importance score, wherein the portrait with the highest score is the important person identified by the relational computation model.
As a preferred technical solution, step S1 specifically includes:
inputting the picture into a face detector or a pedestrian detector to extract a detection frame of the face or the pedestrian in the picture
Figure BDA0002253112630000031
wherein [xpi,ypi,wpi,hpi]Is piDetection frame, wherein [ x ]pi,ypi]Is a pedestrian piPosition in the picture, [ w ]pi,hpi]Is piThe width and height of the frame detected in the picture.
As a preferable embodiment, in step S1, the personal information of the person is characterized by the following method:
based on a pedestrian detection frame, personal information pfeat is automatically extracted from bottom to top by using a convolutional neural network, and meanwhile, in order to research the relationship between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure BDA0002253112630000041
wherein ,
Figure BDA0002253112630000042
which represents the global feature gfeat,
Figure BDA0002253112630000043
representing personal characteristics pfeat, foA representative feature extraction module, I represents the whole picture information, piRepresenting personal information, [ theta ]oAre parameters of the feature extraction module.
As a preferred technical solution, in step S1, the method for fusing the personal features representing high-level semantics includes:
on the aspect of the feature space, a plurality of features are connected in series and then convoluted together, so that the personal features with high-level semantics are generated.
Preferably, in step S2,
modeling relationships between people and scenes, specifically comprising the following steps:
s21, calculating the relationship between people: matrix projection is carried out on the characteristics between every two characteristics, then the characteristics are added, then matrix projection is carried out, a numerical value is obtained, the strength of the connection between every two characteristics is represented, finally, truncation operation is carried out, and the condition that the strength is less than 0 is forcibly set to be 0;
s22, calculating the relation between the person and the scene: adding the characteristics between the people and the scene, then performing matrix projection to obtain a numerical value which represents the strength of the connection between the people and the scene, and finally performing truncation operation, wherein the value which is less than 0 is forcibly set as 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain the importance relation between every two relations: multiplying the two values, and if any one value is small, obtaining a small value;
s24, obtaining an n-n matrix by the step S23, wherein the ith row shows the relationship of all persons to the ith person and is used for integrating the importance relationship of all persons to the ith person;
and S25, calculating the corresponding relation characteristics of each person.
As a preferred technical scheme, a relational feature rfeat corresponding to each person is calculated, specifically, the formula is as follows:
a) calculating the relationship between people:
Figure BDA0002253112630000044
b) calculating the relationship between the person and the scene:
Figure BDA0002253112630000051
c) fusing multiple relationships:
Figure BDA0002253112630000052
d) calculating the importance relation:
Figure BDA0002253112630000053
e) calculating a relation characteristic rfeat:
Figure BDA0002253112630000054
f) constructing an importance characteristic ifeat for importance evaluation:
Figure BDA0002253112630000055
all W in the above formula is matrix, f is eigenvector, the relation epsilon is scalar value, the superscript 1 in f), …, r in f) represents r relation calculation modules, because they can be overlapped, Concat represents splicing operation, the whole relation calculation module is modeled as the following formula:
Figure BDA0002253112630000056
as a preferred technical solution, in step S3, the relationship module specifically includes:
based on
Figure BDA0002253112630000057
And different characteristics and relation functions for constructing a relation graph of human and scene
Figure BDA0002253112630000058
And interpersonal relationship diagram
Figure BDA0002253112630000059
Through the relationship graphs, the relationship characteristics rfeat between persons and scenes are calculated, and then the relationship characteristics rfeat are added and fused with the original personal characteristics pfeat into the importance characteristics ifeat which the importance scores can be judged.
Preferably, in step S3,
inputting the acquired importance characteristic ifeat into a neural network consisting of full connection layers for classification, taking the scores of the classified important characters as importance scores, and calculating the scores as follows:
Figure BDA0002253112630000061
wherein fsTo the importance classification module, θsFor the corresponding parameters, up to this point, the whole network framework can be formulated as follows:
Figure BDA0002253112630000062
compared with the prior art, the invention has the following advantages and beneficial effects:
1. by removing the parameters of the self-learning framework through a deep learning algorithm, a better parameter set can be selected.
2. The relation calculation module can automatically learn the relation graphs between people and scenes in the scenes, adaptively encode the relation characteristics and understand the relation in the scenes from a higher angle.
3. The invention has less requirement on additional manual marking, does not need to mark the posture information of the pedestrian, does not need to calculate the definition of the person in the picture, and can carry out rapid training only by marking which important pedestrian is under the condition of detecting the pedestrian by using the detector, which is not possessed by the previous research.
4. The relation calculation module of the invention is embeddable and has iteration performance.
Drawings
FIG. 1 is a flow diagram of the present invention;
FIG. 2 is a representation of features of the present invention;
FIG. 3 is a construction relationship diagram of the present invention;
fig. 4(a) and 4(b) are graphs showing the detection result of an important pedestrian according to the present invention;
FIG. 5 is a schematic diagram of a relationship module of the model of the present invention
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in FIG. 1, the method of the present invention is POINT (deep import relationships NeTtracks). Firstly, the invention detects all pedestrians in the picture by a human face detector, then (a) extracts personal features and global features by using a feature expression module, then the features are input into a relationship calculation module (b), the whole relationship module consists of r sub-relationship modules, in each sub-relationship module, a person-to-person (p2p) relationship graph and a person-to-event (p2e) relationship graph are constructed, then the importance relationship is estimated from the two graphs, and then the relationship features are coded and spliced with the original personal features pfeat to obtain the characteristic features. Finally, the importance features are input into (c) an importance classification module to score the importance of each person.
Specifically, the method for detecting important persons in pictures/videos by combining deep learning and relational modeling includes the following steps:
(1) detecting pedestrians and extracting important characteristics of the pedestrians in the picture;
given a value of oneFirstly, inputting the image into a face detector or a pedestrian detector to extract a detection frame of the face or the pedestrian in the image
Figure BDA0002253112630000071
wherein [xpi,ypi,wpi,hpi]Is piDetection frame, wherein [ x ]pi,ypi]Is a pedestrian piPosition in the picture, [ w ]pi,hpi]Is piThe width and height of the frame detected in the picture. In particular, the individual face or body information is not sufficient to characterize the overall information of the person, e.g., the position geometry information. The present application therefore references contextual information and location of a persona in order to better characterize the persona's personal information. Based on the pedestrian detection box, the embodiment utilizes the convolutional neural network to automatically extract personal information pfeat from bottom to top, and simultaneously, in order to research the relationship between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure BDA0002253112630000081
wherein ,
Figure BDA0002253112630000082
which represents the global feature gfeat,
Figure BDA0002253112630000083
representing personal characteristics pfeat, foA representative feature extraction module, I represents the whole picture information, piRepresenting personal information, [ theta ]oAre parameters of the feature extraction module. The specific feature extraction operation flow is shown in fig. 2, wherein the appearance information is divided into an internal part and an external part, the internal area extracts more appearance information inherent to the portrait, and the external area extracts more context information of the portrait appearance and the surrounding environment, so that diversification of the portrait information is ensured. At the same time, the map represented by a 01 value represents all the position information of the person, and the global scene information of the whole photoFeature extraction is achieved by a convolutional neural network.
(2) And (3) calculating the relation:
obtaining pfeat and gfeat by relying on the previous step, designing an embeddable and overlappable relation calculation module, modeling the relation between people and between scenes, calculating the corresponding relation characteristic rfeat of each person,
s21, calculating the relationship between people: matrix projection is carried out on the characteristics between every two characteristics, then the characteristics are added, then matrix projection is carried out, a numerical value is obtained, the strength of the connection between every two characteristics is represented, finally, truncation operation is carried out, and the condition that the strength is less than 0 is forcibly set to be 0;
s22, calculating the relation between the person and the scene: adding the characteristics between the people and the scene, then performing matrix projection to obtain a numerical value which represents the strength of the connection between the people and the scene, and finally performing truncation operation, wherein the value which is less than 0 is forcibly set as 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain the importance relation between every two relations: multiplying the two values, and if any one value is small, obtaining a small value;
s24, obtaining an n-n matrix by the step S23, wherein the ith row shows the relationship of all persons to the ith person and is used for integrating the importance relationship of all persons to the ith person;
and S25, calculating the corresponding relation characteristics of each person.
The calculation process is as follows:
a) first, the interpersonal relationship is calculated:
Figure BDA0002253112630000091
b) calculating the relationship between the person and the scene:
Figure BDA0002253112630000092
c) fusing multiple relationships:
Figure BDA0002253112630000093
d) calculating the importance relation:
Figure BDA0002253112630000094
e) calculating a relation characteristic rfeat:
Figure BDA0002253112630000095
f) constructing an importance characteristic ifeat for importance evaluation:
Figure BDA0002253112630000096
the whole process of constructing the importance feature ifeat is shown in fig. 5, wherein eq.3 is the c) process and eq.4 is the d) process. The constructed relationship diagram is shown in fig. 3, all the relationships between people and scenes in each diagram are presented in the form of numerical values, and the relationships between people and scenes are also presented in the form of numerical values, so that it is found that non-important people have obvious directivity to important people, and the directional numerical value is obviously higher than that of other people. And calculating the edge values of different relation graphs by using the relation calculation module through the characteristics calculated by the characteristic expression module, and then fusing the edge values together to form the importance relation. All W in the above formula is matrix, f is eigenvector, the relation epsilon is scalar value, the superscript 1 in f), …, r in f) represents r relation calculation modules, because the relation calculation modules can be overlapped, and Concat represents splicing operation. The entire relational computation module can be modeled as the following equation:
Figure BDA0002253112630000097
(3) and (3) importance classification:
based on
Figure BDA0002253112630000101
And different features and relationship functions, construct person andrelational graph of a scene
Figure BDA0002253112630000102
And interpersonal relationship diagram
Figure BDA0002253112630000103
Through the relationship graphs, the relationship characteristics rfeat between persons and scenes are calculated, and then the relationship characteristics rfeat are added and fused with the original personal characteristics pfeat into the importance characteristics ifeat which the importance scores can be judged. And inputting the importance characteristic ifeat acquired in the last step into a neural network consisting of full connection layers for classification, and taking the value of the important person as the importance value of the person. The score calculation can be written as:
Figure BDA0002253112630000104
wherein fsTo the importance classification module, θsAre the corresponding parameters. To this end, the entire network framework can be formulated as follows:
Figure BDA0002253112630000105
as shown in fig. 4(a) -4 (b), the important pedestrian detection result based on the technology of the present invention. FIG. 4(a) shows the results on NCAABasketbill Image Dataset, and FIG. 4(b) shows the results on Multi-scene Image Dataset. The detection result of the invention is higher than the accuracy of the best algorithm (PersonRank) at present by more than 23.2 percent (NCAA)/7 percent (MS).
All parameters of the invention are depth network parameters, and the autonomous optimization is carried out by a random gradient descent method.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. A picture/video important person detection method combining deep learning and relational modeling is characterized by comprising the following steps:
s1, extracting the appearance information and the geometric information of the portrait in the picture/video, fusing the appearance information and the geometric information of the portrait into a personal characteristic representing high-level semantics, and extracting the information of the whole picture/video as a global characteristic;
s2, calculating relation characteristics which cannot be expressed or cannot be highly expressed independently depending on personal characteristics by mining the relation between people and scenes in the scene, and fusing the relation characteristics rfeat and personal information pfeat to generate importance characteristics which can highly express the importance of the individuals in the scene, wherein the relation characteristics rfeat comprises the information of the relation between people and scenes;
and S3, performing importance classification, performing important or unimportant binary classification on the final characteristic expression of each portrait extracted from the relational computation model, and taking the probability of each portrait being classified into the important category as an importance score, wherein the portrait with the highest score is the important person identified by the relational computation model.
2. The method for detecting important persons in pictures/videos by combining deep learning and relational modeling as claimed in claim 1, wherein the step S1 is specifically as follows:
inputting the picture into a face detector or a pedestrian detector to extract a detection frame of the face or the pedestrian in the picture
Figure FDA0002253112620000011
wherein [xpi,ypi,wpi,hpi]Is piDetection frame, wherein [ x ]pi,ypi]Is a pedestrian piPosition in the picture, [ w ]pi,hpi]Is piThe width and height of the frame detected in the picture.
3. The picture/video important person detection method combining deep learning and relational modeling according to claim 2, wherein in step S1, the personal information of the person is characterized by the following method:
based on a pedestrian detection frame, personal information pfeat is automatically extracted from bottom to top by using a convolutional neural network, and meanwhile, in order to research the relationship between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure FDA0002253112620000012
wherein ,
Figure FDA0002253112620000013
which represents the global feature gfeat,
Figure FDA0002253112620000014
representing personal characteristics pfeat, foA representative feature extraction module, I represents the whole picture information, piRepresenting personal information, [ theta ]oAre parameters of the feature extraction module.
4. The method for detecting important persons in pictures/videos by combining deep learning and relational modeling as claimed in claim 1, wherein in step S1, the method for fusing the personal features representing high-level semantics is as follows:
on the aspect of the feature space, a plurality of features are connected in series and then convoluted together, so that the personal features with high-level semantics are generated.
5. The picture/video important person detection method combining deep learning and relational modeling according to claim 1, wherein in step S2,
modeling relationships between people and scenes, specifically comprising the following steps:
s21, calculating the relationship between people: matrix projection is carried out on the characteristics between every two characteristics, then the characteristics are added, then matrix projection is carried out, a numerical value is obtained, the strength of the connection between every two characteristics is represented, finally, truncation operation is carried out, and the condition that the strength is less than 0 is forcibly set to be 0;
s22, calculating the relation between the person and the scene: adding the characteristics between the people and the scene, then performing matrix projection to obtain a numerical value which represents the strength of the connection between the people and the scene, and finally performing truncation operation, wherein the value which is less than 0 is forcibly set as 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain the importance relation between every two relations: multiplying the two values, and if any one value is small, obtaining a small value;
s24, obtaining an n-n matrix by the step S23, wherein the ith row shows the relationship of all persons to the ith person and is used for integrating the importance relationship of all persons to the ith person;
and S25, calculating the corresponding relation characteristics of each person.
6. The picture/video important person detection method combining deep learning and relational modeling according to claim 5, wherein a relational feature rfeat corresponding to each person is calculated, specifically, the formula is:
a) calculating the relationship between people:
Figure FDA0002253112620000021
b) calculating the relationship between the person and the scene:
Figure FDA0002253112620000022
c) fusing multiple relationships:
Figure FDA0002253112620000023
d) calculating the importance relation:
Figure FDA0002253112620000031
e) calculating a relation characteristic rfeat:
Figure FDA0002253112620000032
f) constructing an importance characteristic ifeat for importance evaluation:
Figure FDA0002253112620000033
all W in the above equation are matrices, f is eigenvector, and e is scalar value, the superscript in f) is 1.. r, r indicates that there are r relation calculation modules, since they can be overlapped, Concat represents splicing operation, and the whole relation calculation module is modeled as the following formula:
Figure FDA0002253112620000034
7. the method for detecting important persons in pictures/videos by combining deep learning and relational modeling as claimed in claim 1, wherein in step S3, the relational module is specifically:
based on
Figure FDA0002253112620000035
And different characteristics and relation functions for constructing a relation graph of human and scene
Figure FDA0002253112620000036
And interpersonal relationship diagram
Figure FDA0002253112620000037
Through the relationship graphs, the relationship characteristics rfeat between persons and scenes are calculated, and then the relationship characteristics rfeat are added and fused with the original personal characteristics pfeat into the importance characteristics ifeat which the importance scores can be judged.
8. The picture/video important person detection method combining deep learning and relational modeling according to claim 7, wherein in step S3,
inputting the acquired importance characteristic ifeat into a neural network consisting of full connection layers for classification, taking the scores of the classified important characters as importance scores, and calculating the scores as follows:
si=fs(fi Is)
wherein fsTo the importance classification module, θsFor the corresponding parameters, up to this point, the whole network framework can be formulated as follows:
Figure FDA0002253112620000038
CN201911042034.6A 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling Active CN111008558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911042034.6A CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911042034.6A CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Publications (2)

Publication Number Publication Date
CN111008558A true CN111008558A (en) 2020-04-14
CN111008558B CN111008558B (en) 2023-05-30

Family

ID=70111750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911042034.6A Active CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Country Status (1)

Country Link
CN (1) CN111008558B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967312A (en) * 2020-07-06 2020-11-20 中央民族大学 Method and system for identifying important persons in picture

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551916A (en) * 2009-04-16 2009-10-07 浙江大学 Method and system of three-dimensional scene modeling based on ontology
CN101957835A (en) * 2010-08-16 2011-01-26 无锡市浏立方科技有限公司 Complicated relationship and context information-oriented semantic data model
CN108416314A (en) * 2018-03-16 2018-08-17 中山大学 The important method for detecting human face of picture
CN108446625A (en) * 2018-03-16 2018-08-24 中山大学 The important pedestrian detection method of picture based on graph model
CN110232330A (en) * 2019-05-23 2019-09-13 复钧智能科技(苏州)有限公司 A kind of recognition methods again of the pedestrian based on video detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551916A (en) * 2009-04-16 2009-10-07 浙江大学 Method and system of three-dimensional scene modeling based on ontology
CN101957835A (en) * 2010-08-16 2011-01-26 无锡市浏立方科技有限公司 Complicated relationship and context information-oriented semantic data model
CN108416314A (en) * 2018-03-16 2018-08-17 中山大学 The important method for detecting human face of picture
CN108446625A (en) * 2018-03-16 2018-08-24 中山大学 The important pedestrian detection method of picture based on graph model
CN110232330A (en) * 2019-05-23 2019-09-13 复钧智能科技(苏州)有限公司 A kind of recognition methods again of the pedestrian based on video detection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967312A (en) * 2020-07-06 2020-11-20 中央民族大学 Method and system for identifying important persons in picture

Also Published As

Publication number Publication date
CN111008558B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
Kumar et al. Multimodal gait recognition with inertial sensor data and video using evolutionary algorithm
Wang et al. Binge watching: Scaling affordance learning from sitcoms
Ma et al. Depth-based human fall detection via shape features and improved extreme learning machine
CN110569772B (en) Method for detecting state of personnel in swimming pool
CN108154075A (en) The population analysis method learnt via single
CN111753747B (en) Violent motion detection method based on monocular camera and three-dimensional attitude estimation
WO2021203667A1 (en) Method, system and medium for identifying human behavior in a digital video using convolutional neural networks
Asif et al. Privacy preserving human fall detection using video data
CN111191667A (en) Crowd counting method for generating confrontation network based on multiple scales
CN114582030A (en) Behavior recognition method based on service robot
CN113111767A (en) Fall detection method based on deep learning 3D posture assessment
CN112200176B (en) Method and system for detecting quality of face image and computer equipment
Xu et al. Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention
CN106548194A (en) The construction method and localization method of two dimensional image human joint pointses location model
Asif et al. Sshfd: Single shot human fall detection with occluded joints resilience
Hua et al. Falls prediction based on body keypoints and seq2seq architecture
CN117218709A (en) Household old man real-time state monitoring method based on time deformable attention mechanism
Taghvaei et al. Autoregressive-moving-average hidden Markov model for vision-based fall prediction—An application for walker robot
Khraief et al. Convolutional neural network based on dynamic motion and shape variations for elderly fall detection
CN116402811B (en) Fighting behavior identification method and electronic equipment
CN111008558B (en) Picture/video important person detection method combining deep learning and relational modeling
CN113158791A (en) Human-centered image description labeling method, system, terminal and medium
CN116958769A (en) Method and related device for detecting crossing behavior based on fusion characteristics
Zhan et al. Pictorial structures model based human interaction recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant