CN111008558B - Picture/video important person detection method combining deep learning and relational modeling - Google Patents

Picture/video important person detection method combining deep learning and relational modeling Download PDF

Info

Publication number
CN111008558B
CN111008558B CN201911042034.6A CN201911042034A CN111008558B CN 111008558 B CN111008558 B CN 111008558B CN 201911042034 A CN201911042034 A CN 201911042034A CN 111008558 B CN111008558 B CN 111008558B
Authority
CN
China
Prior art keywords
relation
importance
picture
features
person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911042034.6A
Other languages
Chinese (zh)
Other versions
CN111008558A (en
Inventor
郑伟诗
洪发挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201911042034.6A priority Critical patent/CN111008558B/en
Publication of CN111008558A publication Critical patent/CN111008558A/en
Application granted granted Critical
Publication of CN111008558B publication Critical patent/CN111008558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps: s1, extracting features of appearance information and geometric information of a portrait in a picture/video, and fusing the features into a personal feature representing high-level semantics; s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by mining the relation between people in the scene and between people and the scene; s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model. Through learning, the invention can autonomously construct the relationship between the characters in the picture/video and the relationship between the characters and the events in the picture, and automatically deduce the importance degree of the characters.

Description

Picture/video important person detection method combining deep learning and relational modeling
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a picture/video important person detection method combining deep learning and relational modeling.
Background
The important person detection of the picture/video refers to that in a given picture containing a plurality of figures, the important person in the picture/video is obtained according to wearing, actions, positions, interaction information and the scene where the person is located. The technology can be helpful for scene understanding and the development of industries such as character live broadcasting, film and television shooting, safety monitoring and the like. For example, in text live broadcast, what happens to a scene can be judged according to the behavior of a central character of a video, and text description can be directly generated. In sports live broadcast, the method is applied to detect important characters in sports scenes, such as a ball holder in basketball or football games, and then a camera is utilized to track, so that the manpower consumption is reduced. In security work, important person detection is carried out in video monitoring, important protection objects in a scene are monitored, and persons with abnormally high important scores are analyzed to carry out proper prevention and control plans.
The existing important face detection of pictures/videos mainly comprises the following two types:
1) Based on pedestrian pair ordering: in order to automatically detect important pedestrians in a picture, the most direct mode is to form pedestrian pairs for every two pedestrians in the picture, and predict the importance degree relation of the pedestrian pairs. Therefore, in the prior art, it is proposed to use a regression model to infer the importance relationship between two different people in a picture, and by using such importance relationship of pedestrian pairs, the most important face in the picture is inferred.
2) Based on perceptron ordering: the most important people in a picture or video have a great effect on the identification and detection of events in the video. In the prior art, action characteristics and appearance characteristics of different players in the basketball game are extracted, and importance degrees of the different players are calculated through a sensor, so that the accuracy of identifying and detecting events in the basketball game is improved.
3) Based on the multi-layer mixed relation diagram, whether a person is the most important person in the scene is judged, and the interaction information between people is more important than only depending on the appearance information and the action information. Therefore, in the prior art, it is also proposed that by using different features, a mixed relation graph is constructed for pedestrians detected in a picture to model the relation between pedestrians in the picture, and a famous ranking algorithm PageRank is improved so that the famous ranking algorithm PageRank can be used for ranking the importance degree of the pedestrians in the multi-layer mixed relation graph, and finally, the most important pedestrians in the picture are detected.
However, the above-mentioned important face detection method has many disadvantages. The technology based on pedestrian pair ordering provides that the spatial features and the remarkable features of the pedestrian faces are extracted, and the pedestrian pairs are ordered to order the importance degree of the pedestrians. The method ignores the importance degree of other people and the influence of the relation among pedestrians on the importance degree when sequencing the importance degrees of the pedestrians. At the same time, the method also ignores the effects of context information, motion information, appearance information, and attention information on important pedestrian detection. In the technology based on the ranking of the perceptrons, based on the characteristics of each pedestrian, it is proposed to directly calculate the importance degree of the pedestrian by using the perceptrons. This ignores the effect of the relationship between pedestrians on the importance level analysis. In addition, the method also ignores the effects of spatial information and attention information. The characteristics adopted in the technology based on the multi-layer mixed relation graph are pre-trained from other tasks, the information expressed on the high-layer semantic level cannot be well expressed, only the relation between people is considered, and the relation between people and scenes is not considered.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a picture/video important person detection method combining deep learning and relational modeling, which can automatically construct the relationship between the persons in the picture/video and the relationship between the persons and events in the picture through learning and automatically deduce the importance degree of the persons.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps:
s1, extracting features of appearance information and geometric information of a portrait in a picture/video, fusing the appearance information and the geometric information of the portrait into a personal feature representing high-level semantics, and extracting information of the whole picture/video as global features;
s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by excavating relations among people in a scene and among people in the scene, and fusing the relation features rfeat and personal information pfeat to generate importance features which can highly express the importance of the individuals in the scene, wherein the relation features rfeat contain information of relations among people and between people and the scene;
s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model.
As a preferred technical solution, step S1 specifically includes:
inputting the picture into a face detector or a pedestrian detector to extract the face or pedestrian in the picture
Figure BDA0002253112630000031
wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture.
As a preferable technical solution, in step S1, the personal information of the person is characterized by the following method:
based on the pedestrian detection frame, the personal information pfeat is automatically extracted from bottom to top by utilizing a convolutional neural network, and meanwhile, in order to study the relation between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure BDA0002253112630000041
wherein ,
Figure BDA0002253112630000042
representing the global feature gfeatl->
Figure BDA0002253112630000043
Representing personal characteristics pfeat, f o Representing feature extraction module, I represents whole picture information, p i Representing personal information, θ o Is a parameter of the feature extraction module.
In step S1, as an preferable technical solution, the method for fusing the personal features representing the high-level semantics is as follows:
at the feature space level, multiple features are concatenated and convolved together, thereby generating high-level semantic personal features.
As a preferred technical solution, in step S2,
modeling the relationship between people and scenes, specifically:
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating the corresponding relation characteristic of each person.
As a preferable technical scheme, calculating a corresponding relationship feature rfeat of each person, specifically, a formula is as follows:
a) Calculating the relationship between people:
Figure BDA0002253112630000044
b) Calculating the relationship between the person and the scene:
Figure BDA0002253112630000051
c) Fusing various relations:
Figure BDA0002253112630000052
d) Calculating importance relation:
Figure BDA0002253112630000053
e) Calculating a relation feature rfeat:
Figure BDA0002253112630000054
f) And (3) constructing an importance feature area of importance judgment:
Figure BDA0002253112630000055
all W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f), r represents r relation calculation modules, because they can be superimposed, concat represents a splicing operation, and the whole relation calculation module is modeled as the following formula:
Figure BDA0002253112630000056
as a preferred technical solution, in step S3, the relationship module specifically includes:
based on
Figure BDA0002253112630000057
And different features and relation functions, constructing a person-scene relation diagram +.>
Figure BDA0002253112630000058
And person-to-person relationship diagram->
Figure BDA0002253112630000059
Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score.
As a preferred technical solution, in step S3,
inputting the obtained importance feature ifeats into a neural network composed of all connected layers for classification, and taking the scores of important characters as importance scores thereof, wherein the score calculation can be written as follows:
Figure BDA0002253112630000061
wherein fs For importance classification module, θ s For the corresponding parameters, the entire network framework can be formulated as follows:
Figure BDA0002253112630000062
compared with the prior art, the invention has the following advantages and beneficial effects:
1. the parameters of the self-learning framework are removed through the deep learning algorithm, and a better parameter set can be selected.
2. The relation calculation module can autonomously learn the relation graph between people in the scene and between people in the scene, adaptively encode the relation features, and understand the relation in the scene from a higher angle.
3. The invention has less requirement for additional manual labeling, does not need to label the gesture information of pedestrians, does not need to calculate the definition of the people in the picture, and can perform rapid training only by labeling important pedestrians under the condition that the pedestrians are detected by using the detector, which is not possessed by the previous researches.
4. The relation calculation module is embeddable and has iteratability.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a representation of the features of the present invention;
FIG. 3 is a construction relationship diagram of the present invention;
fig. 4 (a) and 4 (b) are graphs of important pedestrian detection results of the present invention;
FIG. 5 is a schematic diagram of a relational module of the model in the present invention
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in fig. 1, the method of the present invention is POINT (deep importance relatIons NeTworks). Firstly, all pedestrians in a picture are detected through a face detector, then (a) personal features and global features are extracted by utilizing a feature expression module, then the features are input into a relation calculation module (b), the whole relation module is composed of r sub-relation modules, in each sub-relation module, a person-to-person (p 2 p) relation diagram and a person-to-event (p 2 e) relation diagram are constructed, importance relations are estimated from the two diagrams, then the relation features are encoded, and the characteristic features are obtained by splicing the relation modules with original personal features pfeat. Finally, inputting the importance characteristics into an importance classification module (c) to score the importance of each person.
Specifically, the method for detecting important figures in pictures/videos by combining deep learning and relational modeling in this embodiment includes the following steps:
(1) Pedestrian detection and important pedestrian feature extraction in the picture;
given a picture containing multiple pedestrians, the invention firstly inputs the picture into a face detector or a detection frame of the pedestrian detector for extracting the face or the pedestrian in the picture
Figure BDA0002253112630000071
wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture. In particular, face or personal information alone is not sufficient to characterize the overall information of a person, e.g., location geometry information. The present application refers to contextual information and location of a persona in order to better characterize the persona's personal information. Based on the pedestrian detection frame, the embodiment utilizes the convolutional neural network to automatically extract the personal information pfeat from bottom to top, and simultaneously, in order to study the relationship between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure BDA0002253112630000081
wherein ,
Figure BDA0002253112630000082
representing the global feature gfeatl->
Figure BDA0002253112630000083
Representing personal characteristics pfeat, f o Representing feature extraction module, I represents whole picture information, p i Representing personal information, θ o Is a parameter of the feature extraction module. The specific feature extraction operation flow is shown in fig. 2, wherein the appearance information is divided into an internal part and an external part, the internal area is used for extracting the inherent appearance information of the portrait, and the external area is used for extracting the appearance of the portrait and the context information of the surrounding environment, so that the diversification of the portrait information is ensured. Meanwhile, a map chart expressed by a 01 value represents all the information of the positions of the characters, and in addition, the global scene information of the whole photo can also realize feature extraction through a convolutional neural network.
(2) And (3) calculating the relation:
the previous step is relied on to obtain pfeat and gfeat, an embeddability and superposition relation calculating module is designed, the relation between people and scenes is modeled, the relation characteristic rfeat corresponding to each person is calculated,
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating the corresponding relation characteristic of each person.
The calculation process is as follows:
a) First, the relationship between people is calculated:
Figure BDA0002253112630000091
b) Calculating the relationship between the person and the scene:
Figure BDA0002253112630000092
c) Fusing various relations:
Figure BDA0002253112630000093
d) Calculating importance relation:
Figure BDA0002253112630000094
e) Calculating a relation feature rfeat:
Figure BDA0002253112630000095
f) And (3) constructing an importance feature area of importance judgment:
Figure BDA0002253112630000096
the entire process of constructing the importance feature ifeat is shown in fig. 5, where eq.3 is the c) process and eq.4 is the d) process. The constructed relation graph is shown in fig. 3, all the relations between people in each graph are presented in the form of numerical values, and the relations between people and scenes are presented in the form of numerical values, so that the non-important people are found to have obvious directivity to important people, and the directivity numerical values are obviously higher than those of other people. And calculating the edge values of different relation graphs by utilizing the characteristics calculated by the characteristic expression module, and fusing the edge values to form an importance relation. All W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f) indicates that r relation calculation modules are available, and Concat indicates a splicing operation. The overall relationship calculation module can be modeled as the following formula:
Figure BDA0002253112630000097
(3) Importance classification:
based on
Figure BDA0002253112630000101
And different features and relation functions, constructing a person-scene relation diagram +.>
Figure BDA0002253112630000102
And person-to-person relationship diagram->
Figure BDA0002253112630000103
Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score. The importance feature ifeat obtained in the previous step is input into a neural network composed of all connected layers for classification, and the score divided into important characters is regarded as an importance score. The score calculation may be written as:
Figure BDA0002253112630000104
wherein fs For importance classification module, θ s Is the corresponding parameter. To this end, the entire network framework can be formulated as follows:
Figure BDA0002253112630000105
as shown in fig. 4 (a) -4 (b), important pedestrian detection results based on the technology of the present invention. Fig. 4 (a) shows the results on NCAA Basketball Image Dataset and fig. 4 (b) shows the results on Multi-scene Important People Image Dataset. The detection result of the invention is higher than the accuracy of the best algorithm (PersonRank) of the prior art by more than 23.2% (NCAA)/7% (MS).
All parameters of the invention are deep network parameters, and autonomous optimization is carried out by a random gradient descent method.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

1. A picture/video important person detection method combining deep learning and relational modeling is characterized by comprising the following steps:
s1, extracting features of appearance information and geometric information of a portrait in a picture/video, fusing the appearance information and the geometric information of the portrait into a personal feature representing high-level semantics, and extracting information of the whole picture/video as global features;
s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by excavating relations among people in a scene and among people in the scene, and fusing the relation features rfeat and personal information pfeat to generate importance features which can highly express the importance of the individuals in the scene, wherein the relation features rfeat contain information of relations among people and between people and the scene;
in the step S2 of the process,
modeling the relationship between people and scenes, specifically:
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating corresponding relation characteristics of each person;
calculating the corresponding relation characteristic rfeat of each person, wherein the specific formula is as follows:
a) Calculating the relationship between people:
Figure FDA0004054524990000011
b) Calculating the relationship between the person and the scene:
Figure FDA0004054524990000021
c) Fusing various relations:
Figure FDA0004054524990000022
d) Calculating importance relation:
Figure FDA0004054524990000023
e) Calculating a relation feature rfeat:
Figure FDA0004054524990000024
f) And (3) constructing an importance feature area of importance judgment:
Figure FDA0004054524990000025
all W and W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f), r represents r relation calculation modules, because they can be superimposed, concat represents a splicing operation, and the whole relation calculation module is modeled as the following formula:
Figure FDA0004054524990000026
s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model.
2. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein step S1 is specifically:
inputting the picture into a face detector or a pedestrian detector to extract the face or pedestrian in the picture
Figure FDA0004054524990000027
wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture.
3. The method for detecting important figures of picture/video combining deep learning and relational modeling as in claim 2, wherein in step S1, the personal information of the figures is characterized by the following method:
based on the pedestrian detection frame, the personal information pfeat is automatically extracted from bottom to top by utilizing a convolutional neural network, and meanwhile, in order to study the relation between people and scenes, the global feature gfeat of the whole picture is also extracted:
Figure FDA0004054524990000031
wherein ,
Figure FDA0004054524990000032
representing the global feature gfeatl->
Figure FDA0004054524990000033
Representing personal characteristics pfeat, f o Representing feature extraction module, I represents whole picture information, p i Representing personal information, θ o Is a parameter of the feature extraction module.
4. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein in step S1, the method for merging into a personal feature representing high-level semantics is as follows:
at the feature space level, multiple features are concatenated and convolved together, thereby generating high-level semantic personal features.
5. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein in step S3, the relational module specifically comprises:
based on
Figure FDA0004054524990000034
And different features and relation functions, constructing a person-scene relation diagram +.>
Figure FDA0004054524990000035
And person-to-person relationship diagram->
Figure FDA0004054524990000036
Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score.
6. The method for detecting a picture/video important person combining deep learning and relational modeling as recited in claim 5, wherein in step S3,
inputting the obtained importance feature ifeats into a neural network composed of all connected layers for classification, and taking the scores of important characters as importance scores thereof, wherein the score calculation can be written as follows:
Figure FDA0004054524990000037
wherein fs For importance classification module, θ s For the corresponding parameters, the entire network framework can be formulated as follows:
Figure FDA0004054524990000038
CN201911042034.6A 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling Active CN111008558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911042034.6A CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911042034.6A CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Publications (2)

Publication Number Publication Date
CN111008558A CN111008558A (en) 2020-04-14
CN111008558B true CN111008558B (en) 2023-05-30

Family

ID=70111750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911042034.6A Active CN111008558B (en) 2019-10-30 2019-10-30 Picture/video important person detection method combining deep learning and relational modeling

Country Status (1)

Country Link
CN (1) CN111008558B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967312B (en) * 2020-07-06 2023-03-24 中央民族大学 Method and system for identifying important persons in picture

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551916A (en) * 2009-04-16 2009-10-07 浙江大学 Method and system of three-dimensional scene modeling based on ontology
CN101957835A (en) * 2010-08-16 2011-01-26 无锡市浏立方科技有限公司 Complicated relationship and context information-oriented semantic data model
CN108416314A (en) * 2018-03-16 2018-08-17 中山大学 The important method for detecting human face of picture
CN108446625A (en) * 2018-03-16 2018-08-24 中山大学 The important pedestrian detection method of picture based on graph model
CN110232330A (en) * 2019-05-23 2019-09-13 复钧智能科技(苏州)有限公司 A kind of recognition methods again of the pedestrian based on video detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551916A (en) * 2009-04-16 2009-10-07 浙江大学 Method and system of three-dimensional scene modeling based on ontology
CN101957835A (en) * 2010-08-16 2011-01-26 无锡市浏立方科技有限公司 Complicated relationship and context information-oriented semantic data model
CN108416314A (en) * 2018-03-16 2018-08-17 中山大学 The important method for detecting human face of picture
CN108446625A (en) * 2018-03-16 2018-08-24 中山大学 The important pedestrian detection method of picture based on graph model
CN110232330A (en) * 2019-05-23 2019-09-13 复钧智能科技(苏州)有限公司 A kind of recognition methods again of the pedestrian based on video detection

Also Published As

Publication number Publication date
CN111008558A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
Zhang et al. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things
Wang et al. Binge watching: Scaling affordance learning from sitcoms
CN111401174B (en) Volleyball group behavior identification method based on multi-mode information fusion
Lan et al. Discriminative latent models for recognizing contextual group activities
CN108154075A (en) The population analysis method learnt via single
Asif et al. Privacy preserving human fall detection using video data
KR102106135B1 (en) Apparatus and method for providing application service by using action recognition
KR102174695B1 (en) Apparatus and method for recognizing movement of object
CN115427982A (en) Methods, systems, and media for identifying human behavior in digital video using convolutional neural networks
CN110532862B (en) Feature fusion group identification method based on gating fusion unit
EP3765995B1 (en) Systems and methods for inter-camera recognition of individuals and their properties
Ghadi et al. Syntactic model-based human body 3D reconstruction and event classification via association based features mining and deep learning
CN111104930A (en) Video processing method and device, electronic equipment and storage medium
WO2023185074A1 (en) Group behavior recognition method based on complementary spatio-temporal information modeling
Rehman et al. Internet‐of‐Things‐Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security
CN114582030A (en) Behavior recognition method based on service robot
CN113158791A (en) Human-centered image description labeling method, system, terminal and medium
CN112200176A (en) Method and system for detecting quality of face image and computer equipment
CN117612249A (en) Underground miner dangerous behavior identification method and device based on improved OpenPose algorithm
CN111008558B (en) Picture/video important person detection method combining deep learning and relational modeling
Khraief et al. Convolutional neural network based on dynamic motion and shape variations for elderly fall detection
Atrey et al. Real-world application of face mask detection system using YOLOv6
CN117746399A (en) Driver smoking detection algorithm based on ST-GCN and Yolov5n
Collell et al. Learning representations specialized in spatial knowledge: Leveraging language and vision
Pham et al. A proposal model using deep learning model integrated with knowledge graph for monitoring human behavior in forest protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant