CN111008558B - Picture/video important person detection method combining deep learning and relational modeling - Google Patents
Picture/video important person detection method combining deep learning and relational modeling Download PDFInfo
- Publication number
- CN111008558B CN111008558B CN201911042034.6A CN201911042034A CN111008558B CN 111008558 B CN111008558 B CN 111008558B CN 201911042034 A CN201911042034 A CN 201911042034A CN 111008558 B CN111008558 B CN 111008558B
- Authority
- CN
- China
- Prior art keywords
- relation
- importance
- picture
- features
- person
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- 238000013135 deep learning Methods 0.000 title claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims description 20
- 238000010586 diagram Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000005065 mining Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps: s1, extracting features of appearance information and geometric information of a portrait in a picture/video, and fusing the features into a personal feature representing high-level semantics; s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by mining the relation between people in the scene and between people and the scene; s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model. Through learning, the invention can autonomously construct the relationship between the characters in the picture/video and the relationship between the characters and the events in the picture, and automatically deduce the importance degree of the characters.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a picture/video important person detection method combining deep learning and relational modeling.
Background
The important person detection of the picture/video refers to that in a given picture containing a plurality of figures, the important person in the picture/video is obtained according to wearing, actions, positions, interaction information and the scene where the person is located. The technology can be helpful for scene understanding and the development of industries such as character live broadcasting, film and television shooting, safety monitoring and the like. For example, in text live broadcast, what happens to a scene can be judged according to the behavior of a central character of a video, and text description can be directly generated. In sports live broadcast, the method is applied to detect important characters in sports scenes, such as a ball holder in basketball or football games, and then a camera is utilized to track, so that the manpower consumption is reduced. In security work, important person detection is carried out in video monitoring, important protection objects in a scene are monitored, and persons with abnormally high important scores are analyzed to carry out proper prevention and control plans.
The existing important face detection of pictures/videos mainly comprises the following two types:
1) Based on pedestrian pair ordering: in order to automatically detect important pedestrians in a picture, the most direct mode is to form pedestrian pairs for every two pedestrians in the picture, and predict the importance degree relation of the pedestrian pairs. Therefore, in the prior art, it is proposed to use a regression model to infer the importance relationship between two different people in a picture, and by using such importance relationship of pedestrian pairs, the most important face in the picture is inferred.
2) Based on perceptron ordering: the most important people in a picture or video have a great effect on the identification and detection of events in the video. In the prior art, action characteristics and appearance characteristics of different players in the basketball game are extracted, and importance degrees of the different players are calculated through a sensor, so that the accuracy of identifying and detecting events in the basketball game is improved.
3) Based on the multi-layer mixed relation diagram, whether a person is the most important person in the scene is judged, and the interaction information between people is more important than only depending on the appearance information and the action information. Therefore, in the prior art, it is also proposed that by using different features, a mixed relation graph is constructed for pedestrians detected in a picture to model the relation between pedestrians in the picture, and a famous ranking algorithm PageRank is improved so that the famous ranking algorithm PageRank can be used for ranking the importance degree of the pedestrians in the multi-layer mixed relation graph, and finally, the most important pedestrians in the picture are detected.
However, the above-mentioned important face detection method has many disadvantages. The technology based on pedestrian pair ordering provides that the spatial features and the remarkable features of the pedestrian faces are extracted, and the pedestrian pairs are ordered to order the importance degree of the pedestrians. The method ignores the importance degree of other people and the influence of the relation among pedestrians on the importance degree when sequencing the importance degrees of the pedestrians. At the same time, the method also ignores the effects of context information, motion information, appearance information, and attention information on important pedestrian detection. In the technology based on the ranking of the perceptrons, based on the characteristics of each pedestrian, it is proposed to directly calculate the importance degree of the pedestrian by using the perceptrons. This ignores the effect of the relationship between pedestrians on the importance level analysis. In addition, the method also ignores the effects of spatial information and attention information. The characteristics adopted in the technology based on the multi-layer mixed relation graph are pre-trained from other tasks, the information expressed on the high-layer semantic level cannot be well expressed, only the relation between people is considered, and the relation between people and scenes is not considered.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a picture/video important person detection method combining deep learning and relational modeling, which can automatically construct the relationship between the persons in the picture/video and the relationship between the persons and events in the picture through learning and automatically deduce the importance degree of the persons.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention discloses a picture/video important person detection method combining deep learning and relational modeling, which comprises the following steps:
s1, extracting features of appearance information and geometric information of a portrait in a picture/video, fusing the appearance information and the geometric information of the portrait into a personal feature representing high-level semantics, and extracting information of the whole picture/video as global features;
s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by excavating relations among people in a scene and among people in the scene, and fusing the relation features rfeat and personal information pfeat to generate importance features which can highly express the importance of the individuals in the scene, wherein the relation features rfeat contain information of relations among people and between people and the scene;
s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model.
As a preferred technical solution, step S1 specifically includes:
inputting the picture into a face detector or a pedestrian detector to extract the face or pedestrian in the picture wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture.
As a preferable technical solution, in step S1, the personal information of the person is characterized by the following method:
based on the pedestrian detection frame, the personal information pfeat is automatically extracted from bottom to top by utilizing a convolutional neural network, and meanwhile, in order to study the relation between people and scenes, the global feature gfeat of the whole picture is also extracted:
wherein ,representing the global feature gfeatl->Representing personal characteristics pfeat, f o Representing feature extraction module, I represents whole picture information, p i Representing personal information, θ o Is a parameter of the feature extraction module.
In step S1, as an preferable technical solution, the method for fusing the personal features representing the high-level semantics is as follows:
at the feature space level, multiple features are concatenated and convolved together, thereby generating high-level semantic personal features.
As a preferred technical solution, in step S2,
modeling the relationship between people and scenes, specifically:
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating the corresponding relation characteristic of each person.
As a preferable technical scheme, calculating a corresponding relationship feature rfeat of each person, specifically, a formula is as follows:
a) Calculating the relationship between people:
b) Calculating the relationship between the person and the scene:
c) Fusing various relations:
d) Calculating importance relation:
e) Calculating a relation feature rfeat:
f) And (3) constructing an importance feature area of importance judgment:
all W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f), r represents r relation calculation modules, because they can be superimposed, concat represents a splicing operation, and the whole relation calculation module is modeled as the following formula:
as a preferred technical solution, in step S3, the relationship module specifically includes:
based onAnd different features and relation functions, constructing a person-scene relation diagram +.>And person-to-person relationship diagram->Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score.
As a preferred technical solution, in step S3,
inputting the obtained importance feature ifeats into a neural network composed of all connected layers for classification, and taking the scores of important characters as importance scores thereof, wherein the score calculation can be written as follows:
wherein fs For importance classification module, θ s For the corresponding parameters, the entire network framework can be formulated as follows:
compared with the prior art, the invention has the following advantages and beneficial effects:
1. the parameters of the self-learning framework are removed through the deep learning algorithm, and a better parameter set can be selected.
2. The relation calculation module can autonomously learn the relation graph between people in the scene and between people in the scene, adaptively encode the relation features, and understand the relation in the scene from a higher angle.
3. The invention has less requirement for additional manual labeling, does not need to label the gesture information of pedestrians, does not need to calculate the definition of the people in the picture, and can perform rapid training only by labeling important pedestrians under the condition that the pedestrians are detected by using the detector, which is not possessed by the previous researches.
4. The relation calculation module is embeddable and has iteratability.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a representation of the features of the present invention;
FIG. 3 is a construction relationship diagram of the present invention;
fig. 4 (a) and 4 (b) are graphs of important pedestrian detection results of the present invention;
FIG. 5 is a schematic diagram of a relational module of the model in the present invention
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in fig. 1, the method of the present invention is POINT (deep importance relatIons NeTworks). Firstly, all pedestrians in a picture are detected through a face detector, then (a) personal features and global features are extracted by utilizing a feature expression module, then the features are input into a relation calculation module (b), the whole relation module is composed of r sub-relation modules, in each sub-relation module, a person-to-person (p 2 p) relation diagram and a person-to-event (p 2 e) relation diagram are constructed, importance relations are estimated from the two diagrams, then the relation features are encoded, and the characteristic features are obtained by splicing the relation modules with original personal features pfeat. Finally, inputting the importance characteristics into an importance classification module (c) to score the importance of each person.
Specifically, the method for detecting important figures in pictures/videos by combining deep learning and relational modeling in this embodiment includes the following steps:
(1) Pedestrian detection and important pedestrian feature extraction in the picture;
given a picture containing multiple pedestrians, the invention firstly inputs the picture into a face detector or a detection frame of the pedestrian detector for extracting the face or the pedestrian in the picture wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture. In particular, face or personal information alone is not sufficient to characterize the overall information of a person, e.g., location geometry information. The present application refers to contextual information and location of a persona in order to better characterize the persona's personal information. Based on the pedestrian detection frame, the embodiment utilizes the convolutional neural network to automatically extract the personal information pfeat from bottom to top, and simultaneously, in order to study the relationship between people and scenes, the global feature gfeat of the whole picture is also extracted:
wherein ,representing the global feature gfeatl->Representing personal characteristics pfeat, f o Representing feature extraction module, I represents whole picture information, p i Representing personal information, θ o Is a parameter of the feature extraction module. The specific feature extraction operation flow is shown in fig. 2, wherein the appearance information is divided into an internal part and an external part, the internal area is used for extracting the inherent appearance information of the portrait, and the external area is used for extracting the appearance of the portrait and the context information of the surrounding environment, so that the diversification of the portrait information is ensured. Meanwhile, a map chart expressed by a 01 value represents all the information of the positions of the characters, and in addition, the global scene information of the whole photo can also realize feature extraction through a convolutional neural network.
(2) And (3) calculating the relation:
the previous step is relied on to obtain pfeat and gfeat, an embeddability and superposition relation calculating module is designed, the relation between people and scenes is modeled, the relation characteristic rfeat corresponding to each person is calculated,
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating the corresponding relation characteristic of each person.
The calculation process is as follows:
a) First, the relationship between people is calculated:
b) Calculating the relationship between the person and the scene:
c) Fusing various relations:
d) Calculating importance relation:
e) Calculating a relation feature rfeat:
f) And (3) constructing an importance feature area of importance judgment:
the entire process of constructing the importance feature ifeat is shown in fig. 5, where eq.3 is the c) process and eq.4 is the d) process. The constructed relation graph is shown in fig. 3, all the relations between people in each graph are presented in the form of numerical values, and the relations between people and scenes are presented in the form of numerical values, so that the non-important people are found to have obvious directivity to important people, and the directivity numerical values are obviously higher than those of other people. And calculating the edge values of different relation graphs by utilizing the characteristics calculated by the characteristic expression module, and fusing the edge values to form an importance relation. All W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f) indicates that r relation calculation modules are available, and Concat indicates a splicing operation. The overall relationship calculation module can be modeled as the following formula:
(3) Importance classification:
based onAnd different features and relation functions, constructing a person-scene relation diagram +.>And person-to-person relationship diagram->Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score. The importance feature ifeat obtained in the previous step is input into a neural network composed of all connected layers for classification, and the score divided into important characters is regarded as an importance score. The score calculation may be written as:
wherein fs For importance classification module, θ s Is the corresponding parameter. To this end, the entire network framework can be formulated as follows:
as shown in fig. 4 (a) -4 (b), important pedestrian detection results based on the technology of the present invention. Fig. 4 (a) shows the results on NCAA Basketball Image Dataset and fig. 4 (b) shows the results on Multi-scene Important People Image Dataset. The detection result of the invention is higher than the accuracy of the best algorithm (PersonRank) of the prior art by more than 23.2% (NCAA)/7% (MS).
All parameters of the invention are deep network parameters, and autonomous optimization is carried out by a random gradient descent method.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (6)
1. A picture/video important person detection method combining deep learning and relational modeling is characterized by comprising the following steps:
s1, extracting features of appearance information and geometric information of a portrait in a picture/video, fusing the appearance information and the geometric information of the portrait into a personal feature representing high-level semantics, and extracting information of the whole picture/video as global features;
s2, calculating relation features which cannot be expressed or cannot be highly expressed by individual features by excavating relations among people in a scene and among people in the scene, and fusing the relation features rfeat and personal information pfeat to generate importance features which can highly express the importance of the individuals in the scene, wherein the relation features rfeat contain information of relations among people and between people and the scene;
in the step S2 of the process,
modeling the relationship between people and scenes, specifically:
s21, calculating the relationship between people: the features between every two are added after matrix projection, then matrix projection is carried out to obtain a numerical value which represents the connection strength between every two, and finally a cutting operation is carried out, wherein the forced value smaller than 0 is set to be 0;
s22, calculating the relation between people and scenes: adding the features between the person and the scene, then performing matrix projection to obtain a numerical value which represents the connection strength between the person and the scene, and finally performing a cut-off operation, wherein the forced value smaller than 0 is set to be 0;
s23, fusing the relations obtained in the steps S21 and S22 to obtain importance relations between every two: multiplying the two values, if any one value is small, the obtained value is small;
s24, an n matrix is obtained in the step S23, and the ith row represents the relationship between all persons and the ith person and is used for integrating the importance relationship between all persons and the ith person;
s25, calculating corresponding relation characteristics of each person;
calculating the corresponding relation characteristic rfeat of each person, wherein the specific formula is as follows:
a) Calculating the relationship between people:
b) Calculating the relationship between the person and the scene:
c) Fusing various relations:
d) Calculating importance relation:
e) Calculating a relation feature rfeat:
f) And (3) constructing an importance feature area of importance judgment:
all W and W in the above formula are matrices, f is a feature vector, the relation epsilon is a scalar value, the superscript 1, … in f), r represents r relation calculation modules, because they can be superimposed, concat represents a splicing operation, and the whole relation calculation module is modeled as the following formula:
s3, importance classification is carried out, and the probability that each portrait is classified into important category is taken as an importance score by carrying out important or unimportant two classifications on the final feature expression of each portrait extracted in the relation calculation model, and the portrait with the highest score is the important person identified by the relation calculation model.
2. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein step S1 is specifically:
inputting the picture into a face detector or a pedestrian detector to extract the face or pedestrian in the picture wherein [xpi ,y pi ,w pi ,h pi ]Is p i Detection frame, wherein [ x ] pi ,y pi ]Is a pedestrian p i Position in picture, [ w ] pi ,h pi ]Is p i The width and height of the detected box in the picture.
3. The method for detecting important figures of picture/video combining deep learning and relational modeling as in claim 2, wherein in step S1, the personal information of the figures is characterized by the following method:
based on the pedestrian detection frame, the personal information pfeat is automatically extracted from bottom to top by utilizing a convolutional neural network, and meanwhile, in order to study the relation between people and scenes, the global feature gfeat of the whole picture is also extracted:
4. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein in step S1, the method for merging into a personal feature representing high-level semantics is as follows:
at the feature space level, multiple features are concatenated and convolved together, thereby generating high-level semantic personal features.
5. The method for detecting important figures in pictures/videos by combining deep learning and relational modeling according to claim 1, wherein in step S3, the relational module specifically comprises:
based onAnd different features and relation functions, constructing a person-scene relation diagram +.>And person-to-person relationship diagram->Through the relation diagrams, the relation characteristic rfeats between people and scenes are calculated, and then the relation characteristic rfeats are fused with the original personal characteristic pfeats to judge the importance characteristic if eats of the importance score.
6. The method for detecting a picture/video important person combining deep learning and relational modeling as recited in claim 5, wherein in step S3,
inputting the obtained importance feature ifeats into a neural network composed of all connected layers for classification, and taking the scores of important characters as importance scores thereof, wherein the score calculation can be written as follows:
wherein fs For importance classification module, θ s For the corresponding parameters, the entire network framework can be formulated as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911042034.6A CN111008558B (en) | 2019-10-30 | 2019-10-30 | Picture/video important person detection method combining deep learning and relational modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911042034.6A CN111008558B (en) | 2019-10-30 | 2019-10-30 | Picture/video important person detection method combining deep learning and relational modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111008558A CN111008558A (en) | 2020-04-14 |
CN111008558B true CN111008558B (en) | 2023-05-30 |
Family
ID=70111750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911042034.6A Active CN111008558B (en) | 2019-10-30 | 2019-10-30 | Picture/video important person detection method combining deep learning and relational modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111008558B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967312B (en) * | 2020-07-06 | 2023-03-24 | 中央民族大学 | Method and system for identifying important persons in picture |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101551916A (en) * | 2009-04-16 | 2009-10-07 | 浙江大学 | Method and system of three-dimensional scene modeling based on ontology |
CN101957835A (en) * | 2010-08-16 | 2011-01-26 | 无锡市浏立方科技有限公司 | Complicated relationship and context information-oriented semantic data model |
CN108416314A (en) * | 2018-03-16 | 2018-08-17 | 中山大学 | The important method for detecting human face of picture |
CN108446625A (en) * | 2018-03-16 | 2018-08-24 | 中山大学 | The important pedestrian detection method of picture based on graph model |
CN110232330A (en) * | 2019-05-23 | 2019-09-13 | 复钧智能科技(苏州)有限公司 | A kind of recognition methods again of the pedestrian based on video detection |
-
2019
- 2019-10-30 CN CN201911042034.6A patent/CN111008558B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101551916A (en) * | 2009-04-16 | 2009-10-07 | 浙江大学 | Method and system of three-dimensional scene modeling based on ontology |
CN101957835A (en) * | 2010-08-16 | 2011-01-26 | 无锡市浏立方科技有限公司 | Complicated relationship and context information-oriented semantic data model |
CN108416314A (en) * | 2018-03-16 | 2018-08-17 | 中山大学 | The important method for detecting human face of picture |
CN108446625A (en) * | 2018-03-16 | 2018-08-24 | 中山大学 | The important pedestrian detection method of picture based on graph model |
CN110232330A (en) * | 2019-05-23 | 2019-09-13 | 复钧智能科技(苏州)有限公司 | A kind of recognition methods again of the pedestrian based on video detection |
Also Published As
Publication number | Publication date |
---|---|
CN111008558A (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things | |
Wang et al. | Binge watching: Scaling affordance learning from sitcoms | |
CN111401174B (en) | Volleyball group behavior identification method based on multi-mode information fusion | |
Lan et al. | Discriminative latent models for recognizing contextual group activities | |
CN108154075A (en) | The population analysis method learnt via single | |
Asif et al. | Privacy preserving human fall detection using video data | |
KR102106135B1 (en) | Apparatus and method for providing application service by using action recognition | |
KR102174695B1 (en) | Apparatus and method for recognizing movement of object | |
CN115427982A (en) | Methods, systems, and media for identifying human behavior in digital video using convolutional neural networks | |
CN110532862B (en) | Feature fusion group identification method based on gating fusion unit | |
EP3765995B1 (en) | Systems and methods for inter-camera recognition of individuals and their properties | |
Ghadi et al. | Syntactic model-based human body 3D reconstruction and event classification via association based features mining and deep learning | |
CN111104930A (en) | Video processing method and device, electronic equipment and storage medium | |
WO2023185074A1 (en) | Group behavior recognition method based on complementary spatio-temporal information modeling | |
Rehman et al. | Internet‐of‐Things‐Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security | |
CN114582030A (en) | Behavior recognition method based on service robot | |
CN113158791A (en) | Human-centered image description labeling method, system, terminal and medium | |
CN112200176A (en) | Method and system for detecting quality of face image and computer equipment | |
CN117612249A (en) | Underground miner dangerous behavior identification method and device based on improved OpenPose algorithm | |
CN111008558B (en) | Picture/video important person detection method combining deep learning and relational modeling | |
Khraief et al. | Convolutional neural network based on dynamic motion and shape variations for elderly fall detection | |
Atrey et al. | Real-world application of face mask detection system using YOLOv6 | |
CN117746399A (en) | Driver smoking detection algorithm based on ST-GCN and Yolov5n | |
Collell et al. | Learning representations specialized in spatial knowledge: Leveraging language and vision | |
Pham et al. | A proposal model using deep learning model integrated with knowledge graph for monitoring human behavior in forest protection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |