CN116469167A - Method and system for obtaining character action fragments based on character actions in video - Google Patents

Method and system for obtaining character action fragments based on character actions in video Download PDF

Info

Publication number
CN116469167A
CN116469167A CN202310395288.6A CN202310395288A CN116469167A CN 116469167 A CN116469167 A CN 116469167A CN 202310395288 A CN202310395288 A CN 202310395288A CN 116469167 A CN116469167 A CN 116469167A
Authority
CN
China
Prior art keywords
video
character
action
numbers
clips
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310395288.6A
Other languages
Chinese (zh)
Inventor
韩继泽
刘永辉
谢恩鹏
王志亮
杜浩
温连龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Langchao Ultra Hd Intelligent Technology Co ltd
Original Assignee
Shandong Langchao Ultra Hd Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Langchao Ultra Hd Intelligent Technology Co ltd filed Critical Shandong Langchao Ultra Hd Intelligent Technology Co ltd
Priority to CN202310395288.6A priority Critical patent/CN116469167A/en
Publication of CN116469167A publication Critical patent/CN116469167A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for acquiring character action fragments based on character actions in videos, belongs to the technical field of big data processing, and aims to solve the technical problem of how to quickly acquire the character actions in the videos and provide accurate starting time of the actions. The method comprises the following steps: dividing the acquired video stream into a plurality of video clips at equal time intervals to obtain video frame numbers, and recording the starting time of video frames in the video clips; carrying out character position identification based on a view image set corresponding to the video clip to obtain characters in the video image and character positions corresponding to each character; carrying out character action recognition by using a video image set corresponding to the video clips and the character positions of each character to obtain the action type of each character, and numbering each character to obtain the character number of the video frame; and carrying out feature matching on the video clips based on the character feature vectors, and merging the video clips of the same character and the same action category to obtain a new video clip.

Description

Method and system for obtaining character action fragments based on character actions in video
Technical Field
The invention relates to the technical field of big data processing, in particular to a method and a system for acquiring a character action fragment based on a character action in a video.
Background
With the progress of AI technology, video motion recognition has been well developed on the basis of picture classification and object detection. For a video clip, it can be identified whether the video clip contains a given character action. The identification is independent of the length of the video, and even if the video has redundant parts or the human actions in the video are only partial, the identification can be performed incompletely. There is no valid identification definition for the exact starting position of the action.
How to quickly acquire the actions of people in a video and give out the accurate starting time of the actions is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a method and a system for acquiring character action fragments based on character actions in videos, so as to solve the problem of how to quickly acquire the character actions in the videos and provide accurate starting time of the actions.
In a first aspect, the present invention provides a method for obtaining a character action segment based on a character action in a video, comprising the steps of:
dividing the acquired video stream into a plurality of video clips at equal time intervals, constructing a video image set based on video images corresponding to the video clips for each video clip, numbering the video clips to obtain video frame numbers, and recording the starting time of video frames in the video clips;
for each video clip, carrying out character position identification based on a view image set corresponding to the video clip to obtain characters in the video image and character positions corresponding to each character;
for each video clip, carrying out character action recognition by using a video image set corresponding to the video clip and the character position of each character to obtain the action type of each character, and numbering each character to obtain the character number of the video frame;
and for all video clips in the video stream, carrying out feature matching on the video clips based on the character feature vectors, combining the video clips with the same character and the same action category to obtain a new video clip, and updating the video frame numbers, the video frame starting time, the video frame character numbers and the action category and the character feature vector corresponding to each video frame number corresponding to the new video clip.
Preferably, for each video clip, taking a video image set corresponding to the video clip as input, and carrying out character position recognition through a target recognition model constructed based on a YOLO algorithm to obtain characters and character positions in the video clip.
Preferably, for each video clip, the motion of each person is obtained by performing motion recognition on a motion recognition model constructed based on a slow algorithm with a view image set and a person position corresponding to the video clip as inputs.
Preferably, for all video clips in the video stream, feature comparison is performed on the video clips based on time sequence, and the video clips with the same person and the same action result are combined, including the following steps:
for characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
according to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
and comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
In a second aspect, the present invention is a system for obtaining a character action segment based on a character action in a video, for obtaining a character action segment by the method for obtaining a character action segment based on a character action in a video according to any one of the first aspects, the system comprising:
the video segment acquisition module is used for dividing an acquired video stream into a plurality of video segments at equal time intervals, constructing a video image set based on video images corresponding to the video segments for each video segment, numbering the video segments to obtain video frame numbers, and recording the starting time of video frames in the video segments;
the character recognition module is used for recognizing the character positions of each video clip based on the view image set corresponding to the video clip to obtain characters in the video image and the character positions corresponding to each character;
the motion recognition module is used for recognizing the motions of the characters according to the video image set corresponding to the video clips and the character positions of the characters to obtain the motion types of the characters, numbering the characters and obtaining the character numbers of the video frames;
and the video segment merging module is used for carrying out feature matching on the video segments based on the character feature vectors, merging the video segments with the same character and the same action category to obtain a new video segment, and updating the video frame numbers, the video frame starting time, the video frame character numbers and the action category and the character feature vectors corresponding to each video frame number corresponding to the new video segment.
Preferably, for each video clip, the person recognition module is configured to perform person position recognition by using a video image set corresponding to the video clip as input and using a target recognition model constructed based on YOLO algorithm to obtain a person and a person position in the video clip.
Preferably, for each video clip, the motion recognition module is configured to perform motion recognition by using a view image set and a person position corresponding to the video clip as input through a motion recognition model constructed based on a slow fast algorithm, so as to obtain a motion of each person.
Preferably, the video clip merging module is configured to perform the following:
for characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
according to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
and comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
The system for acquiring the character action fragments based on the character actions in the video has the following advantages: the method comprises the steps of segmenting video streams at equal time intervals into a plurality of video segments, numbering video frames for each video segment, obtaining character positions and action types of characters in the video segments, extracting character feature vectors based on the character positions, carrying out feature matching on the video segments through the character feature vectors, merging the video segments with the same character and the same action type into a new video segment, obtaining accurate segment information of character actions based on the method, rapidly obtaining the character actions, and accurately giving out the starting time of the actions.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block flow diagram of a method for obtaining character action fragments based on character actions in video according to embodiment 1;
FIG. 2 is a block flow diagram of motion recognition in a method for obtaining a character motion segment based on a character motion in a video according to embodiment 1;
fig. 3 is a flowchart of a method for obtaining character action fragments based on character actions in video according to embodiment 1, wherein character feature comparison and video fragment merging are performed.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.
The embodiment of the invention provides a method and a system for acquiring a character action fragment based on a character action in a video, which are used for solving the technical problem of how to quickly acquire the character action in the video and giving out the accurate starting time of the action.
Example 1:
the invention discloses a method for acquiring a character action fragment based on character actions in video, which comprises the following steps:
s100, segmenting an acquired video stream into a plurality of video clips at equal time intervals, constructing a video image set based on video images corresponding to the video clips for each video clip, numbering the video clips to obtain video frame numbers, and recording the starting time of video frames in the video clips;
s200, for each video clip, carrying out character position identification based on a view image set corresponding to the video clip to obtain characters in the video image and character positions corresponding to each character;
s300, for each video clip, carrying out character action recognition by using a video image set corresponding to the video clip and the character position of each character to obtain the action type of each character, and numbering each character to obtain the character number of the video frame;
s400, for all video clips in the video stream, performing feature matching on the video clips based on the character feature vectors, merging the video clips with the same character and the same action category to obtain a new video clip, and updating the video frame numbers, the video frame start time, the video frame character numbers and the action category and the character feature vectors corresponding to each video frame number corresponding to the new video clip.
In this embodiment, step S100 intercepts a small video clip from a fixed video stream interval, for example, specifies the small video clip for 2 seconds, uniquely numbers the video frame picture set of the clip, obtains a video frame number, and records a video frame start-stop position, where the video frame start-stop position is understood as a start time.
In step S200 of this embodiment, for each video clip, a set of video images corresponding to the video clip is taken as input, and person position recognition is performed through a target recognition model constructed based on YOLO algorithm, so as to obtain a person and a person position in the video clip.
For a target recognition model, firstly, a model is required to be built according to a YOLO algorithm, then a sample set consisting of video images is acquired, character types and character position information are marked in the video images as label information, model training and testing are carried out on the target recognition model based on the sample set and the label information, a trained target recognition model is obtained, character position recognition is carried out through the trained target recognition model, and characters and character positions in video clips are obtained.
In step S300 of this embodiment, for each video clip, the motion recognition is performed by using the view image set and the person position corresponding to the video clip as input through the motion recognition model constructed based on the slow algorithm, so as to obtain the motion of each person.
For the action recognition model, firstly, a model is required to be built according to a SLOWFAST algorithm, then a sample set formed by video images is acquired, character types and character position information are marked in the video images as label information, model training and testing are carried out on the action recognition model based on the sample set and the label information, a trained action recognition model is obtained, character action recognition is carried out through the trained action recognition model, and the action types of the characters in the video clips are obtained.
In this embodiment, step S400 is character feature comparison and video segment merging, and as a specific implementation method, the steps include the following operations:
(1) For characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
(2) Constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
(3) According to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
(4) And comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
When the character feature vectors are compared, the character feature vectors are normalized, then multiplication operation is carried out on the normalized two character feature vectors, and whether the two task feature vectors are identical or similar is judged based on multiplication results.
Based on the specific operation, in the specific implementation process, according to the character position of the small video clip frame picture, the picture area of the human body position is subjected to characteristic representation, and a multidimensional characteristic vector is obtained. If a plurality of characters exist, a plurality of characteristic vectors are generated, and the character codes are combined with the character numbers of the video frames in the video character action recognition algorithm to obtain final characterization information, wherein the characterization information comprises the character actions and the character characteristic vectors corresponding to the video frame numbers, the video frame start and end positions, the video frame character numbers and the video frame character numbers.
Then, according to the time sequence of the video, carrying out feature matching on adjacent small video fragments, if character features of two adjacent small video fragments are very similar to each other, and meanwhile, the action types of the two adjacent small video fragments are the same, then the two small video fragments can be considered as the continuation of the same action of the same person, then the information of the two small video fragments is combined, the lengths of video frames are combined, so that new video frame starting and ending positions are obtained, and other video frame numbers, video frame character numbers, actions corresponding to the video frame character numbers and character feature vectors which are used in the time sequence are used in the front; if the feature contrast is unlike, it is interpreted that they are actions of different people, and no merging operation is performed, they are considered to be separate and distinct actions. Note that if character features match similarly, but their kinds of actions are different, they are considered to be different actions of the same character, and no merging operation is performed. Thus, the video clips of the same action are combined, thereby obtaining the accurate starting and stopping positions of the video clips.
And carrying out the operation according to the operation cycle, comparing the human body characteristics of the new video segment and the video segment at the next time, and merging the same action segment of the same person to obtain the starting and ending positions of the whole action.
According to the method, the action small fragments appearing in the video fragments are stored and the characteristics are generated, then the small fragments of the same action are combined through characteristic matching, the large fragment containing the whole action is generated, and the actions of the video person can be quickly acquired and the accurate starting and ending positions of the actions can be given.
Example 2:
the invention discloses a system for acquiring a character action fragment based on a character action in a video, which comprises a video fragment acquisition module, a character recognition module, an action recognition module and a video fragment merging module.
The video segment acquisition module is used for dividing an acquired video stream into a plurality of video segments at equal time intervals, constructing a video image set based on video images corresponding to the video segments for each video segment, numbering the video segments to obtain video frame numbers, and recording the starting time of video frames in the video segments.
In this embodiment, the video clip collection module is configured to intercept a small video clip from a video stream at a fixed interval, for example, specify the small video clip for 2 seconds, uniquely number a video frame picture set of the clip, obtain a video frame number, and record a start-stop position of a video frame, where the start-stop position of the video frame is understood as a start time.
And for each video clip, the person identification module is used for carrying out person position identification based on the view image set corresponding to the video clip, so as to obtain the person in the video image and the person position corresponding to each person.
In this embodiment, for each video clip, the person recognition module is configured to perform person position recognition by using a set of video images corresponding to the video clip as input and using a target recognition model constructed based on YOLO algorithm to obtain a person and a person position in the video clip.
For the object recognition model, the person recognition module is configured to perform the following operations: firstly, a model is required to be built according to a YOLO algorithm, then a sample set formed by video images is obtained, character types and character position information are marked in the video images as tag information, model training and testing are carried out on the target recognition model based on the sample set and the tag information, a trained target recognition model is obtained, character position recognition is carried out through the trained target recognition model, and characters and character positions in video clips are obtained. Or, the person recognition module is configured with the target recognition model trained through the operation, and the trained target recognition model is called to recognize the person category and the person position.
And for each video clip, the action recognition module is used for carrying out character action recognition according to the video image set corresponding to the video clip and the character position of each character to obtain the action type of each character, and numbering each character to obtain the character number of the video frame.
For each video clip, the action recognition module in this embodiment takes the view image set corresponding to the video clip and the person position as input, and performs action recognition through an action recognition model constructed based on a slow fast algorithm to obtain the action of each person.
For the action category model, the action recognition module is configured to perform the following: firstly, a model is required to be built according to a SLOWFAST algorithm, then a sample set formed by video images is acquired, character types and character position information are marked in the video images as tag information, model training and testing are carried out on the motion recognition model based on the sample set and the tag information, a trained motion recognition model is obtained, character motion recognition is carried out through the trained motion recognition model, and the motion types of the characters in the video clips are obtained. Or, the action recognition module is configured with the action recognition model trained by the operation, and the trained action recognition model is called to recognize the action category.
And for all video clips in the video stream, the video clip merging module is used for carrying out feature comparison on the video clips based on time sequence, merging the video clips with the same person and the same action result to obtain a new video clip, and updating the video frame number, the video frame person number and the video frame starting time corresponding to the new video clip.
In this embodiment, the video segment merging module is configured to compare character features and merge video segments, and as a specific implementation method, is specifically configured to perform the following operations:
(1) For characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
(2) Constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
(3) According to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
(4) And comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
When the character feature vectors are compared, the character feature vectors are normalized, then multiplication operation is carried out on the normalized two character feature vectors, and whether the two task feature vectors are identical or similar is judged based on multiplication results.
Based on the specific operation, the module workflow is as follows:
firstly, according to the character position of a small video clip frame picture, the picture area of the human body position is subjected to characteristic representation to obtain a multidimensional characteristic vector. If a plurality of characters exist, a plurality of characteristic vectors are generated, and the character codes are combined with the character numbers of the video frames in the video character action recognition algorithm to obtain final characterization information, wherein the characterization information comprises the character actions and the character characteristic vectors corresponding to the video frame numbers, the video frame start and stop positions, the video frame character numbers and the video frame character numbers;
then, according to the time sequence of the video, carrying out feature matching on adjacent small video fragments, if character features of two adjacent small video fragments are very similar to each other, and meanwhile, the action types of the two adjacent small video fragments are the same, then the two small video fragments can be considered as the continuation of the same action of the same person, then the information of the two small video fragments is combined, the lengths of video frames are combined, so that new video frame starting and ending positions are obtained, and other video frame numbers, video frame character numbers, actions corresponding to the video frame character numbers and character feature vectors which are used in the time sequence are used in the front; if the feature contrast is unlike, it is interpreted that they are actions of different people, and no merging operation is performed, they are considered to be separate and distinct actions. Note that if character features match similarly, but their kinds of actions are different, they are considered to be different actions of the same character, and no merging operation is performed; thus, the video clips of the same action are combined, thereby obtaining the accurate starting and stopping positions of the video clips.
Finally, according to the operation cycle, comparing the human body characteristics of the new video segment and the video segment at the next time, and combining the same action segment of the same person to obtain the starting and ending positions of the whole action.
While the invention has been illustrated and described in detail in the drawings and in the preferred embodiments, the invention is not limited to the disclosed embodiments, and it will be appreciated by those skilled in the art that the code audits of the various embodiments described above may be combined to produce further embodiments of the invention, which are also within the scope of the invention.

Claims (8)

1. A method for obtaining character action fragments based on character actions in video, comprising the steps of:
dividing the acquired video stream into a plurality of video clips at equal time intervals, constructing a video image set based on video images corresponding to the video clips for each video clip, numbering the video clips to obtain video frame numbers, and recording the starting time of video frames in the video clips;
for each video clip, carrying out character position identification based on a view image set corresponding to the video clip to obtain characters in the video image and character positions corresponding to each character;
for each video clip, carrying out character action recognition by using a video image set corresponding to the video clip and the character position of each character to obtain the action type of each character, and numbering each character to obtain the character number of the video frame;
and for all video clips in the video stream, carrying out feature matching on the video clips based on the character feature vectors, combining the video clips with the same character and the same action category to obtain a new video clip, and updating the video frame numbers, the video frame starting time, the video frame character numbers and the action category and the character feature vector corresponding to each video frame number corresponding to the new video clip.
2. The method for obtaining a character action segment based on a character action in video according to claim 1, wherein for each video segment, a set of video images corresponding to the video segment is taken as input, and character position recognition is performed through a target recognition model constructed based on YOLO algorithm, so as to obtain a character and a character position in the video segment.
3. The method for obtaining motion segments of a person based on motion of the person in video according to claim 1, wherein for each video segment, motion recognition is performed by a motion recognition model constructed based on a slow algorithm with a set of view images corresponding to the video segment and a position of the person as inputs, to obtain the motion of each person.
4. A method for obtaining character action fragments based on character actions in video according to any one of claims 1-3, wherein for all video fragments in a video stream, feature comparison is performed on the video fragments based on time sequence, and the video fragments of the same character and having the same action result are combined, comprising the steps of:
for characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
according to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
and comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
5. A system for obtaining a character action fragment based on a character action in a video, wherein the system comprises:
the video segment acquisition module is used for dividing an acquired video stream into a plurality of video segments at equal time intervals, constructing a video image set based on video images corresponding to the video segments for each video segment, numbering the video segments to obtain video frame numbers, and recording the starting time of video frames in the video segments;
the character recognition module is used for recognizing the character positions of each video clip based on the view image set corresponding to the video clip to obtain characters in the video image and the character positions corresponding to each character;
the motion recognition module is used for recognizing the motions of the characters according to the video image set corresponding to the video clips and the character positions of the characters to obtain the motion types of the characters, numbering the characters and obtaining the character numbers of the video frames;
and the video segment merging module is used for carrying out feature matching on the video segments based on the character feature vectors, merging the video segments with the same character and the same action category to obtain a new video segment, and updating the video frame numbers, the video frame starting time, the video frame character numbers and the action category and the character feature vectors corresponding to each video frame number corresponding to the new video segment.
6. The system for capturing segments of human actions based on human actions in video of claim 5 wherein for each video segment, said human recognition module is configured to perform human location recognition on a target recognition model constructed based on YOLO algorithm with a set of video images corresponding to said video segment as input to obtain a human and a human location in the video segment.
7. The system for capturing motion segments of a person based on motion of the person in video of claim 5, wherein for each video segment, the motion recognition module is configured to perform motion recognition by using a motion recognition model constructed based on a slow algorithm with a set of view images corresponding to the video segment and a position of the person as inputs to obtain the motion of each person.
8. The system for capturing character action segments based on character actions in video according to any one of claims 5-7, wherein the video segment merging module is configured to perform the following:
for characters in each frame of video image in the video clip, carrying out feature extraction on an image area of the character position in the video image through a twin network based on the character position to obtain a multi-dimensional feature vector as a character feature vector, wherein each character corresponds to a corresponding character feature vector;
constructing characterization information based on video frame numbers, video frame start time, video frame character numbers, action results corresponding to the video frame character numbers and character feature vectors of the video clips;
according to the sequence of the video stream, based on character feature vectors, carrying out feature comparison on two adjacent video clips in the video stream, and executing a feature comparison principle, wherein the feature comparison principle is as follows: if the character characteristic vector comparison results corresponding to two adjacent video clips accord with a threshold value, the same character is judged, the action results are compared, if the action comparison results accord with the threshold value, the same character is judged to be the continuation of the same action of the same character, the two video clips are combined to obtain a new video clip, the video frame starting time corresponding to the new video clip is updated, and the video frame numbers corresponding to the video clips, the video frame character numbers and the action and character characteristic vectors corresponding to the video frame character numbers are close to the video frame numbers in time sequence;
and comparing the characteristics of the new video segment with the characteristics of the next video segment which is not subjected to characteristic comparison based on the character characteristic vector, and executing a characteristic comparison principle until the character characteristic comparison result does not accord with a threshold value, or the task characteristic comparison result accords with the threshold value, but the action comparison result does not accord with the threshold value.
CN202310395288.6A 2023-04-10 2023-04-10 Method and system for obtaining character action fragments based on character actions in video Pending CN116469167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310395288.6A CN116469167A (en) 2023-04-10 2023-04-10 Method and system for obtaining character action fragments based on character actions in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310395288.6A CN116469167A (en) 2023-04-10 2023-04-10 Method and system for obtaining character action fragments based on character actions in video

Publications (1)

Publication Number Publication Date
CN116469167A true CN116469167A (en) 2023-07-21

Family

ID=87183725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310395288.6A Pending CN116469167A (en) 2023-04-10 2023-04-10 Method and system for obtaining character action fragments based on character actions in video

Country Status (1)

Country Link
CN (1) CN116469167A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117676245A (en) * 2024-01-31 2024-03-08 深圳市积加创新技术有限公司 Context video generation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117676245A (en) * 2024-01-31 2024-03-08 深圳市积加创新技术有限公司 Context video generation method and device
CN117676245B (en) * 2024-01-31 2024-06-11 深圳市积加创新技术有限公司 Context video generation method and device

Similar Documents

Publication Publication Date Title
Zhao et al. Temporal action detection with structured segment networks
US8170280B2 (en) Integrated systems and methods for video-based object modeling, recognition, and tracking
CN109919977B (en) Video motion person tracking and identity recognition method based on time characteristics
CN108446601B (en) Face recognition method based on dynamic and static feature fusion
CN110008797B (en) Multi-camera multi-face video continuous acquisition method
CN108537119B (en) Small sample video identification method
Bonomi et al. Dynamic texture analysis for detecting fake faces in video sequences
CN108921204B (en) Electronic device, picture sample set generation method, and computer-readable storage medium
EP2291722A1 (en) Method, apparatus and computer program product for providing gesture analysis
Ong et al. Robust facial feature tracking using shape-constrained multiresolution-selected linear predictors
CN112446363A (en) Image splicing and de-duplication method and device based on video frame extraction
CN111985333B (en) Behavior detection method based on graph structure information interaction enhancement and electronic device
US20210097692A1 (en) Data filtering of image stacks and video streams
CN112712005B (en) Training method of recognition model, target recognition method and terminal equipment
CN113221770B (en) Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN116469167A (en) Method and system for obtaining character action fragments based on character actions in video
Dai et al. Tan: Temporal aggregation network for dense multi-label action recognition
CN109684511A (en) A kind of video clipping method, video aggregation method, apparatus and system
CN115497124A (en) Identity recognition method and device and storage medium
CN113627576B (en) Code scanning information detection method, device, equipment and storage medium
Toldo et al. Real-time incremental j-linkage for robust multiple structures estimation
CN117292338B (en) Vehicle accident identification and analysis method based on video stream analysis
CN116434096A (en) Spatiotemporal motion detection method and device, electronic equipment and storage medium
CN113837006A (en) Face recognition method and device, storage medium and electronic equipment
JPH10222678A (en) Device for detecting object and method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination