CN113269854B - Method for intelligently generating interview-type comprehensive programs - Google Patents

Method for intelligently generating interview-type comprehensive programs Download PDF

Info

Publication number
CN113269854B
CN113269854B CN202110803384.0A CN202110803384A CN113269854B CN 113269854 B CN113269854 B CN 113269854B CN 202110803384 A CN202110803384 A CN 202110803384A CN 113269854 B CN113269854 B CN 113269854B
Authority
CN
China
Prior art keywords
face
frame
video
channel
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110803384.0A
Other languages
Chinese (zh)
Other versions
CN113269854A (en
Inventor
袁琦
李�杰
杨瀚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobei Video Cloud Computing Co ltd
Original Assignee
Chengdu Sobei Video Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobei Video Cloud Computing Co ltd filed Critical Chengdu Sobei Video Cloud Computing Co ltd
Priority to CN202110803384.0A priority Critical patent/CN113269854B/en
Publication of CN113269854A publication Critical patent/CN113269854A/en
Application granted granted Critical
Publication of CN113269854B publication Critical patent/CN113269854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Abstract

The invention discloses a method for intelligently generating interview-type comprehensive programs, which comprises the following steps: s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software; s2, setting the role played by each channel material according to the camera shooting picture in the program video; s3, extracting video characteristics of each channel material; s4, generating a plurality of candidate video clips in each channel according to the extracted video features; s5, selecting candidate video clips according to predefined rules, synthesizing program initial clips and the like; the invention can quickly generate the initial film, provides the later editing personnel with quick editing and film forming, and reduces the manual load.

Description

Method for intelligently generating interview-type comprehensive programs
Technical Field
The invention relates to the field of video program synthesis, in particular to a method for intelligently generating interview-type integrated art programs.
Background
The interview-type program is a television program form which is easy and pleasant in atmosphere and is carried out around a certain theme between a host and guests in a mode of taking conversation as a main form, and the interview-type comprehensive program is an interview program which is mainly aimed at pleasure, mind and body and leisure fun, and is added with more comprehensive components and comic situation design to achieve a dramatic effect so as to be earmarked by entertainers. Its guests are mainly celebrity and sports stars, and therefore tend to have a very high popularity among young people. Although the programs are not similar to other art programs and are usually shot in a single scene and stage, a large number of cameras are still required to be arranged on the spot, and during shooting, pictures shot at different angles by different scenes on the spot are fully utilized to synthesize the initial program through a series of complicated operations such as real-time coordination between a director on the spot and each machine group member, lens cutting and the like, which often requires rich command experience and on-site capability of the director.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for intelligently generating interview-type comprehensive programs, which can quickly generate an initial film and provide the editing personnel for later-stage editing to quickly edit and generate the film, thereby reducing the manual load.
The purpose of the invention is realized by the following scheme:
a method for intelligently generating interview-type integrated art programs comprises the following steps:
s1, recording program video materials shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
Further, in step S2, the setting the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
Further, in step S3, the following steps are included:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
Further, in step S31, if the program is shared
Figure 919903DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 633781DEST_PATH_IMAGE002
Feature matrix of
Figure 679097DEST_PATH_IMAGE003
And
Figure 871044DEST_PATH_IMAGE004
name matrix of
Figure 950996DEST_PATH_IMAGE005
Figure 304617DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 102809DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 832867DEST_PATH_IMAGE007
And
Figure 537299DEST_PATH_IMAGE008
to (1) a
Figure 61821DEST_PATH_IMAGE009
Go to the first
Figure 347309DEST_PATH_IMAGE010
Column elements.
Further, in step S32, if there is any
Figure 615479DEST_PATH_IMAGE011
Video material of each channel, wherein each video material is
Figure 404444DEST_PATH_IMAGE012
Frames, each frame having been aligned on a timeline, are passed through
Figure 365446DEST_PATH_IMAGE013
An individual material
Figure 138230DEST_PATH_IMAGE014
To (1) a
Figure 678933DEST_PATH_IMAGE015
Frame image
Figure 119142DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 221352DEST_PATH_IMAGE017
Figure 481432DEST_PATH_IMAGE018
Wherein
Figure 91405DEST_PATH_IMAGE019
Denotes the first
Figure 120541DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 423346DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 905143DEST_PATH_IMAGE021
is shown as
Figure 318807DEST_PATH_IMAGE015
First of frame extraction
Figure 202450DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 174691DEST_PATH_IMAGE022
denotes the first
Figure 143784DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 95560DEST_PATH_IMAGE023
is shown as
Figure 99288DEST_PATH_IMAGE015
First of frame detection
Figure 478317DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 934706DEST_PATH_IMAGE024
denotes the first
Figure 690172DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 282828DEST_PATH_IMAGE025
is shown as
Figure 98337DEST_PATH_IMAGE015
First of frame detection
Figure 402541DEST_PATH_IMAGE010
Personal face keyThe point(s) is (are) such that,
Figure 961699DEST_PATH_IMAGE026
denotes the first
Figure 674440DEST_PATH_IMAGE015
The face detected by the frame corresponds to the identified name,
Figure 395271DEST_PATH_IMAGE027
is shown as
Figure 560673DEST_PATH_IMAGE015
First of frame detection
Figure 416197DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 249024DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 140757DEST_PATH_IMAGE029
is shown as
Figure 793455DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 694415DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 116169DEST_PATH_IMAGE032
representing a similarity calculation function. The result of extracting video features from all stories is expressed as
Figure 444382DEST_PATH_IMAGE033
Further, in step S33, for the second step
Figure 584376DEST_PATH_IMAGE013
An individual material
Figure 790492DEST_PATH_IMAGE034
To (1) a
Figure 332332DEST_PATH_IMAGE015
Frame image
Figure 565867DEST_PATH_IMAGE016
Given its width of
Figure 458737DEST_PATH_IMAGE035
High is
Figure 701499DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 97845DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 502282DEST_PATH_IMAGE038
,
Figure 616869DEST_PATH_IMAGE039
,
Figure 896278DEST_PATH_IMAGE040
,
Figure 412710DEST_PATH_IMAGE041
,
wherein the content of the first and second substances,
Figure 253627DEST_PATH_IMAGE042
is to show to
Figure 855510DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 440075DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 545434DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 557252DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 380852DEST_PATH_IMAGE046
is composed of
Figure 769108DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 230438DEST_PATH_IMAGE047
is composed of
Figure 413158DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 989633DEST_PATH_IMAGE048
is a threshold value set as
Figure 916001DEST_PATH_IMAGE046
Of medium maximum value
Figure 995952DEST_PATH_IMAGE049
Figure 615152DEST_PATH_IMAGE050
Is composed of
Figure 413344DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 877823DEST_PATH_IMAGE037
If the value is larger than the set empirical value, the image is represented
Figure 812281DEST_PATH_IMAGE016
And (5) stabilizing the picture.
Further, in step S34, for the second step
Figure 106777DEST_PATH_IMAGE013
An individual material
Figure 392265DEST_PATH_IMAGE034
Taking a fixed time window size of
Figure 660435DEST_PATH_IMAGE051
(i.e., fixed duration of time of
Figure 714979DEST_PATH_IMAGE051
) The key point data of the face of the same person
Figure 410403DEST_PATH_IMAGE052
I.e. by
Figure 917607DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 989469DEST_PATH_IMAGE054
I.e. by
Figure 164098DEST_PATH_IMAGE055
Thereby calculating out
Figure 764844DEST_PATH_IMAGE051
Variance of the area of the figure's mouth
Figure 526388DEST_PATH_IMAGE056
Figure 136361DEST_PATH_IMAGE057
Wherein
Figure 165497DEST_PATH_IMAGE058
Is shown by
Figure 202723DEST_PATH_IMAGE051
The average value of the human figure mouth-shaped area,
Figure 950100DEST_PATH_IMAGE059
representing a character
Figure 363763DEST_PATH_IMAGE060
At the moment of time
Figure 981826DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 455533DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 923161DEST_PATH_IMAGE056
When the value is larger than the set empirical value, the name is
Figure 874937DEST_PATH_IMAGE060
The character of
Figure 878665DEST_PATH_IMAGE051
Speaking during the time period is marked as a speaker.
Further, in step S4, the following steps are included:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 257694DEST_PATH_IMAGE013
An individual material
Figure 714083DEST_PATH_IMAGE014
All-frame analysis results of
Figure 203970DEST_PATH_IMAGE062
Go through all the results when
Figure 327784DEST_PATH_IMAGE063
Greater than a set empirical value, the flag
Figure 877714DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 821399DEST_PATH_IMAGE064
When the value is less than or equal to the set empirical value, marking
Figure 616442DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 329183DEST_PATH_IMAGE034
In a common vessel
Figure 315594DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 746575DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 843844DEST_PATH_IMAGE068
Comparing the current segment
Figure 145512DEST_PATH_IMAGE069
Out point of
Figure 302824DEST_PATH_IMAGE070
With the next segment
Figure 221102DEST_PATH_IMAGE071
In the point of entry
Figure 372596DEST_PATH_IMAGE072
If, if
Figure 59929DEST_PATH_IMAGE073
If the value is larger than the set empirical value, the segment is divided
Figure 388142DEST_PATH_IMAGE069
And fragments thereof
Figure 262557DEST_PATH_IMAGE071
Are combined into
Figure 967208DEST_PATH_IMAGE074
At the point of entry is
Figure 509048DEST_PATH_IMAGE069
In the point of entry
Figure 8162DEST_PATH_IMAGE075
At the point of departure is
Figure 635453DEST_PATH_IMAGE071
Out point of
Figure 612636DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 510447DEST_PATH_IMAGE077
Further, in step S5, the following steps are included:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42
Figure 180463DEST_PATH_IMAGE011
Final candidate segment list for individual channel material
Figure 295049DEST_PATH_IMAGE078
And the step ofAnd (3) filling the segments in the final candidate list of each channel material into the final slicing timeline according to the speaker marking result in the S34 and according to the following rules (the higher the priority is, the more the front is), so as to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above rule
Figure 75923DEST_PATH_IMAGE077
And selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The beneficial effects of the invention include:
(1) the method of the invention provides a program initial film generation method by utilizing video face recognition, speaker recognition and picture stability analysis through observing the on-site command and lens cutting logic of a director when shooting interview-type integrated art programs, and the method extracts the most appropriate lens segments from pictures shot from different angles and automatically generates the interview-type integrated art program initial film so as to reduce the workload of the director and later-period program editors.
(2) The invention provides a simple and efficient method for automatically synthesizing interview-type comprehensive video program initial films only by a small amount of presetting; specifically, roles are divided according to scenes by shooting pictures of different cameras on site in a node list system, a host and a guest are marked through face recognition processing, a speaker is marked through mouth shape analysis, invalid lenses are filtered through calculating picture stability scores to generate a candidate video clip list, and finally all candidate video clips are combined regularly to generate a program primary clip. The method of the invention achieves the purposes of quickly generating the initial film, providing the post-editing personnel with quick editing and film forming and reducing the manual load.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of steps in an embodiment of a method of the present invention;
FIG. 2 is a flow chart of the method embodiment of the present invention for extracting visual features from a channel material.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
As shown in fig. 1 and 2, a method for intelligently generating interview-type integrated art programs includes the steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
for example, in this step, videos of programs shot by 6 cameras at the scene of the program "when going on the spring and evening" are recorded, respectively
Figure 592355DEST_PATH_IMAGE079
(ii) a Other programs may also be recorded, and the number of cameras may be 8, 10, 12, and the like, which is not described herein again.
S2, setting the role played by each channel material according to the camera shooting picture in the program video;
according to pictures shot by a cameraSetting the role played by each channel material; in particular, the amount of the solvent to be used,
Figure 433272DEST_PATH_IMAGE080
the camera is fixed, the shot picture is a close shot,
Figure 35155DEST_PATH_IMAGE081
the camera is fixed, the shot picture is a middle view,
Figure 619720DEST_PATH_IMAGE082
the camera is fixed, the shot picture is a long shot,
Figure 489194DEST_PATH_IMAGE083
the camera is a rocker arm camera, and the shot picture is a long shot.
In step S2, the setting of the role played by each channel material includes the following steps: dividing each channel material into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
S3, extracting video features for each channel material, in step S3, the method includes the following steps:
s31, establishing a face library containing the host and the guest of the field program;
in step S31, if the program is shared
Figure 501012DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 324612DEST_PATH_IMAGE002
Feature matrix of
Figure 712868DEST_PATH_IMAGE003
And
Figure 672733DEST_PATH_IMAGE004
name matrix of
Figure 855453DEST_PATH_IMAGE005
Figure 431928DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 358296DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 438247DEST_PATH_IMAGE007
And
Figure 558912DEST_PATH_IMAGE008
to (1) a
Figure 357104DEST_PATH_IMAGE009
Go to the first
Figure 556004DEST_PATH_IMAGE010
Column elements.
S32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is any
Figure 756041DEST_PATH_IMAGE011
The video materials of the channels, here, N =6 (may be other numbers), each of which is a video material of one channel
Figure 546143DEST_PATH_IMAGE012
Frames, each frame having been aligned on a timeline, are passed through
Figure 566051DEST_PATH_IMAGE013
An individual material
Figure 99801DEST_PATH_IMAGE014
To (1) a
Figure 154344DEST_PATH_IMAGE015
Frame image
Figure 584189DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 861367DEST_PATH_IMAGE017
Figure 933228DEST_PATH_IMAGE018
Wherein
Figure 842279DEST_PATH_IMAGE019
Denotes the first
Figure 974183DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 703104DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 578656DEST_PATH_IMAGE021
is shown as
Figure 607792DEST_PATH_IMAGE015
First of frame extraction
Figure 645018DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 893859DEST_PATH_IMAGE022
denotes the first
Figure 307523DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 925586DEST_PATH_IMAGE023
is shown as
Figure 399293DEST_PATH_IMAGE015
First of frame detection
Figure 368386DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 585741DEST_PATH_IMAGE024
denotes the first
Figure 58310DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 702918DEST_PATH_IMAGE025
is shown as
Figure 159308DEST_PATH_IMAGE015
First of frame detection
Figure 413309DEST_PATH_IMAGE010
The key points of the face of the individual,
Figure 271544DEST_PATH_IMAGE026
denotes the first
Figure 821474DEST_PATH_IMAGE015
The face detected by the frame corresponds to the identified name,
Figure 765159DEST_PATH_IMAGE027
is shown as
Figure 58737DEST_PATH_IMAGE015
First of frame detection
Figure 771478DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 757889DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 188870DEST_PATH_IMAGE029
is shown as
Figure 787604DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 354851DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 246584DEST_PATH_IMAGE032
representing a similarity calculation function. The result of extracting video special frames from all the materials is expressed as
Figure 899282DEST_PATH_IMAGE084
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second step
Figure 800242DEST_PATH_IMAGE013
An individual material
Figure 487575DEST_PATH_IMAGE034
To (1) a
Figure 815789DEST_PATH_IMAGE015
Frame image
Figure 955783DEST_PATH_IMAGE016
Given its width of
Figure 394855DEST_PATH_IMAGE035
High is
Figure 429370DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 928485DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 290196DEST_PATH_IMAGE038
,
Figure 532958DEST_PATH_IMAGE039
,
Figure 194884DEST_PATH_IMAGE040
,
Figure 864900DEST_PATH_IMAGE041
,
wherein the content of the first and second substances,
Figure 713907DEST_PATH_IMAGE042
is to show to
Figure 760360DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 778257DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 619174DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 689898DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 274463DEST_PATH_IMAGE046
is composed of
Figure 910981DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 922799DEST_PATH_IMAGE047
is composed of
Figure 746399DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 869076DEST_PATH_IMAGE048
is a threshold value set as
Figure 360100DEST_PATH_IMAGE046
Of medium maximum value
Figure 41355DEST_PATH_IMAGE049
Figure 352250DEST_PATH_IMAGE050
Is composed of
Figure 13039DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 624149DEST_PATH_IMAGE037
When the image is larger than a certain preset value, the image is represented
Figure 712191DEST_PATH_IMAGE016
And (5) stabilizing the picture. In the present embodiment, for example, the preset value is taken as
Figure 510382DEST_PATH_IMAGE085
I.e. by
Figure 974862DEST_PATH_IMAGE086
Then represent the image
Figure 440478DEST_PATH_IMAGE016
And (5) stabilizing the picture.
And S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time. In step S34, forFirst, the
Figure 699421DEST_PATH_IMAGE013
An individual material
Figure 486374DEST_PATH_IMAGE034
Taking a fixed duration of
Figure 20123DEST_PATH_IMAGE051
The key point data of the face of the same person
Figure 74667DEST_PATH_IMAGE052
I.e. by
Figure 770090DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 277295DEST_PATH_IMAGE054
I.e. by
Figure 349156DEST_PATH_IMAGE055
Thereby calculating the variance of the figure mouth area in the period of time
Figure 258207DEST_PATH_IMAGE056
Figure 655690DEST_PATH_IMAGE057
Wherein
Figure 154585DEST_PATH_IMAGE058
The average value of the human mouth shape area in the period of time is shown,
Figure 764558DEST_PATH_IMAGE059
representing a character
Figure 528115DEST_PATH_IMAGE060
At the moment of time
Figure 830920DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 312717DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 726381DEST_PATH_IMAGE056
If it is greater than a predetermined value, V may be 500, and is referred to as
Figure 610023DEST_PATH_IMAGE060
The character of
Figure 818151DEST_PATH_IMAGE051
Speaking during the time period is marked as a speaker. In this embodiment, T may be 250 units, for example, and is selected according to actual conditions.
S4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 787244DEST_PATH_IMAGE013
An individual material
Figure 506063DEST_PATH_IMAGE014
All-frame analysis results of
Figure 244212DEST_PATH_IMAGE062
Go through all the results when
Figure 888820DEST_PATH_IMAGE063
Above a certain preset value (where the preset value can be 0.002, depending on different programs), the mark is marked
Figure 345209DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 100676DEST_PATH_IMAGE064
Less than or equal to a predetermined value (the predetermined value may be 0.002, depending on the program), marking
Figure 693331DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 243261DEST_PATH_IMAGE034
In a common vessel
Figure 186946DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 244639DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 957380DEST_PATH_IMAGE068
Comparing the current segment
Figure 678211DEST_PATH_IMAGE069
Out point of
Figure 843613DEST_PATH_IMAGE070
With the next segment
Figure 940882DEST_PATH_IMAGE071
In the point of entry
Figure 773709DEST_PATH_IMAGE072
If, if
Figure 665442DEST_PATH_IMAGE073
Above a certain preset value (here, 50 frames) the segment is segmented
Figure 318140DEST_PATH_IMAGE069
And fragments thereof
Figure 219100DEST_PATH_IMAGE071
Are combined into
Figure 407898DEST_PATH_IMAGE074
At the point of entry is
Figure 736111DEST_PATH_IMAGE069
In the point of entry
Figure 610526DEST_PATH_IMAGE075
At the point of departure is
Figure 315177DEST_PATH_IMAGE071
Out point of
Figure 591438DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 90552DEST_PATH_IMAGE077
And S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips. In step S5, the method includes the steps of:
s51, setting priority according to the scene according to the shooting picture category of each channel material; in particular, for the 6 channel material
Figure 717842DEST_PATH_IMAGE087
To
Figure 695026DEST_PATH_IMAGE083
Figure 91372DEST_PATH_IMAGE080
The highest priority is given to the first group,
Figure 277501DEST_PATH_IMAGE081
the priority level is set to a second priority level,
Figure 392088DEST_PATH_IMAGE088
the lowest priority;
s52, integrating the step S42
Figure 704120DEST_PATH_IMAGE011
Final candidate segment list for individual channel material
Figure 689394DEST_PATH_IMAGE078
And the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above rule
Figure 530311DEST_PATH_IMAGE077
And selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The parts not involved in the present invention are the same as or can be implemented using the prior art.
The above-described embodiment is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application and principle of the present invention disclosed in the present application, and the present invention is not limited to the method described in the above-described embodiment of the present invention, so that the above-described embodiment is only preferred, and not restrictive.
Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.

Claims (2)

1. A method for intelligently generating interview-type comprehensive programs is characterized by comprising the following steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video; in step S2, the setting of the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage;
s3, extracting video characteristics of each channel material; in step S3, the method includes the steps of:
s31, establishing a face library containing the host and the guest of the field program; in step S31, if the program is shared
Figure 52497DEST_PATH_IMAGE001
And the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face features
Figure 180990DEST_PATH_IMAGE002
Feature matrix of
Figure 611490DEST_PATH_IMAGE003
And
Figure 141829DEST_PATH_IMAGE004
name matrix of
Figure 629442DEST_PATH_IMAGE005
Figure 979652DEST_PATH_IMAGE001
Is an integer which is the number of the whole,
Figure 945334DEST_PATH_IMAGE006
respectively correspond to the matrix
Figure 330179DEST_PATH_IMAGE007
And
Figure 988693DEST_PATH_IMAGE008
to (1) a
Figure 826199DEST_PATH_IMAGE009
Go to the first
Figure 861151DEST_PATH_IMAGE010
A column element;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is any
Figure 100503DEST_PATH_IMAGE011
Video material of each channel, wherein each video material is
Figure 926989DEST_PATH_IMAGE012
Frames, each frame having been aligned on a timeline, are passed through
Figure 251791DEST_PATH_IMAGE013
An individual material
Figure 824855DEST_PATH_IMAGE014
To (1) a
Figure 184292DEST_PATH_IMAGE015
Frame image
Figure 919030DEST_PATH_IMAGE016
Face recognition processing is carried out to obtain a processing result set of the frame
Figure 731128DEST_PATH_IMAGE017
Figure 576724DEST_PATH_IMAGE018
Wherein
Figure 525088DEST_PATH_IMAGE019
Denotes the first
Figure 430728DEST_PATH_IMAGE015
A face feature matrix obtained by frame extraction,
Figure 291691DEST_PATH_IMAGE020
in order to detect the number of faces,
Figure 940978DEST_PATH_IMAGE021
is shown as
Figure 9428DEST_PATH_IMAGE015
First of frame extraction
Figure 85969DEST_PATH_IMAGE010
The characteristics of the individual's face are,
Figure 341501DEST_PATH_IMAGE022
is shown as
Figure 528900DEST_PATH_IMAGE015
All the face frames detected by the frame are,
Figure 451856DEST_PATH_IMAGE023
is shown as
Figure 964877DEST_PATH_IMAGE015
First of frame detection
Figure 970355DEST_PATH_IMAGE010
The number of the face frames is one,
Figure 961445DEST_PATH_IMAGE024
is shown as
Figure 4487DEST_PATH_IMAGE015
The key points of all the faces detected by the frame,
Figure 422830DEST_PATH_IMAGE025
is shown as
Figure 652954DEST_PATH_IMAGE015
First of frame detection
Figure 650997DEST_PATH_IMAGE010
The key points of the face of the individual,
Figure 548546DEST_PATH_IMAGE026
is shown as
Figure 403370DEST_PATH_IMAGE015
The face detected by the frame corresponds to the identified name,
Figure 123720DEST_PATH_IMAGE027
is shown as
Figure 191033DEST_PATH_IMAGE015
First of frame detection
Figure 411930DEST_PATH_IMAGE010
The name of the person corresponding to the individual person,
Figure 906496DEST_PATH_IMAGE028
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,
Figure 580054DEST_PATH_IMAGE029
is shown as
Figure 716637DEST_PATH_IMAGE030
The name of the individual person is used,
Figure 323199DEST_PATH_IMAGE031
indicating that the index corresponding to the maximum value is taken,
Figure 251316DEST_PATH_IMAGE032
representing a similarity calculation function; the result of extracting video features from all stories is expressed as
Figure 943329DEST_PATH_IMAGE033
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second step
Figure 883603DEST_PATH_IMAGE013
An individual material
Figure 344671DEST_PATH_IMAGE034
To (1) a
Figure 977778DEST_PATH_IMAGE015
Frame image
Figure 422666DEST_PATH_IMAGE016
Given its width of
Figure 166631DEST_PATH_IMAGE035
High is
Figure 13364DEST_PATH_IMAGE036
By counting the picture stability scores
Figure 817372DEST_PATH_IMAGE037
To characterize whether the frame of image picture is stable,
Figure 483977DEST_PATH_IMAGE038
,
Figure 757264DEST_PATH_IMAGE039
,
Figure 192925DEST_PATH_IMAGE040
,
Figure 167834DEST_PATH_IMAGE041
,
wherein the content of the first and second substances,
Figure 321735DEST_PATH_IMAGE042
is to show to
Figure 407503DEST_PATH_IMAGE016
The frame image is taken as a gray-scale image,
Figure 228828DEST_PATH_IMAGE043
which represents the fourier transform of the signal,
Figure 374639DEST_PATH_IMAGE044
representing the conversion of the 0 frequency component to the center of the spectrum,
Figure 15836DEST_PATH_IMAGE045
it is indicated that the absolute value is taken,
Figure 905294DEST_PATH_IMAGE046
is composed of
Figure 581126DEST_PATH_IMAGE047
The absolute value of (a) is,
Figure 163417DEST_PATH_IMAGE047
is composed of
Figure 288981DEST_PATH_IMAGE016
The grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,
Figure 716551DEST_PATH_IMAGE048
is a threshold value set as
Figure 512469DEST_PATH_IMAGE046
Of medium maximum value
Figure 734502DEST_PATH_IMAGE049
Figure 350292DEST_PATH_IMAGE050
Is composed of
Figure 847132DEST_PATH_IMAGE046
The number of pixels greater than the threshold value in
Figure 497556DEST_PATH_IMAGE037
If the value is larger than the set empirical value, the image is represented
Figure 421650DEST_PATH_IMAGE016
The picture is stable;
s34, using the data in the step S31 and using the human face key point data of the same person continuous time dimension to analyze the mouth shape and judge whether the person is speaking in the set time; in step S34, for the second step
Figure 524735DEST_PATH_IMAGE013
An individual material
Figure 294108DEST_PATH_IMAGE034
Taking a fixed time window size of
Figure 64618DEST_PATH_IMAGE051
The key point data of the face of the same person
Figure 894034DEST_PATH_IMAGE052
I.e. by
Figure 484415DEST_PATH_IMAGE053
Calculating the area of the mouth
Figure 60408DEST_PATH_IMAGE054
I.e. by
Figure 685425DEST_PATH_IMAGE055
Thereby calculating out
Figure 951321DEST_PATH_IMAGE051
Variance of the area of the figure's mouth
Figure 763419DEST_PATH_IMAGE056
Figure 874595DEST_PATH_IMAGE057
Wherein
Figure 354118DEST_PATH_IMAGE058
Is shown by
Figure 525336DEST_PATH_IMAGE051
The average value of the human figure mouth-shaped area,
Figure 90309DEST_PATH_IMAGE059
representing a character
Figure 5176DEST_PATH_IMAGE060
At the moment of time
Figure 73626DEST_PATH_IMAGE010
The key points of the face at the time of the operation,
Figure 681325DEST_PATH_IMAGE061
indicates the calculated area thereof when
Figure 468015DEST_PATH_IMAGE056
When the value is larger than the set empirical value, the name is
Figure 452152DEST_PATH_IMAGE060
The character of
Figure 372179DEST_PATH_IMAGE051
Speaking in a time period and marking as a speaker;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the first
Figure 150779DEST_PATH_IMAGE013
An individual material
Figure 424765DEST_PATH_IMAGE014
All-frame analysis results of
Figure 947014DEST_PATH_IMAGE062
Go through all the results when
Figure 990056DEST_PATH_IMAGE063
Greater than a set empirical value, the flag
Figure 673978DEST_PATH_IMAGE009
Continuously traversing subsequent results for the entry point of the updated candidate segment when
Figure 700840DEST_PATH_IMAGE064
When the value is less than or equal to the set empirical value, marking
Figure 761200DEST_PATH_IMAGE065
Generating material for the out-pointing of the updated candidate segment, and so on
Figure 658749DEST_PATH_IMAGE034
In a common vessel
Figure 44731DEST_PATH_IMAGE066
Initial candidate segment list of candidate segments
Figure 27730DEST_PATH_IMAGE067
S42, traversing the initial candidate segment list generated in S41
Figure 891781DEST_PATH_IMAGE068
Comparing the current segment
Figure 924064DEST_PATH_IMAGE069
Out point of
Figure 949789DEST_PATH_IMAGE070
With the next segment
Figure 685664DEST_PATH_IMAGE071
In the point of entry
Figure 353405DEST_PATH_IMAGE072
If, if
Figure 225547DEST_PATH_IMAGE073
If the value is larger than the set empirical value, the segment is divided
Figure 422173DEST_PATH_IMAGE069
And fragments thereof
Figure 645344DEST_PATH_IMAGE071
Are combined into
Figure 585618DEST_PATH_IMAGE074
At the point of entry is
Figure 577845DEST_PATH_IMAGE069
In the point of entry
Figure 210951DEST_PATH_IMAGE075
At the point of departure is
Figure 655839DEST_PATH_IMAGE071
Out point of
Figure 399804DEST_PATH_IMAGE076
And so on, generating a final candidate segment list
Figure 246537DEST_PATH_IMAGE077
S5, selecting candidate video clips according to predefined rules to compose a primary program, in step S5, the method includes the following steps:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42
Figure 782036DEST_PATH_IMAGE011
Final candidate segment list for individual channel material
Figure 979800DEST_PATH_IMAGE078
And the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
2. The method for intelligently generating interview-like variety programs according to claim 1, wherein in step S51, priority is set as: short shot > medium shot > long shot.
CN202110803384.0A 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs Active CN113269854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110803384.0A CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110803384.0A CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Publications (2)

Publication Number Publication Date
CN113269854A CN113269854A (en) 2021-08-17
CN113269854B true CN113269854B (en) 2021-10-15

Family

ID=77236586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110803384.0A Active CN113269854B (en) 2021-07-16 2021-07-16 Method for intelligently generating interview-type comprehensive programs

Country Status (1)

Country Link
CN (1) CN113269854B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174962A (en) * 2022-07-22 2022-10-11 湖南芒果无际科技有限公司 Rehearsal simulation method and device, computer equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005091211A1 (en) * 2004-03-16 2005-09-29 3Vr Security, Inc. Interactive system for recognition analysis of multiple streams of video
CN104732991A (en) * 2015-04-08 2015-06-24 成都索贝数码科技股份有限公司 System and method for rapidly sorting, selecting and editing entertainment program massive materials
CN105307028A (en) * 2015-10-26 2016-02-03 新奥特(北京)视频技术有限公司 Video editing method and device specific to video materials of plurality of lenses
CN106682617A (en) * 2016-12-28 2017-05-17 电子科技大学 Image definition judgment and feature extraction method based on frequency spectrum section information
CN108875602A (en) * 2018-05-31 2018-11-23 珠海亿智电子科技有限公司 Monitor the face identification method based on deep learning under environment
CN111191484A (en) * 2018-11-14 2020-05-22 普天信息技术有限公司 Method and device for recognizing human speaking in video image

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9818136B1 (en) * 2003-02-05 2017-11-14 Steven M. Hoffberg System and method for determining contingent relevance
US8095466B2 (en) * 2006-05-15 2012-01-10 The Directv Group, Inc. Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems
US20170032559A1 (en) * 2015-10-16 2017-02-02 Mediatek Inc. Simulated Transparent Device
CN110691258A (en) * 2019-10-30 2020-01-14 中央电视台 Program material manufacturing method and device, computer storage medium and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005091211A1 (en) * 2004-03-16 2005-09-29 3Vr Security, Inc. Interactive system for recognition analysis of multiple streams of video
CN104732991A (en) * 2015-04-08 2015-06-24 成都索贝数码科技股份有限公司 System and method for rapidly sorting, selecting and editing entertainment program massive materials
CN105307028A (en) * 2015-10-26 2016-02-03 新奥特(北京)视频技术有限公司 Video editing method and device specific to video materials of plurality of lenses
CN106682617A (en) * 2016-12-28 2017-05-17 电子科技大学 Image definition judgment and feature extraction method based on frequency spectrum section information
CN108875602A (en) * 2018-05-31 2018-11-23 珠海亿智电子科技有限公司 Monitor the face identification method based on deep learning under environment
CN111191484A (en) * 2018-11-14 2020-05-22 普天信息技术有限公司 Method and device for recognizing human speaking in video image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"索贝AI剪辑应用于总台综艺访谈类节目";无;《现代电视技术》;20200229(第2期);第160页 *
F'elicien Vallet等."ROBUST VISUAL FEATURES FOR THE MULTIMODAL IDENTIFICATION OF UNREGISTERED SPEAKERS IN TV TALK-SHOWS".《2010 IEEE 17th International Conference on Image Processing》.2010, *
无."索贝AI剪辑应用于总台综艺访谈类节目".《现代电视技术》.2020,(第2期), *
说话人辨认中有效参数的研究;王炳锡等;《应用声学》;19920431;第11卷(第02期);第20-23页 *

Also Published As

Publication number Publication date
CN113269854A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
JP7252362B2 (en) Method for automatically editing video and portable terminal
JP7228682B2 (en) Gating model for video analysis
Chen et al. What comprises a good talking-head video generation?: A survey and benchmark
CN111683209B (en) Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium
Kang Affective content detection using HMMs
US7949188B2 (en) Image processing apparatus, image processing method, and program
US8879788B2 (en) Video processing apparatus, method and system
JP5510167B2 (en) Video search system and computer program therefor
WO2022184117A1 (en) Deep learning-based video clipping method, related device, and storage medium
TWI253860B (en) Method for generating a slide show of an image
CN107430780B (en) Method for output creation based on video content characteristics
CN109218629B (en) Video generation method, storage medium and device
CN112367551B (en) Video editing method and device, electronic equipment and readable storage medium
WO2011015909A1 (en) System for creating a capsule representation of an instructional video
JPH11514479A (en) Method for computerized automatic audiovisual dubbing of movies
US20170213576A1 (en) Live Comics Capturing Camera
CN110505498A (en) Processing, playback method, device and the computer-readable medium of video
WO2023197979A1 (en) Data processing method and apparatus, and computer device and storage medium
WO2022061806A1 (en) Film production method, terminal device, photographing device, and film production system
CN113269854B (en) Method for intelligently generating interview-type comprehensive programs
Zhang et al. Detecting and removing visual distractors for video aesthetic enhancement
US9542976B2 (en) Synchronizing videos with frame-based metadata using video content
JP6389296B1 (en) VIDEO DATA PROCESSING DEVICE, VIDEO DATA PROCESSING METHOD, AND COMPUTER PROGRAM
CN113255628B (en) Scene identification recognition method for news scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant