CN113269854B - Method for intelligently generating interview-type comprehensive programs - Google Patents
Method for intelligently generating interview-type comprehensive programs Download PDFInfo
- Publication number
- CN113269854B CN113269854B CN202110803384.0A CN202110803384A CN113269854B CN 113269854 B CN113269854 B CN 113269854B CN 202110803384 A CN202110803384 A CN 202110803384A CN 113269854 B CN113269854 B CN 113269854B
- Authority
- CN
- China
- Prior art keywords
- face
- frame
- video
- channel
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Abstract
The invention discloses a method for intelligently generating interview-type comprehensive programs, which comprises the following steps: s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software; s2, setting the role played by each channel material according to the camera shooting picture in the program video; s3, extracting video characteristics of each channel material; s4, generating a plurality of candidate video clips in each channel according to the extracted video features; s5, selecting candidate video clips according to predefined rules, synthesizing program initial clips and the like; the invention can quickly generate the initial film, provides the later editing personnel with quick editing and film forming, and reduces the manual load.
Description
Technical Field
The invention relates to the field of video program synthesis, in particular to a method for intelligently generating interview-type integrated art programs.
Background
The interview-type program is a television program form which is easy and pleasant in atmosphere and is carried out around a certain theme between a host and guests in a mode of taking conversation as a main form, and the interview-type comprehensive program is an interview program which is mainly aimed at pleasure, mind and body and leisure fun, and is added with more comprehensive components and comic situation design to achieve a dramatic effect so as to be earmarked by entertainers. Its guests are mainly celebrity and sports stars, and therefore tend to have a very high popularity among young people. Although the programs are not similar to other art programs and are usually shot in a single scene and stage, a large number of cameras are still required to be arranged on the spot, and during shooting, pictures shot at different angles by different scenes on the spot are fully utilized to synthesize the initial program through a series of complicated operations such as real-time coordination between a director on the spot and each machine group member, lens cutting and the like, which often requires rich command experience and on-site capability of the director.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for intelligently generating interview-type comprehensive programs, which can quickly generate an initial film and provide the editing personnel for later-stage editing to quickly edit and generate the film, thereby reducing the manual load.
The purpose of the invention is realized by the following scheme:
a method for intelligently generating interview-type integrated art programs comprises the following steps:
s1, recording program video materials shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video;
s3, extracting video characteristics of each channel material;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features;
and S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips.
Further, in step S2, the setting the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
Further, in step S3, the following steps are included:
s31, establishing a face library containing the host and the guest of the field program;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame;
s33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error;
and S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time.
Further, in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstColumn elements.
Further, in step S32, if there is anyVideo material of each channel, wherein each video material isFrames, each frame having been aligned on a timeline, are passed throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame,
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,denotes the firstAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,denotes the firstThe key points of all the faces detected by the frame,is shown asFirst of frame detectionPersonal face keyThe point(s) is (are) such that,denotes the firstThe face detected by the frame corresponds to the identified name,is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function. The result of extracting video features from all stories is expressed as。
Further, in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein the content of the first and second substances,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inIf the value is larger than the set empirical value, the image is representedAnd (5) stabilizing the picture.
Further, in step S34, for the second stepAn individual materialTaking a fixed time window size of(i.e., fixed duration of time of) The key point data of the face of the same personI.e. by
WhereinIs shown byThe average value of the human figure mouth-shaped area,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenWhen the value is larger than the set empirical value, the name isThe character ofSpeaking during the time period is marked as a speaker.
Further, in step S4, the following steps are included:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenGreater than a set empirical value, the flagContinuously traversing subsequent results for the entry point of the updated candidate segment whenWhen the value is less than or equal to the set empirical value, markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifIf the value is larger than the set empirical value, the segment is dividedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list。
Further, in step S5, the following steps are included:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42Final candidate segment list for individual channel materialAnd the step ofAnd (3) filling the segments in the final candidate list of each channel material into the final slicing timeline according to the speaker marking result in the S34 and according to the following rules (the higher the priority is, the more the front is), so as to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above ruleAnd selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The beneficial effects of the invention include:
(1) the method of the invention provides a program initial film generation method by utilizing video face recognition, speaker recognition and picture stability analysis through observing the on-site command and lens cutting logic of a director when shooting interview-type integrated art programs, and the method extracts the most appropriate lens segments from pictures shot from different angles and automatically generates the interview-type integrated art program initial film so as to reduce the workload of the director and later-period program editors.
(2) The invention provides a simple and efficient method for automatically synthesizing interview-type comprehensive video program initial films only by a small amount of presetting; specifically, roles are divided according to scenes by shooting pictures of different cameras on site in a node list system, a host and a guest are marked through face recognition processing, a speaker is marked through mouth shape analysis, invalid lenses are filtered through calculating picture stability scores to generate a candidate video clip list, and finally all candidate video clips are combined regularly to generate a program primary clip. The method of the invention achieves the purposes of quickly generating the initial film, providing the post-editing personnel with quick editing and film forming and reducing the manual load.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of steps in an embodiment of a method of the present invention;
FIG. 2 is a flow chart of the method embodiment of the present invention for extracting visual features from a channel material.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
As shown in fig. 1 and 2, a method for intelligently generating interview-type integrated art programs includes the steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
for example, in this step, videos of programs shot by 6 cameras at the scene of the program "when going on the spring and evening" are recorded, respectively(ii) a Other programs may also be recorded, and the number of cameras may be 8, 10, 12, and the like, which is not described herein again.
S2, setting the role played by each channel material according to the camera shooting picture in the program video;
according to pictures shot by a cameraSetting the role played by each channel material; in particular, the amount of the solvent to be used,the camera is fixed, the shot picture is a close shot,the camera is fixed, the shot picture is a middle view,the camera is fixed, the shot picture is a long shot,the camera is a rocker arm camera, and the shot picture is a long shot.
In step S2, the setting of the role played by each channel material includes the following steps: dividing each channel material into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage.
S3, extracting video features for each channel material, in step S3, the method includes the following steps:
s31, establishing a face library containing the host and the guest of the field program;
in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstColumn elements.
S32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is anyThe video materials of the channels, here, N =6 (may be other numbers), each of which is a video material of one channelFrames, each frame having been aligned on a timeline, are passed throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame;
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,denotes the firstAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,denotes the firstThe key points of all the faces detected by the frame,is shown asFirst of frame detectionThe key points of the face of the individual,denotes the firstThe face detected by the frame corresponds to the identified name,is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function. The result of extracting video special frames from all the materials is expressed as。
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein the content of the first and second substances,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inWhen the image is larger than a certain preset value, the image is representedAnd (5) stabilizing the picture. In the present embodiment, for example, the preset value is taken asI.e. byThen represent the imageAnd (5) stabilizing the picture.
And S34, using the data in the step S31 and the face key point data of the same person in continuous time dimension, carrying out mouth shape analysis and judging whether the person is speaking in the set time. In step S34, forFirst, theAn individual materialTaking a fixed duration ofThe key point data of the face of the same personI.e. by
WhereinThe average value of the human mouth shape area in the period of time is shown,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenIf it is greater than a predetermined value, V may be 500, and is referred to asThe character ofSpeaking during the time period is marked as a speaker. In this embodiment, T may be 250 units, for example, and is selected according to actual conditions.
S4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenAbove a certain preset value (where the preset value can be 0.002, depending on different programs), the mark is markedContinuously traversing subsequent results for the entry point of the updated candidate segment whenLess than or equal to a predetermined value (the predetermined value may be 0.002, depending on the program), markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifAbove a certain preset value (here, 50 frames) the segment is segmentedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list。
And S5, selecting candidate video clips according to predefined rules, and synthesizing the program initial clips. In step S5, the method includes the steps of:
s51, setting priority according to the scene according to the shooting picture category of each channel material; in particular, for the 6 channel materialTo,The highest priority is given to the first group,the priority level is set to a second priority level,the lowest priority;
s52, integrating the step S42Final candidate segment list for individual channel materialAnd the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
Further, in step S51, the priority is set: close range>Middle view>And (5) distant view. Further, in step S52, a time line gap filling method is adopted, i.e. the current time, according to the above ruleAnd selecting the most suitable candidate segment, filling the segment into the corresponding time line for generating the initial segment, updating the current time as the corresponding time of the candidate segment out point, and repeating the steps until all time lines for generating the initial segment are filled.
The parts not involved in the present invention are the same as or can be implemented using the prior art.
The above-described embodiment is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application and principle of the present invention disclosed in the present application, and the present invention is not limited to the method described in the above-described embodiment of the present invention, so that the above-described embodiment is only preferred, and not restrictive.
Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.
Claims (2)
1. A method for intelligently generating interview-type comprehensive programs is characterized by comprising the following steps:
s1, recording program videos shot by a plurality of cameras on a program site through multichannel recording software;
s2, setting the role played by each channel material according to the camera shooting picture in the program video; in step S2, the setting of the role played by each channel material includes the following steps: dividing the channel materials into three categories according to the scene, namely a close scene, a middle scene and a long scene; the shooting picture of the close shot is close-up of a guest and a host; the shot picture of the middle scene is the interaction between the guests and the guests, between the guests and the host and between the host and the host; the shot picture of the long shot is the whole stage;
s3, extracting video characteristics of each channel material; in step S3, the method includes the steps of:
s31, establishing a face library containing the host and the guest of the field program; in step S31, if the program is sharedAnd the person collects the single photos of the host and the guest related to the program through the Internet, one photo for each person, extracts 512-dimensional face features through a face recognition network to serve as the character representation of the person, and if the person has the face features, the 512-dimensional face featuresFeature matrix ofAndname matrix of;Is an integer which is the number of the whole,respectively correspond to the matrixAndto (1) aGo to the firstA column element;
s32, performing face recognition analysis on the video material of each channel, and extracting face frame coordinates, face 68 key point coordinates and corresponding names in each frame; in step S32, if there is anyVideo material of each channel, wherein each video material isFrames, each frame having been aligned on a timeline, are passed throughAn individual materialTo (1) aFrame imageFace recognition processing is carried out to obtain a processing result set of the frame,
WhereinDenotes the firstA face feature matrix obtained by frame extraction,in order to detect the number of faces,is shown asFirst of frame extractionThe characteristics of the individual's face are,is shown asAll the face frames detected by the frame are,is shown asFirst of frame detectionThe number of the face frames is one,is shown asThe key points of all the faces detected by the frame,is shown asFirst of frame detectionThe key points of the face of the individual,is shown asThe face detected by the frame corresponds to the identified name,is shown asFirst of frame detectionThe name of the person corresponding to the individual person,
namely, the name with the highest similarity in the face database is taken as the name corresponding to the face,is shown asThe name of the individual person is used,indicating that the index corresponding to the maximum value is taken,representing a similarity calculation function; the result of extracting video features from all stories is expressed as;
S33, performing picture stability analysis on the video material of each channel, and marking a blurred picture caused by camera movement or focusing error; in step S33, for the second stepAn individual materialTo (1) aFrame imageGiven its width ofHigh isBy counting the picture stability scoresTo characterize whether the frame of image picture is stable,
wherein the content of the first and second substances,is to show toThe frame image is taken as a gray-scale image,which represents the fourier transform of the signal,representing the conversion of the 0 frequency component to the center of the spectrum,it is indicated that the absolute value is taken,is composed ofThe absolute value of (a) is,is composed ofThe grayscale map of (a) is transformed to the frequency domain and the 0-frequency component is converted to the result of the center of the frequency spectrum,is a threshold value set asOf medium maximum value,Is composed ofThe number of pixels greater than the threshold value inIf the value is larger than the set empirical value, the image is representedThe picture is stable;
s34, using the data in the step S31 and using the human face key point data of the same person continuous time dimension to analyze the mouth shape and judge whether the person is speaking in the set time; in step S34, for the second stepAn individual materialTaking a fixed time window size ofThe key point data of the face of the same personI.e. by
WhereinIs shown byThe average value of the human figure mouth-shaped area,representing a characterAt the moment of timeThe key points of the face at the time of the operation,indicates the calculated area thereof whenWhen the value is larger than the set empirical value, the name isThe character ofSpeaking in a time period and marking as a speaker;
s4, generating a plurality of candidate video clips in each channel according to the extracted video features; in step S4, the method includes the steps of:
s41, generating initial candidate video clips of each channel according to the picture stabilization result obtained by analyzing the video material of each channel in the step S33; for the firstAn individual materialAll-frame analysis results ofGo through all the results whenGreater than a set empirical value, the flagContinuously traversing subsequent results for the entry point of the updated candidate segment whenWhen the value is less than or equal to the set empirical value, markingGenerating material for the out-pointing of the updated candidate segment, and so onIn a common vesselInitial candidate segment list of candidate segments;
S42, traversing the initial candidate segment list generated in S41Comparing the current segmentOut point ofWith the next segmentIn the point of entryIf, ifIf the value is larger than the set empirical value, the segment is dividedAnd fragments thereofAre combined intoAt the point of entry isIn the point of entryAt the point of departure isOut point ofAnd so on, generating a final candidate segment list;
S5, selecting candidate video clips according to predefined rules to compose a primary program, in step S5, the method includes the following steps:
s51, setting priority according to the scene according to the shooting picture category of each channel material;
s52, integrating the step S42Final candidate segment list for individual channel materialAnd the speaker marking result in the step S34, filling the segments in the final candidate list of each channel material into the final slicing time line according to the following rules to obtain the final composite video:
the segment is a close shot, there is a speaker, and the speaker is a guest;
the segment is a close shot, there is a speaker, and the speaker is the moderator;
the segment is a medium scene, speakers exist, and the number of the speakers is not higher than 3;
the segment is a perspective.
2. The method for intelligently generating interview-like variety programs according to claim 1, wherein in step S51, priority is set as: short shot > medium shot > long shot.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110803384.0A CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110803384.0A CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113269854A CN113269854A (en) | 2021-08-17 |
CN113269854B true CN113269854B (en) | 2021-10-15 |
Family
ID=77236586
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110803384.0A Active CN113269854B (en) | 2021-07-16 | 2021-07-16 | Method for intelligently generating interview-type comprehensive programs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113269854B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115174962A (en) * | 2022-07-22 | 2022-10-11 | 湖南芒果无际科技有限公司 | Rehearsal simulation method and device, computer equipment and computer readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005091211A1 (en) * | 2004-03-16 | 2005-09-29 | 3Vr Security, Inc. | Interactive system for recognition analysis of multiple streams of video |
CN104732991A (en) * | 2015-04-08 | 2015-06-24 | 成都索贝数码科技股份有限公司 | System and method for rapidly sorting, selecting and editing entertainment program massive materials |
CN105307028A (en) * | 2015-10-26 | 2016-02-03 | 新奥特(北京)视频技术有限公司 | Video editing method and device specific to video materials of plurality of lenses |
CN106682617A (en) * | 2016-12-28 | 2017-05-17 | 电子科技大学 | Image definition judgment and feature extraction method based on frequency spectrum section information |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN111191484A (en) * | 2018-11-14 | 2020-05-22 | 普天信息技术有限公司 | Method and device for recognizing human speaking in video image |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9818136B1 (en) * | 2003-02-05 | 2017-11-14 | Steven M. Hoffberg | System and method for determining contingent relevance |
US8095466B2 (en) * | 2006-05-15 | 2012-01-10 | The Directv Group, Inc. | Methods and apparatus to conditionally authorize content delivery at content servers in pay delivery systems |
US20170032559A1 (en) * | 2015-10-16 | 2017-02-02 | Mediatek Inc. | Simulated Transparent Device |
CN110691258A (en) * | 2019-10-30 | 2020-01-14 | 中央电视台 | Program material manufacturing method and device, computer storage medium and electronic equipment |
-
2021
- 2021-07-16 CN CN202110803384.0A patent/CN113269854B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005091211A1 (en) * | 2004-03-16 | 2005-09-29 | 3Vr Security, Inc. | Interactive system for recognition analysis of multiple streams of video |
CN104732991A (en) * | 2015-04-08 | 2015-06-24 | 成都索贝数码科技股份有限公司 | System and method for rapidly sorting, selecting and editing entertainment program massive materials |
CN105307028A (en) * | 2015-10-26 | 2016-02-03 | 新奥特(北京)视频技术有限公司 | Video editing method and device specific to video materials of plurality of lenses |
CN106682617A (en) * | 2016-12-28 | 2017-05-17 | 电子科技大学 | Image definition judgment and feature extraction method based on frequency spectrum section information |
CN108875602A (en) * | 2018-05-31 | 2018-11-23 | 珠海亿智电子科技有限公司 | Monitor the face identification method based on deep learning under environment |
CN111191484A (en) * | 2018-11-14 | 2020-05-22 | 普天信息技术有限公司 | Method and device for recognizing human speaking in video image |
Non-Patent Citations (4)
Title |
---|
"索贝AI剪辑应用于总台综艺访谈类节目";无;《现代电视技术》;20200229(第2期);第160页 * |
F'elicien Vallet等."ROBUST VISUAL FEATURES FOR THE MULTIMODAL IDENTIFICATION OF UNREGISTERED SPEAKERS IN TV TALK-SHOWS".《2010 IEEE 17th International Conference on Image Processing》.2010, * |
无."索贝AI剪辑应用于总台综艺访谈类节目".《现代电视技术》.2020,(第2期), * |
说话人辨认中有效参数的研究;王炳锡等;《应用声学》;19920431;第11卷(第02期);第20-23页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113269854A (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107707931B (en) | Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment | |
JP7252362B2 (en) | Method for automatically editing video and portable terminal | |
JP7228682B2 (en) | Gating model for video analysis | |
Chen et al. | What comprises a good talking-head video generation?: A survey and benchmark | |
CN111683209B (en) | Mixed-cut video generation method and device, electronic equipment and computer-readable storage medium | |
Kang | Affective content detection using HMMs | |
US7949188B2 (en) | Image processing apparatus, image processing method, and program | |
US8879788B2 (en) | Video processing apparatus, method and system | |
JP5510167B2 (en) | Video search system and computer program therefor | |
WO2022184117A1 (en) | Deep learning-based video clipping method, related device, and storage medium | |
TWI253860B (en) | Method for generating a slide show of an image | |
CN107430780B (en) | Method for output creation based on video content characteristics | |
CN109218629B (en) | Video generation method, storage medium and device | |
CN112367551B (en) | Video editing method and device, electronic equipment and readable storage medium | |
WO2011015909A1 (en) | System for creating a capsule representation of an instructional video | |
JPH11514479A (en) | Method for computerized automatic audiovisual dubbing of movies | |
US20170213576A1 (en) | Live Comics Capturing Camera | |
CN110505498A (en) | Processing, playback method, device and the computer-readable medium of video | |
WO2023197979A1 (en) | Data processing method and apparatus, and computer device and storage medium | |
WO2022061806A1 (en) | Film production method, terminal device, photographing device, and film production system | |
CN113269854B (en) | Method for intelligently generating interview-type comprehensive programs | |
Zhang et al. | Detecting and removing visual distractors for video aesthetic enhancement | |
US9542976B2 (en) | Synchronizing videos with frame-based metadata using video content | |
JP6389296B1 (en) | VIDEO DATA PROCESSING DEVICE, VIDEO DATA PROCESSING METHOD, AND COMPUTER PROGRAM | |
CN113255628B (en) | Scene identification recognition method for news scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |