CN114979705A - Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning - Google Patents
Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning Download PDFInfo
- Publication number
- CN114979705A CN114979705A CN202210383218.4A CN202210383218A CN114979705A CN 114979705 A CN114979705 A CN 114979705A CN 202210383218 A CN202210383218 A CN 202210383218A CN 114979705 A CN114979705 A CN 114979705A
- Authority
- CN
- China
- Prior art keywords
- video
- text
- quality
- hake
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000013135 deep learning Methods 0.000 title claims abstract description 18
- 230000007246 mechanism Effects 0.000 title claims abstract description 17
- 241000612182 Rexea solandri Species 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000012163 sequencing technique Methods 0.000 claims abstract description 4
- 239000012634 fragment Substances 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 239000007787 solid Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The invention discloses an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of: establishing a field video material library needing propaganda; training the RVM by the constructed database; establishing a primitive library of video content to be described; organizing primitives using a logical inference engine of HAKE; establishing a text type needing semantic understanding; training a transformer by using a data set to obtain a text understanding network; inputting the video needing automatic clipping to the RVM network; then inputting the video to a HAKE video understanding engine, and outputting the video with the label; inputting the editing requirement text into a transform model, and outputting labels arranged according to a semantic sequence; comparing and matching the obtained labels; sequencing the video matching results; the steps are integrated into an integrated system, and the operation facing to a user is simplified. The invention solves the problems that the threshold of the pre-editing technology is high, and a plurality of videos cannot be edited simultaneously, and a large amount of human resources and time resources are consumed.
Description
Technical Field
The invention relates to the technical field of video automatic editing methods based on deep learning, in particular to an automatic editing method based on deep learning, an automatic attention mechanism and symbolic reasoning.
Background
With the development of science and technology, the consumption of contents by the public experiences huge changes from text to picture and from picture to video, and compared with pictures and texts, the video has more three-dimensional and visual effects and becomes an important window for connecting people with the society. Meanwhile, the content requirements of the public are gradually increasing in the face of a large amount of information input. Video is changed from the original entertainment carrier to an important channel for acquiring news and knowledge. Therefore, with the development of socio-economic and the change of propagation media in China, the research on the automatic generation of images needs to be solved urgently.
In the automatic generation of images, the classification according to the image feature values is the key of the whole image generation process. In the traditional video image editing process, the types and the characteristics of video segments are mainly determined according to the experience and subjective judgment of an editor and are embedded into video logic which is well conceived by the editor in advance. Although the industry admission threshold is improved and the editing auxiliary software is continuously upgraded, the quality and the efficiency of video image production are improved to a certain extent. However, with the increasing of image materials and the increasing of image requirements, image editing becomes more and more complex. Therefore, the traditional method cannot meet the current requirement, and the neural network has obvious advantages for processing the information, and the classification problem of fuzzy labeling factors is solved. The pre-editing technology has a high threshold, and a plurality of videos cannot be edited simultaneously, so that a large amount of human resources and time resources are consumed.
Disclosure of Invention
The invention aims to provide an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, and solves the problems that a pre-editing technology has a high threshold, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
In order to achieve the above object, the present invention provides an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning, comprising the following steps:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and meeting the requirement of analyzing the clips;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
Preferably, outputting the video with the label is to collect a large number of unprocessed video segments as input into the multi-channel pre-trained RVM network, delete low-quality segments due to assumed misoperation or environmental factors, and output high-quality segments without defects;
secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
Preferably, the construction work of the cell library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of a hierarchical structure and entities with the same level;
in the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing solid embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus and phase portions, respectively, and HAKE maps one entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 To evaluate the effect of modulus and phase;
the third step is parallel to the video segmentationPerforming text semantic segmentation, and completing the task by adopting a Transformer which consists of self-attention and Feed Forward Neural Network only, wherein in an encoder of the Transformer, data is firstly subjected to a module called self-attention to obtain a weighted feature vector Z which is expressed as a self-attention moduleAfter obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 The two attentions are respectively used for calculating input and output weights, inputting a required text of a user into a transformer, outputting semantic labels which are logically arranged according to the text semantics through understanding the text, and mostly presenting in a primitive form;
and finally, comparing the video labels with the text labels by adopting a common comparison algorithm, matching within a certain fault-tolerant range, sequencing the video contents according to the sequence of the text labels, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, wherein the video fragments can be used as fragments.
Therefore, the automatic editing method based on the deep learning, the self-attention mechanism and the symbolic reasoning solves the problems that the threshold of the pre-editing technology is high, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning according to the present invention;
FIG. 2 is a schematic RVM structure diagram of an embodiment of an automatic clipping method based on deep learning, self-attention mechanism and symbolic reasoning according to the present invention;
FIG. 3 is a schematic diagram of a knowledge graph embedding diagram of an embodiment of an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning according to the present invention;
fig. 4 is a schematic structural diagram of a transform in an embodiment of an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning.
Detailed Description
The technical solution of the present invention is further illustrated by the accompanying drawings and examples.
Examples
The invention provides an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and adaptive to the requirement of analysis clipping;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
The invention provides an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of: a large number of unprocessed video segments are first collected as input into a multi-channel pre-trained RVM network, low quality segments due to thought of as mishandling or environmental factors are deleted, and high quality segments without defects are output.
Secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
Finally, the construction work of the element library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of hierarchy structure and entities with the same level.
In the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing entity embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus, respectivelyQuantity part and phase part, HAKE maps an entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 The effect of modulus and phase was evaluated.
Thirdly, performing text semantic segmentation in parallel while performing video segmentation, and completing the task by adopting a Transformer, wherein the Transformer consists of self-attention and Feed Forward Neural Network, and in an encoder of the Transformer, data first passes through a module called self-attention ' to obtain a weighted feature vector Z which is expressed as a ' self-attention ' moduleAfter obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 Two attentions are used to compute the weights of input and output, respectively: self-orientation: the relationship between the current translation and the translated context; Encoder-Decnoder Attention: the relationship between the current translated and encoded feature vectors. The structure is very suitable for a task of semantic understanding, a text required by a user is input into a transform, semantic labels which are logically arranged according to text semantics are output through the text understanding, and the semantic labels are mostly presented in a primitive form.
And finally, comparing the video labels with the text labels by adopting a common comparison algorithm, matching within a certain fault-tolerant range, sequencing the video contents according to the sequence of the text labels, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, wherein the video fragments can be used as fragments.
Therefore, the automatic editing method based on the deep learning, the self-attention mechanism and the symbolic reasoning solves the problems that the threshold of the pre-editing technology is high, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the invention without departing from the spirit and scope of the invention.
Claims (3)
1. An automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning is characterized by comprising the following steps:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and meeting the requirement of analyzing the clips;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
2. The automatic clipping method based on deep learning, self-attention mechanism and symbolic reasoning according to claim 1, wherein:
outputting a video with a label specifically comprises the steps of firstly collecting a large number of unprocessed video segments, using the collected video segments as input into a multichannel pre-trained RVM network, deleting low-quality segments caused by operation errors or environmental factors, and outputting high-quality segments without defects;
secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining the primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
3. The automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning according to claim 1, wherein the building work of the primitive library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of a hierarchical structure and entities with the same level;
in the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing solid embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus and phase portions, respectively, and HAKE maps one entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 To evaluate the effect of modulus and phase;
thirdly, performing text semantic segmentation in parallel while performing video segmentation, and completing the task by adopting a Transformer, wherein the Transformer consists of self-attention and Feed Forward neural network, and in an encoder of the Transformer, data first passes through a module called self-attention ' to obtain a weighted feature vector Z which is expressed as a ' self-attention ' moduleAfter obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 The two attentions are respectively used for calculating input and output weights, inputting a required text of a user into a transformer, outputting semantic labels which are logically arranged according to the text semantics through understanding the text, and displaying the semantic labels in a primitive form;
and finally, comparing the video tags with the text tags by adopting a common comparison algorithm, matching within a certain fault tolerance range, sequencing the video contents according to the sequence of the text tags, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, namely the video fragments can be used as fragments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210383218.4A CN114979705A (en) | 2022-04-12 | 2022-04-12 | Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210383218.4A CN114979705A (en) | 2022-04-12 | 2022-04-12 | Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114979705A true CN114979705A (en) | 2022-08-30 |
Family
ID=82978212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210383218.4A Pending CN114979705A (en) | 2022-04-12 | 2022-04-12 | Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114979705A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692676A (en) * | 2023-12-08 | 2024-03-12 | 广东创意热店互联网科技有限公司 | Video quick editing method based on artificial intelligence technology |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018040059A1 (en) * | 2016-09-02 | 2018-03-08 | Microsoft Technology Licensing, Llc | Clip content categorization |
CN110334219A (en) * | 2019-07-12 | 2019-10-15 | 电子科技大学 | The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method |
CN112423023A (en) * | 2020-12-09 | 2021-02-26 | 珠海九松科技有限公司 | Intelligent automatic video mixed-cutting method |
CN113821679A (en) * | 2021-07-09 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Video frame positioning method, electronic equipment and computer readable storage medium |
CN113870010A (en) * | 2021-09-30 | 2021-12-31 | 平安科技(深圳)有限公司 | Data processing method, device and equipment based on machine learning and storage medium |
CN114026874A (en) * | 2020-10-27 | 2022-02-08 | 深圳市大疆创新科技有限公司 | Video processing method and device, mobile device and readable storage medium |
CN114297440A (en) * | 2021-12-30 | 2022-04-08 | 深圳市富之富信息科技有限公司 | Video automatic generation method and device, computer equipment and storage medium |
-
2022
- 2022-04-12 CN CN202210383218.4A patent/CN114979705A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018040059A1 (en) * | 2016-09-02 | 2018-03-08 | Microsoft Technology Licensing, Llc | Clip content categorization |
CN110334219A (en) * | 2019-07-12 | 2019-10-15 | 电子科技大学 | The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method |
CN114026874A (en) * | 2020-10-27 | 2022-02-08 | 深圳市大疆创新科技有限公司 | Video processing method and device, mobile device and readable storage medium |
CN112423023A (en) * | 2020-12-09 | 2021-02-26 | 珠海九松科技有限公司 | Intelligent automatic video mixed-cutting method |
CN113821679A (en) * | 2021-07-09 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Video frame positioning method, electronic equipment and computer readable storage medium |
CN113870010A (en) * | 2021-09-30 | 2021-12-31 | 平安科技(深圳)有限公司 | Data processing method, device and equipment based on machine learning and storage medium |
CN114297440A (en) * | 2021-12-30 | 2022-04-08 | 深圳市富之富信息科技有限公司 | Video automatic generation method and device, computer equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
ASHISH VASWANI ET AL.: "Attention Is All You Need", 《PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 * |
YONG-LU LI ET AL.: "HAKE_ A Knowledge Engine Foundation for__Human Activity Understanding", 《ARXIV》, pages 4 - 5 * |
ZHANGQIU ZHANG ET AL.: "Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction", 《ARXIV》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692676A (en) * | 2023-12-08 | 2024-03-12 | 广东创意热店互联网科技有限公司 | Video quick editing method based on artificial intelligence technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108984683B (en) | Method, system, equipment and storage medium for extracting structured data | |
WO2023065617A1 (en) | Cross-modal retrieval system and method based on pre-training model and recall and ranking | |
CN111709518A (en) | Method for enhancing network representation learning based on community perception and relationship attention | |
CN109874053A (en) | The short video recommendation method with user's dynamic interest is understood based on video content | |
CN108427740B (en) | Image emotion classification and retrieval algorithm based on depth metric learning | |
CN112580362B (en) | Visual behavior recognition method, system and computer readable medium based on text semantic supervision | |
US11876986B2 (en) | Hierarchical video encoders | |
CN111651566B (en) | Multi-task small sample learning-based referee document dispute focus extraction method | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN116049397A (en) | Sensitive information discovery and automatic classification method based on multi-mode fusion | |
CN111353314A (en) | Story text semantic analysis method for animation generation | |
WO2023124647A1 (en) | Summary determination method and related device thereof | |
CN113657115A (en) | Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion | |
CN112800263A (en) | Video synthesis system, method and medium based on artificial intelligence | |
CN115563342A (en) | Method, system, equipment and storage medium for video theme retrieval | |
CN114979705A (en) | Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning | |
CN114328939A (en) | Natural language processing model construction method based on big data | |
CN112528642B (en) | Automatic implicit chapter relation recognition method and system | |
CN117173730A (en) | Document image intelligent analysis and processing method based on multi-mode information | |
CN107491814B (en) | Construction method of process case layered knowledge model for knowledge push | |
CN112800259B (en) | Image generation method and system based on edge closure and commonality detection | |
CN114842301A (en) | Semi-supervised training method of image annotation model | |
CN111708896B (en) | Entity relationship extraction method applied to biomedical literature | |
CN110659390A (en) | Video content retrieval method based on deep convolutional network | |
CN116611514B (en) | Value orientation evaluation system construction method based on data driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220830 |
|
RJ01 | Rejection of invention patent application after publication |