CN114979705A - Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning - Google Patents

Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning Download PDF

Info

Publication number
CN114979705A
CN114979705A CN202210383218.4A CN202210383218A CN114979705A CN 114979705 A CN114979705 A CN 114979705A CN 202210383218 A CN202210383218 A CN 202210383218A CN 114979705 A CN114979705 A CN 114979705A
Authority
CN
China
Prior art keywords
video
text
quality
hake
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210383218.4A
Other languages
Chinese (zh)
Inventor
周景林
曹瀚洋
周奕希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210383218.4A priority Critical patent/CN114979705A/en
Publication of CN114979705A publication Critical patent/CN114979705A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biophysics (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of: establishing a field video material library needing propaganda; training the RVM by the constructed database; establishing a primitive library of video content to be described; organizing primitives using a logical inference engine of HAKE; establishing a text type needing semantic understanding; training a transformer by using a data set to obtain a text understanding network; inputting the video needing automatic clipping to the RVM network; then inputting the video to a HAKE video understanding engine, and outputting the video with the label; inputting the editing requirement text into a transform model, and outputting labels arranged according to a semantic sequence; comparing and matching the obtained labels; sequencing the video matching results; the steps are integrated into an integrated system, and the operation facing to a user is simplified. The invention solves the problems that the threshold of the pre-editing technology is high, and a plurality of videos cannot be edited simultaneously, and a large amount of human resources and time resources are consumed.

Description

Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning
Technical Field
The invention relates to the technical field of video automatic editing methods based on deep learning, in particular to an automatic editing method based on deep learning, an automatic attention mechanism and symbolic reasoning.
Background
With the development of science and technology, the consumption of contents by the public experiences huge changes from text to picture and from picture to video, and compared with pictures and texts, the video has more three-dimensional and visual effects and becomes an important window for connecting people with the society. Meanwhile, the content requirements of the public are gradually increasing in the face of a large amount of information input. Video is changed from the original entertainment carrier to an important channel for acquiring news and knowledge. Therefore, with the development of socio-economic and the change of propagation media in China, the research on the automatic generation of images needs to be solved urgently.
In the automatic generation of images, the classification according to the image feature values is the key of the whole image generation process. In the traditional video image editing process, the types and the characteristics of video segments are mainly determined according to the experience and subjective judgment of an editor and are embedded into video logic which is well conceived by the editor in advance. Although the industry admission threshold is improved and the editing auxiliary software is continuously upgraded, the quality and the efficiency of video image production are improved to a certain extent. However, with the increasing of image materials and the increasing of image requirements, image editing becomes more and more complex. Therefore, the traditional method cannot meet the current requirement, and the neural network has obvious advantages for processing the information, and the classification problem of fuzzy labeling factors is solved. The pre-editing technology has a high threshold, and a plurality of videos cannot be edited simultaneously, so that a large amount of human resources and time resources are consumed.
Disclosure of Invention
The invention aims to provide an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, and solves the problems that a pre-editing technology has a high threshold, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
In order to achieve the above object, the present invention provides an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning, comprising the following steps:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and meeting the requirement of analyzing the clips;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
Preferably, outputting the video with the label is to collect a large number of unprocessed video segments as input into the multi-channel pre-trained RVM network, delete low-quality segments due to assumed misoperation or environmental factors, and output high-quality segments without defects;
secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
Preferably, the construction work of the cell library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of a hierarchical structure and entities with the same level;
in the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing solid embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus and phase portions, respectively, and HAKE maps one entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 To evaluate the effect of modulus and phase;
the third step is parallel to the video segmentationPerforming text semantic segmentation, and completing the task by adopting a Transformer which consists of self-attention and Feed Forward Neural Network only, wherein in an encoder of the Transformer, data is firstly subjected to a module called self-attention to obtain a weighted feature vector Z which is expressed as a self-attention module
Figure BDA0003592662180000031
After obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 The two attentions are respectively used for calculating input and output weights, inputting a required text of a user into a transformer, outputting semantic labels which are logically arranged according to the text semantics through understanding the text, and mostly presenting in a primitive form;
and finally, comparing the video labels with the text labels by adopting a common comparison algorithm, matching within a certain fault-tolerant range, sequencing the video contents according to the sequence of the text labels, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, wherein the video fragments can be used as fragments.
Therefore, the automatic editing method based on the deep learning, the self-attention mechanism and the symbolic reasoning solves the problems that the threshold of the pre-editing technology is high, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning according to the present invention;
FIG. 2 is a schematic RVM structure diagram of an embodiment of an automatic clipping method based on deep learning, self-attention mechanism and symbolic reasoning according to the present invention;
FIG. 3 is a schematic diagram of a knowledge graph embedding diagram of an embodiment of an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning according to the present invention;
fig. 4 is a schematic structural diagram of a transform in an embodiment of an automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning.
Detailed Description
The technical solution of the present invention is further illustrated by the accompanying drawings and examples.
Examples
The invention provides an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and adaptive to the requirement of analysis clipping;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
The invention provides an automatic editing method based on deep learning, a self-attention mechanism and symbolic reasoning, which comprises the following steps of: a large number of unprocessed video segments are first collected as input into a multi-channel pre-trained RVM network, low quality segments due to thought of as mishandling or environmental factors are deleted, and high quality segments without defects are output.
Secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
Finally, the construction work of the element library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of hierarchy structure and entities with the same level.
In the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing entity embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus, respectivelyQuantity part and phase part, HAKE maps an entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 The effect of modulus and phase was evaluated.
Thirdly, performing text semantic segmentation in parallel while performing video segmentation, and completing the task by adopting a Transformer, wherein the Transformer consists of self-attention and Feed Forward Neural Network, and in an encoder of the Transformer, data first passes through a module called self-attention ' to obtain a weighted feature vector Z which is expressed as a ' self-attention ' module
Figure BDA0003592662180000061
After obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 Two attentions are used to compute the weights of input and output, respectively: self-orientation: the relationship between the current translation and the translated context; Encoder-Decnoder Attention: the relationship between the current translated and encoded feature vectors. The structure is very suitable for a task of semantic understanding, a text required by a user is input into a transform, semantic labels which are logically arranged according to text semantics are output through the text understanding, and the semantic labels are mostly presented in a primitive form.
And finally, comparing the video labels with the text labels by adopting a common comparison algorithm, matching within a certain fault-tolerant range, sequencing the video contents according to the sequence of the text labels, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, wherein the video fragments can be used as fragments.
Therefore, the automatic editing method based on the deep learning, the self-attention mechanism and the symbolic reasoning solves the problems that the threshold of the pre-editing technology is high, and a large amount of human resources and time resources cannot be consumed for simultaneously editing a plurality of videos.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the invention without departing from the spirit and scope of the invention.

Claims (3)

1. An automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning is characterized by comprising the following steps:
s1, establishing a field video material library needing propaganda, and adopting RVM to segment low-quality video segments, wherein the low-quality original video and the artificially edited high-quality video are required to be contained;
s2, training the RVMs by using the database built in the step S1 to obtain a network architecture suitable for the task, and performing supervised training on the original RVMs by using a data set with larger capacity and containing low-quality fragments and corresponding high-quality fragments to obtain a network suitable for dividing videos with high and low video quality;
s3, establishing a primitive library of the video content to be described;
s4, organizing the primitives by using a logic inference engine of HAKE to obtain a series of labels conforming to semantic logic;
s5, establishing a text type needing semantic understanding, and mainly considering a manually marked related data set;
s6, training the transformer by using the data set of the step S5 to obtain a text understanding network with higher precision and meeting the requirement of analyzing the clips;
s7, inputting the video needing automatic clipping into the RVM network trained in the step S2, and obtaining a flaw part without human error or environmental factor influence to obtain a high-quality video;
s8, inputting the high-quality video obtained in the step S7 into a HAKE video understanding engine, and outputting the video with the label;
s9, inputting the clipping requirement text into the transformer model trained in the step S6, and outputting labels arranged according to the semantic sequence;
s10, matching the labels obtained in the step S8 and the step S9 in a matching mode;
s11, sorting the videos according to the matching result of the step S10;
and S12, integrating the steps into an integrated system, and simplifying the operation facing the user.
2. The automatic clipping method based on deep learning, self-attention mechanism and symbolic reasoning according to claim 1, wherein:
outputting a video with a label specifically comprises the steps of firstly collecting a large number of unprocessed video segments, using the collected video segments as input into a multichannel pre-trained RVM network, deleting low-quality segments caused by operation errors or environmental factors, and outputting high-quality segments without defects;
secondly, after obtaining high-quality video clips, enabling the high-quality video clips to enter HAKE as input, enabling the HAKE to understand video content through three stages of work, building a primitive library in the related field, continuously expanding the capacity of the primitive library as required, combining the primitives according to language logic by using a logic inference rule, labeling the video content by using CNN, and outputting the video with labels.
3. The automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning according to claim 1, wherein the building work of the primitive library is divided into three steps:
the first step is to realize the identification of two types of entities, entities with different levels of a hierarchical structure and entities with the same level;
in the second step, hierarchical perception knowledge graph embedding is carried out, the HAKE consists of two parts, namely a quantity part and a phase part, modeling is carried out on two different classes of entities respectively, and e is used in a modulus part for distinguishing embedding of different parts m And h m Representing entity embedding and relationship embedding, while using e in the phase part p And r p Representing solid embedding and relational embedding, HAKE groups together the modulus and phase portions, maps the entities into a polar coordinate system, where the radial and angular coordinates correspond to the modulus and phase portions, respectively, and HAKE maps one entity h to [ h m ;h p ],[·;·]Representing a concatenation of two vectors with a scoring function d r,m (h,t)=||h m r m -t m || 2 To evaluate the effect of modulus and phase;
thirdly, performing text semantic segmentation in parallel while performing video segmentation, and completing the task by adopting a Transformer, wherein the Transformer consists of self-attention and Feed Forward neural network, and in an encoder of the Transformer, data first passes through a module called self-attention ' to obtain a weighted feature vector Z which is expressed as a ' self-attention ' module
Figure FDA0003592662170000021
After obtaining Z, it is sent to the next module of the encoder, i.e. a Feed Forward Neural Network, which is fully connected with two layers, the activation function of the first layer is ReLU and the second layer is a linear activation function, which can be expressed as ffn (Z) ═ max (0, ZW) 1 +b 1 )W 2 +b 2 The two attentions are respectively used for calculating input and output weights, inputting a required text of a user into a transformer, outputting semantic labels which are logically arranged according to the text semantics through understanding the text, and displaying the semantic labels in a primitive form;
and finally, comparing the video tags with the text tags by adopting a common comparison algorithm, matching within a certain fault tolerance range, sequencing the video contents according to the sequence of the text tags, and finally obtaining high-quality video fragments which are arranged according to the text semantic sequence, namely the video fragments can be used as fragments.
CN202210383218.4A 2022-04-12 2022-04-12 Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning Pending CN114979705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210383218.4A CN114979705A (en) 2022-04-12 2022-04-12 Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210383218.4A CN114979705A (en) 2022-04-12 2022-04-12 Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning

Publications (1)

Publication Number Publication Date
CN114979705A true CN114979705A (en) 2022-08-30

Family

ID=82978212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210383218.4A Pending CN114979705A (en) 2022-04-12 2022-04-12 Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning

Country Status (1)

Country Link
CN (1) CN114979705A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692676A (en) * 2023-12-08 2024-03-12 广东创意热店互联网科技有限公司 Video quick editing method based on artificial intelligence technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018040059A1 (en) * 2016-09-02 2018-03-08 Microsoft Technology Licensing, Llc Clip content categorization
CN110334219A (en) * 2019-07-12 2019-10-15 电子科技大学 The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN112423023A (en) * 2020-12-09 2021-02-26 珠海九松科技有限公司 Intelligent automatic video mixed-cutting method
CN113821679A (en) * 2021-07-09 2021-12-21 腾讯科技(深圳)有限公司 Video frame positioning method, electronic equipment and computer readable storage medium
CN113870010A (en) * 2021-09-30 2021-12-31 平安科技(深圳)有限公司 Data processing method, device and equipment based on machine learning and storage medium
CN114026874A (en) * 2020-10-27 2022-02-08 深圳市大疆创新科技有限公司 Video processing method and device, mobile device and readable storage medium
CN114297440A (en) * 2021-12-30 2022-04-08 深圳市富之富信息科技有限公司 Video automatic generation method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018040059A1 (en) * 2016-09-02 2018-03-08 Microsoft Technology Licensing, Llc Clip content categorization
CN110334219A (en) * 2019-07-12 2019-10-15 电子科技大学 The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN114026874A (en) * 2020-10-27 2022-02-08 深圳市大疆创新科技有限公司 Video processing method and device, mobile device and readable storage medium
CN112423023A (en) * 2020-12-09 2021-02-26 珠海九松科技有限公司 Intelligent automatic video mixed-cutting method
CN113821679A (en) * 2021-07-09 2021-12-21 腾讯科技(深圳)有限公司 Video frame positioning method, electronic equipment and computer readable storage medium
CN113870010A (en) * 2021-09-30 2021-12-31 平安科技(深圳)有限公司 Data processing method, device and equipment based on machine learning and storage medium
CN114297440A (en) * 2021-12-30 2022-04-08 深圳市富之富信息科技有限公司 Video automatic generation method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ASHISH VASWANI ET AL.: "Attention Is All You Need", 《PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 *
YONG-LU LI ET AL.: "HAKE_ A Knowledge Engine Foundation for__Human Activity Understanding", 《ARXIV》, pages 4 - 5 *
ZHANGQIU ZHANG ET AL.: "Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction", 《ARXIV》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692676A (en) * 2023-12-08 2024-03-12 广东创意热店互联网科技有限公司 Video quick editing method based on artificial intelligence technology

Similar Documents

Publication Publication Date Title
CN108984683B (en) Method, system, equipment and storage medium for extracting structured data
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
CN111709518A (en) Method for enhancing network representation learning based on community perception and relationship attention
CN109874053A (en) The short video recommendation method with user's dynamic interest is understood based on video content
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN112580362B (en) Visual behavior recognition method, system and computer readable medium based on text semantic supervision
US11876986B2 (en) Hierarchical video encoders
CN111651566B (en) Multi-task small sample learning-based referee document dispute focus extraction method
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN116049397A (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN111353314A (en) Story text semantic analysis method for animation generation
WO2023124647A1 (en) Summary determination method and related device thereof
CN113657115A (en) Multi-modal Mongolian emotion analysis method based on ironic recognition and fine-grained feature fusion
CN112800263A (en) Video synthesis system, method and medium based on artificial intelligence
CN115563342A (en) Method, system, equipment and storage medium for video theme retrieval
CN114979705A (en) Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning
CN114328939A (en) Natural language processing model construction method based on big data
CN112528642B (en) Automatic implicit chapter relation recognition method and system
CN117173730A (en) Document image intelligent analysis and processing method based on multi-mode information
CN107491814B (en) Construction method of process case layered knowledge model for knowledge push
CN112800259B (en) Image generation method and system based on edge closure and commonality detection
CN114842301A (en) Semi-supervised training method of image annotation model
CN111708896B (en) Entity relationship extraction method applied to biomedical literature
CN110659390A (en) Video content retrieval method based on deep convolutional network
CN116611514B (en) Value orientation evaluation system construction method based on data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220830

RJ01 Rejection of invention patent application after publication