CN115134660A - Video editing method and device, computer equipment and storage medium - Google Patents

Video editing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115134660A
CN115134660A CN202210737613.8A CN202210737613A CN115134660A CN 115134660 A CN115134660 A CN 115134660A CN 202210737613 A CN202210737613 A CN 202210737613A CN 115134660 A CN115134660 A CN 115134660A
Authority
CN
China
Prior art keywords
video
text
knowledge point
target
source file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210737613.8A
Other languages
Chinese (zh)
Inventor
马亿凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202210737613.8A priority Critical patent/CN115134660A/en
Publication of CN115134660A publication Critical patent/CN115134660A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234336Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Abstract

The application relates to the technical field of audio and video processing, and provides a video editing method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: when a video clipping instruction sent by a first target account is received, a target video source file is read according to the video clipping instruction, target audio information is extracted from the target video source file, and a target video text corresponding to the target video source file is obtained through text conversion. Determining at least one knowledge point text from the target video text based on a preset dictionary, then determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file, splicing the video clips corresponding to the same knowledge point text according to a preset video splicing strategy, and outputting short videos corresponding to different knowledge point texts. According to the scheme, the video editing is carried out on the target video source file on the basis of the knowledge point text in the target video source file, so that the video editing efficiency is improved.

Description

Video editing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of audio and video processing technologies, and in particular, to a video editing method, a video editing apparatus, a computer device, and a storage medium.
Background
The video editing is to use software to carry out nonlinear editing on a video source, to remix the added materials such as pictures, background music, special effects, scenes and the like with the video, to cut and combine the video source, and to generate new videos with different expressive forces through secondary coding.
In many existing enterprises, as related technical personnel without video clips need to arrange video source files in advance and then output the video source files as short video scripts, the short video scripts are submitted to a supplier, the supplier carries out video clips according to the short video scripts, the process is complicated, and for video source files such as training types, if the personnel of the enterprise understand the training content incompletely, the problem that the arranged short video texts are unclear easily occurs, the effect of the video clips is influenced, and in sum, the editing efficiency of the existing video clipping scheme is low.
Disclosure of Invention
Therefore, it is necessary to provide a video clipping method to solve the problem of low video clipping efficiency in the existing video clipping scheme.
A first aspect of an embodiment of the present application provides a video clipping method, including:
reading a target video source file in response to a video clip instruction sent by a first target account;
extracting target audio information from the target video source file, and performing text conversion on the target audio information by adopting a preset voice recognition model to obtain a target video text;
matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary comprises keywords matched with the knowledge point text;
determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file;
splicing video clips corresponding to the same knowledge point texts according to a preset video splicing strategy, and outputting short videos corresponding to different knowledge point texts; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different of the keywords.
A second aspect of embodiments of the present application provides a video clipping apparatus, including:
a response module: the video editing device is used for responding to a video editing instruction sent by the first target account, and reading a target video source file;
a conversion module: the system comprises a target video source file, a target audio source file and a target audio source file, wherein the target audio source file is used for extracting target audio information from the target video source file, and text conversion is carried out on the target audio information by adopting a preset voice recognition model to obtain a target video text;
a matching module: the target video text matching system is used for matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary comprises keywords matched with the knowledge point text;
a determination module: the video clip corresponding to each knowledge point text is determined according to the position of each knowledge point text on the time axis corresponding to the target video source file;
an output module: the system is used for splicing the video clips corresponding to the same knowledge point texts according to a preset video splicing strategy and outputting short videos corresponding to different knowledge point texts; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different of the keywords.
A third aspect of embodiments of the present application provides a computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the above video clipping method when executing the computer readable instructions.
A fourth aspect of embodiments of the present application provides one or more readable storage media storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the video clipping method as described above.
The implementation of the video clipping method, the video clipping device, the computer device and the storage medium provided by the embodiment of the application has the following beneficial effects:
the application provides a video clipping method, when a video clipping instruction sent by a first target account is received, a target video source file is read according to the video clipping instruction, in order to clip the target video source file more accurately according to a knowledge point in the target video source file, target audio information is extracted from the target video source file, and a target video text corresponding to a target video source file is obtained through text conversion. The preset dictionary comprises keywords matched with the knowledge point texts, so that at least one knowledge point text can be determined from the target video text based on the preset dictionary, then the video clips corresponding to the knowledge point texts are determined according to the position of each knowledge point text on the time axis corresponding to the target video source file, the video clips corresponding to the same knowledge point text are spliced according to a preset video splicing strategy, and short videos corresponding to different knowledge point texts are output. According to the scheme, after the video clipping instruction is received, the target video source file is obtained according to the video clipping instruction, video clipping is carried out on the target video source file based on the knowledge point text, intervention of a supplier is not needed, knowledge point classification can be carried out on the target video source file more accurately, and the video clipping efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic diagram of an application environment of a video editing method in an embodiment of the present application;
FIG. 2 is a schematic flow chart of an implementation of a video clipping method in an embodiment of the present application;
FIG. 3 is a schematic diagram of a video editing apparatus according to an embodiment of the present application;
fig. 4 is a schematic diagram of a computer device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic view of an application environment of a video clipping method in an embodiment of the present application, and as shown in fig. 1, a first target account sends a video clipping instruction through a client, a server receives and responds to the video clipping instruction, reads a target video source file, clips the target video source file into short videos corresponding to different knowledge point texts, and returns the short videos to the client. The client includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented by an independent server or a server cluster composed of a plurality of servers, or may be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The user terminals of different service systems can interact with the server simultaneously or with a specific server in the server cluster.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
In specific implementation, the server responds to a video clip instruction sent by the first target account, and reads a target original video file according to the video clip instruction. And then, the server performs text conversion on target audio information corresponding to the target video source file by adopting a preset voice recognition model to obtain a target video text. Based on a preset dictionary configured according to a target video source file, after the server matches at least one knowledge point text from the target video text, according to the position of each knowledge point text on a time axis corresponding to the target video source file, a video clip corresponding to each knowledge point text is determined, video clips corresponding to the same knowledge point text are spliced according to a preset video splicing strategy, and short videos corresponding to different knowledge point texts are output. According to the scheme, after the video editing instruction is received, the target video source file is obtained according to the video editing instruction, video editing is carried out on the target video source file based on the knowledge point text, intervention of a supplier is not needed, knowledge point classification can be carried out on the target video source file more accurately, and the video editing efficiency is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of a video clipping method in an embodiment of the present application, taking an example that the method is applied to a server of a server in fig. 1 as an example, and includes the following steps:
s11: and reading the target video source file in response to the video clip instruction sent by the first target account.
In step S11, the first target account refers to an account registered at the enterprise client, through which videos can be uploaded, addresses of video source files can be uploaded, videos can be viewed, videos can be evaluated, a video clip instruction can be sent, and the like. The video clip instruction contains address information of the video source file. The target video source file refers to a video source file to be clipped corresponding to the video clipping instruction.
In this embodiment, it is considered that the video sources are many, and may be from an external website, or may be videos synchronously recorded by an enterprise offline. Therefore, the method is not suitable for directly carrying out a video docking scheme in an interface form to acquire the video source file. The address information of the target video source file can be uploaded by enterprise staff at the user terminal, and the user terminal packages the address information into the video clipping instruction and sends the video clipping instruction to the server. And the server receives the video clipping instruction and obtains a target video source file by analyzing the video clipping instruction.
S12: and extracting target audio information from the target video source file, and performing text conversion on the target audio information by adopting a preset voice recognition model to obtain a target video text.
In step S12, the target video source file includes target audio information, images, and the like. The preset Speech Recognition model may be an ASR (Automatic Speech Recognition) model, a Wav2Vec model, or the like, for converting audio into text.
In this embodiment, in order to clip a target video source file according to the content of the target video source file, the specific content of the target video source file needs to be learned, so that the present solution performs text conversion on target audio information corresponding to a target video source file by using a preset speech recognition model to obtain a target video text, and the converted target video text is represented by a vector, so that a server can recognize the target video text. As an implementation manner, in an actual application scenario, considering that some noise may have a certain influence on the converted target audio/video text in the text conversion process, the preset speech recognition model may be optimized through an LMS (Least Mean Square) algorithm, so as to remove other environmental noise in the text conversion process, so as to improve the efficiency and accuracy of text recognition. The LMS algorithm is a commonly used algorithm in an adaptive filter, has the characteristics of low computational complexity, good convergence in an environment where a signal is a stationary signal, unbiased convergence of an expected value to a wiener solution, realization of stationarity of the algorithm by using finite precision, and the like, and can be used for denoising and the like.
S13: and matching at least one knowledge point text from the target video text based on a preset dictionary.
In step S13, the predetermined dictionary is configured according to the target video source file; the preset dictionary includes keywords matching the knowledge point text.
In this embodiment, the electronic version of video data corresponding to the target video source file and having the formats of PPT, word, and the like may be labeled with keywords by relevant personnel in an enterprise in advance on a knowledge frame, a professional term, and the like in the video data, and the labeled keywords are used as possible knowledge points in the video corresponding to the target video source file, so as to construct a preset dictionary based on the keywords. After the target video text is obtained, similarity matching is carried out on the target video text and each keyword in a preset dictionary, and at least one knowledge point text can be determined from the target video text. It should be noted that each keyword in the preset dictionary is identified by a vector.
As an example, please refer to table 1 below, where table 1 shows a preset dictionary configured according to a target video source file, where the preset dictionary includes different topic keywords and corresponding content keywords under each topic, where the topic keywords and the content keywords are both used for knowledge point text matching.
Figure BDA0003716528900000071
Figure BDA0003716528900000081
TABLE 1
As an embodiment of the present application, the matching at least one knowledge point text from the target video text based on a preset dictionary includes: preprocessing the target video text through a preset natural language processing model to obtain at least one noun text; calculating the association degree score of each noun text and each keyword in the preset dictionary; and when the relevancy score is larger than a preset threshold value, determining the noun text corresponding to the relevancy score as the knowledge point text.
In this embodiment, a preset natural language processing model may be used for similarity calculation. The preset natural language processing model may be One-hot code, BOW (Bag of Words), N-gram language model, etc. In the NLP (Natural Language Processing) model, a similarity calculation method is generally used. One is a statistical index, such as cosine similarity, Person correlation coefficient, euclidean distance, etc.; one is a text distance-based calculation method such as edit distance, WMD, BM25, etc.; still another class is depth matching based similarity computation, such as dssm (deep Structured Semantic models).
Because the target video text is a continuous text obtained by text conversion through the speech recognition model and comprises a plurality of vocabularies irrelevant to the matching of the knowledge point text, the target video text needs to be preprocessed through the preset natural language processing model, and all noun texts in the target video text are extracted. Then, the association degree score of each noun text and the preset dictionary is calculated, that is, when it is determined whether the noun text is a knowledge point text, similarity calculation is performed between the noun text and each keyword in the preset dictionary, for example, euclidean distance, cosine similarity and the like between the noun text and each keyword in the preset dictionary can be calculated, and then the highest value of the similarity is taken as the association degree score of the noun text and the preset dictionary. If the relevance score of a certain noun text and a preset dictionary is larger than a preset threshold value, it can be judged that a keyword in the preset dictionary is matched with the noun text, the noun text is judged to be a knowledge point text, wherein the relevance score is larger than or equal to 0 and smaller than or equal to 10, and the preset threshold value can be 6, 7 and the like.
It should be noted that all the noun texts and the keywords in the preset dictionary are represented by vectors.
As an embodiment of the present application, the preprocessing the target video text by using a preset natural language processing model to obtain at least one noun text includes: performing word segmentation processing on the target video text by using a preset natural language processing model to obtain at least one text word; performing part-of-speech tagging on all the text vocabularies to obtain tagged text vocabularies; and extracting the text marked as the part of speech of the noun from the marked text vocabulary to obtain at least one noun text.
In this embodiment, since there may be more semantic words, such as kaki, o, etc., in the target video text, and the semantic words are irrelevant to the subsequent matching of the knowledge point text, it is necessary to perform word segmentation on the target video text to obtain at least one text word, for example, perform word segmentation on the target video by using a forward maximum matching algorithm. After word segmentation is completed, part-of-speech tagging is performed on each text vocabulary according to grammar rules to obtain tagged text vocabularies, wherein the tagged text vocabularies comprise adjective texts, adverb texts, noun texts and the like. And extracting all noun texts from the labeled text vocabulary for subsequent knowledge point text matching.
S14: and determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file.
In step S14, the time axis corresponding to the target video source file, that is, the playing time axis of the video corresponding to the target video source file.
In this embodiment, when text conversion is performed on a target video source file through a preset speech recognition model, time points of a target video text are synchronously identified, so that each time point of each knowledge point text in the target video text on a time axis can be determined, and a video clip corresponding to each knowledge point text is determined according to the time point corresponding to each knowledge point text. It should be noted that the same knowledge point may appear in different time periods, and there may be an overlap between the video segments corresponding to two different knowledge points.
As an embodiment of the present application, the determining, according to the position of each knowledge point text on the time axis corresponding to the target video source file, a video clip corresponding to each knowledge point text includes: identifying a time stamp of each knowledge point text in the target video text; the time stamp comprises a start time stamp and an end time stamp; and determining a video segment corresponding to each knowledge point text according to the starting time stamp and the ending time stamp corresponding to each knowledge point text.
In this embodiment, a time point at which the first knowledge point text appears along the time axis, that is, a start time stamp of the first knowledge point text, and when the first knowledge point text does not appear any more within a preset time after a certain random time point at which the first knowledge point text appears, the random time point is taken as an end time stamp of the first knowledge point text, where the preset time may be self-defined, for example, 3 minutes, 5 minutes, and so on. Similarly, the time point at which the second knowledge point text appears is the start time stamp of the second knowledge point text, when the second knowledge point text does not appear any more within a preset time after a certain random time point at which the second knowledge point text appears, the random time point is used as the end time stamp of the second knowledge point text, and so on, the start time stamp and the end time stamp of each knowledge point text are determined, and the video segment between the start time stamp and the end time stamp of each knowledge point text is the video segment corresponding to the knowledge point text.
It should be noted that there may be overlap between video segments corresponding to two different knowledge point texts, and the same knowledge point text may appear at different time periods. As an example, please refer to table 2 below, where table 2 shows start time stamps and end time stamps of 4 knowledge point texts on the same topic, and as shown in table 2, a video segment corresponding to a first "customer list" overlaps with a video segment corresponding to a "target group" knowledge point text, and two identical knowledge point texts, "customer lists", appear in different time periods.
Figure BDA0003716528900000111
TABLE 2
S15: and splicing the video clips corresponding to the texts with the same knowledge points according to a preset video splicing strategy, and outputting short videos corresponding to the texts with different knowledge points.
In step S15, the homogeneous knowledge point text refers to the knowledge point text of the same keyword that is matched; different knowledge point text refers to text that matches to different keywords. For example, two knowledge point texts, a client list and a client list are matched with a keyword 'the client list' in a preset dictionary, and the two knowledge point texts are judged to be the same knowledge point text. And the preset video assembling strategy is used for representing the assembling rule of the video clip corresponding to the knowledge point text.
In this embodiment, since the obtained video segments corresponding to each knowledge point text are relatively scattered, the video segments corresponding to the texts of the same knowledge point need to be assembled according to a preset video assembly strategy, so that the first target account can perform segmented learning according to the knowledge points.
As an embodiment of the present application, the assembling video clips corresponding to texts with the same knowledge point according to a preset video assembling policy, and outputting short videos corresponding to different texts with the knowledge point includes: classifying the video clips according to keywords corresponding to the texts of the knowledge points to obtain at least one group of video clip sets; each group of video clip sets corresponds to a knowledge point text; sequentially identifying the sub-video clips based on the time sequence of each sub-video clip in the video clip set to obtain identified sub-video clips; and splicing the identified sub-video clips according to the sequence of the identified sub-video clips to obtain short videos corresponding to different knowledge point texts.
In this embodiment, according to keywords matching with each knowledge point text in a preset dictionary, the knowledge point texts matching with the same keyword are divided into the same knowledge point texts, and each video clip corresponding to the same knowledge point text is put into one video clip set. Because the time sequence of each sub-video clip in each group of video clip set on the time axis is different, in order to enable the output short video corresponding to the knowledge point text to be coherent, the sub-video clips can be sequentially identified based on the time sequence of each sub-video clip in the video clip set. For example, the video segment set includes two sub-video segments, the start timestamps of the two sub-video segments are respectively 10 th minute and 30 th minute, the sub-video segment with the start timestamp of 10 th minute is identified as 1, and the sub-video segment with the start timestamp of 30 th minute is identified as 2. As another embodiment, the sub-video segments in the video segment set may be identified by adding the keywords corresponding to the video segment set to the time sequence of the sub-video segments. And finally, splicing the identified sub-video clips into a complete short video according to the identification of the identified sub-video clips, wherein the short video corresponds to a knowledge point text.
As an implementation manner, if the target video source file contains different topics, the topic short videos corresponding to each topic may be divided according to the above method, and then the topic short videos corresponding to each sub-topic are further divided according to the subdivided knowledge point text under each topic.
The application provides a video clipping method, when a video clipping instruction sent by a first target account is received, a target video source file is read according to the video clipping instruction, in order to clip the target video source file more accurately according to a knowledge point in the target video source file, target audio information is extracted from the target video source file, and a target video text corresponding to a target video source file is obtained through text conversion. The preset dictionary is obtained according to the target video source file configuration, and comprises keywords matched with the knowledge point texts, so that at least one knowledge point text can be determined from the target video texts based on the preset dictionary, then the video clips corresponding to the knowledge point texts are determined according to the positions of the knowledge point texts on the time axis corresponding to the target video source file, the video clips corresponding to the knowledge point texts of the same kind are assembled according to a preset video assembling strategy, and short videos corresponding to different knowledge point texts are output. According to the scheme, after the video clipping instruction is received, the target video source file is obtained according to the video clipping instruction, video clipping is carried out on the target video source file based on the knowledge point text, intervention of a supplier is not needed, knowledge point classification can be carried out on the target video source file more accurately, and the video clipping efficiency is improved.
As another embodiment of the present application, after the step of assembling the video segments corresponding to the texts with the same knowledge points according to the preset video assembling policy and outputting the short videos corresponding to the texts with different knowledge points, the method further includes: receiving an evaluation result of a second target account on each short video, and updating the association degree score of the knowledge point text corresponding to each short video according to the evaluation result; the second target account comprises the first target account; and carrying out deleting operation on the short video with the relevance score in the descending state to obtain the deleted short video.
In this embodiment, the second target account also refers to an account that is registered at the enterprise client, and includes the first target account and other target accounts that do not have the authority to send the video clip instruction. After the second target account watches the short videos corresponding to the knowledge point texts, the server can evaluate the short videos, and updates the association degree scores of the knowledge point texts corresponding to the short videos according to the evaluation results of the second target account on the short videos.
Specifically, if the evaluation result of the second target account on the short video is approved, the association degree score of the knowledge point text corresponding to the short video is improved; and if the evaluation result of the second target account on the target short video is against, reducing the association degree score of the knowledge point text corresponding to the short video. For short videos with reduced relevance scores of the knowledge point texts, a first target account conducts video segment extraction on the short videos with the reduced relevance scores irregularly, whether the knowledge point texts corresponding to the video segments are consistent with the knowledge point texts corresponding to the short videos of the video segments or not is judged, if the knowledge point texts are not consistent, the video segments are marked as waste contents, a server deletes the video segments from the short videos of the video segments according to the marks to obtain the deleted short videos, the deleted short videos are sent to enterprise clients, and a second target account can continuously watch and evaluate the deleted short videos.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In one embodiment, a video clipping device 300 is provided, which corresponds one-to-one to the video clipping methods in the above embodiments. As shown in fig. 3, the video clipping device comprises a response module 301, a conversion module 302, a matching module 303, a determination module 304 and an output module 305. The functional modules are explained in detail as follows:
the response module 301: the video editing device is used for responding to a video editing instruction sent by the first target account, and reading a target video source file;
the conversion module 302: the system comprises a target video source file, a target audio source file and a target audio source file, wherein the target audio source file is used for extracting target audio information from the target video source file, and text conversion is carried out on the target audio information by adopting a preset voice recognition model to obtain a target video text;
the matching module 303: the target video text matching system is used for matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary is obtained according to the target video source file configuration; the preset dictionary comprises keywords matched with the knowledge point texts;
the determination module 304: the video clip corresponding to each knowledge point text is determined according to the position of each knowledge point text on the time axis corresponding to the target video source file;
the output module 305: the system is used for splicing the video clips corresponding to the texts with the same knowledge points according to a preset video splicing strategy and outputting short videos corresponding to the texts with different knowledge points; the same-kind knowledge point texts refer to the matched knowledge point texts of the same keyword; different knowledge point text refers to text that matches different of the keywords.
For specific limitations of the video clipping apparatus, reference may be made to the limitations of the video clipping method above, and further description is omitted here. The various modules in the video clipping device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operating system and execution of computer-readable instructions in the readable storage medium. The database of the computer device is used for storing data related to the video clipping method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer readable instructions, when executed by a processor, implement a video clipping method. The readable storage media provided by the present embodiment include nonvolatile readable storage media and volatile readable storage media.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a readable storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer readable instructions. The internal memory provides an environment for the operating system and execution of computer-readable instructions in the readable storage medium. The network interface of the computer device is used for communicating with an external server through a network connection. The computer readable instructions, when executed by a processor, implement a video clipping method. The readable storage media provided by the present embodiment include nonvolatile readable storage media and volatile readable storage media.
In one embodiment, a computer device is provided, comprising a memory, a processor, and computer readable instructions stored on the memory and executable on the processor, the processor when executing the computer readable instructions implementing the steps of:
reading a target video source file in response to a video clip instruction sent by a first target account;
extracting target audio information from the target video source file, and performing text conversion on the target audio information by adopting a preset voice recognition model to obtain a target video text;
matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary is obtained according to the target video source file configuration; the preset dictionary comprises keywords matched with the knowledge point texts;
determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file;
splicing the video clips corresponding to the texts with the same knowledge points according to a preset video splicing strategy, and outputting short videos corresponding to the texts with different knowledge points; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different of the keywords.
In one embodiment, one or more computer-readable storage media storing computer-readable instructions are provided, the readable storage media provided by the embodiments including non-volatile readable storage media and volatile readable storage media. The readable storage medium has stored thereon computer readable instructions which, when executed by one or more processors, perform the steps of:
reading a target video source file in response to a video clip instruction sent by a first target account;
extracting target audio information from the target video source file, and performing text conversion on the target audio information by adopting a preset voice recognition model to obtain a target video text;
matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary is obtained according to the target video source file configuration; the preset dictionary comprises keywords matched with the knowledge point texts;
determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file;
splicing the video clips corresponding to the texts with the same knowledge points according to a preset video splicing strategy, and outputting short videos corresponding to the texts with different knowledge points; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different said keywords.
It will be understood by those of ordinary skill in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware related to computer readable instructions, which may be stored in a non-volatile readable storage medium or a volatile readable storage medium, and when executed, the computer readable instructions may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A video clipping method, characterized in that the video clipping method comprises:
reading a target video source file in response to a video clipping instruction sent by a first target account;
extracting target audio information from the target video source file, and performing text conversion on the target audio information by adopting a preset voice recognition model to obtain a target video text;
matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary comprises keywords matched with the knowledge point text;
determining a video clip corresponding to each knowledge point text according to the position of each knowledge point text on a time axis corresponding to the target video source file;
splicing video clips corresponding to the same knowledge point texts according to a preset video splicing strategy, and outputting short videos corresponding to different knowledge point texts; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different of the keywords.
2. The video clipping method of claim 1, wherein said matching out at least one knowledge point text from said target video text based on a preset dictionary comprises:
preprocessing the target video text through a preset natural language processing model to obtain at least one noun text;
calculating the association degree score of each noun text and each keyword in the preset dictionary;
and when the relevance score is larger than a preset threshold value, determining the noun text corresponding to the relevance score as the knowledge point text.
3. The video editing method of claim 2, wherein the preprocessing the target video text by the preset natural language processing model to obtain at least one noun text comprises:
performing word segmentation processing on the target video text by using a preset natural language processing model to obtain at least one text word;
performing part-of-speech tagging on all the text vocabularies to obtain tagged text vocabularies;
and extracting the text labeled as the part of speech of the noun from the labeled text vocabulary to obtain at least one noun text.
4. The video clipping method of claim 1, wherein the determining the video clip corresponding to each knowledge point text according to the position of each knowledge point text on the time axis corresponding to the target video source file comprises:
identifying a time stamp of each knowledge point text in the target video text; the time stamps comprise a start time stamp and an end time stamp;
and determining a video segment corresponding to each knowledge point text according to the starting time stamp and the ending time stamp corresponding to each knowledge point text.
5. The video clipping method of claim 1, wherein the splicing of the video segments corresponding to the texts of the same knowledge point according to a preset video splicing strategy and the output of the short videos corresponding to the texts of different knowledge points comprises:
classifying the video clips according to keywords corresponding to the texts of the knowledge points to obtain at least one group of video clip sets; each group of video clip sets corresponds to a knowledge point text;
sequentially identifying the sub-video clips based on the time sequence of each sub-video clip in the video clip set to obtain identified sub-video clips;
and splicing the identified sub-video clips according to the sequence of the identified sub-video clips to obtain short videos corresponding to different knowledge point texts.
6. The video clipping method of claim 2, wherein after the step of splicing the video segments corresponding to the same knowledge point text according to a preset video splicing strategy and outputting the short videos corresponding to the different knowledge point texts, the method further comprises:
receiving an evaluation result of a second target account on each short video, and updating the association degree score of the knowledge point text corresponding to each short video according to the evaluation result;
and carrying out deleting operation on the short video with the relevance score in the descending state to obtain the deleted short video.
7. The video clipping method of claim 6, wherein said updating the relevancy score of each short video corresponding knowledge point text according to the evaluation result comprises:
if the evaluation result of the second target account on the short video is approved, improving the association degree score of the knowledge point text corresponding to the short video;
and if the evaluation result of the second target account on the short video is against, reducing the association degree score of the knowledge point text corresponding to the short video.
8. A video clipping apparatus, characterized in that the video clipping apparatus comprises:
a response module: the video editing device is used for responding to a video editing instruction sent by a first target account and reading a target video source file;
a conversion module: the system comprises a target video source file, a target audio source file and a target audio source file, wherein the target audio source file is used for extracting target audio information from the target video source file, and text conversion is carried out on the target audio information by adopting a preset voice recognition model to obtain a target video text;
a matching module: the target video text matching system is used for matching at least one knowledge point text from the target video text based on a preset dictionary; the preset dictionary comprises keywords matched with the knowledge point text;
a determination module: the video clip corresponding to each knowledge point text is determined according to the position of each knowledge point text on the time axis corresponding to the target video source file;
an output module: the system is used for splicing the video clips corresponding to the same knowledge point texts according to a preset video splicing strategy and outputting short videos corresponding to different knowledge point texts; the same kind of knowledge point texts refer to the matched knowledge point texts of the same key words; different knowledge point text refers to text that matches different of the keywords.
9. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the computer readable instructions, when executed by the processor, implement the video clipping method of any one of claims 1 to 7.
10. One or more readable storage media storing computer readable instructions which, when executed by a processor, implement the video clipping method of any one of claims 1-7.
CN202210737613.8A 2022-06-27 2022-06-27 Video editing method and device, computer equipment and storage medium Pending CN115134660A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210737613.8A CN115134660A (en) 2022-06-27 2022-06-27 Video editing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210737613.8A CN115134660A (en) 2022-06-27 2022-06-27 Video editing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115134660A true CN115134660A (en) 2022-09-30

Family

ID=83379496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210737613.8A Pending CN115134660A (en) 2022-06-27 2022-06-27 Video editing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115134660A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115767174A (en) * 2022-10-31 2023-03-07 上海卓越睿新数码科技股份有限公司 Online video editing method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858005A (en) * 2019-03-07 2019-06-07 百度在线网络技术(北京)有限公司 Document updating method, device, equipment and storage medium based on speech recognition
CN110134761A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Adjudicate document information retrieval method, device, computer equipment and storage medium
CN112929744A (en) * 2021-01-22 2021-06-08 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for segmenting video clips
CN113254708A (en) * 2021-06-28 2021-08-13 北京乐学帮网络技术有限公司 Video searching method and device, computer equipment and storage medium
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device
CN113709384A (en) * 2021-03-04 2021-11-26 腾讯科技(深圳)有限公司 Video editing method based on deep learning, related equipment and storage medium
CN114357996A (en) * 2021-12-06 2022-04-15 北京网宿科技有限公司 Time sequence text feature extraction method and device, electronic equipment and storage medium
CN114449310A (en) * 2022-02-15 2022-05-06 平安科技(深圳)有限公司 Video editing method and device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858005A (en) * 2019-03-07 2019-06-07 百度在线网络技术(北京)有限公司 Document updating method, device, equipment and storage medium based on speech recognition
CN110134761A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Adjudicate document information retrieval method, device, computer equipment and storage medium
CN112929744A (en) * 2021-01-22 2021-06-08 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for segmenting video clips
CN113709384A (en) * 2021-03-04 2021-11-26 腾讯科技(深圳)有限公司 Video editing method based on deep learning, related equipment and storage medium
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device
CN113254708A (en) * 2021-06-28 2021-08-13 北京乐学帮网络技术有限公司 Video searching method and device, computer equipment and storage medium
CN114357996A (en) * 2021-12-06 2022-04-15 北京网宿科技有限公司 Time sequence text feature extraction method and device, electronic equipment and storage medium
CN114449310A (en) * 2022-02-15 2022-05-06 平安科技(深圳)有限公司 Video editing method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115767174A (en) * 2022-10-31 2023-03-07 上海卓越睿新数码科技股份有限公司 Online video editing method

Similar Documents

Publication Publication Date Title
CN110765244B (en) Method, device, computer equipment and storage medium for obtaining answering operation
US9923860B2 (en) Annotating content with contextually relevant comments
US11468239B2 (en) Joint intent and entity recognition using transformer models
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US11163936B2 (en) Interactive virtual conversation interface systems and methods
CN113901320A (en) Scene service recommendation method, device, equipment and storage medium
CN114556328A (en) Data processing method and device, electronic equipment and storage medium
US20230289514A1 (en) Speech recognition text processing method and apparatus, device, storage medium, and program product
US20200387534A1 (en) Media selection based on content topic & sentiment
CN112650842A (en) Human-computer interaction based customer service robot intention recognition method and related equipment
US20210004602A1 (en) Method and apparatus for determining (raw) video materials for news
CN113254613A (en) Dialogue question-answering method, device, equipment and storage medium
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
WO2021001517A1 (en) Question answering systems
CN115438149A (en) End-to-end model training method and device, computer equipment and storage medium
US9747891B1 (en) Name pronunciation recommendation
CN115134660A (en) Video editing method and device, computer equipment and storage medium
US11437038B2 (en) Recognition and restructuring of previously presented materials
US11972759B2 (en) Audio mistranscription mitigation
CN112307738A (en) Method and device for processing text
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN115169345A (en) Training method, device and equipment for text emotion analysis model and storage medium
CN110276001B (en) Checking page identification method and device, computing equipment and medium
CN114449310A (en) Video editing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination