CN111046839B - Video segmentation method and device - Google Patents

Video segmentation method and device Download PDF

Info

Publication number
CN111046839B
CN111046839B CN201911352570.6A CN201911352570A CN111046839B CN 111046839 B CN111046839 B CN 111046839B CN 201911352570 A CN201911352570 A CN 201911352570A CN 111046839 B CN111046839 B CN 111046839B
Authority
CN
China
Prior art keywords
text
video
determining
caption
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911352570.6A
Other languages
Chinese (zh)
Other versions
CN111046839A (en
Inventor
干紫乔
冯晓峰
王思梦
赵金鑫
秦瑞雄
胡智
杜嘉
吴想想
熊威
蔡晨
祁缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201911352570.6A priority Critical patent/CN111046839B/en
Publication of CN111046839A publication Critical patent/CN111046839A/en
Application granted granted Critical
Publication of CN111046839B publication Critical patent/CN111046839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a video segmentation method and a device, wherein the method comprises the following steps: acquiring caption text corresponding to a video file and generating text vectors based on the caption text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; determining an entropy function for calculating a cost value corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree-graph; determining a target group node on the hierarchical clustering tree diagram according to the entropy function; and dividing the video file based on the target group node. The invention can ensure that the optimized segmentation result content is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.

Description

Video segmentation method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a video segmentation method and device.
Background
With the continuous development of the network application field, more and more users choose to perform network learning through network video. Network video resources of network learning are mainly long videos, and the video duration of the long videos is usually between 20 minutes and 180 minutes. The long video is less generalized than the newly appearing short video due to the limitation of the learning time of the user, and the video duration of the short video is usually between 3 minutes and 10 minutes.
In order to improve user experience, the long video is segmented to obtain short video. At present, a semantic clustering algorithm-based mode is adopted to divide a long video into short videos, and similar sentences can be combined to form paragraphs, and cutting points for optimal segmentation are arranged between the paragraphs.
However, the segmentation mode based on the semantic clustering algorithm can cause the situation that the segmented short video is too long or too short, so that the segmentation accuracy is reduced, and the applicability of the segmentation mode is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a video segmentation method and a video segmentation device, which can ensure that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectations of users, and the intelligent and practical micro-class segmentation optimization is realized.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides a video segmentation method, including:
acquiring a subtitle text corresponding to a video file and generating a text vector based on the subtitle text;
clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram;
determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
And dividing the video file based on the target group node.
Further, after the video file is segmented based on the target group node, the method further includes:
and determining the sub video files obtained by segmentation, and establishing the association relationship between the sub video files and the video files.
The method for acquiring the caption text corresponding to the video file and generating the text vector based on the caption text comprises the following steps:
extracting caption text from the video file in a voice recognition mode; wherein the caption text includes a plurality of caption units;
converting each caption unit in the caption text into a corresponding caption vector through a language characterization model; wherein all subtitle vectors constitute text vectors.
Wherein the language characterization model is a BERT model.
The clustering processing based on the text vector is performed to obtain a hierarchical clustering tree diagram, which comprises the following steps:
and clustering the text vectors by adopting a hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram.
The method for clustering the text vectors by using the hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram comprises the following steps:
Determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
determining a similarity distance in a hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
and clustering the text vector based on the similarity distance to obtain a hierarchical clustering tree diagram.
The determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes meeting constraint conditions on the hierarchical clustering tree-graph comprises the following steps:
determining the cost value corresponding to each node in any group of nodes conforming to constraint conditions on the hierarchical clustering tree-graph;
the entropy function is used for determining entropy values corresponding to the group of nodes for each cost value.
The determining the cost value corresponding to each node in any group of nodes meeting constraint conditions on the hierarchical clustering tree-graph comprises the following steps:
determining video time length corresponding to each node in any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram;
and determining the cost value corresponding to each node according to the video time length corresponding to each node and the target segmentation time length.
The entropy function is configured to determine, for each of the cost values, an entropy value corresponding to the set of nodes, including:
and the entropy function is used for summing the cost values, and determining that the summation result is the entropy value corresponding to the group of nodes.
The determining the target group node on the hierarchical clustering tree diagram according to the entropy function comprises the following steps:
solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function;
and determining a group of nodes corresponding to the minimum entropy value as target group nodes.
In a second aspect, the present invention provides a video segmentation apparatus comprising:
the conversion unit is used for acquiring the caption text corresponding to the video file and generating a text vector based on the caption text;
the clustering unit is used for carrying out clustering processing based on the text vector to obtain a hierarchical clustering tree diagram;
the entropy function unit is used for determining an entropy function for calculating entropy values corresponding to any group of nodes according to the constraint condition on the hierarchical clustering tree-graph; the selecting unit is used for determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
and the segmentation unit is used for segmenting the video file based on the target group node.
Further, the method further comprises the following steps:
and the link unit is used for determining the sub-video files obtained by segmentation and establishing the association relationship between the sub-video files and the video files.
Wherein the conversion unit includes:
an extraction subunit, configured to extract a subtitle text from the video file by using a speech recognition manner; wherein the caption text includes a plurality of caption units;
the characterization subunit is used for converting each caption unit in the caption text into a corresponding caption vector through a language characterization model; wherein all subtitle vectors constitute text vectors.
Wherein the language characterization model is a BERT model.
Wherein the clustering unit includes:
and the clustering subunit is used for carrying out clustering processing on the text vectors by adopting a hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram.
Wherein the clustering subunit comprises:
the text module is used for determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
the similarity module is used for determining a similarity distance in the hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
And the clustering module is used for carrying out clustering processing on the text vectors based on the similarity distance to obtain a hierarchical clustering tree diagram.
Wherein the entropy function unit includes:
a calculating subunit, configured to determine a cost value corresponding to each node in any group of nodes that conform to constraint conditions on the hierarchical clustering tree-graph;
and the function subunit is used for determining entropy values corresponding to the group of nodes for each cost value by the entropy function.
Wherein the computing subunit comprises:
the first calculation module is used for determining video time length corresponding to each node in any group of nodes meeting constraint conditions on the hierarchical clustering tree diagram;
and the second calculation module is used for determining the cost value corresponding to each node according to the video time length and the target segmentation time length corresponding to each node.
Wherein the selecting unit includes:
the minimum subunit is used for solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function;
and the selecting subunit is used for determining a group of nodes corresponding to the minimum entropy value as target group nodes.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the video segmentation method when executing the program.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the video segmentation method.
According to the technical scheme, the video segmentation method provided by the invention comprises the steps of obtaining the caption text corresponding to the video file and generating a text vector based on the caption text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; performing multiple segmentation processing on the video file according to the hierarchical clustering tree diagram and determining an entropy function corresponding to each segmentation processing; determining a target entropy function of a plurality of entropy functions; the video file is segmented by adopting the segmentation mode corresponding to the target entropy function, so that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a video segmentation method according to an embodiment of the invention.
Fig. 2 is a schematic flow chart of determining a similarity distance in a video segmentation method according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a method for determining an entropy function in a video segmentation method according to an embodiment of the present invention.
Fig. 4 is a second flowchart of a video segmentation method according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a video segmentation apparatus according to an embodiment of the invention.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides an embodiment of a video segmentation method, referring to fig. 1, the video segmentation method specifically comprises the following steps:
S101: acquiring a subtitle text corresponding to a video file and generating a text vector based on the subtitle text;
in this step, the video file is a long video to be divided. In this embodiment, a video file in MP4 format is adopted, firstly, subtitles in the video file are extracted, specifically, audio information is extracted from the video file in a voice recognition mode, the audio information is converted into subtitle text, and the format of the subtitle text is srt format. The caption text in the format can directly determine the time length of each caption in the caption text and the pause interval (pause time length) between adjacent captions; each caption is a caption unit in the caption text. All subtitle units constitute subtitle text.
After the subtitle text is determined, each subtitle unit in the subtitle text is converted into a corresponding subtitle vector based on a language characterization model, wherein the language characterization model is preferably a BERT Chinese pre-training model provided by google.
And taking each subtitle unit in the subtitle text as input of a BERT Chinese pre-training model, wherein the BERT Chinese pre-training model outputs subtitle vectors corresponding to each subtitle unit, and all the subtitle vectors form text vectors.
S102: clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram;
in the step, a hierarchical aggregation clustering algorithm (HAC) is adopted to perform clustering processing on the text vectors to obtain a hierarchical clustering tree diagram, and each subtitle unit in the text vectors is divided by clustering the text vectors.
Referring to fig. 2, when clustering is performed using a hierarchical aggregation clustering algorithm (HAC), a required similarity distance is determined in the following manner, including:
s1021: determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
in this step, according to the subtitle text in srt format, the duration corresponding to each subtitle unit in the subtitle text, that is, the time length of each subtitle unit, may be determined; and determining a pause interval between adjacent caption units, i.e., an interval time length between adjacent caption units.
S1022: determining a similarity distance in a hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
in this step, the corresponding caption vector of each caption unit is determined, the cosine distance between adjacent caption vectors is calculated, the difference between the value 1 and the cosine distance is determined, and the difference is determined as the text vector cosine distance of the caption unit in the adjacent caption unit corresponding to the adjacent caption vector and preceding in time sequence.
Determining pause interval distances between adjacent caption units according to the duration of each caption unit and the pause interval between adjacent caption units; the method specifically comprises the following steps: determining a caption unit with a time sequence behind in adjacent caption units as a target caption unit, and determining a pause interval between the target caption unit and a previous caption unit and a pause interval between the target caption unit and a next caption unit respectively; and determining the pause interval distance between the target caption unit and the previous caption unit according to the two pause intervals, wherein the pause interval distance is used as the pause interval distance between adjacent caption units.
The pause interval distance corresponds to a period of time, the pause interval distance is normalized, and the pause interval distance is in the same order of magnitude as the cosine distance of the text vector. And determining the pause interval distance after normalization processing, and carrying out weighted summation on the text vector cosine distance and the pause interval distance after normalization processing to obtain the similarity distance in the hierarchical aggregation clustering algorithm.
S1023: and clustering the text vector based on the similarity distance to obtain a hierarchical clustering tree diagram.
In the step, clustering processing is carried out on the text vectors according to the similarity distance to obtain a hierarchical clustering tree diagram.
S103: determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram;
in this step, there are multiple node combination modes conforming to constraint conditions on the hierarchical clustering tree-graph, and each node combination mode corresponds to a group of nodes, that is, there are multiple node groups conforming to constraint conditions on the hierarchical clustering tree-graph, where a group of nodes on the hierarchical clustering tree-graph determine or form a partition mode. The method comprises the steps of defining an entropy function for calculating entropy values corresponding to any group of nodes according to constraint conditions on a hierarchical clustering tree-graph, measuring the advantages and disadvantages of different segmentation modes in the hierarchical clustering tree-graph by the entropy function, and determining the group of nodes corresponding to the optimal solution by searching the optimal solution of the entropy function, wherein the segmentation mode corresponding to the group of nodes is the segmentation mode corresponding to video segmentation.
The constraint is that all subtitles or sentences in the video are covered and only once.
It should be noted that, the entropy function is used for calculating the entropy value corresponding to any group of nodes according to the constraint condition on the hierarchical clustering tree diagram.
Referring to fig. 3, the specific steps of determining the entropy function include:
s1031: determining the cost value corresponding to each node in any group of nodes conforming to constraint conditions on the hierarchical clustering tree-graph;
in this step, any one group of nodes conforming to the constraint condition on the hierarchical clustering tree-graph includes a plurality of nodes, and when determining entropy values corresponding to any one group of nodes conforming to the constraint condition on the hierarchical clustering tree-graph, the cost value of each node in any one group of nodes needs to be determined, where the cost value corresponding to each node is determined according to video duration corresponding to each node in any one group of nodes conforming to the constraint condition on the hierarchical clustering tree-graph, specifically, the cost value corresponding to each node is determined according to video duration corresponding to each node and target segmentation duration.
It should be noted that, the preset duration is a duration corresponding to the short video, and may be set according to a requirement of splitting the short video.
In the implementation, if the video time length corresponding to the node is longer than the preset time length, the cost value of the node is the first time length difference power of the natural constant e, and the first time length difference is the difference value between the video time length corresponding to the node and the preset time length;
If the video duration corresponding to the node is smaller than or equal to the preset duration, the cost value of the node is a second duration difference power of a natural constant e, and the second duration difference is a difference value between the preset duration and the video duration corresponding to the node.
S1032: the entropy function is used for determining entropy values corresponding to the group of nodes for each cost value.
In this step, an entropy function is used to sum the individual cost values. In this embodiment, the entropy function sums up the cost values, and determines that the sum result is the entropy value corresponding to the set of nodes.
S104: determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
in the step, the entropy function is used for measuring the advantages and disadvantages of different segmentation modes in the hierarchical clustering tree-graph; the method comprises the steps of searching an optimal solution of an entropy function, specifically solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function; the minimum entropy value is the optimal solution of the entropy function, and a group of nodes corresponding to the minimum entropy value is determined as target group nodes. The target group node is the optimal segmentation mode.
S105: and dividing the video file based on the target group node.
In this step, each node in the group node determines the splitting manner in common, wherein each node in the target group node forms or determines the optimal splitting manner of the video file. In this embodiment, each node in the target group node is determined, and the video file is segmented at each node in the target group node to obtain a sub-video file.
As can be seen from the above description, in the video segmentation method provided by the embodiment of the present invention, a text vector is generated by acquiring a subtitle text corresponding to a video file and based on the subtitle text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function; based on the target group node, the video file is segmented, so that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.
In an embodiment of the present invention, referring to fig. 4, step S106 is further included after step S105 of the video segmentation method, which specifically includes the following steps:
s106: and determining the sub video files obtained by segmentation, and establishing the association relationship between the sub video files and the video files.
Further, the video file is segmented into a plurality of short videos, and the short video output formats are MP4, M3U8 (which is a file format encrypted by AES 128) and ZIP.
In this embodiment, a one-to-many association relationship between long and short videos is established, so that related contents can be associated and displayed, a mode of detailed complementation of knowledge types is formed, more choices are provided for users, and related applications of recommendation and intelligent search are optimized through the association relationship.
It should be noted that, there is a strong correlation between the short videos split by the long video and the long video, when a user searches or views the long video, there is a need to view the associated short video due to aging, emphasis and other reasons, and similarly, there is a need to view more complete and comprehensive knowledge points when learning the short video.
The segmented short video can be uploaded to a platform resource library to become a platform learning resource, and information such as a title, a visible range, keywords, associated courses and the like is required to be set during uploading operation. The title and the key are important basis in searching for inquiry, and in order to improve the searching accuracy, the setting of the key words needs to be as comprehensive and accurate as possible.
Furthermore, the cut short video is stored and used in a mode of separating playing and downloading source files in order to ensure the video safety.
The cut short video automatically generates zip package and transcoded encrypted file (m 3u 8) format. The zip file is in an mp4 format after being decompressed, and is used for downloading, only users in an authorized range have downloading permission, and the downloading adopts an anti-theft chain technology, so that source file leakage is prevented; the file of the transcoding type is encrypted, transcoded and stored, an address is disclosed externally, an on-demand interface is provided, an on-demand video is provided with a user verification anti-theft chain, video leakage caused by direct playing of a copy link is prevented, meanwhile, the playing file is in an encrypted format, and even if a user grabs the playing file, the file stored locally cannot be played under the condition that a decryption key is not available.
The embodiment of the invention provides a specific implementation manner of a video segmentation device capable of realizing all contents in the video segmentation method, and referring to fig. 5, the video segmentation device specifically comprises the following contents:
a conversion unit 10, configured to obtain a subtitle text corresponding to a video file and generate a text vector based on the subtitle text;
a clustering unit 20, configured to perform clustering processing based on the text vector to obtain a hierarchical clustering tree diagram;
an entropy function unit 30, configured to determine an entropy function for calculating entropy values corresponding to any group of nodes according to the constraint condition on the hierarchical clustering tree-graph; the method comprises the steps of carrying out a first treatment on the surface of the
A selecting unit 40, configured to determine a target group node on the hierarchical clustering tree-graph according to the entropy function;
a dividing unit 50, configured to divide the video file based on the target group node.
Further, the method further comprises the following steps:
and a link unit 60, configured to determine the sub-video files obtained by segmentation, and establish an association relationship between the sub-video files and the video files.
Wherein the conversion unit 10 includes:
an extraction subunit, configured to extract a subtitle text from the video file by using a speech recognition manner; wherein the caption text includes a plurality of caption units;
The characterization subunit is used for converting each caption unit in the caption text into a corresponding caption vector through a language characterization model; wherein all subtitle vectors constitute text vectors.
Wherein the language characterization model is a BERT model.
Wherein the clustering unit 20 includes:
and the clustering subunit is used for carrying out clustering processing on the text vectors by adopting a hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram.
Wherein the clustering subunit comprises:
the text module is used for determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
the similarity module is used for determining a similarity distance in the hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
and the clustering module is used for carrying out clustering processing on the text vectors based on the similarity distance to obtain a hierarchical clustering tree diagram.
Wherein the entropy function unit 30 comprises:
a calculating subunit, configured to determine a cost value corresponding to each node in any group of nodes that conform to constraint conditions on the hierarchical clustering tree-graph;
And the function subunit is used for determining entropy values corresponding to the group of nodes for each cost value by the entropy function.
Wherein the computing subunit comprises:
the first calculation module is used for determining video time length corresponding to each node in any group of nodes meeting constraint conditions on the hierarchical clustering tree diagram;
and the second calculation module is used for determining the cost value corresponding to each node according to the video time length and the target segmentation time length corresponding to each node.
Wherein the selecting unit 40 includes:
the minimum subunit is used for solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function;
and the selecting subunit is used for determining a group of nodes corresponding to the minimum entropy value as target group nodes.
The embodiment of the video segmentation apparatus provided by the present invention may be specifically used to execute the processing flow of the embodiment of the video segmentation method in the above embodiment, and the functions thereof are not described herein again, and reference may be made to the detailed description of the above method embodiment.
As can be seen from the above description, the video segmentation device provided by the embodiment of the present invention obtains the caption text corresponding to the video file and generates the text vector based on the caption text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function; based on the target group node, the video file is segmented, so that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.
The application provides an embodiment of an electronic device for implementing all or part of content in the video segmentation method, wherein the electronic device specifically comprises the following contents:
a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the communication interface is used for realizing information transmission between related devices; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, etc., and the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment for implementing the video segmentation method and the embodiment for implementing the video segmentation apparatus, and the contents thereof are incorporated herein and are not repeated here.
Fig. 6 is a schematic block diagram of a system configuration of an electronic device 9600 of an embodiment of the present application. As shown in fig. 6, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 6 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.
In one embodiment, the video segmentation functionality may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:
acquiring a subtitle text corresponding to a video file and generating a text vector based on the subtitle text;
clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram;
determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
and dividing the video file based on the target group node.
As can be seen from the above description, the electronic device provided in the embodiments of the present application obtains the caption text corresponding to the video file and generates a text vector based on the caption text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function; based on the target group node, the video file is segmented, so that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.
In another embodiment, the video segmentation device may be configured separately from the central processor 9100, for example, the video segmentation device may be configured as a chip connected to the central processor 9100, and the video segmentation function is implemented under the control of the central processor.
As shown in fig. 6, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 6; in addition, the electronic device 9600 may further include components not shown in fig. 6, and reference may be made to the related art.
As shown in fig. 6, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.
The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.
The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.
The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.
An embodiment of the present invention also provides a computer-readable storage medium capable of implementing all the steps of the video segmentation method in the above embodiment, the computer-readable storage medium storing thereon a computer program which, when executed by a processor, implements all the steps of the video segmentation method in the above embodiment, for example, the processor implementing the following steps when executing the computer program:
Acquiring a subtitle text corresponding to a video file and generating a text vector based on the subtitle text;
clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram;
determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
and dividing the video file based on the target group node.
As can be seen from the above description, the computer readable storage medium provided by the embodiments of the present invention generates a text vector by acquiring a caption text corresponding to a video file and based on the caption text; clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram; determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function; based on the target group node, the video file is segmented, so that the content of an optimized segmentation result is more accurate, the video length is more in line with the expectation of a user, and the intelligent and practical micro-class segmentation optimization is realized.
Although the invention provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an actual device or client product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment) as shown in the embodiments or figures.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, apparatus (system) or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The orientation or positional relationship indicated by the terms "upper", "lower", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description and to simplify the description, and are not indicative or implying that the apparatus or elements in question must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Unless specifically stated or limited otherwise, the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The present invention is not limited to any single aspect, nor to any single embodiment, nor to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the invention may be used alone or in combination with one or more other aspects and/or embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (17)

1. A method of video segmentation, comprising:
acquiring a subtitle text corresponding to a video file and generating a text vector based on the subtitle text;
clustering is carried out based on the text vector to obtain a hierarchical clustering tree diagram;
determining an entropy function for calculating entropy values corresponding to any group of nodes according to any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram; determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
dividing the video file based on the target group node;
The determining the entropy function for calculating the entropy value corresponding to any group of nodes according to the constraint condition on the hierarchical clustering tree-graph comprises the following steps:
determining the cost value corresponding to each node in any group of nodes conforming to constraint conditions on the hierarchical clustering tree-graph;
the entropy function is used for determining entropy values corresponding to the group of nodes for each cost value;
the determining the target group node on the hierarchical clustering tree diagram according to the entropy function comprises the following steps:
solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function;
and determining a group of nodes corresponding to the minimum entropy value as target group nodes.
2. The video segmentation method as set forth in claim 1, further comprising, after the segmenting the video file based on the target group node:
and determining the sub video files obtained by segmentation, and establishing the association relationship between the sub video files and the video files.
3. The video segmentation method according to claim 1, wherein the acquiring the subtitle text corresponding to the video file and generating the text vector based on the subtitle text comprises:
Extracting caption text from the video file in a voice recognition mode; wherein the caption text includes a plurality of caption units;
converting each caption unit in the caption text into a corresponding caption vector through a language characterization model; wherein all subtitle vectors constitute text vectors.
4. A video segmentation method according to claim 3, wherein the language characterization model is a BERT model.
5. The video segmentation method according to claim 1, wherein the clustering based on the text vector to obtain a hierarchical clustering tree diagram comprises:
and clustering the text vectors by adopting a hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram.
6. The video segmentation method according to claim 5, wherein the clustering the text vectors using a hierarchical clustering algorithm to obtain a hierarchical clustering tree diagram comprises:
determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
determining a similarity distance in a hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
And clustering the text vector based on the similarity distance to obtain a hierarchical clustering tree diagram.
7. The method for video segmentation according to claim 1, wherein the determining a cost value corresponding to each node in any group of nodes meeting a constraint condition on the hierarchical clustering tree-graph comprises:
determining video time length corresponding to each node in any group of nodes conforming to constraint conditions on the hierarchical clustering tree diagram;
and determining the cost value corresponding to each node according to the video time length corresponding to each node and the target segmentation time length.
8. The video segmentation method according to claim 1, wherein the entropy function is configured to determine, for each of the cost values, an entropy value corresponding to the set of nodes, including:
and the entropy function is used for summing the cost values, and determining that the summation result is the entropy value corresponding to the group of nodes.
9. A video segmentation apparatus, comprising:
the conversion unit is used for acquiring the caption text corresponding to the video file and generating a text vector based on the caption text;
the clustering unit is used for carrying out clustering processing based on the text vector to obtain a hierarchical clustering tree diagram;
The entropy function unit is used for determining an entropy function for calculating entropy values corresponding to any group of nodes according to the constraint condition on the hierarchical clustering tree-graph; the selecting unit is used for determining a target group node on the hierarchical clustering tree diagram according to the entropy function;
the segmentation unit is used for segmenting the video file based on the target group node;
the entropy function unit includes:
a calculating subunit, configured to determine a cost value corresponding to each node in any group of nodes that conform to constraint conditions on the hierarchical clustering tree-graph;
the function subunit is used for determining entropy values corresponding to the group of nodes for each cost value by the entropy function;
the selecting unit includes:
the minimum subunit is used for solving the entropy function to determine the minimum entropy value which can be obtained by the entropy function;
and the selecting subunit is used for determining a group of nodes corresponding to the minimum entropy value as target group nodes.
10. The video segmentation apparatus as set forth in claim 9, further comprising:
and the link unit is used for determining the sub-video files obtained by segmentation and establishing the association relationship between the sub-video files and the video files.
11. The video segmentation apparatus according to claim 9, wherein the conversion unit includes:
an extraction subunit, configured to extract a subtitle text from the video file by using a speech recognition manner; wherein the caption text includes a plurality of caption units;
the characterization subunit is used for converting each caption unit in the caption text into a corresponding caption vector through a language characterization model; wherein all subtitle vectors constitute text vectors.
12. The video segmentation apparatus according to claim 11, wherein the language characterization model is a BERT model.
13. The video segmentation apparatus according to claim 9, wherein the clustering unit includes:
and the clustering subunit is used for carrying out clustering processing on the text vectors by adopting a hierarchical aggregation clustering algorithm to obtain a hierarchical clustering tree diagram.
14. The video segmentation apparatus as set forth in claim 13, wherein the clustering subunit comprises:
the text module is used for determining the duration of each caption unit and the pause interval between adjacent caption units in the caption text according to the time data in the caption text;
The similarity module is used for determining a similarity distance in the hierarchical aggregation clustering algorithm based on the duration of each caption unit and the pause interval between adjacent caption units;
and the clustering module is used for carrying out clustering processing on the text vectors based on the similarity distance to obtain a hierarchical clustering tree diagram.
15. The video segmentation device of claim 9, wherein the computing subunit comprises:
the first calculation module is used for determining video time length corresponding to each node in any group of nodes meeting constraint conditions on the hierarchical clustering tree diagram;
and the second calculation module is used for determining the cost value corresponding to each node according to the video time length and the target segmentation time length corresponding to each node.
16. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the video segmentation method according to any one of claims 1 to 8 when the program is executed by the processor.
17. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the video segmentation method according to any of claims 1 to 8.
CN201911352570.6A 2019-12-25 2019-12-25 Video segmentation method and device Active CN111046839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911352570.6A CN111046839B (en) 2019-12-25 2019-12-25 Video segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911352570.6A CN111046839B (en) 2019-12-25 2019-12-25 Video segmentation method and device

Publications (2)

Publication Number Publication Date
CN111046839A CN111046839A (en) 2020-04-21
CN111046839B true CN111046839B (en) 2023-05-19

Family

ID=70240142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911352570.6A Active CN111046839B (en) 2019-12-25 2019-12-25 Video segmentation method and device

Country Status (1)

Country Link
CN (1) CN111046839B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866608B (en) * 2020-08-05 2022-08-16 北京华盛互联科技有限公司 Video playing method, device and system for teaching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719144A (en) * 2009-11-04 2010-06-02 中国科学院声学研究所 Method for segmenting and indexing scenes by combining captions and video image information
CN103631786A (en) * 2012-08-22 2014-03-12 腾讯科技(深圳)有限公司 Clustering method and device for video files
CN106845390A (en) * 2017-01-18 2017-06-13 腾讯科技(深圳)有限公司 Video title generation method and device
CN109783656A (en) * 2018-12-06 2019-05-21 北京达佳互联信息技术有限公司 Recommended method, system and the server and storage medium of audio, video data
CN110147846A (en) * 2019-05-23 2019-08-20 软通智慧科技有限公司 Methods of video segmentation, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8451292B2 (en) * 2009-11-23 2013-05-28 National Cheng Kung University Video summarization method based on mining story structure and semantic relations among concept entities thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719144A (en) * 2009-11-04 2010-06-02 中国科学院声学研究所 Method for segmenting and indexing scenes by combining captions and video image information
CN103631786A (en) * 2012-08-22 2014-03-12 腾讯科技(深圳)有限公司 Clustering method and device for video files
CN106845390A (en) * 2017-01-18 2017-06-13 腾讯科技(深圳)有限公司 Video title generation method and device
CN109783656A (en) * 2018-12-06 2019-05-21 北京达佳互联信息技术有限公司 Recommended method, system and the server and storage medium of audio, video data
CN110147846A (en) * 2019-05-23 2019-08-20 软通智慧科技有限公司 Methods of video segmentation, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111046839A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
US11350178B2 (en) Content providing server, content providing terminal and content providing method
US8886635B2 (en) Apparatus and method for recognizing content using audio signal
CN110503961B (en) Audio recognition method and device, storage medium and electronic equipment
CN102985967B (en) Adaptive audio transcoding
CN104703043A (en) Video special effect adding method and device
CN103699530A (en) Method and equipment for inputting texts in target application according to voice input information
JP6785904B2 (en) Information push method and equipment
CN112037792B (en) Voice recognition method and device, electronic equipment and storage medium
CN105551488A (en) Voice control method and system
EP3523718B1 (en) Creating a cinematic storytelling experience using network-addressable devices
CN105279259A (en) Search result determination method and apparatus
CN110738323B (en) Method and device for establishing machine learning model based on data sharing
CN106713111B (en) Processing method for adding friends, terminal and server
CN113257218B (en) Speech synthesis method, device, electronic equipment and storage medium
CN111078930A (en) Audio file data processing method and device
CN113177538A (en) Video cycle identification method and device, computer equipment and storage medium
US20150235643A1 (en) Interactive server and method for controlling the server
CN111667810B (en) Method and device for acquiring polyphone corpus, readable medium and electronic equipment
CN111046839B (en) Video segmentation method and device
KR102357620B1 (en) Chatbot integration agent platform system and service method thereof
CN112995530A (en) Video generation method, device and equipment
US20200112755A1 (en) Providing relevant and authentic channel content to users based on user persona and interest
CN113986958B (en) Text information conversion method and device, readable medium and electronic equipment
CN115967833A (en) Video generation method, device and equipment meter storage medium
CN111460214B (en) Classification model training method, audio classification method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220920

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant