CN112288047A - Broadcast television news stripping method based on probability distribution transformation clustering - Google Patents

Broadcast television news stripping method based on probability distribution transformation clustering Download PDF

Info

Publication number
CN112288047A
CN112288047A CN202011555578.5A CN202011555578A CN112288047A CN 112288047 A CN112288047 A CN 112288047A CN 202011555578 A CN202011555578 A CN 202011555578A CN 112288047 A CN112288047 A CN 112288047A
Authority
CN
China
Prior art keywords
data
current
feature
clustering
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011555578.5A
Other languages
Chinese (zh)
Other versions
CN112288047B (en
Inventor
陈锋
温序铭
张�诚
杨瀚
彭军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobey Digital Technology Co Ltd
Original Assignee
Chengdu Sobey Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobey Digital Technology Co Ltd filed Critical Chengdu Sobey Digital Technology Co Ltd
Priority to CN202011555578.5A priority Critical patent/CN112288047B/en
Publication of CN112288047A publication Critical patent/CN112288047A/en
Application granted granted Critical
Publication of CN112288047B publication Critical patent/CN112288047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a broadcast television news stripping method based on probability distribution transformation clustering, which comprises the following steps: s1, converting the news program video into data, and extracting characteristic data; s2, calculating the importance ratio of each feature data, and multiplying each feature data by the importance ratio of the feature as the new data of the feature; s3, normalizing each data with the weight characteristics; s4, clustering the in-point class and the non-in-point class in the normalized feature data by using probability distribution transformation clustering; s5, segmenting news stories and the like according to the point-in type data and the non-point-in type data obtained by clustering; the method solves the problems of large error and low accuracy of the traditional clustering algorithm in the splitting of the broadcast television news, and has very important significance for improving the accuracy of the clustering algorithm in the splitting application of the television news program.

Description

Broadcast television news stripping method based on probability distribution transformation clustering
Technical Field
The invention relates to the field of broadcast television news stripping, in particular to a broadcast television news stripping method based on probability distribution transformation clustering.
Background
In recent years, with the blowout development of the broadcast television news industry, the television news program has the characteristics of durability, instantaneity and the like of 7 × 24 hours. These news programs typically contain multiple news stories, and audience members such as television editors, viewers, etc. are usually only interested in a small portion of the news stories, so that it is necessary to split a continuous entire news program into multiple independent news stories. The traditional method for manually splitting the news stories is time-consuming and labor-consuming. Therefore, it is necessary to find a method for automatically stripping tv news, which intercepts news stories from the entire news material.
In conventional engineering applications, the problem of splitting news stories is generally regarded as a labeling type problem, and segments of news stories are labeled as bs (begin scene), ms (middle scene), es (end scene), ss (single scene), and then labeled by using a labeling algorithm, so as to complete the splitting. However, the labeling algorithm applied in the traditional labeling thought is a supervised learning algorithm, and a large amount of labels labeled manually are needed, so that the rapid application of the labeling algorithm is restricted.
Clustering algorithms, an unsupervised learning algorithm, are typically used in the absence of data tags. The essence of the news story split bar is to find the in point of each news story from the television news program material, and the news story is naturally determined as long as the in point of the news story is found. Therefore, the in-points can be considered as one class, and the non-in-points can be considered as another class, so that the breaking of news stories is considered as a binary clustering problem.
However, in the practical engineering application of news story slitting, the effect of the traditional clustering algorithm is limited to a certain extent, mainly because the traditional clustering algorithm directly carries out clustering analysis in the original data space. For example, the Kmeans algorithm determines the class of data by continuously iterating with euclidean distance directly in the original data space. When the distribution of the in-point data and the non-in-point data in the original data space is not clear, the clustering effect is not good due to direct clustering, and therefore large errors occur in the in-out points of the news story slivers.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a broadcast television news stripping method based on probability distribution transformation clustering, solves the problems of large error and low accuracy of the traditional clustering algorithm in the broadcast television news stripping, and has very important significance for improving the accuracy of the clustering algorithm in the application of television news program stripping.
The purpose of the invention is realized by the following scheme:
a broadcast television news splitting method based on probability distribution transformation clustering comprises the following steps:
s1, extracting characteristic data in the news program video data;
s2, calculating the importance ratio of each extracted feature data, and then multiplying each feature data by the importance ratio of the feature data to obtain data with weighted features;
s3, normalizing each data with the weight characteristics;
s4, clustering the point-in data and the non-point-in data in the normalized feature data by probability distribution conversion clustering;
and S5, segmenting the news story according to the data of the in-point class and the non-in-point class obtained by clustering.
Further, step S1 includes the steps of:
s101, cutting a video from audio pause points in a news program to obtain a plurality of cut segments, wherein all the audio pause points are used as candidate cut points of a news story;
s102, extracting visual feature data of each cutting segment according to the video information of each cutting segment; the visual feature data includes: judging result data of whether the current cutting segment has a studio or not, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain the studio or not, data of the number of faces appearing in the studio, judging result data of whether the current cutting segment is a continuous studio or not and judging result data of whether the film and flower information appears or not;
s103, extracting audio characteristic data of each cutting segment according to the audio information of each cutting segment; the audio feature data includes: judging result data of whether music appears in the current cutting segment, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain music and ASR (auto-regressive) speech character information data of the current cutting segment;
s104, manually judging whether each current cutting segment and the previous cutting segment belong to different news stories, setting 1 if the current cutting segment and the previous cutting segment belong to different news stories, and setting 0 if the current cutting segment and the previous cutting segment do not belong to different news stories; the result of the manual judgment will be used in calculating the feature importance in the subsequent step S2 as a true result.
Further, step S2 includes the steps of:
s201, numbering according to a time sequence based on the feature data extracted in the step S1, then taking a certain feature according to the number sequence, taking the certain feature as a current feature, judging whether the current feature is a continuous feature or a discrete feature, and if the current feature is the continuous feature, discretizing a continuous value into n boxes by using an equal frequency binning method, wherein 2< = n < = 5;
s202, selecting a certain box of the current characteristic data, recording the box as an i box, and counting the number of the cutting types of the i box, namely the data of the set 1 in the step S104
Figure 767010DEST_PATH_IMAGE001
And counting the number of non-slitting type data of the set 0 in step S104
Figure 152992DEST_PATH_IMAGE002
Then, respectively calculating the number of the cutting classes of the current feature current box to account for the total number of the cutting classes of the current feature
Figure 401571DEST_PATH_IMAGE003
The ratio of the current feature to the total number of the non-slitting classes of the current feature
Figure 531201DEST_PATH_IMAGE004
The quotient of the ratio, the logarithm, and the notation
Figure 814414DEST_PATH_IMAGE005
Figure 105718DEST_PATH_IMAGE006
S203, calculating the difference between the ratio of the cutting class and the ratio of the non-cutting class of the current box of the current feature in the step S202, and multiplying the difference by the ratio
Figure 107173DEST_PATH_IMAGE005
It is recorded as
Figure 774914DEST_PATH_IMAGE007
Figure 411170DEST_PATH_IMAGE008
S204, repeating the steps S202-S203, completing the calculation of all boxes, and then completing the calculation of all boxes, namely n boxes
Figure 873375DEST_PATH_IMAGE007
Adding to obtain current characteristic dataIVA value;
Figure 362125DEST_PATH_IMAGE009
s205, of all characteristic dataIVAfter the values are calculated, the value of each feature is calculatedIVValue of the sumIVThe ratio of the value sum; and taking the ratio as the weight of the current characteristic data, and multiplying the value of the current characteristic data by the ratio as the data with the weighted characteristic of the current characteristic data.
Further, in step S3, the min-max algorithm is selected as the normalization method of the weighted feature data, and the calculation formula is as follows:
Figure 567979DEST_PATH_IMAGE010
where j represents the data index,
Figure 560205DEST_PATH_IMAGE011
Figure 458891DEST_PATH_IMAGE012
respectively represent data before and after normalization, and X represents all data of a certain characteristic.
Further, step S4 includes the steps of:
s401, taking the certain normalized news program material data in the step S3 as basic data X, randomly selecting two segments in the basic data X as initial center segments, and taking the two initial center segments as current optimal center segments;
s402, taking each line in the basic data X as a segment, calculating Euclidean distances between the segments and the two initial central segments in the step S401 respectively, dividing the segments into which initial central segment class when the segments are closer to which initial central segment, and respectively recording the two initial central segment classes as a class and a class b;
s403, solving the data transfer matrix A to enable the edge probability of the data of a and b
Figure 169358DEST_PATH_IMAGE013
And
Figure 913323DEST_PATH_IMAGE014
the difference of (a) is minimal; calculating a data transfer matrix A by using the MMD distances of the two classes; xa,XbRespectively representing basic data of a and b;
s404, using a Gaussian kernel function to perform dimension increasing on the basic data X to obtain dimension increasing data
Figure 524171DEST_PATH_IMAGE015
(ii) a The gaussian kernel function is given by:
Figure 593758DEST_PATH_IMAGE016
s405, calculating
Figure 791521DEST_PATH_IMAGE017
Data in new data space after upscaling
Figure 339177DEST_PATH_IMAGE018
S406, for the data of the new data space after the dimension increasing
Figure 571576DEST_PATH_IMAGE018
Using the kmeans algorithm, the data is divided
Figure 812064DEST_PATH_IMAGE018
Clustering as new
Figure 231544DEST_PATH_IMAGE019
Recording indexes of data of the two types;
s407: according to step S406
Figure 582891DEST_PATH_IMAGE019
The indexes of the two types of data find the corresponding data of the basic data X as new types a and b of the basic data X;
s408: respectively calculating new a and b clustering center segments according to the new a and b types of the basic data X, and comparing the new a and b clustering center segments with the current optimal clustering center segment; if the new a and b cluster center segments and the current optimal cluster center segment are moved in comparison, which indicates that the clustering has a space for iterative optimization, then go to step S403; if the new a and b cluster center segments are not moved in comparison with the current optimal cluster center segment, the optimal cluster is found through iteration, and the algorithm is ended.
Further, in step S403, the MMD distance, i.e. the distance between the class centers of the two classes a and b, is calculated as follows:
Figure 669796DEST_PATH_IMAGE020
n and m respectively represent data volumes of a and b, and i and j respectively represent data indexes of a and b;
then, the MMD distance is converted, and the constraint condition that the variance of the data of a and b is unchanged before and after conversion is considered; meanwhile, overfitting is prevented, and a regular term is added; in summary, the objective function is as follows:
Figure 81185DEST_PATH_IMAGE021
where tr () is the trace of the matrix, M is an MMD matrix, H is the center matrix, I is the identity matrix,
Figure 486497DEST_PATH_IMAGE022
is a regular term coefficient;
then, solving the target function by using a Lagrange method, and then obtaining a transformation matrix A; the solving formula is as follows:
Figure 375955DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 317367DEST_PATH_IMAGE024
is the lagrange multiplier and X is the underlying data.
Further, in step S5, the number of studios appearing in each category is counted according to the two categories of clustering results a and b, and the category with the largest number of studios appearing is taken as the cut category, and the category with the smallest number of studios appearing is taken as the non-cut category, so as to obtain the final news program cut result.
Further, in step S1, the audio pause point is used as a candidate point for news story segmentation.
Further, comprising the steps of:
extracting the topic distribution of each segment by using a multi-label topic classification model according to the ASR speech character information extracted in the step S103, then calculating the topic cosine similarity of the current segment and the two segments before and after the current segment and calculating the jaccard similarity of the current segment and the two segments before and after the current segment by using the topic distribution of the current segment and the two segments before and after the current segment;
extracting keywords of each segment according to the ASR speech character information extracted in the step S103, and then calculating the average value, the maximum value, the minimum value and the variance of the similarity of the keywords of the current segment and the keywords of the front segment and the rear segment by using the keywords of the current segment and the keywords of the front segment and the rear segment and combining a word2vect model;
and extracting the entity time of each segment according to the ASR speech character information extracted in the step S103, and judging whether the entity time can be extracted or not.
The invention has the beneficial effects that:
(1) the invention provides a new clustering method and a news stripping method, which solve the problems of large error and low accuracy of the traditional clustering algorithm in the broadcast television news stripping; specifically, the method for splitting the news of the broadcast television based on the probability distribution transformation clustering comprises the steps of firstly transforming an original data space so as to enable the distribution difference of point-in data and non-point-in data to be larger, and then clustering again by using data after transforming the data space so as to achieve the purpose of distinguishing the point-in data from the non-point-in data; the method can convert the original data space according to the distribution of the point-in data and the non-point-in data, so that the difference of the data of the same type is smaller, and then the category of the data is calculated by using a clustering method, so that the point-in class and the non-point-in class are found, and the method has very important significance for improving the accuracy of a clustering algorithm in the application of splitting the television news program.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a probability distribution transform clustering algorithm;
FIG. 2 is a flow chart of the method steps of the present invention.
Detailed Description
All of the features disclosed in the specification for all of the embodiments (including any accompanying claims, abstract and drawings), or all of the steps of a method or process so disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
As shown in fig. 1 and 2, the method for splitting news items of broadcast television based on probability distribution transformation clustering includes the steps:
s1, extracting characteristic data in the news program video data;
s2, calculating the importance ratio of each extracted feature data, and then multiplying each feature data by the importance ratio of the feature data to obtain data with weighted features;
s3, normalizing each data with the weight characteristics;
s4, clustering the point-in data and the non-point-in data in the normalized feature data by probability distribution conversion clustering;
and S5, segmenting the news story according to the data of the in-point class and the non-in-point class obtained by clustering.
Further, step S1 includes the steps of:
s101, cutting a video from audio pause points in a news program to obtain a plurality of cut segments, wherein all the audio pause points are used as candidate cut points of a news story;
s102, extracting visual feature data of each cutting segment according to the video information of each cutting segment; the visual feature data includes: judging result data of whether the current cutting segment has a studio or not, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain the studio or not, data of the number of faces appearing in the studio, judging result data of whether the current cutting segment is a continuous studio or not and judging result data of whether the film and flower information appears or not;
s103, extracting audio characteristic data of each cutting segment according to the audio information of each cutting segment; the audio feature data includes: judging result data of whether music appears in the current cutting segment, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain music and ASR (auto-regressive) speech character information data of the current cutting segment;
s104, manually judging whether each current cutting segment and the previous cutting segment belong to different news stories, setting 1 if the current cutting segment and the previous cutting segment belong to different news stories, and setting 0 if the current cutting segment and the previous cutting segment do not belong to different news stories; the result of the manual judgment will be used in calculating the feature importance in the subsequent step S2 as a true result.
Further, step S2 includes the steps of:
s201, numbering according to a time sequence based on the feature data extracted in the step S1, then taking a certain feature according to the number sequence, taking the certain feature as a current feature, judging whether the current feature is a continuous feature or a discrete feature, and if the current feature is the continuous feature, discretizing a continuous value into n boxes by using an equal frequency binning method, wherein 2< = n < = 5;
s202, selecting a certain box of the current characteristic data, recording the box as an i box, and counting the number of the cutting types of the i box, namely the data of the set 1 in the step S104
Figure 899658DEST_PATH_IMAGE025
And counting the number of non-slitting type data of the set 0 in step S104
Figure 293730DEST_PATH_IMAGE026
Then, respectively calculating the number of the cutting classes of the current feature current box to account for the total number of the cutting classes of the current feature
Figure 986879DEST_PATH_IMAGE003
The ratio of the current feature to the total number of the non-slitting classes of the current feature
Figure 517218DEST_PATH_IMAGE004
The quotient of the ratio, the logarithm, and the notation
Figure 270410DEST_PATH_IMAGE005
Figure 650314DEST_PATH_IMAGE006
S203, calculating the difference between the ratio of the cutting class and the ratio of the non-cutting class of the current box of the current feature in the step S202, and multiplying the difference by the ratio
Figure 147154DEST_PATH_IMAGE005
It is recorded as
Figure 797578DEST_PATH_IMAGE007
Figure 721672DEST_PATH_IMAGE008
S204, repeating the steps S202-S203, completing the calculation of all boxes, and then completing the calculation of all boxes, namely n boxes
Figure 90336DEST_PATH_IMAGE007
Adding to obtain current characteristic dataIVA value;
Figure 125288DEST_PATH_IMAGE009
s205, of all characteristic dataIVAfter the values are calculated, the value of each feature is calculatedIVValue of the sumIVThe ratio of the value sum; and taking the ratio as the weight of the current characteristic data, and multiplying the value of the current characteristic data by the ratio as the data with the weighted characteristic of the current characteristic data.
Further, in step S3, the min-max algorithm is selected as the normalization method of the weighted feature data, and the calculation formula is as follows:
Figure 895798DEST_PATH_IMAGE010
where j represents the data index,
Figure 725214DEST_PATH_IMAGE011
Figure 79710DEST_PATH_IMAGE012
respectively represent data before and after normalization, and X represents all data of a certain characteristic.
Further, step S4 includes the steps of:
s401, taking the certain normalized news program material data in the step S3 as basic data X, randomly selecting two segments in the basic data X as initial center segments, and taking the two initial center segments as current optimal center segments;
s402, taking each line in the basic data X as a segment, calculating Euclidean distances between the segments and the two initial central segments in the step S401 respectively, dividing the segments into which initial central segment class when the segments are closer to which initial central segment, and respectively recording the two initial central segment classes as a class and a class b;
s403, solving the data transfer matrix A to enable the edge probability of the data of a and b
Figure 183932DEST_PATH_IMAGE013
And
Figure 543369DEST_PATH_IMAGE014
the difference of (a) is minimal; calculating a data transfer matrix A by using the MMD distances of the two classes; xa,XbRespectively representing basic data of a and b;
s404, using a Gaussian kernel function to perform dimension increasing on the basic data X to obtain dimension increasing data
Figure 74845DEST_PATH_IMAGE015
(ii) a The gaussian kernel function is given by:
Figure 886943DEST_PATH_IMAGE016
s405, calculating
Figure 794856DEST_PATH_IMAGE017
Data in new data space after upscaling
Figure 274379DEST_PATH_IMAGE018
S406, for the data of the new data space after the dimension increasing
Figure 711177DEST_PATH_IMAGE018
Using the kmeans algorithm, the data is divided
Figure 541729DEST_PATH_IMAGE018
Clustering as new
Figure 722175DEST_PATH_IMAGE019
Recording indexes of data of the two types;
s407: according to step S406
Figure 554739DEST_PATH_IMAGE019
The indexes of the two types of data find the corresponding data of the basic data X as new types a and b of the basic data X;
s408: respectively calculating new a and b clustering center segments according to the new a and b types of the basic data X, and comparing the new a and b clustering center segments with the current optimal clustering center segment; if the new a and b cluster center segments and the current optimal cluster center segment are moved in comparison, which indicates that the clustering has a space for iterative optimization, then go to step S403; if the new a and b cluster center segments are not moved in comparison with the current optimal cluster center segment, the optimal cluster is found through iteration, and the algorithm is ended.
Further, in step S403, the MMD distance, i.e. the distance between the class centers of the two classes a and b, is calculated as follows:
Figure 162438DEST_PATH_IMAGE020
n and m respectively represent data volumes of a and b, and i and j respectively represent data indexes of a and b;
then, the MMD distance is converted, and the constraint condition that the variance of the data of a and b is unchanged before and after conversion is considered; meanwhile, overfitting is prevented, and a regular term is added; in summary, the objective function is as follows:
Figure 214708DEST_PATH_IMAGE021
where tr () is the trace of the matrix, M is an MMD matrix, H is the center matrix, I is the identity matrix,
Figure 933265DEST_PATH_IMAGE022
is a regular term coefficient;
then, solving the target function by using a Lagrange method, and then obtaining a transformation matrix A; the solving formula is as follows:
Figure 387380DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 165980DEST_PATH_IMAGE024
is the lagrange multiplier and X is the underlying data.
Further, in step S5, the number of studios appearing in each category is counted according to the two categories of clustering results a and b, and the category with the largest number of studios appearing is taken as the cut category, and the category with the smallest number of studios appearing is taken as the non-cut category, so as to obtain the final news program cut result.
Further, in step S1, the audio pause point is used as a candidate point for news story segmentation.
Further, comprising the steps of:
extracting the topic distribution of each segment by using a multi-label topic classification model according to the ASR speech character information extracted in the step S103, then calculating the topic cosine similarity of the current segment and the two segments before and after the current segment and calculating the jaccard similarity of the current segment and the two segments before and after the current segment by using the topic distribution of the current segment and the two segments before and after the current segment;
extracting keywords of each segment according to the ASR speech character information extracted in the step S103, and then calculating the average value, the maximum value, the minimum value and the variance of the similarity of the keywords of the current segment and the keywords of the front segment and the rear segment by using the keywords of the current segment and the keywords of the front segment and the rear segment and combining a word2vect model;
and extracting the entity time of each segment according to the ASR speech character information extracted in the step S103, and judging whether the entity time can be extracted or not.
In other embodiments of the present invention, a method for breaking news items of broadcast television based on probability distribution transformation clustering is found, which comprises the following steps:
the method comprises the following steps: and (5) video datamation of the news program. More than 50 news program videos are obtained, and feature data (such as whether a studio, semantic similarity before and after, keyword similarity before and after, and the like) are extracted according to the news program videos.
Step two: and calculating the weight of the news program characteristic data. The importance ratio of each feature is calculated using the Information Value algorithm, and then each feature data is multiplied by the importance ratio of the feature as new data for the feature.
Step three: and normalizing the news program characteristic data. The data for each feature was normalized to between 0-1 using the min-max method.
Step four: and clustering news program characteristic data. And clustering the in-point class and the non-in-point class in the characteristic data by using a probability distribution transformation clustering algorithm.
Step five: a news story is segmented. And cutting out news stories according to the point-in type data and the non-point-in type data obtained by clustering.
In other embodiments of the present invention, a method for splitting news of broadcast television based on probability distribution transformation clustering is provided, where fig. 1 shows the whole process steps of extracting video data of broadcast television news into segments by using a clustering algorithm, and the method includes the following steps:
the method comprises the following steps: video digitization of news programs;
step two: calculating the weight of the feature data of the news program;
step three: normalizing the weight characteristic data of the news program;
step four: clustering news program characteristic data;
step five: and segmenting the news stories based on the clustering point-in data.
In the first step of the above scheme, the video digitization of the news program refers to obtaining historical video material of the news program from a plurality of television channel programs. Considering that short audio pause occurs when switching between different news stories, the scheme of the embodiment adopts the audio pause point as a candidate point for segmenting the news stories. The nature of the news story ticker is to find the true news story cut points from these candidate audio cut points.
In view of the above considerations, the specific implementation steps of step one are as follows:
step 101: the video is first cut from audio pause points in the news program, all of which are candidate cut points for news stories.
Step 102: and extracting the visual feature data of the segments according to the video information of each cut segment. The visual feature data includes: whether the current cutting segment appears in a studio; whether the front and rear cutting segments of the current cutting segment contain a studio or not; the number of faces appearing in the studio; whether it is a continuous studio; whether the chipping information appears or not.
Step 103: and extracting the audio characteristic data of each cut segment according to the audio information of each cut segment. The audio feature data includes: whether music appears in the current cutting segment; whether the front and rear cut sections of the current cut section contain music; ASR phonetic text information.
Step 104: and extracting the theme distribution of each segment by using a theme model according to the ASR speech character information extracted in the step 103, and then calculating the theme cosine and jaccard similarity of the current segment and the two segments by using the theme distribution of the current segment and the two segments.
Step 105: extracting the keywords of each segment by using the keyword model according to the ASR speech character information extracted in the step 103, and then calculating the average value, the maximum value, the minimum value and the variance of the similarity of the keywords of the current segment and the keywords of the front segment and the back segment by using the keywords of the current segment and the keywords of the front segment and the back segment in combination with the word2 vent model.
Step 106: and (4) extracting the entity time of each segment by using an entity recognition model according to the ASR speech character information extracted in the step 103, and judging whether the entity time can be extracted or not.
Step 107: and manually judging whether each current clip and the previous clip belong to different news stories, if so, setting 1, and otherwise, setting 0. And the manual judgment result is used as a real result in the subsequent step II for calculating the feature importance.
In step two of the above scheme, the feature weight extracted in step one needs to be calculated. The purpose of extracting feature weights is to enlarge the contribution of important features and reduce the contribution of unimportant features when computing feature distances. The present invention calculates the degree of importance of each feature using the Information Value method. The specific calculation formula and the calculation process are as follows:
step 201: judging whether the current feature is a continuous feature or a discrete feature, and if the current feature is the continuous feature, discretizing a continuous value into n boxes by using an equal frequency binning method, wherein 2< = n < = 5.
Step 202: selecting a certain box with current characteristics, recording the box as i, and counting the number of the box slitting classes (namely the data of the 1 in the step 107) ((I))
Figure 705546DEST_PATH_IMAGE027
) And the number of non-slitted classes (i.e. data set 0 in step 107) ((ii))
Figure 227794DEST_PATH_IMAGE028
) Then respectively calculating the current characteristicsThe number of the cutting types of the front box accounts for the total number of the cutting types of the current characteristics (
Figure 34951DEST_PATH_IMAGE029
) The ratio of (a) and the number of non-slice classes of the current box of the current feature in the total number of non-slice classes of the current feature ((
Figure 984453DEST_PATH_IMAGE030
) The quotient of the ratio, the logarithm, and the notation
Figure 745735DEST_PATH_IMAGE031
Figure 71674DEST_PATH_IMAGE032
Step 203: the difference between the slice class fraction and the non-slice class fraction of the current bin of the current feature in step 202 is calculated and then multiplied by this difference
Figure 234802DEST_PATH_IMAGE031
It is recorded as
Figure 355205DEST_PATH_IMAGE033
Figure 134942DEST_PATH_IMAGE034
Step 204: the 202-203 steps are repeated to directly complete the calculation of all the bins, and then all the bins (n bins in total) are calculated
Figure 733414DEST_PATH_IMAGE033
And adding to obtain the Information Value of the feature.
Figure 249584DEST_PATH_IMAGE035
Step 205: after the Information values of all the features are calculated, the ratio of the Information Value of each feature to the sum of the total Information values is calculated. And taking the ratio as the weight of the feature, and multiplying the value of the feature by the weight to obtain new data of the feature.
In the third step of the above scheme, the weight feature data of the news program calculated in the second step needs to be normalized. The purpose of normalization is to scale the numerical value of each feature within the interval of 0-1, and avoid large deviation of the calculated distance caused by different features due to inconsistent dimensions. Selecting a min-max algorithm as a normalization method of the characteristic data with the weight, wherein the calculation method comprises the following steps:
Figure 72046DEST_PATH_IMAGE036
where j represents the data index,
Figure 807921DEST_PATH_IMAGE037
Figure 741242DEST_PATH_IMAGE038
respectively represent data before and after normalization, and X represents all data of a certain characteristic.
In the fourth step of the scheme, data clustering is carried out by using a probability distribution transform clustering algorithm according to the data normalized in the third step, so that the clustering distinguishes two types, namely a slitting type and a non-slitting type. The concrete implementation steps of the step four are divided into the following steps.
Step 401: taking the normalized news program material data in the third step as basic data X, randomly selecting two segments in the basic data X as initial central segments, and taking the two initial central segments as current optimal central segments.
Step 402: the euclidean distances are calculated for the segments in the basis data from the two initial center segments in step 401, and the closer the basis data is to which initial center segment, the segments are classified as the center segment class. These two classes are referred to as a and b, respectively.
Step 403: solving the data transfer matrix A to make the data of a and bEdge probability
Figure 613383DEST_PATH_IMAGE039
And
Figure 341168DEST_PATH_IMAGE040
as close as possible. The data transfer matrix a is calculated using the MMD distances of the two classes. The MMD distance is essentially the distance to compute the class center for the two classes a, b, which is defined as follows:
Figure 829918DEST_PATH_IMAGE020
wherein n and m represent data volumes of a and b, and i and j represent data indexes of a and b.
The MMD distance is then transformed using mathematical changes, the mathematical reasoning being as follows:
Figure 35771DEST_PATH_IMAGE041
where M is an MMD matrix, as follows:
Figure 27998DEST_PATH_IMAGE042
this MMD matrix represents the meaning: when two fragment data belong to class a at the same time, then
Figure 425219DEST_PATH_IMAGE043
When two fragment data belong to class b at the same time, then
Figure 135686DEST_PATH_IMAGE044
When the two fragments belong to different classes respectively,
Figure 145230DEST_PATH_IMAGE045
although the data of the two classes a and b are as close as possible in the transformed data space, the misclassified data can be closer to the correct class, but the risk that the correctly clustered data is misclassified is also increased. In order to resist the risk, the constraint condition that the respective variances of the data of the a and the b are not changed before and after the transformation data space is added. Meanwhile, in order to prevent overfitting, a regularization term needs to be added.
In summary, the objective function of the algorithm of the present embodiment is as follows:
Figure 523122DEST_PATH_IMAGE046
wherein, in the constraint condition,
Figure 327130DEST_PATH_IMAGE047
is a central matrix of which the center is,
Figure 524893DEST_PATH_IMAGE048
is a matrix of units, and is,
Figure 72549DEST_PATH_IMAGE049
is a regular term coefficient.
Then, the lagrangian method is used to solve the target function with the constraint, and then the transformation matrix a can be obtained. The solution is:
Figure 39368DEST_PATH_IMAGE050
wherein the content of the first and second substances,
Figure 279857DEST_PATH_IMAGE051
is the lagrange multiplier and X is the underlying data.
Step 404: using a Gaussian kernel function to carry out dimensionality raising on the basic data X to obtain dimensionality-raised data
Figure 964916DEST_PATH_IMAGE015
. The gaussian kernel function is given by:
Figure 838235DEST_PATH_IMAGE016
step 405: the up-dimensional data K is obtained from step 404, and calculation is performed
Figure 659561DEST_PATH_IMAGE052
Data in new data space after upscaling
Figure 70951DEST_PATH_IMAGE053
And K represents the ascending dimension data.
Step 406: data for new data space after upscaling
Figure 243306DEST_PATH_IMAGE053
Using the kmeans algorithm, the data is divided
Figure 132765DEST_PATH_IMAGE053
Clustering as new
Figure 339755DEST_PATH_IMAGE054
And recording indexes of the data to which the two types belong.
Step 407: according to step 406
Figure 656467DEST_PATH_IMAGE054
And the indexes of the data belonging to the two types find the corresponding data of the basic data X as new types a and b of the basic data X.
Step 408: and respectively calculating the clustering center segments of the new a and b types according to the new a and b types of the basic data X. Step 409: comparing the new a and b cluster center segments with the current optimal cluster center segment; if the new a and b cluster center segments and the current optimal cluster center segment are moved in comparison, which indicates that the clustering has a space for iterative optimization, then go to step S403; if the new a and b cluster center segments are not moved in comparison with the current optimal cluster center segment, the optimal cluster is found through iteration, and the algorithm is ended.
Step five: and counting the number of the studios in each category according to the clustering results a and b, taking the studios with more studios as the slitting categories, and taking the studios with less studios as the non-slitting categories.
And (5) extracting data from the news program, and obtaining a slitting class and a non-slitting class by using a probability distribution conversion clustering algorithm so as to obtain a final news program slitting result.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.

Claims (8)

1. A broadcast television news splitting method based on probability distribution transformation clustering is characterized by comprising the following steps:
s1, extracting characteristic data in the news program video data;
s2, calculating the importance ratio of each extracted feature data, and then multiplying each feature data by the importance ratio of the feature data to obtain data with weighted features;
s3, normalizing each data with the weight characteristics;
s4, clustering the point-in data and the non-point-in data in the normalized feature data by probability distribution conversion clustering;
and S5, segmenting the news story according to the data of the in-point class and the non-in-point class obtained by clustering.
2. The method for breaking news items in broadcast television based on probability distribution transformation clustering as claimed in claim 1, wherein the step S1 includes the steps of:
s101, cutting a video from audio pause points in a news program to obtain a plurality of cut segments, wherein all the audio pause points are used as candidate cut points of a news story;
s102, extracting visual feature data of each cutting segment according to the video information of each cutting segment; the visual feature data includes: judging result data of whether the current cutting segment has a studio or not, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain the studio or not, data of the number of faces appearing in the studio, judging result data of whether the current cutting segment is a continuous studio or not and judging result data of whether the film and flower information appears or not;
s103, extracting audio characteristic data of each cutting segment according to the audio information of each cutting segment; the audio feature data includes: judging result data of whether music appears in the current cutting segment, judging result data of whether the front cutting segment and the rear cutting segment of the current cutting segment contain music and ASR (auto-regressive) speech character information data of the current cutting segment;
s104, manually judging whether each current cutting segment and the previous cutting segment belong to different news stories, setting 1 if the current cutting segment and the previous cutting segment belong to different news stories, and setting 0 if the current cutting segment and the previous cutting segment do not belong to different news stories; the result of the manual judgment will be used in calculating the feature importance in the subsequent step S2 as a true result.
3. The method for breaking news items of broadcast television based on probability distribution transformation clustering as claimed in any one of claims 1 or 2, wherein the step S2 comprises the steps of:
s201, numbering according to a time sequence based on the feature data extracted in the step S1, then taking a certain feature according to the number sequence, taking the certain feature as a current feature, judging whether the current feature is a continuous feature or a discrete feature, and if the current feature is the continuous feature, discretizing a continuous value into n boxes by using an equal frequency binning method, wherein 2< = n < = 5;
s202, selecting a certain box of the current characteristic data, recording the box as an i box, and counting the number of the cutting types of the i box, namely the data of the set 1 in the step S104
Figure 924107DEST_PATH_IMAGE001
And counting the number of non-slitting type data of the set 0 in step S104
Figure 856291DEST_PATH_IMAGE002
Then, respectively calculating the number of the cutting classes of the current feature current box to account for the total number of the cutting classes of the current feature
Figure 433641DEST_PATH_IMAGE003
The ratio of the current feature to the total number of the non-slitting classes of the current feature
Figure 134880DEST_PATH_IMAGE004
The quotient of the ratio, the logarithm, and the notation
Figure 375369DEST_PATH_IMAGE005
Figure 998111DEST_PATH_IMAGE006
S203, calculating the difference between the ratio of the cutting class and the ratio of the non-cutting class of the current box of the current feature in the step S202, and multiplying the difference by the ratio
Figure 349458DEST_PATH_IMAGE005
It is recorded as
Figure 170784DEST_PATH_IMAGE007
Figure 847753DEST_PATH_IMAGE008
S204, repeating the steps S202-S203, completing the calculation of all boxes, and then completing the calculation of all boxes, namely n boxes
Figure 253064DEST_PATH_IMAGE007
Adding to obtain current characteristic dataIVA value;
Figure 408102DEST_PATH_IMAGE009
s205, of all characteristic dataIVAfter the values are calculated, the value of each feature is calculatedIVValue of the sumIVThe ratio of the value sum; and taking the ratio as the weight of the current characteristic data, and multiplying the value of the current characteristic data by the ratio as the data with the weighted characteristic of the current characteristic data.
4. The method for breaking news in broadcasting TV based on probability distribution transformation clustering as claimed in claim 1, wherein in step S3, the min-max algorithm is selected as the normalization method of weighted feature data, and the calculation formula is as follows:
Figure 287196DEST_PATH_IMAGE010
where j represents the data index,
Figure 135066DEST_PATH_IMAGE011
Figure 529139DEST_PATH_IMAGE012
respectively represent data before and after normalization, and X represents all data of a certain characteristic.
5. The method for breaking news items in broadcast television based on probability distribution transformation clustering as claimed in claim 1, wherein the step S4 comprises the steps of:
s401, taking the certain normalized news program material data in the step S3 as basic data X, randomly selecting two segments in the basic data X as initial center segments, and taking the two initial center segments as current optimal center segments;
s402, taking each line in the basic data X as a segment, calculating Euclidean distances between the segments and the two initial central segments in the step S401 respectively, dividing the segments into which initial central segment class when the segments are closer to which initial central segment, and respectively recording the two initial central segment classes as a class and a class b;
s403, solving the data transfer matrix A to enable the edge probability of the data of a and b
Figure 425550DEST_PATH_IMAGE013
And
Figure 188845DEST_PATH_IMAGE014
the difference of (a) is minimal; calculating a data transfer matrix A by using the MMD distances of the two classes; xa,XbRespectively representing basic data of a and b;
s404, using a Gaussian kernel function to perform dimension increasing on the basic data X to obtain dimension increasing data
Figure 676458DEST_PATH_IMAGE015
(ii) a The gaussian kernel function is given by:
Figure 557826DEST_PATH_IMAGE016
s405, calculating
Figure 54667DEST_PATH_IMAGE017
Data in new data space after upscaling
Figure 705091DEST_PATH_IMAGE018
S406, for the data of the new data space after the dimension increasing
Figure 629185DEST_PATH_IMAGE018
Using the kmeans algorithm, the data is divided
Figure 732270DEST_PATH_IMAGE018
Clustering as new
Figure 265757DEST_PATH_IMAGE019
Recording indexes of data of the two types;
s407: according to step S406
Figure 833005DEST_PATH_IMAGE019
The indexes of the two types of data find the corresponding data of the basic data X as new types a and b of the basic data X;
s408: respectively calculating new a and b clustering center segments according to the new a and b types of the basic data X, and comparing the new a and b clustering center segments with the current optimal clustering center segment; if the new a and b cluster center segments and the current optimal cluster center segment are moved in comparison, which indicates that the clustering has a space for iterative optimization, then go to step S403; if the new a and b cluster center segments are not moved in comparison with the current optimal cluster center segment, the optimal cluster is found through iteration, and the algorithm is ended.
6. The method of claim 5, wherein the broadcast TV news ticker-splitting method based on probability distribution transformation clustering,
in step S403, the MMD distance, i.e. the distance between the class centers of the two classes a and b, is calculated as follows:
Figure 928000DEST_PATH_IMAGE020
n and m respectively represent data volumes of a and b, and i and j respectively represent data indexes of a and b;
then, the MMD distance is converted, and the constraint condition that the variance of the data of a and b is unchanged before and after conversion is considered; meanwhile, overfitting is prevented, and a regular term is added; in summary, the objective function is as follows:
Figure 721643DEST_PATH_IMAGE021
where tr () is the trace of the matrix, M is an MMD matrix, H is the center matrix, I is the identity matrix,
Figure 825866DEST_PATH_IMAGE022
is a regular term coefficient;
then, solving the target function by using a Lagrange method, and then obtaining a transformation matrix A; the solving formula is as follows:
Figure 450882DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 920040DEST_PATH_IMAGE024
is the lagrange multiplier and X is the underlying data.
7. The method for breaking news in broadcasting TV based on probability distribution transformation clustering as claimed in claim 5, wherein in step S5, according to the clustering results a and b, the number of studios appearing in each category is counted, the studios with the larger number of studios appearing are used as the slicing categories, and the studios with the smaller number of studios appearing are used as the non-slicing categories, so as to obtain the final news program slicing result.
8. The broadcast television news ticker method based on probability distribution transform clustering of claim 1, wherein in step S1, audio pause points are used as candidate points for news story segmentation.
CN202011555578.5A 2020-12-25 2020-12-25 Broadcast television news stripping method based on probability distribution transformation clustering Active CN112288047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011555578.5A CN112288047B (en) 2020-12-25 2020-12-25 Broadcast television news stripping method based on probability distribution transformation clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011555578.5A CN112288047B (en) 2020-12-25 2020-12-25 Broadcast television news stripping method based on probability distribution transformation clustering

Publications (2)

Publication Number Publication Date
CN112288047A true CN112288047A (en) 2021-01-29
CN112288047B CN112288047B (en) 2021-04-09

Family

ID=74426352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011555578.5A Active CN112288047B (en) 2020-12-25 2020-12-25 Broadcast television news stripping method based on probability distribution transformation clustering

Country Status (1)

Country Link
CN (1) CN112288047B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113997989A (en) * 2021-11-29 2022-02-01 中国人民解放军国防科技大学 Safety detection method, device, equipment and medium for single-point suspension system of maglev train

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007036888A2 (en) * 2005-09-29 2007-04-05 Koninklijke Philips Electronics N.V. A method and apparatus for segmenting a content item
CN102523536A (en) * 2011-12-15 2012-06-27 清华大学 Video semantic visualization method
CN102547139A (en) * 2010-12-30 2012-07-04 北京新岸线网络技术有限公司 Method for splitting news video program, and method and system for cataloging news videos
KR101382904B1 (en) * 2012-12-14 2014-04-08 포항공과대학교 산학협력단 Methods of online video segmentation and apparatus for performing the same
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN107341429A (en) * 2016-04-28 2017-11-10 富士通株式会社 Cutting method, cutting device and the electronic equipment of hand-written adhesion character string
CN108710860A (en) * 2018-05-23 2018-10-26 北京奇艺世纪科技有限公司 A kind of news-video dividing method and device
CN109086830A (en) * 2018-08-14 2018-12-25 江苏大学 Typical association analysis based on sample punishment closely repeats video detecting method
CN110110739A (en) * 2019-03-25 2019-08-09 中山大学 A kind of domain self-adaptive reduced-dimensions method based on samples selection
CN111126126A (en) * 2019-10-21 2020-05-08 武汉大学 Intelligent video strip splitting method based on graph convolution neural network
CN111160099A (en) * 2019-11-28 2020-05-15 福建省星云大数据应用服务有限公司 Intelligent segmentation method for video image target
CN111222499A (en) * 2020-04-22 2020-06-02 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111242110A (en) * 2020-04-28 2020-06-05 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items
US20200320306A1 (en) * 2019-04-08 2020-10-08 Baidu.Com Times Technology (Beijing) Co., Ltd. Method and apparatus for generating information

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007036888A2 (en) * 2005-09-29 2007-04-05 Koninklijke Philips Electronics N.V. A method and apparatus for segmenting a content item
CN102547139A (en) * 2010-12-30 2012-07-04 北京新岸线网络技术有限公司 Method for splitting news video program, and method and system for cataloging news videos
CN102523536A (en) * 2011-12-15 2012-06-27 清华大学 Video semantic visualization method
KR101382904B1 (en) * 2012-12-14 2014-04-08 포항공과대학교 산학협력단 Methods of online video segmentation and apparatus for performing the same
CN104182421A (en) * 2013-05-27 2014-12-03 华东师范大学 Video clustering method and detecting method
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN107341429A (en) * 2016-04-28 2017-11-10 富士通株式会社 Cutting method, cutting device and the electronic equipment of hand-written adhesion character string
CN108710860A (en) * 2018-05-23 2018-10-26 北京奇艺世纪科技有限公司 A kind of news-video dividing method and device
CN109086830A (en) * 2018-08-14 2018-12-25 江苏大学 Typical association analysis based on sample punishment closely repeats video detecting method
CN110110739A (en) * 2019-03-25 2019-08-09 中山大学 A kind of domain self-adaptive reduced-dimensions method based on samples selection
US20200320306A1 (en) * 2019-04-08 2020-10-08 Baidu.Com Times Technology (Beijing) Co., Ltd. Method and apparatus for generating information
CN111126126A (en) * 2019-10-21 2020-05-08 武汉大学 Intelligent video strip splitting method based on graph convolution neural network
CN111160099A (en) * 2019-11-28 2020-05-15 福建省星云大数据应用服务有限公司 Intelligent segmentation method for video image target
CN111222499A (en) * 2020-04-22 2020-06-02 成都索贝数码科技股份有限公司 News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111242110A (en) * 2020-04-28 2020-06-05 成都索贝数码科技股份有限公司 Training method of self-adaptive conditional random field algorithm for automatically breaking news items

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BOGDAN MOCANU 等: "Automatic Segmentation of TV News into Stories Using Visual and Temporal Information", 《INTERNATIONAL CONFERENCE ON ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEM(ACIVS)》 *
RAGHVENDRA KANNAO 等: "A system for semantic segmentation of TV news broadcast videos", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *
WINSTON H.HSU 等: "Discovery and fusion of salient multimodal features toward news story segmentation", 《THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING》 *
刘智康: "基于语义关系图的新闻事件聚类算法研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李晨杰 等: "基于音视频特征的新闻拆条算法", 《微型电脑应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113997989A (en) * 2021-11-29 2022-02-01 中国人民解放军国防科技大学 Safety detection method, device, equipment and medium for single-point suspension system of maglev train
CN113997989B (en) * 2021-11-29 2024-03-29 中国人民解放军国防科技大学 Safety detection method, device, equipment and medium for single-point suspension system of maglev train

Also Published As

Publication number Publication date
CN112288047B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN106599029B (en) Chinese short text clustering method
CN106919619B (en) Commodity clustering method and device and electronic equipment
US20210382937A1 (en) Image processing method and apparatus, and storage medium
US20120123978A1 (en) Learning Tags for Video Annotation Using Latent Subtags
CN108932318B (en) Intelligent analysis and accurate pushing method based on policy resource big data
Bouguila A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity
WO2023065642A1 (en) Corpus screening method, intention recognition model optimization method, device, and storage medium
CN111242110B (en) Training method of self-adaptive conditional random field algorithm for automatically breaking news items
CN110928981A (en) Method, system and storage medium for establishing and perfecting iteration of text label system
CN112288047B (en) Broadcast television news stripping method based on probability distribution transformation clustering
EP3340069A1 (en) Automated characterization of scripted narratives
CN111538846A (en) Third-party library recommendation method based on mixed collaborative filtering
CN114328939B (en) Natural language processing model construction method based on big data
CN113656373A (en) Method, device, equipment and storage medium for constructing retrieval database
CN114579768A (en) Maintenance method for realizing intelligent operation and maintenance knowledge base of equipment
CN114579739A (en) Topic detection and tracking method for text data stream
CN107609570B (en) Micro video popularity prediction method based on attribute classification and multi-view feature fusion
CN116561230B (en) Distributed storage and retrieval system based on cloud computing
Liang et al. An efficient hierarchical near-duplicate video detection algorithm based on deep semantic features
CN117033961A (en) Multi-mode image-text classification method for context awareness
CN116737936A (en) AI virtual personage language library classification management system based on artificial intelligence
CN115587231A (en) Data combination processing and rapid storage and retrieval method based on cloud computing platform
CN112988953B (en) Adaptive broadcast television news keyword standardization method
CN115935579A (en) Language model pre-training method, commodity information processing method and related device
Velivelli et al. Automatic video annotation by mining speech transcripts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant