WO2023000782A1 - Method and apparatus for acquiring video hotspot, readable medium, and electronic device - Google Patents

Method and apparatus for acquiring video hotspot, readable medium, and electronic device Download PDF

Info

Publication number
WO2023000782A1
WO2023000782A1 PCT/CN2022/092514 CN2022092514W WO2023000782A1 WO 2023000782 A1 WO2023000782 A1 WO 2023000782A1 CN 2022092514 W CN2022092514 W CN 2022092514W WO 2023000782 A1 WO2023000782 A1 WO 2023000782A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
cluster
texts
video
clustering
Prior art date
Application number
PCT/CN2022/092514
Other languages
French (fr)
Chinese (zh)
Inventor
佘琪
沈铮阳
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023000782A1 publication Critical patent/WO2023000782A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Definitions

  • the present disclosure relates to the technical field of the Internet, and in particular, to a method, device, readable medium and electronic equipment for acquiring video hotspots.
  • video hotspots are mainly obtained through manual summarization or hotspot discovery models (eg, latent Dirichlet model or latent semantic analysis model).
  • hotspot discovery models eg, latent Dirichlet model or latent semantic analysis model.
  • using artificial summarization to obtain video hotspots will consume a lot of human resources as the data flow continues to increase, and the efficiency is low and the real-time performance is poor.
  • using the hotspot mining model to obtain video hotspots as the amount of data increases, the calculation cost is high, and ambiguous expressions are prone to occur, which reduces the accuracy of the acquired video hotspots.
  • the present disclosure provides a method for acquiring video hotspots, the method comprising:
  • For each of the first text clusters determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number , obtaining the second text clusters of the second preset classification quantity;
  • a video hotspot corresponding to the at least one video page is determined according to the cluster center of each second text cluster.
  • the present disclosure provides a device for acquiring video hotspots, the device comprising:
  • An acquisition module configured to identify the page information of at least one video page, and obtain multiple texts corresponding to the at least one video page;
  • the first clustering module is used to cluster a plurality of said texts to obtain the first text clusters of the first preset classification quantity;
  • the second clustering module is configured to, for each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and classify the first text according to the second preset classification number
  • the texts in the clustering are clustered to obtain the second preset classification number of second text clusters;
  • the determination module is configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.
  • the present disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.
  • Fig. 1 is a flow chart showing a method for acquiring video hotspots according to an exemplary embodiment
  • Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1;
  • Fig. 3 is a flow chart showing a step 103 according to the embodiment shown in Fig. 1;
  • Fig. 4 is a block diagram of a device for acquiring video hotspots according to an exemplary embodiment
  • Fig. 5 is a block diagram showing a first clustering module according to the embodiment shown in Fig. 4;
  • Fig. 6 is a block diagram showing a second clustering module according to the embodiment shown in Fig. 4;
  • Fig. 7 is a block diagram of an acquisition module according to the embodiment shown in Fig. 4;
  • Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment.
  • Fig. 1 is a flow chart of a method for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 1, the method may include the following steps:
  • Step 101 Identify the page information of at least one video page to obtain multiple texts corresponding to the at least one video page.
  • the manner of recognizing the page information of at least one video page to obtain multiple texts may be: performing text recognition on the text information of each video page to obtain the page text corresponding to each video page.
  • audio recognition can be performed on the audio information of each video page to obtain the audio text corresponding to each video page.
  • the page text and the audio text can be used as multiple texts.
  • OCR English: Optical Character Recognition, Chinese: Optical Character Recognition
  • existing video data can be used to identify each Text recognition of live room titles, live introduction texts, live comments, live barrage and other texts of each live page to obtain the page text corresponding to each live page.
  • Step 102 clustering a plurality of texts to obtain a first preset number of first text clusters.
  • the first preset classification number can be set in advance, and according to the first preset classification number, a preset clustering algorithm is used to cluster a plurality of texts to obtain the first preset classification number of first text clusters.
  • the preset clustering algorithm can be, for example, the K-Means clustering algorithm
  • the first preset classification number can be artificially set according to experience, or can be based on coarse-grained texts in multiple texts (for example, titles, topic introductions, etc.) The words in the corresponding text) are selected.
  • the process of obtaining the number of first text clusters of the first preset classification can actually be understood as a coarse-grained clustering process.
  • the clustering granularity of the first text cluster is relatively coarse, and each first text cluster contains a type of text.
  • the three first text clusters can respectively contain The texts of sports, film and television, and games, that is, the clustering granularity of the first text clustering is at the level of sports, film and television, and games.
  • the second preset classification quantity corresponding to each first text cluster can be determined, and the second preset classification quantity can be a preset fixed value, or can be based on the of the text selected. Then, according to the second preset classification number corresponding to each first text cluster, the text in the first text cluster can be clustered by using a preset clustering algorithm, and the first text cluster corresponding to the first text cluster can be obtained. Two preset classifications and a second text clustering. At this time, the number of second text clusters finally obtained is the sum of the second preset classification numbers corresponding to each first text cluster. Obtaining the second preset number of second text clusters corresponding to each first text cluster can actually be understood as a fine-grained clustering process.
  • the clustering granularity of the second text clustering is relatively fine.
  • the second preset classification number corresponding to the first text clustering can be set to 3
  • the three second text clusters corresponding to the first text cluster can respectively contain track and field, football and basketball texts, that is, the clustering granularity of the second text cluster is at the level of track and field, football and basketball.
  • Step 104 according to the cluster center of each second text cluster, determine the video hotspot corresponding to at least one video page.
  • the first number of texts closest to the cluster center of the second text cluster in the second text cluster can be used as the target text, and the TF-IDF corresponding to each word in the second text cluster The largest second number of words is used as the target word. Then, the target text and target words can be used as video hotspots.
  • video hotspots By selecting video hotspots from the second text clustering, the expression form of video hotspots is clear, which is convenient for subsequent processing and analysis.
  • the method for acquiring video hotspots in the present disclosure can be applied not only to acquiring video and live broadcast hotspots, but also to other types of hotspots. For example, it can be applied to acquiring hotspots in images, and this disclosure does not make any Specific limits.
  • Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1 .
  • step 102 may include the following steps:
  • Step 1021 determine the TF-IDF of each word in the multiple texts.
  • text preprocessing can be performed on multiple texts after acquiring multiple texts, so as to remove information irrelevant to video hotspots (such as punctuation marks) in each text. , stop words, etc.) and sensitive information.
  • word segmentation can be performed on multiple texts that have undergone text preprocessing, and then a vocabulary corresponding to multiple texts can be constructed according to the word segmentation results (the vocabulary corresponding to multiple texts includes all words in multiple texts), and multiple texts corresponding to each other can be calculated.
  • TF-IDF for each word in the vocabulary of .
  • Step 1022 for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, determine the text vector corresponding to the text.
  • Step 1023 According to the first preset number of categories, use a preset clustering algorithm to cluster the text vectors corresponding to the multiple texts to obtain the first preset number of first text clusters.
  • the TF-IDF of each word in the text can be used for weighted average to obtain the text vector corresponding to the text, That is, the text features of the text.
  • the text vectors corresponding to the multiple texts may be clustered by using a preset clustering algorithm to obtain the first preset classification number of first text clusters.
  • step 103 can be implemented in the following manner:
  • the second preset number of categories corresponding to the first text cluster may be determined according to the central sentence and keywords of the first text cluster.
  • the central sentence and keywords of each first text cluster can be fed back to the user, and the user can determine the category of the text contained in the first text cluster according to the central sentence and keywords of the first text cluster, And set the corresponding second preset classification number for the first text cluster according to the category, wherein the central sentence can be several texts closest to the cluster center of the first text cluster in the first text cluster,
  • the keywords may be several words with the largest TF-IDF in the first text clustering.
  • the second preset number of categories corresponding to the first text cluster may be determined according to the number of texts in the first text cluster. Specifically, for the first text cluster with a large number of texts, a larger number of second preset classifications may be set. For example, when the number of first preset classifications is 4, and the number of texts included in the four first text clusters is 100, 10, 20, and 50, the second preset of the first text cluster with the number of texts of 100 can be Set the number of classifications to 5, set the second preset classification number of the first text cluster with 10 texts to 2, and set the second preset classification number of the first text cluster with 20 texts to 3 , set the second preset category number of the first text cluster whose number of texts is 50 to 4.
  • FIG. 3 is a flow chart of step 103 according to the embodiment shown in FIG. 1 .
  • the second preset classification quantity includes multiple, and step 103 may include the following steps:
  • Step 1031 for each second preset classification number, use a preset clustering algorithm to cluster the texts in the first text clustering according to the second preset classification number to obtain the second preset classification number candidate text clusters.
  • Step 1032 according to the candidate text clusters, determine the target preset category number from multiple second preset category numbers.
  • Step 1033 cluster the candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.
  • each candidate text cluster can be determined by using indicators such as the contour coefficient method, the elbow method, and the CH coefficient (English: Calinski-Harabasz Index)
  • the clustering effect of the set, and the second preset classification number corresponding to the candidate text clustering set with the best clustering effect is used as the target preset classification number.
  • the candidate text clusters in the candidate text cluster set corresponding to the target preset number of categories are used as the second preset number of second text clusters.
  • the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.
  • This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
  • Fig. 4 is a block diagram of an apparatus for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 4, the device 200 includes:
  • the first clustering module 202 is configured to cluster multiple texts to obtain a first preset number of first text clusters.
  • Determining module 204 is used for determining the video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.
  • Fig. 5 is a block diagram of a first clustering module according to the embodiment shown in Fig. 4 .
  • the first clustering module 202 includes:
  • the second determination sub-module 2021 is configured to determine the TF-IDF of each word in the multiple texts.
  • the second determination sub-module 2021 is further configured for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, to determine the text vector corresponding to the text.
  • the first clustering sub-module 2022 is further configured to use a preset clustering algorithm to cluster the text vectors corresponding to a plurality of texts according to the first preset classification number to obtain the first preset classification number of first text clusters .
  • the second clustering module 203 is used for:
  • the second clustering module 203 is used for:
  • the second preset classification number corresponding to the first text cluster is determined.
  • the second clustering sub-module 2031 is configured to cluster the texts in the first text clustering using a preset clustering algorithm according to the second preset classification number for each second preset classification number, to obtain There are a number of candidate text clusters for the second preset classification.
  • the third determination sub-module 2032 is configured to determine the target preset number of categories from multiple second preset numbers of categories according to the candidate text clusters.
  • the third determination sub-module 2032 is further configured to cluster candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.
  • the determination module 204 is used for:
  • each second text cluster according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, and according to the The TF-IDF corresponding to each word in the second text cluster determines the target word corresponding to the second text cluster.
  • the recognition sub-module 2011 is configured to perform text recognition on the text information of each video page to obtain the page text corresponding to each video page.
  • the identification sub-module 2011 is further configured to perform audio identification on the audio information of each video page to obtain the corresponding audio text of each video page.
  • the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.
  • This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 309, or from storage means 308, or from ROM 302.
  • the processing device 301 When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium Communications (eg, communication networks) are interconnected.
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: recognizes the page information of at least one video page, and obtains the at least one video page a plurality of corresponding texts; clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determine the corresponding text of the first text clusters a second preset classification number, and cluster the texts in the first text cluster according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each The cluster center of the second text cluster determines the video hotspot corresponding to the at least one video page.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • Example 1 provides a method for acquiring video hotspots, including: identifying the page information of at least one video page, and obtaining multiple texts corresponding to the at least one video page; Clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determining a second preset classification number corresponding to the first text cluster, and clustering the text in the first text clustering according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each of the second text clusters The clustering centers of the at least one video page are determined to determine the video hotspot corresponding to the at least one video page.
  • Example 3 provides the method of Example 1, the determining the second preset classification quantity corresponding to the first text cluster includes: according to the text in the first text cluster , to determine the second preset classification quantity corresponding to the first text cluster.
  • Example 9 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the methods described in Example 1 to Example 7 are implemented.
  • Example 10 provides an electronic device, including: a storage device, on which a computer program is stored; a processing device, configured to execute the computer program in the storage device, to Implement the steps of the method described in Example 1 to Example 7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a method and apparatus for acquiring a video hotspot, a readable medium, and an electronic device. The method comprises: identifying page information of at least one video page to obtain a plurality of texts; clustering the plurality of texts to obtain a first preset classification quantity of first text clusters; for each first text cluster, determining a second preset classification quantity corresponding to the first text cluster, and clustering texts in the first text cluster according to the second preset classification quantity, to obtain a second preset classification quantity of second text clusters; and determining, according to a cluster center of each second text cluster, a video hotspot corresponding to the at least one video page.

Description

获取视频热点的方法、装置、可读介质和电子设备Method, device, readable medium and electronic equipment for acquiring video hotspots
本公开要求于2021年07月21日提交的,申请名称为“获取视频热点的方法、装置、可读介质和电子设备”的、中国专利申请号为“202110825848.8”的优先权,该中国专利申请的全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application number "202110825848.8" filed on July 21, 2021, with the application name "Method, device, readable medium and electronic equipment for obtaining video hotspots", the Chinese patent application The entire contents of are incorporated by reference in this disclosure.
技术领域technical field
本公开涉及互联网技术领域,具体地,涉及一种获取视频热点的方法、装置、可读介质和电子设备。The present disclosure relates to the technical field of the Internet, and in particular, to a method, device, readable medium and electronic equipment for acquiring video hotspots.
背景技术Background technique
随着互联网技术以及多媒体技术的不断发展,网络视频正逐渐成为网络生活中不可或缺的重要组成部分,发掘网络视频的视频热点对于增强用户粘性以及实现舆情监控有着重要作用。目前,主要是通过人工总结或热点发掘模型(例如,隐狄利克雷模型或潜在语义分析模型),来获取视频热点。然而,采用人工总结的方式来获取视频热点,随着数据流的不断增加,会耗费大量的人力资源,效率较低而且实时性差。而通过热点发掘模型来获取视频热点,随着数据量的增大,计算成本较高,并且容易产生歧义表达,降低了获取的视频热点的准确度。With the continuous development of Internet technology and multimedia technology, online video is gradually becoming an indispensable part of online life. Discovering video hotspots of online video plays an important role in enhancing user stickiness and realizing public opinion monitoring. At present, video hotspots are mainly obtained through manual summarization or hotspot discovery models (eg, latent Dirichlet model or latent semantic analysis model). However, using artificial summarization to obtain video hotspots will consume a lot of human resources as the data flow continues to increase, and the efficiency is low and the real-time performance is poor. However, using the hotspot mining model to obtain video hotspots, as the amount of data increases, the calculation cost is high, and ambiguous expressions are prone to occur, which reduces the accuracy of the acquired video hotspots.
发明内容Contents of the invention
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。。This Summary is provided to introduce a simplified form of concepts that are described in detail later in the Detailed Description. This summary of the invention is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution. .
第一方面,本公开提供一种获取视频热点的方法,所述方法包括:In a first aspect, the present disclosure provides a method for acquiring video hotspots, the method comprising:
对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本;Identifying the page information of at least one video page to obtain multiple texts corresponding to the at least one video page;
对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;clustering a plurality of said texts to obtain a first preset number of first text clusters;
针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;For each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number , obtaining the second text clusters of the second preset classification quantity;
根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。A video hotspot corresponding to the at least one video page is determined according to the cluster center of each second text cluster.
第二方面,本公开提供一种获取视频热点的装置,所述装置包括:In a second aspect, the present disclosure provides a device for acquiring video hotspots, the device comprising:
获取模块,用于对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本;An acquisition module, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to the at least one video page;
第一聚类模块,用于对多个所述文本进行聚类,得到第一预设分类数量个第一文本 聚类;The first clustering module is used to cluster a plurality of said texts to obtain the first text clusters of the first preset classification quantity;
第二聚类模块,用于针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;The second clustering module is configured to, for each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and classify the first text according to the second preset classification number The texts in the clustering are clustered to obtain the second preset classification number of second text clusters;
确定模块,用于根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。The determination module is configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开第一方面所述方法的步骤。In a third aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.
第四方面,本公开提供一种电子设备,包括:In a fourth aspect, the present disclosure provides an electronic device, including:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开第一方面所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.
通过上述技术方案,本公开首先对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本,再对多个文本进行聚类,得到第一预设分类数量个第一文本聚类,之后针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类,最后根据每个第二文本聚类的聚类中心,确定至少一个视频页面对应的视频热点。本公开通过对视频页面中的文本进行多次聚类的方式,来高效地获取视频热点,可以确保视频热点的实时性,同时不需要人工参与,计算成本较低,并且,能够避免产生歧义表达,提高了获取的视频热点的准确度。Through the above technical solution, the present disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale. In the attached picture:
图1是根据一示例性实施例示出的一种获取视频热点的方法的流程图;Fig. 1 is a flow chart showing a method for acquiring video hotspots according to an exemplary embodiment;
图2是根据图1所示实施例示出的一种步骤102的流程图;Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1;
图3是根据图1所示实施例示出的一种步骤103的流程图;Fig. 3 is a flow chart showing a step 103 according to the embodiment shown in Fig. 1;
图4是根据一示例性实施例示出的一种获取视频热点的装置的框图;Fig. 4 is a block diagram of a device for acquiring video hotspots according to an exemplary embodiment;
图5是根据图4所示实施例示出的一种第一聚类模块的框图;Fig. 5 is a block diagram showing a first clustering module according to the embodiment shown in Fig. 4;
图6是根据图4所示实施例示出的一种第二聚类模块的框图;Fig. 6 is a block diagram showing a second clustering module according to the embodiment shown in Fig. 4;
图7是根据图4所示实施例示出的一种获取模块的框图;Fig. 7 is a block diagram of an acquisition module according to the embodiment shown in Fig. 4;
图8是根据一示例性实施例示出的一种电子设备的框图。Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
图1是根据一示例性实施例示出的一种获取视频热点的方法的流程图。如图1所示,该方法可以包括以下步骤:Fig. 1 is a flow chart of a method for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 1, the method may include the following steps:
步骤101,对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本。Step 101: Identify the page information of at least one video page to obtain multiple texts corresponding to the at least one video page.
示例地,视频页面中包含着大量的页面信息,这些页面信息对于视频热点具有归纳总结的作用,因此,可以利用这些页面信息,来自动发掘视频热点。具体的,首先可以获取需要进行视频热点发掘的至少一个视频页面以及每个视频页面的页面信息,其中,获取的视频页面可以是网络视频的视频显示页面或网络直播的直播页面,页面信息包括文本信息和音频信息中的至少一种。例如,在视频页面为视频显示页面的情况下,文本信息可以是网络视频的标题、主题介绍、视频字幕和视频弹幕等文本所对应的信息,音频信息可以是网络视频播放时发出的声音所对应的信息。在视频页面为直播页面的情况 下,文本信息可以是网络直播的直播间标题、直播介绍文字、直播评论和直播弹幕等文本所对应的信息,音频信息可以是直播时发出的声音所对应的信息。For example, a video page contains a large amount of page information, and these page information can summarize video hotspots. Therefore, these page information can be used to automatically discover video hotspots. Specifically, at first, at least one video page and the page information of each video page that needs to be discovered for video hotspots can be obtained, wherein the obtained video page can be a video display page of an online video or a live page of a webcast, and the page information includes text at least one of information and audio information. For example, when the video page is a video display page, the text information can be information corresponding to texts such as the title of the network video, topic introduction, video subtitles, and video barrage, and the audio information can be the sound generated by the network video when it is played. corresponding information. When the video page is a live broadcast page, the text information can be the information corresponding to the text of the live broadcast room title, live introduction text, live comments and live barrage, etc., and the audio information can be the corresponding information information.
对至少一个视频页面的页面信息进行识别,得到多个文本的方式可以为:对每个视频页面的文本信息进行文本识别,得到每个视频页面对应的页面文本。同时还可以对每个视频页面的音频信息进行音频识别,得到每个视频页面对应的音频文本。最后可以将页面文本和音频文本,作为多个文本。例如,在视频页面为直播页面的情况下,可以在指定时长内(例如,2个小时),利用OCR(英文:Optical Character Recognition,中文:光学字符识别)技术或已有的视频数据,对每个直播页面的直播间标题、直播介绍文字、直播评论和直播弹幕等文本进行文本识别,得到每个直播页面对应的页面文本。同时,还可以在指定时长内,获取每个直播页面进行直播时发出的声音,并利用语音识别技术,将该声音转化为相应的文本,以得到每个直播页面对应的音频文本。最后可以将每个直播页面对应的页面文本和每个直播页面对应的音频文本,作为多个文本。The manner of recognizing the page information of at least one video page to obtain multiple texts may be: performing text recognition on the text information of each video page to obtain the page text corresponding to each video page. At the same time, audio recognition can be performed on the audio information of each video page to obtain the audio text corresponding to each video page. Finally, the page text and the audio text can be used as multiple texts. For example, when the video page is a live page, within a specified period of time (for example, 2 hours), OCR (English: Optical Character Recognition, Chinese: Optical Character Recognition) technology or existing video data can be used to identify each Text recognition of live room titles, live introduction texts, live comments, live barrage and other texts of each live page to obtain the page text corresponding to each live page. At the same time, it is also possible to obtain the sound produced by each live page during live broadcast within a specified time period, and use speech recognition technology to convert the sound into corresponding text, so as to obtain the corresponding audio text of each live page. Finally, the page text corresponding to each live page and the audio text corresponding to each live page may be used as multiple texts.
步骤102,对多个文本进行聚类,得到第一预设分类数量个第一文本聚类。 Step 102, clustering a plurality of texts to obtain a first preset number of first text clusters.
举例来说,可以预先设置第一预设分类数量,并根据第一预设分类数量,利用预设聚类算法,对多个文本进行聚类,得到第一预设分类数量个第一文本聚类。其中,预设聚类算法例如可以是K-Means聚类算法,第一预设分类数量可以是根据经验人为设置的,也可以是根据多个文本中粗粒度的文本(例如,标题、主题介绍所对应的文本)中的词语进行选取的。得到第一预设分类数量个第一文本聚类的过程,实际上可以理解为一种粗粒度的聚类过程。第一文本聚类的聚类粒度较粗,每个第一文本聚类包含了一种类别的文本,例如,当第一预设分类数量为3时,3个第一文本聚类可以分别包含体育类、影视类、游戏类的文本,即第一文本聚类的聚类粒度处于体育、影视、游戏这一级别。For example, the first preset classification number can be set in advance, and according to the first preset classification number, a preset clustering algorithm is used to cluster a plurality of texts to obtain the first preset classification number of first text clusters. kind. Wherein, the preset clustering algorithm can be, for example, the K-Means clustering algorithm, and the first preset classification number can be artificially set according to experience, or can be based on coarse-grained texts in multiple texts (for example, titles, topic introductions, etc.) The words in the corresponding text) are selected. The process of obtaining the number of first text clusters of the first preset classification can actually be understood as a coarse-grained clustering process. The clustering granularity of the first text cluster is relatively coarse, and each first text cluster contains a type of text. For example, when the first preset classification number is 3, the three first text clusters can respectively contain The texts of sports, film and television, and games, that is, the clustering granularity of the first text clustering is at the level of sports, film and television, and games.
步骤103,针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类。 Step 103, for each first text cluster, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number, A second preset classification quantity of second text clusters is obtained.
在本步骤中,首先可以确定每个第一文本聚类对应的第二预设分类数量,第二预设分类数量可以是预先设置的固定值,也可以是根据每个第一文本聚类中的文本选取的。然后可以按照每个第一文本聚类对应的第二预设分类数量,利用预设聚类算法,对该第一文本聚类中的文本进行聚类,得到该第一文本聚类对应的第二预设分类数量个第二文本聚类。此时,最终得到的第二文本聚类的数量为每个第一文本聚类对应的第二预设分类数量之和。得到每个第一文本聚类对应的第二预设分类数量个第二文本聚类,实际上可以理解为一种细粒度的聚类过程。第二文本聚类的聚类粒度较细,例如,当某一第一文本聚类包含的文本的类别为体育,可以将该第一文本聚类对应的第二预设分类数量设 置为3,那么该第一文本聚类对应的3个第二文本聚类可以分别包含田径类、足球类和篮球类的文本,即第二文本聚类的聚类粒度处于田径、足球和篮球这一级别。In this step, firstly, the second preset classification quantity corresponding to each first text cluster can be determined, and the second preset classification quantity can be a preset fixed value, or can be based on the of the text selected. Then, according to the second preset classification number corresponding to each first text cluster, the text in the first text cluster can be clustered by using a preset clustering algorithm, and the first text cluster corresponding to the first text cluster can be obtained. Two preset classifications and a second text clustering. At this time, the number of second text clusters finally obtained is the sum of the second preset classification numbers corresponding to each first text cluster. Obtaining the second preset number of second text clusters corresponding to each first text cluster can actually be understood as a fine-grained clustering process. The clustering granularity of the second text clustering is relatively fine. For example, when the category of the text contained in a certain first text clustering is sports, the second preset classification number corresponding to the first text clustering can be set to 3, Then the three second text clusters corresponding to the first text cluster can respectively contain track and field, football and basketball texts, that is, the clustering granularity of the second text cluster is at the level of track and field, football and basketball.
步骤104,根据每个第二文本聚类的聚类中心,确定至少一个视频页面对应的视频热点。 Step 104, according to the cluster center of each second text cluster, determine the video hotspot corresponding to at least one video page.
具体的,在获取到每个第一文本聚类对应的第二预设分类数量个第二文本聚类后,可以针对每个第二文本聚类,根据该第二文本聚类中的每个文本距离该第二文本聚类的聚类中心的距离,确定该第二文本聚类对应的目标文本。同时可以构建该第二文本聚类的词表(该第二文本聚类的词表包括该第二文本聚类的全部词语),再通过该第二文本聚类的词表确定该第二文本聚类中的每个词语对应的TF-IDF(英文:term frequency–inverse document frequency,中文:词频-逆文本频率),并根据该第二文本聚类中的每个词语对应的TF-IDF,确定该第二文本聚类对应的目标词语。例如,可以将该第二文本聚类中距离第二文本聚类的聚类中心最近的第一数量个文本作为目标文本,并将该第二文本聚类中的每个词语对应的TF-IDF最大的第二数量个词语作为目标词语。然后,可以将目标文本和目标词语,作为视频热点。通过从第二文本聚类中选取视频热点,视频热点的表达形式清晰,便于后续处理分析。Specifically, after obtaining the second preset classification number of second text clusters corresponding to each first text cluster, for each second text cluster, according to each of the second text clusters The distance between the text and the cluster center of the second text cluster determines the target text corresponding to the second text cluster. At the same time, the vocabulary of the second text clustering can be constructed (the vocabulary of the second text clustering includes all the words of the second text clustering), and then the second text can be determined by the vocabulary of the second text clustering The TF-IDF (English: term frequency-inverse document frequency, Chinese: word frequency-inverse text frequency) corresponding to each word in the clustering, and according to the TF-IDF corresponding to each word in the second text clustering, A target word corresponding to the second text cluster is determined. For example, the first number of texts closest to the cluster center of the second text cluster in the second text cluster can be used as the target text, and the TF-IDF corresponding to each word in the second text cluster The largest second number of words is used as the target word. Then, the target text and target words can be used as video hotspots. By selecting video hotspots from the second text clustering, the expression form of video hotspots is clear, which is convenient for subsequent processing and analysis.
需要说明的是,本公开的获取视频热点的方法不仅可以应用于获取视频、直播的热点,还可以应用于获取其他类型的热点,例如,可以应用于获取图像中的热点,本公开对此不作具体限定。It should be noted that the method for acquiring video hotspots in the present disclosure can be applied not only to acquiring video and live broadcast hotspots, but also to other types of hotspots. For example, it can be applied to acquiring hotspots in images, and this disclosure does not make any Specific limits.
综上所述,本公开首先对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本,再对多个文本进行聚类,得到第一预设分类数量个第一文本聚类,之后针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类,最后根据每个第二文本聚类的聚类中心,确定至少一个视频页面对应的视频热点。本公开通过对视频页面中的文本进行多次聚类的方式,来高效地获取视频热点,可以确保视频热点的实时性,同时不需要人工参与,计算成本较低,并且,能够避免产生歧义表达,提高了获取的视频热点的准确度。To sum up, the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
图2是根据图1所示实施例示出的一种步骤102的流程图。如图2所示,步骤102可以包括以下步骤:Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1 . As shown in Figure 2, step 102 may include the following steps:
步骤1021,确定多个文本中的每个词语的TF-IDF。 Step 1021, determine the TF-IDF of each word in the multiple texts.
举例来说,为了提高获取的视频热点的效率和准确度,可以在获取到多个文本后,对多个文本进行文本预处理,以去除每个文本中与视频热点无关的信息(例如标点符号、停用词等)和敏感信息。之后可以对经过文本预处理的多个文本进行分词,再根据分词 结果构建多个文本对应的词表(多个文本对应的词表包括多个文本中的全部词语),并计算多个文本对应的词表中每个词语的TF-IDF。For example, in order to improve the efficiency and accuracy of the acquired video hotspots, text preprocessing can be performed on multiple texts after acquiring multiple texts, so as to remove information irrelevant to video hotspots (such as punctuation marks) in each text. , stop words, etc.) and sensitive information. After that, word segmentation can be performed on multiple texts that have undergone text preprocessing, and then a vocabulary corresponding to multiple texts can be constructed according to the word segmentation results (the vocabulary corresponding to multiple texts includes all words in multiple texts), and multiple texts corresponding to each other can be calculated. TF-IDF for each word in the vocabulary of .
步骤1022,针对每个文本,根据多个文本中的每个词语的TF-IDF和该文本中的每个词语对应的词向量,确定该文本对应的文本向量。 Step 1022, for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, determine the text vector corresponding to the text.
步骤1023,根据第一预设分类数量,利用预设聚类算法对多个文本对应的文本向量进行聚类,得到第一预设分类数量个第一文本聚类。Step 1023: According to the first preset number of categories, use a preset clustering algorithm to cluster the text vectors corresponding to the multiple texts to obtain the first preset number of first text clusters.
进一步的,可以针对每个文本,根据该文本中每个词语对应的词向量(英文:word embedding),利用该文本中每个词语的TF-IDF进行加权平均,得到该文本对应的文本向量,也就是该文本的文本特征。然后,可以根据第一预设分类数量,利用预设聚类算法对多个文本对应的文本向量进行聚类,得到第一预设分类数量个第一文本聚类。Further, for each text, according to the word vector (English: word embedding) corresponding to each word in the text, the TF-IDF of each word in the text can be used for weighted average to obtain the text vector corresponding to the text, That is, the text features of the text. Then, according to the first preset classification number, the text vectors corresponding to the multiple texts may be clustered by using a preset clustering algorithm to obtain the first preset classification number of first text clusters.
可选地,步骤103可以通过以下方式实现:Optionally, step 103 can be implemented in the following manner:
根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量。According to the texts in the first text cluster, determine the second preset classification quantity corresponding to the first text cluster.
在一种场景中,可以根据该第一文本聚类的中心句和关键词,确定该第一文本聚类对应的第二预设分类数量。例如,可以将每个第一文本聚类的中心句和关键词反馈给用户,由用户根据该第一文本聚类的中心句和关键词,确定该第一文本聚类包含的文本的类别,并根据该类别为该第一文本聚类设置对应的第二预设分类数量,其中,中心句可以是该第一文本聚类中距离第一文本聚类的聚类中心最近的若干个文本,关键词可以是该第一文本聚类中TF-IDF最大的若干个词语。In one scenario, the second preset number of categories corresponding to the first text cluster may be determined according to the central sentence and keywords of the first text cluster. For example, the central sentence and keywords of each first text cluster can be fed back to the user, and the user can determine the category of the text contained in the first text cluster according to the central sentence and keywords of the first text cluster, And set the corresponding second preset classification number for the first text cluster according to the category, wherein the central sentence can be several texts closest to the cluster center of the first text cluster in the first text cluster, The keywords may be several words with the largest TF-IDF in the first text clustering.
在另一种场景中,可以根据该第一文本聚类中的文本数量,确定该第一文本聚类对应的第二预设分类数量。具体的,可以对于文本数量较多的第一文本聚类,设置较大的第二预设分类数量。例如,当第一预设分类数量为4,且4个第一文本聚类包括的文本数量为100、10、20、50时,可以将文本数量为100的第一文本聚类的第二预设分类数量设置为5,将文本数量为10的第一文本聚类的第二预设分类数量设置为2,将文本数量为20的第一文本聚类的第二预设分类数量设置为3,将文本数量为50的第一文本聚类的第二预设分类数量设置为4。In another scenario, the second preset number of categories corresponding to the first text cluster may be determined according to the number of texts in the first text cluster. Specifically, for the first text cluster with a large number of texts, a larger number of second preset classifications may be set. For example, when the number of first preset classifications is 4, and the number of texts included in the four first text clusters is 100, 10, 20, and 50, the second preset of the first text cluster with the number of texts of 100 can be Set the number of classifications to 5, set the second preset classification number of the first text cluster with 10 texts to 2, and set the second preset classification number of the first text cluster with 20 texts to 3 , set the second preset category number of the first text cluster whose number of texts is 50 to 4.
图3是根据图1所示实施例示出的一种步骤103的流程图。如图3所示,第二预设分类数量包括多个,步骤103可以包括以下步骤:FIG. 3 is a flow chart of step 103 according to the embodiment shown in FIG. 1 . As shown in Figure 3, the second preset classification quantity includes multiple, and step 103 may include the following steps:
步骤1031,针对每个第二预设分类数量,根据该第二预设分类数量,利用预设聚类算法对该第一文本聚类中的文本进行聚类,得到该第二预设分类数量个候选文本聚类。 Step 1031, for each second preset classification number, use a preset clustering algorithm to cluster the texts in the first text clustering according to the second preset classification number to obtain the second preset classification number candidate text clusters.
步骤1032,根据候选文本聚类,从多个第二预设分类数量中确定目标预设分类数量。 Step 1032, according to the candidate text clusters, determine the target preset category number from multiple second preset category numbers.
步骤1033,将目标预设分类数量对应的候选文本聚类,作为第二预设分类数量个第二文本聚类。 Step 1033, cluster the candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.
示例地,为了使得到的第二文本聚类更加准确,可以使每个第一文本聚类对应多个第二预设分类数量。在对每个第一文本聚类中的文本进行聚类时,可以根据该第一文本聚类对应的每个第二预设分类数量,利用预设聚类算法,分别对该第一文本聚类中的文本进行聚类,得到该第二预设分类数量对应的候选文本聚类集合。每个第二预设分类数量对应的候选文本聚类集合包括该第二预设分类数量个候选文本聚类。例如,当某一第一文本聚类对应多个第二预设分类数量为3、4、5时,会得到3、4、5分别对应的3个候选文本聚类集合,3对应的候选文本聚类集合包括3个候选文本聚类,4对应的候选文本聚类集合包括4个候选文本聚类,5对应的候选文本聚类集合包括5个候选文本聚类。For example, in order to make the obtained second text clusters more accurate, each first text cluster may correspond to a plurality of second preset classification quantities. When clustering the text in each first text cluster, the first text can be clustered respectively by using a preset clustering algorithm according to each second preset classification quantity corresponding to the first text cluster The texts in the class are clustered to obtain a set of candidate text clusters corresponding to the second preset number of classifications. The set of candidate text clusters corresponding to each second preset number of categories includes the second preset number of candidate text clusters. For example, when a first text cluster corresponds to a plurality of second preset classification numbers of 3, 4, and 5, three candidate text cluster sets corresponding to 3, 4, and 5 will be obtained, and the candidate text corresponding to 3 The cluster set includes 3 candidate text clusters, the candidate text cluster set corresponding to 4 includes 4 candidate text clusters, and the candidate text cluster set corresponding to 5 includes 5 candidate text clusters.
然后,可以根据每个第二预设分类数量对应的候选文本聚类集合,利用轮廓系数法、手肘法、CH系数(英文:Calinski-Harabasz Index)等指标,来确定每个候选文本聚类集合的聚类效果,并将聚类效果最好的候选文本聚类集合对应的第二预设分类数量作为目标预设分类数量。最后将目标预设分类数量对应的候选文本聚类集合中的候选文本聚类,作为第二预设分类数量个第二文本聚类。Then, according to the set of candidate text clusters corresponding to the second preset classification quantity, each candidate text cluster can be determined by using indicators such as the contour coefficient method, the elbow method, and the CH coefficient (English: Calinski-Harabasz Index) The clustering effect of the set, and the second preset classification number corresponding to the candidate text clustering set with the best clustering effect is used as the target preset classification number. Finally, the candidate text clusters in the candidate text cluster set corresponding to the target preset number of categories are used as the second preset number of second text clusters.
综上所述,本公开首先对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本,再对多个文本进行聚类,得到第一预设分类数量个第一文本聚类,之后针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类,最后根据每个第二文本聚类的聚类中心,确定至少一个视频页面对应的视频热点。本公开通过对视频页面中的文本进行多次聚类的方式,来高效地获取视频热点,可以确保视频热点的实时性,同时不需要人工参与,计算成本较低,并且,能够避免产生歧义表达,提高了获取的视频热点的准确度。To sum up, the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
图4是根据一示例性实施例示出的一种获取视频热点的装置的框图。如图4所示,该装置200包括:Fig. 4 is a block diagram of an apparatus for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 4, the device 200 includes:
获取模块201,用于对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本;An acquisition module 201, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to at least one video page;
第一聚类模块202,用于对多个文本进行聚类,得到第一预设分类数量个第一文本聚类。The first clustering module 202 is configured to cluster multiple texts to obtain a first preset number of first text clusters.
第二聚类模块203,用于针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类。The second clustering module 203 is configured to, for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the first text according to the second preset classification number The texts are clustered to obtain a second preset number of second text clusters.
确定模块204,用于根据每个第二文本聚类的聚类中心,确定至少一个视频页面对 应的视频热点。Determining module 204 is used for determining the video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.
图5是根据图4所示实施例示出的一种第一聚类模块的框图。如图5所示,第一聚类模块202包括:Fig. 5 is a block diagram of a first clustering module according to the embodiment shown in Fig. 4 . As shown in Figure 5, the first clustering module 202 includes:
第二确定子模块2021,用于确定多个文本中的每个词语的TF-IDF。The second determination sub-module 2021 is configured to determine the TF-IDF of each word in the multiple texts.
第二确定子模块2021,还用于针对每个文本,根据多个文本中的每个词语的TF-IDF和该文本中的每个词语对应的词向量,确定该文本对应的文本向量。The second determination sub-module 2021 is further configured for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, to determine the text vector corresponding to the text.
第一聚类子模块2022,还用于根据第一预设分类数量,利用预设聚类算法对多个文本对应的文本向量进行聚类,得到第一预设分类数量个第一文本聚类。The first clustering sub-module 2022 is further configured to use a preset clustering algorithm to cluster the text vectors corresponding to a plurality of texts according to the first preset classification number to obtain the first preset classification number of first text clusters .
可选地,第二聚类模块203用于:Optionally, the second clustering module 203 is used for:
根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量。According to the texts in the first text cluster, determine the second preset classification quantity corresponding to the first text cluster.
可选地,第二聚类模块203用于:Optionally, the second clustering module 203 is used for:
根据该第一文本聚类中的文本数量,确定该第一文本聚类对应的第二预设分类数量。According to the number of texts in the first text cluster, the second preset classification number corresponding to the first text cluster is determined.
图6是根据图4所示实施例示出的一种第二聚类模块的框图。如图6所示,第二聚类模块203包括:Fig. 6 is a block diagram of a second clustering module according to the embodiment shown in Fig. 4 . As shown in Figure 6, the second clustering module 203 includes:
第二聚类子模块2031,用于针对每个第二预设分类数量,根据该第二预设分类数量,利用预设聚类算法对该第一文本聚类中的文本进行聚类,得到该第二预设分类数量个候选文本聚类。The second clustering sub-module 2031 is configured to cluster the texts in the first text clustering using a preset clustering algorithm according to the second preset classification number for each second preset classification number, to obtain There are a number of candidate text clusters for the second preset classification.
第三确定子模块2032,用于根据候选文本聚类,从多个第二预设分类数量中确定目标预设分类数量。The third determination sub-module 2032 is configured to determine the target preset number of categories from multiple second preset numbers of categories according to the candidate text clusters.
第三确定子模块2032,还用于将目标预设分类数量对应的候选文本聚类,作为第二预设分类数量个第二文本聚类。The third determination sub-module 2032 is further configured to cluster candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.
可选地,确定模块204用于:Optionally, the determination module 204 is used for:
针对每个第二文本聚类,根据该第二文本聚类中的每个文本距离该第二文本聚类的聚类中心的距离,确定该第二文本聚类对应的目标文本,并根据该第二文本聚类中的每个词语对应的TF-IDF,确定该第二文本聚类对应的目标词语。For each second text cluster, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, and according to the The TF-IDF corresponding to each word in the second text cluster determines the target word corresponding to the second text cluster.
将目标文本和目标词语,作为视频热点。Use the target text and target words as video hotspots.
图7是根据图4所示实施例示出的一种获取模块的框图。如图7所示,页面信息包括文本信息和音频信息中的至少一种,获取模块201包括:Fig. 7 is a block diagram of an acquisition module according to the embodiment shown in Fig. 4 . As shown in Figure 7, the page information includes at least one of text information and audio information, and the acquisition module 201 includes:
识别子模块2011,用于对每个视频页面的文本信息进行文本识别,得到每个视频页面对应的页面文本。The recognition sub-module 2011 is configured to perform text recognition on the text information of each video page to obtain the page text corresponding to each video page.
识别子模块2011,还用于对每个视频页面的音频信息进行音频识别,得到每个视频页面对应的音频文本。The identification sub-module 2011 is further configured to perform audio identification on the audio information of each video page to obtain the corresponding audio text of each video page.
处理子模块2012,用于将页面文本和所述音频文本,作为多个文本。The processing sub-module 2012 is configured to use the page text and the audio text as multiple texts.
综上所述,本公开首先对至少一个视频页面的页面信息进行识别,得到至少一个视频页面对应的多个文本,再对多个文本进行聚类,得到第一预设分类数量个第一文本聚类,之后针对每个第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照第二预设分类数量对该第一文本聚类中的文本进行聚类,得到第二预设分类数量个第二文本聚类,最后根据每个第二文本聚类的聚类中心,确定至少一个视频页面对应的视频热点。本公开通过对视频页面中的文本进行多次聚类的方式,来高效地获取视频热点,可以确保视频热点的实时性,同时不需要人工参与,计算成本较低,并且,能够避免产生歧义表达,提高了获取的视频热点的准确度。To sum up, the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.
下面参考图8,其示出了适于用来实现本公开实施例的电子设备(例如图1中的终端设备或服务器)300的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 8 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 1 ) 300 suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
如图8所示,电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM 303中,还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。As shown in FIG. 8, an electronic device 300 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 301, which may be randomly accessed according to a program stored in a read-only memory (ROM) 302 or loaded from a storage device 308. Various appropriate actions and processes are executed by programs in the memory (RAM) 303 . In the RAM 303, various programs and data necessary for the operation of the electronic device 300 are also stored. The processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304. An input/output (I/O) interface 305 is also connected to the bus 304 .
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrating an output device 307 such as a computer; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows electronic device 300 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置308被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 309, or from storage means 308, or from ROM 302. When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium Communications (eg, communication networks) are interconnected. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本;对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: recognizes the page information of at least one video page, and obtains the at least one video page a plurality of corresponding texts; clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determine the corresponding text of the first text clusters a second preset classification number, and cluster the texts in the first text cluster according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each The cluster center of the second text cluster determines the video hotspot corresponding to the at least one video page.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、 Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,获取模块还可以被描述为“获取视频页面对应的多个文本的模块”。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the obtaining module can also be described as "a module for obtaining multiple texts corresponding to the video page".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例,示例1提供了一种获取视频热点的方法,包括:对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本; 对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。According to one or more embodiments of the present disclosure, Example 1 provides a method for acquiring video hotspots, including: identifying the page information of at least one video page, and obtaining multiple texts corresponding to the at least one video page; Clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determining a second preset classification number corresponding to the first text cluster, and clustering the text in the first text clustering according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each of the second text clusters The clustering centers of the at least one video page are determined to determine the video hotspot corresponding to the at least one video page.
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类,包括:确定多个所述文本中的每个词语的TF-IDF;针对每个所述文本,根据多个所述文本中的每个词语的TF-IDF和该文本中的每个词语对应的词向量,确定该文本对应的文本向量;根据所述第一预设分类数量,利用预设聚类算法对多个所述文本对应的文本向量进行聚类,得到所述第一预设分类数量个第一文本聚类。According to one or more embodiments of the present disclosure, Example 2 provides the method of Example 1. The clustering of a plurality of texts to obtain a first preset classification number of first text clusters includes: determining how many The TF-IDF of each word in a plurality of said texts; for each said text, according to the TF-IDF of each word in a plurality of said texts and the word vector corresponding to each word in this text, determine The text vector corresponding to the text; according to the first preset classification number, use a preset clustering algorithm to cluster a plurality of text vectors corresponding to the text to obtain the first preset classification number of first texts clustering.
根据本公开的一个或多个实施例,示例3提供了示例1的方法,所述确定该第一文本聚类对应的第二预设分类数量,包括:根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量。According to one or more embodiments of the present disclosure, Example 3 provides the method of Example 1, the determining the second preset classification quantity corresponding to the first text cluster includes: according to the text in the first text cluster , to determine the second preset classification quantity corresponding to the first text cluster.
根据本公开的一个或多个实施例,示例4提供了示例3的方法,所述根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量,包括:根据该第一文本聚类中的文本数量,确定该第一文本聚类对应的第二预设分类数量。According to one or more embodiments of the present disclosure, Example 4 provides the method of Example 3, wherein according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes : According to the number of texts in the first text cluster, determine the second preset classification number corresponding to the first text cluster.
根据本公开的一个或多个实施例,示例5提供了示例1的方法,所述第二预设分类数量包括多个,所述按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类,包括:针对每个所述第二预设分类数量,根据该第二预设分类数量,利用预设聚类算法对该第一文本聚类中的文本进行聚类,得到该第二预设分类数量个候选文本聚类;根据所述候选文本聚类,从多个所述第二预设分类数量中确定目标预设分类数量;将所述目标预设分类数量对应的候选文本聚类,作为所述第二预设分类数量个第二文本聚类。According to one or more embodiments of the present disclosure, Example 5 provides the method of Example 1, the second preset classification number includes multiple, and the first text is clustered according to the second preset classification number clustering the texts in to obtain the second preset classification number of second text clusters, including: for each of the second preset classification numbers, according to the second preset classification number, using the preset clustering Clustering the texts in the first text clustering by class algorithm to obtain the second preset classification number of candidate text clusters; according to the candidate text clusters, from the plurality of second preset classification numbers Determining a target preset number of categories; clustering candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.
根据本公开的一个或多个实施例,示例6提供了示例1的方法,所述根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点,包括:针对每个所述第二文本聚类,根据该第二文本聚类中的每个文本距离该第二文本聚类的聚类中心的距离,确定该第二文本聚类对应的目标文本,并根据该第二文本聚类中的每个词语对应的TF-IDF,确定该第二文本聚类对应的目标词语;将所述目标文本和所述目标词语,作为所述视频热点。According to one or more embodiments of the present disclosure, Example 6 provides the method of Example 1, the determining the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters, including : For each of the second text clusters, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, And according to the TF-IDF corresponding to each word in the second text cluster, determine the target word corresponding to the second text cluster; use the target text and the target word as the video hotspot.
根据本公开的一个或多个实施例,示例7提供了示例1的方法,所述页面信息包括文本信息和音频信息中的至少一种,所述对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本,包括:对每个所述视频页面的文本信息进行 文本识别,得到每个所述视频页面对应的页面文本;对每个所述视频页面的音频信息进行音频识别,得到每个所述视频页面对应的音频文本;将所述页面文本和所述音频文本,作为所述多个文本。According to one or more embodiments of the present disclosure, Example 7 provides the method of Example 1, the page information includes at least one of text information and audio information, and the page information of at least one video page is identified to obtain The multiple texts corresponding to the at least one video page include: performing text recognition on the text information of each of the video pages to obtain the page text corresponding to each of the video pages; audio information of each of the video pages Perform audio recognition to obtain audio text corresponding to each video page; use the page text and the audio text as the plurality of texts.
根据本公开的一个或多个实施例,示例8提供了一种获取视频热点的装置,包括:获取模块,用于对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本;第一聚类模块,用于对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;第二聚类模块,用于针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;确定模块,用于根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。According to one or more embodiments of the present disclosure, Example 8 provides an apparatus for acquiring video hotspots, including: an acquisition module configured to identify the page information of at least one video page, and obtain the information corresponding to the at least one video page A plurality of texts; a first clustering module, configured to cluster a plurality of texts to obtain a first preset classification number of first text clusters; a second clustering module, configured for each of the first text clusters A text clustering, determining a second preset classification number corresponding to the first text clustering, and clustering the texts in the first text clustering according to the second preset classification number, to obtain the second A preset number of second text clusters; a determination module configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.
根据本公开的一个或多个实施例,示例9提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1至示例7中所述方法的步骤。According to one or more embodiments of the present disclosure, Example 9 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the methods described in Example 1 to Example 7 are implemented.
根据本公开的一个或多个实施例,示例10提供了一种电子设备,包括:存储装置,其上存储有计算机程序;处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1至示例7中所述方法的步骤。According to one or more embodiments of the present disclosure, Example 10 provides an electronic device, including: a storage device, on which a computer program is stored; a processing device, configured to execute the computer program in the storage device, to Implement the steps of the method described in Example 1 to Example 7.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or to be performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Claims (13)

  1. 一种获取视频热点的方法,其中,所述方法包括:A method for acquiring video hotspots, wherein the method includes:
    对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本;Identifying the page information of at least one video page to obtain multiple texts corresponding to the at least one video page;
    对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;clustering a plurality of said texts to obtain a first preset number of first text clusters;
    针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;For each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number , obtaining the second text clusters of the second preset classification quantity;
    根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。A video hotspot corresponding to the at least one video page is determined according to the cluster center of each second text cluster.
  2. 根据权利要求1所述的方法,其中,所述对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类,包括:The method according to claim 1, wherein said clustering a plurality of said texts to obtain a first preset classification number of first text clusters comprises:
    确定多个所述文本中的每个词语的TF-IDF;determining a TF-IDF for each term in a plurality of said texts;
    针对每个所述文本,根据多个所述文本中的每个词语的TF-IDF和该文本中的每个词语对应的词向量,确定该文本对应的文本向量;For each of the texts, determine the text vector corresponding to the text according to the TF-IDF of each word in the plurality of texts and the word vector corresponding to each word in the text;
    根据所述第一预设分类数量,利用预设聚类算法对多个所述文本对应的文本向量进行聚类,得到所述第一预设分类数量个第一文本聚类。According to the first preset number of classifications, the plurality of text vectors corresponding to the texts are clustered by using a preset clustering algorithm to obtain the first number of first text clusters of the first preset classifications.
  3. 根据权利要求2所述的方法,其中,所述确定多个所述文本中的每个词语的TF-IDF,包括:The method according to claim 2, wherein said determining the TF-IDF of each word in a plurality of said texts comprises:
    对所述多个文本进行文本预处理;performing text preprocessing on the plurality of texts;
    对所述文本预处理后的多个文本进行分词,并根据分词结果构建所述多个文本对应的词表;Segmenting the multiple texts after the text preprocessing, and constructing a vocabulary corresponding to the multiple texts according to the word segmentation results;
    计算所述多个文本对应的词表中每个词语的TF-IDF。Calculate the TF-IDF of each word in the vocabulary corresponding to the plurality of texts.
  4. 根据权利要求2所述的方法,其中,所述针对每个文本,根据多个文本中的每个词语的TF-IDF和该文本中的每个词语对应的词向量,确定该文本对应的文本向量,包括:The method according to claim 2, wherein, for each text, determine the text corresponding to the text according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text vector, including:
    针对每个文本,根据所述多个文本中每个词语对应的词向量,利用所述多个文本中每个词语的TF-IDF进行加权平均,得到所述文本对应的文本向量。For each text, according to the word vector corresponding to each word in the plurality of texts, the TF-IDF of each word in the plurality of texts is used for weighted averaging to obtain the text vector corresponding to the text.
  5. 根据权利要求1所述的方法,其中,所述确定该第一文本聚类对应的第二预设分类数量,包括:The method according to claim 1, wherein said determining the second preset classification quantity corresponding to the first text clustering comprises:
    根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量。According to the texts in the first text cluster, determine the second preset classification quantity corresponding to the first text cluster.
  6. 根据权利要求5所述的方法,其中,所述根据该第一文本聚类中的文本,确定 该第一文本聚类对应的第二预设分类数量,包括:The method according to claim 5, wherein, according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes:
    根据所述第一文本聚类的中心句和关键词,确定所述第一文本聚类对应的第二预设分类数量,其中,所述中心句包括所述第一文本聚类中距离所述第一文本聚类的聚类中心最近的文本,所述关键词包括所述第一文本聚类中TF-IDF最大的词语。According to the central sentence and keywords of the first text cluster, determine the second preset classification quantity corresponding to the first text cluster, wherein the central sentence includes the distance between the first text cluster and the The text closest to the cluster center of the first text cluster, the keyword includes the word with the largest TF-IDF in the first text cluster.
  7. 根据权利要求5所述的方法,其中,所述根据该第一文本聚类中的文本,确定该第一文本聚类对应的第二预设分类数量,包括:The method according to claim 5, wherein, according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes:
    根据该第一文本聚类中的文本数量,确定该第一文本聚类对应的第二预设分类数量。According to the number of texts in the first text cluster, the second preset classification number corresponding to the first text cluster is determined.
  8. 根据权利要求1所述的方法,其中,所述第二预设分类数量包括多个,所述按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类,包括:The method according to claim 1, wherein the second preset classification number includes multiple, and the texts in the first text clustering are clustered according to the second preset classification number to obtain the The number of second text clusters of the second preset classification includes:
    针对每个所述第二预设分类数量,根据该第二预设分类数量,利用预设聚类算法对该第一文本聚类中的文本进行聚类,得到该第二预设分类数量个候选文本聚类;For each of the second preset classification numbers, according to the second preset classification numbers, the text in the first text clustering is clustered using a preset clustering algorithm to obtain the second preset classification numbers candidate text clustering;
    根据所述候选文本聚类,从多个所述第二预设分类数量中确定目标预设分类数量;determining a target preset category number from a plurality of second preset category numbers according to the candidate text clustering;
    将所述目标预设分类数量对应的候选文本聚类,作为所述第二预设分类数量个第二文本聚类。The candidate text clusters corresponding to the target preset number of categories are used as the second preset number of second text clusters.
  9. 根据权利要求1所述的方法,其中,所述根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点,包括:The method according to claim 1, wherein said determining the video hotspot corresponding to said at least one video page according to the clustering center of each said second text clustering comprises:
    针对每个所述第二文本聚类,根据该第二文本聚类中的每个文本距离该第二文本聚类的聚类中心的距离,确定该第二文本聚类对应的目标文本,并根据该第二文本聚类中的每个词语对应的TF-IDF,确定该第二文本聚类对应的目标词语;For each of the second text clusters, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, and According to the TF-IDF corresponding to each word in the second text clustering, determine the target word corresponding to the second text clustering;
    将所述目标文本和所述目标词语,作为所述视频热点。The target text and the target words are used as the video hotspots.
  10. 根据权利要求1所述的方法,其中,所述页面信息包括文本信息和音频信息中的至少一种,所述对至少一个视频页面的页面信息进行识别,得到所述至少一个视频页面对应的多个文本,包括:The method according to claim 1, wherein the page information includes at least one of text information and audio information, and the page information of the at least one video page is identified to obtain the multiple information corresponding to the at least one video page. text, including:
    对每个所述视频页面的文本信息进行文本识别,得到每个所述视频页面对应的页面文本;Carry out text recognition to the text information of each described video page, obtain the corresponding page text of each described video page;
    对每个所述视频页面的音频信息进行音频识别,得到每个所述视频页面对应的音频文本;Perform audio recognition on the audio information of each of the video pages to obtain the corresponding audio text of each of the video pages;
    将所述页面文本和所述音频文本,作为所述多个文本。The page text and the audio text are used as the plurality of texts.
  11. 一种获取视频热点的装置,其中,所述装置包括:A device for acquiring video hotspots, wherein the device includes:
    获取模块,用于对至少一个视频页面的页面信息进行识别,得到所述至少一个视频 页面对应的多个文本;An acquisition module, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to the at least one video page;
    第一聚类模块,用于对多个所述文本进行聚类,得到第一预设分类数量个第一文本聚类;A first clustering module, configured to cluster a plurality of said texts to obtain a first preset number of first text clusters;
    第二聚类模块,用于针对每个所述第一文本聚类,确定该第一文本聚类对应的第二预设分类数量,并按照所述第二预设分类数量对该第一文本聚类中的文本进行聚类,得到所述第二预设分类数量个第二文本聚类;The second clustering module is configured to, for each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and classify the first text according to the second preset classification number The texts in the clustering are clustered to obtain the second preset classification number of second text clusters;
    确定模块,用于根据每个所述第二文本聚类的聚类中心,确定所述至少一个视频页面对应的视频热点。The determination module is configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.
  12. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理装置执行时实现权利要求1-10中任一项所述方法的步骤。A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processing device, the steps of the method according to any one of claims 1-10 are realized.
  13. 一种电子设备,其中,包括:An electronic device, comprising:
    存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-10中任一项所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of the method according to any one of claims 1-10.
PCT/CN2022/092514 2021-07-21 2022-05-12 Method and apparatus for acquiring video hotspot, readable medium, and electronic device WO2023000782A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110825848.8 2021-07-21
CN202110825848.8A CN113420723A (en) 2021-07-21 2021-07-21 Method and device for acquiring video hotspot, readable medium and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023000782A1 true WO2023000782A1 (en) 2023-01-26

Family

ID=77718003

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092514 WO2023000782A1 (en) 2021-07-21 2022-05-12 Method and apparatus for acquiring video hotspot, readable medium, and electronic device

Country Status (2)

Country Link
CN (1) CN113420723A (en)
WO (1) WO2023000782A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049077A1 (en) * 2011-09-26 2013-04-04 Limelight Networks, Inc. Methods and systems for generating automated tags for video files and indentifying intra-video features of interest
CN107133238A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 A kind of text message clustering method and text message clustering system
CN109190017A (en) * 2018-08-02 2019-01-11 腾讯科技(北京)有限公司 Determination method, apparatus, server and the storage medium of hot information
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109739978A (en) * 2018-12-11 2019-05-10 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN109918656A (en) * 2019-02-28 2019-06-21 武汉斗鱼鱼乐网络科技有限公司 A kind of live streaming hot spot acquisition methods, device, server and storage medium
CN111460153A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Hot topic extraction method and device, terminal device and storage medium
CN112925905A (en) * 2021-01-28 2021-06-08 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for extracting video subtitles
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982110B (en) * 2012-11-08 2015-04-01 中国科学院自动化研究所 Method for extracting hot spot event information of cyberspace in physical space
CN112749299A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Method and device for determining video type, electronic equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049077A1 (en) * 2011-09-26 2013-04-04 Limelight Networks, Inc. Methods and systems for generating automated tags for video files and indentifying intra-video features of interest
CN107133238A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 A kind of text message clustering method and text message clustering system
CN109190017A (en) * 2018-08-02 2019-01-11 腾讯科技(北京)有限公司 Determination method, apparatus, server and the storage medium of hot information
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109739978A (en) * 2018-12-11 2019-05-10 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN109918656A (en) * 2019-02-28 2019-06-21 武汉斗鱼鱼乐网络科技有限公司 A kind of live streaming hot spot acquisition methods, device, server and storage medium
CN111460153A (en) * 2020-03-27 2020-07-28 深圳价值在线信息科技股份有限公司 Hot topic extraction method and device, terminal device and storage medium
CN112925905A (en) * 2021-01-28 2021-06-08 北京达佳互联信息技术有限公司 Method, apparatus, electronic device and storage medium for extracting video subtitles
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment

Also Published As

Publication number Publication date
CN113420723A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
US11017774B2 (en) Cognitive audio classifier
WO2022057302A1 (en) Clustering method and apparatus, electronic device, and storage medium
WO2023083142A1 (en) Sentence segmentation method and apparatus, storage medium, and electronic device
WO2023143016A1 (en) Feature extraction model generation method and apparatus, and image feature extraction method and apparatus
US20170337294A1 (en) Determining Answer Stability in a Question Answering System
WO2022247562A1 (en) Multi-modal data retrieval method and apparatus, and medium and electronic device
WO2023142913A1 (en) Video processing method and apparatus, readable medium and electronic device
WO2022037419A1 (en) Audio content recognition method and apparatus, and device and computer-readable medium
WO2023151589A1 (en) Video display method and apparatus, electronic device and storage medium
WO2023273596A1 (en) Method and apparatus for determining text correlation, readable medium, and electronic device
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN113033682B (en) Video classification method, device, readable medium and electronic equipment
WO2023279843A1 (en) Content search method, apparatus and device, and storage medium
JP2023550211A (en) Method and apparatus for generating text
WO2020151548A1 (en) Method and device for sorting followed pages
WO2023093361A1 (en) Image character recognition model training method, and image character recognition method and apparatus
WO2023211369A2 (en) Speech recognition model generation method and apparatus, speech recognition method and apparatus, medium, and device
CN113919320A (en) Method, system and equipment for detecting early rumors of heteromorphic neural network
WO2023000782A1 (en) Method and apparatus for acquiring video hotspot, readable medium, and electronic device
CN113343069B (en) User information processing method, device, medium and electronic equipment
US20230315990A1 (en) Text detection method and apparatus, electronic device, and storage medium
CN114298007A (en) Text similarity determination method, device, equipment and medium
WO2023174075A1 (en) Training method and apparatus for content detection model, and content detection method and apparatus
WO2023143107A1 (en) Character recognition method and apparatus, device, and medium
WO2023130925A1 (en) Font recognition method and apparatus, readable medium, and electronic device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22844949

Country of ref document: EP

Kind code of ref document: A1