WO2023000782A1

WO2023000782A1 - Method and apparatus for acquiring video hotspot, readable medium, and electronic device

Info

Publication number: WO2023000782A1
Application number: PCT/CN2022/092514
Authority: WO
Inventors: 佘琪; 沈铮阳; 王长虎
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-07-21
Filing date: 2022-05-12
Publication date: 2023-01-26
Also published as: CN113420723A

Abstract

The present disclosure relates to a method and apparatus for acquiring a video hotspot, a readable medium, and an electronic device. The method comprises: identifying page information of at least one video page to obtain a plurality of texts; clustering the plurality of texts to obtain a first preset classification quantity of first text clusters; for each first text cluster, determining a second preset classification quantity corresponding to the first text cluster, and clustering texts in the first text cluster according to the second preset classification quantity, to obtain a second preset classification quantity of second text clusters; and determining, according to a cluster center of each second text cluster, a video hotspot corresponding to the at least one video page.

Description

Method, device, readable medium and electronic equipment for acquiring video hotspots

This disclosure claims the priority of the Chinese patent application number "202110825848.8" filed on July 21, 2021, with the application name "Method, device, readable medium and electronic equipment for obtaining video hotspots", the Chinese patent application The entire contents of are incorporated by reference in this disclosure.

technical field

The present disclosure relates to the technical field of the Internet, and in particular, to a method, device, readable medium and electronic equipment for acquiring video hotspots.

Background technique

With the continuous development of Internet technology and multimedia technology, online video is gradually becoming an indispensable part of online life. Discovering video hotspots of online video plays an important role in enhancing user stickiness and realizing public opinion monitoring. At present, video hotspots are mainly obtained through manual summarization or hotspot discovery models (eg, latent Dirichlet model or latent semantic analysis model). However, using artificial summarization to obtain video hotspots will consume a lot of human resources as the data flow continues to increase, and the efficiency is low and the real-time performance is poor. However, using the hotspot mining model to obtain video hotspots, as the amount of data increases, the calculation cost is high, and ambiguous expressions are prone to occur, which reduces the accuracy of the acquired video hotspots.

Contents of the invention

This Summary is provided to introduce a simplified form of concepts that are described in detail later in the Detailed Description. This summary of the invention is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution. .

In a first aspect, the present disclosure provides a method for acquiring video hotspots, the method comprising:

Identifying the page information of at least one video page to obtain multiple texts corresponding to the at least one video page;

clustering a plurality of said texts to obtain a first preset number of first text clusters;

For each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number , obtaining the second text clusters of the second preset classification quantity;

A video hotspot corresponding to the at least one video page is determined according to the cluster center of each second text cluster.

In a second aspect, the present disclosure provides a device for acquiring video hotspots, the device comprising:

An acquisition module, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to the at least one video page;

The first clustering module is used to cluster a plurality of said texts to obtain the first text clusters of the first preset classification quantity;

The second clustering module is configured to, for each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and classify the first text according to the second preset classification number The texts in the clustering are clustered to obtain the second preset classification number of second text clusters;

The determination module is configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.

In a third aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.

In a fourth aspect, the present disclosure provides an electronic device, including:

a storage device on which a computer program is stored;

A processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.

Through the above technical solution, the present disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.

Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.

Description of drawings

The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale. In the attached picture:

Fig. 1 is a flow chart showing a method for acquiring video hotspots according to an exemplary embodiment;

Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1;

Fig. 3 is a flow chart showing a step 103 according to the embodiment shown in Fig. 1;

Fig. 4 is a block diagram of a device for acquiring video hotspots according to an exemplary embodiment;

Fig. 5 is a block diagram showing a first clustering module according to the embodiment shown in Fig. 4;

Fig. 6 is a block diagram showing a second clustering module according to the embodiment shown in Fig. 4;

Fig. 7 is a block diagram of an acquisition module according to the embodiment shown in Fig. 4;

Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment.

detailed description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

Fig. 1 is a flow chart of a method for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 1, the method may include the following steps:

Step 101: Identify the page information of at least one video page to obtain multiple texts corresponding to the at least one video page.

For example, a video page contains a large amount of page information, and these page information can summarize video hotspots. Therefore, these page information can be used to automatically discover video hotspots. Specifically, at first, at least one video page and the page information of each video page that needs to be discovered for video hotspots can be obtained, wherein the obtained video page can be a video display page of an online video or a live page of a webcast, and the page information includes text at least one of information and audio information. For example, when the video page is a video display page, the text information can be information corresponding to texts such as the title of the network video, topic introduction, video subtitles, and video barrage, and the audio information can be the sound generated by the network video when it is played. corresponding information. When the video page is a live broadcast page, the text information can be the information corresponding to the text of the live broadcast room title, live introduction text, live comments and live barrage, etc., and the audio information can be the corresponding information information.

The manner of recognizing the page information of at least one video page to obtain multiple texts may be: performing text recognition on the text information of each video page to obtain the page text corresponding to each video page. At the same time, audio recognition can be performed on the audio information of each video page to obtain the audio text corresponding to each video page. Finally, the page text and the audio text can be used as multiple texts. For example, when the video page is a live page, within a specified period of time (for example, 2 hours), OCR (English: Optical Character Recognition, Chinese: Optical Character Recognition) technology or existing video data can be used to identify each Text recognition of live room titles, live introduction texts, live comments, live barrage and other texts of each live page to obtain the page text corresponding to each live page. At the same time, it is also possible to obtain the sound produced by each live page during live broadcast within a specified time period, and use speech recognition technology to convert the sound into corresponding text, so as to obtain the corresponding audio text of each live page. Finally, the page text corresponding to each live page and the audio text corresponding to each live page may be used as multiple texts.

Step 102, clustering a plurality of texts to obtain a first preset number of first text clusters.

For example, the first preset classification number can be set in advance, and according to the first preset classification number, a preset clustering algorithm is used to cluster a plurality of texts to obtain the first preset classification number of first text clusters. kind. Wherein, the preset clustering algorithm can be, for example, the K-Means clustering algorithm, and the first preset classification number can be artificially set according to experience, or can be based on coarse-grained texts in multiple texts (for example, titles, topic introductions, etc.) The words in the corresponding text) are selected. The process of obtaining the number of first text clusters of the first preset classification can actually be understood as a coarse-grained clustering process. The clustering granularity of the first text cluster is relatively coarse, and each first text cluster contains a type of text. For example, when the first preset classification number is 3, the three first text clusters can respectively contain The texts of sports, film and television, and games, that is, the clustering granularity of the first text clustering is at the level of sports, film and television, and games.

Step 103, for each first text cluster, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number, A second preset classification quantity of second text clusters is obtained.

In this step, firstly, the second preset classification quantity corresponding to each first text cluster can be determined, and the second preset classification quantity can be a preset fixed value, or can be based on the of the text selected. Then, according to the second preset classification number corresponding to each first text cluster, the text in the first text cluster can be clustered by using a preset clustering algorithm, and the first text cluster corresponding to the first text cluster can be obtained. Two preset classifications and a second text clustering. At this time, the number of second text clusters finally obtained is the sum of the second preset classification numbers corresponding to each first text cluster. Obtaining the second preset number of second text clusters corresponding to each first text cluster can actually be understood as a fine-grained clustering process. The clustering granularity of the second text clustering is relatively fine. For example, when the category of the text contained in a certain first text clustering is sports, the second preset classification number corresponding to the first text clustering can be set to 3, Then the three second text clusters corresponding to the first text cluster can respectively contain track and field, football and basketball texts, that is, the clustering granularity of the second text cluster is at the level of track and field, football and basketball.

Step 104, according to the cluster center of each second text cluster, determine the video hotspot corresponding to at least one video page.

Specifically, after obtaining the second preset classification number of second text clusters corresponding to each first text cluster, for each second text cluster, according to each of the second text clusters The distance between the text and the cluster center of the second text cluster determines the target text corresponding to the second text cluster. At the same time, the vocabulary of the second text clustering can be constructed (the vocabulary of the second text clustering includes all the words of the second text clustering), and then the second text can be determined by the vocabulary of the second text clustering The TF-IDF (English: term frequency-inverse document frequency, Chinese: word frequency-inverse text frequency) corresponding to each word in the clustering, and according to the TF-IDF corresponding to each word in the second text clustering, A target word corresponding to the second text cluster is determined. For example, the first number of texts closest to the cluster center of the second text cluster in the second text cluster can be used as the target text, and the TF-IDF corresponding to each word in the second text cluster The largest second number of words is used as the target word. Then, the target text and target words can be used as video hotspots. By selecting video hotspots from the second text clustering, the expression form of video hotspots is clear, which is convenient for subsequent processing and analysis.

It should be noted that the method for acquiring video hotspots in the present disclosure can be applied not only to acquiring video and live broadcast hotspots, but also to other types of hotspots. For example, it can be applied to acquiring hotspots in images, and this disclosure does not make any Specific limits.

To sum up, the disclosure first identifies the page information of at least one video page to obtain multiple texts corresponding to at least one video page, and then clusters the multiple texts to obtain the first preset classification number of first texts Clustering, and then for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the text in the first text cluster according to the second preset classification number , to obtain a second preset classification number of second text clusters, and finally determine a video hotspot corresponding to at least one video page according to the cluster center of each second text cluster. This disclosure efficiently acquires video hotspots by clustering the text in the video page multiple times, which can ensure the real-time performance of video hotspots, does not require manual participation, has low calculation costs, and can avoid ambiguous expressions , improving the accuracy of acquired video hotspots.

Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1 . As shown in Figure 2, step 102 may include the following steps:

Step 1021, determine the TF-IDF of each word in the multiple texts.

For example, in order to improve the efficiency and accuracy of the acquired video hotspots, text preprocessing can be performed on multiple texts after acquiring multiple texts, so as to remove information irrelevant to video hotspots (such as punctuation marks) in each text. , stop words, etc.) and sensitive information. After that, word segmentation can be performed on multiple texts that have undergone text preprocessing, and then a vocabulary corresponding to multiple texts can be constructed according to the word segmentation results (the vocabulary corresponding to multiple texts includes all words in multiple texts), and multiple texts corresponding to each other can be calculated. TF-IDF for each word in the vocabulary of .

Step 1022, for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, determine the text vector corresponding to the text.

Step 1023: According to the first preset number of categories, use a preset clustering algorithm to cluster the text vectors corresponding to the multiple texts to obtain the first preset number of first text clusters.

Further, for each text, according to the word vector (English: word embedding) corresponding to each word in the text, the TF-IDF of each word in the text can be used for weighted average to obtain the text vector corresponding to the text, That is, the text features of the text. Then, according to the first preset classification number, the text vectors corresponding to the multiple texts may be clustered by using a preset clustering algorithm to obtain the first preset classification number of first text clusters.

Optionally, step 103 can be implemented in the following manner:

According to the texts in the first text cluster, determine the second preset classification quantity corresponding to the first text cluster.

In one scenario, the second preset number of categories corresponding to the first text cluster may be determined according to the central sentence and keywords of the first text cluster. For example, the central sentence and keywords of each first text cluster can be fed back to the user, and the user can determine the category of the text contained in the first text cluster according to the central sentence and keywords of the first text cluster, And set the corresponding second preset classification number for the first text cluster according to the category, wherein the central sentence can be several texts closest to the cluster center of the first text cluster in the first text cluster, The keywords may be several words with the largest TF-IDF in the first text clustering.

In another scenario, the second preset number of categories corresponding to the first text cluster may be determined according to the number of texts in the first text cluster. Specifically, for the first text cluster with a large number of texts, a larger number of second preset classifications may be set. For example, when the number of first preset classifications is 4, and the number of texts included in the four first text clusters is 100, 10, 20, and 50, the second preset of the first text cluster with the number of texts of 100 can be Set the number of classifications to 5, set the second preset classification number of the first text cluster with 10 texts to 2, and set the second preset classification number of the first text cluster with 20 texts to 3 , set the second preset category number of the first text cluster whose number of texts is 50 to 4.

FIG. 3 is a flow chart of step 103 according to the embodiment shown in FIG. 1 . As shown in Figure 3, the second preset classification quantity includes multiple, and step 103 may include the following steps:

Step 1031, for each second preset classification number, use a preset clustering algorithm to cluster the texts in the first text clustering according to the second preset classification number to obtain the second preset classification number candidate text clusters.

Step 1032, according to the candidate text clusters, determine the target preset category number from multiple second preset category numbers.

Step 1033, cluster the candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.

For example, in order to make the obtained second text clusters more accurate, each first text cluster may correspond to a plurality of second preset classification quantities. When clustering the text in each first text cluster, the first text can be clustered respectively by using a preset clustering algorithm according to each second preset classification quantity corresponding to the first text cluster The texts in the class are clustered to obtain a set of candidate text clusters corresponding to the second preset number of classifications. The set of candidate text clusters corresponding to each second preset number of categories includes the second preset number of candidate text clusters. For example, when a first text cluster corresponds to a plurality of second preset classification numbers of 3, 4, and 5, three candidate text cluster sets corresponding to 3, 4, and 5 will be obtained, and the candidate text corresponding to 3 The cluster set includes 3 candidate text clusters, the candidate text cluster set corresponding to 4 includes 4 candidate text clusters, and the candidate text cluster set corresponding to 5 includes 5 candidate text clusters.

Then, according to the set of candidate text clusters corresponding to the second preset classification quantity, each candidate text cluster can be determined by using indicators such as the contour coefficient method, the elbow method, and the CH coefficient (English: Calinski-Harabasz Index) The clustering effect of the set, and the second preset classification number corresponding to the candidate text clustering set with the best clustering effect is used as the target preset classification number. Finally, the candidate text clusters in the candidate text cluster set corresponding to the target preset number of categories are used as the second preset number of second text clusters.

Fig. 4 is a block diagram of an apparatus for acquiring video hotspots according to an exemplary embodiment. As shown in Figure 4, the device 200 includes:

An acquisition module 201, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to at least one video page;

The first clustering module 202 is configured to cluster multiple texts to obtain a first preset number of first text clusters.

The second clustering module 203 is configured to, for each first text cluster, determine the second preset classification number corresponding to the first text cluster, and cluster the first text according to the second preset classification number The texts are clustered to obtain a second preset number of second text clusters.

Determining module 204 is used for determining the video hotspot corresponding to at least one video page according to the cluster center of each second text cluster.

Fig. 5 is a block diagram of a first clustering module according to the embodiment shown in Fig. 4 . As shown in Figure 5, the first clustering module 202 includes:

The second determination sub-module 2021 is configured to determine the TF-IDF of each word in the multiple texts.

The second determination sub-module 2021 is further configured for each text, according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text, to determine the text vector corresponding to the text.

The first clustering sub-module 2022 is further configured to use a preset clustering algorithm to cluster the text vectors corresponding to a plurality of texts according to the first preset classification number to obtain the first preset classification number of first text clusters .

Optionally, the second clustering module 203 is used for:

According to the number of texts in the first text cluster, the second preset classification number corresponding to the first text cluster is determined.

Fig. 6 is a block diagram of a second clustering module according to the embodiment shown in Fig. 4 . As shown in Figure 6, the second clustering module 203 includes:

The second clustering sub-module 2031 is configured to cluster the texts in the first text clustering using a preset clustering algorithm according to the second preset classification number for each second preset classification number, to obtain There are a number of candidate text clusters for the second preset classification.

The third determination sub-module 2032 is configured to determine the target preset number of categories from multiple second preset numbers of categories according to the candidate text clusters.

The third determination sub-module 2032 is further configured to cluster candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.

Optionally, the determination module 204 is used for:

For each second text cluster, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, and according to the The TF-IDF corresponding to each word in the second text cluster determines the target word corresponding to the second text cluster.

Use the target text and target words as video hotspots.

Fig. 7 is a block diagram of an acquisition module according to the embodiment shown in Fig. 4 . As shown in Figure 7, the page information includes at least one of text information and audio information, and the acquisition module 201 includes:

The recognition sub-module 2011 is configured to perform text recognition on the text information of each video page to obtain the page text corresponding to each video page.

The identification sub-module 2011 is further configured to perform audio identification on the audio information of each video page to obtain the corresponding audio text of each video page.

The processing sub-module 2012 is configured to use the page text and the audio text as multiple texts.

Referring now to FIG. 8 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 1 ) 300 suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.

As shown in FIG. 8, an electronic device 300 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 301, which may be randomly accessed according to a program stored in a read-only memory (ROM) 302 or loaded from a storage device 308. Various appropriate actions and processes are executed by programs in the memory (RAM) 303 . In the RAM 303, various programs and data necessary for the operation of the electronic device 300 are also stored. The processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304. An input/output (I/O) interface 305 is also connected to the bus 304 .

Typically, the following devices can be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrating an output device 307 such as a computer; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows electronic device 300 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 309, or from storage means 308, or from ROM 302. When the computer program is executed by the processing device 301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.

It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium Communications (eg, communication networks) are interconnected. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: recognizes the page information of at least one video page, and obtains the at least one video page a plurality of corresponding texts; clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determine the corresponding text of the first text clusters a second preset classification number, and cluster the texts in the first text cluster according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each The cluster center of the second text cluster determines the video hotspot corresponding to the at least one video page.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the obtaining module can also be described as "a module for obtaining multiple texts corresponding to the video page".

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, Example 1 provides a method for acquiring video hotspots, including: identifying the page information of at least one video page, and obtaining multiple texts corresponding to the at least one video page; Clustering a plurality of the texts to obtain a first preset classification number of first text clusters; for each of the first text clusters, determining a second preset classification number corresponding to the first text cluster, and clustering the text in the first text clustering according to the second preset classification number to obtain the second preset classification number of second text clusters; according to each of the second text clusters The clustering centers of the at least one video page are determined to determine the video hotspot corresponding to the at least one video page.

According to one or more embodiments of the present disclosure, Example 2 provides the method of Example 1. The clustering of a plurality of texts to obtain a first preset classification number of first text clusters includes: determining how many The TF-IDF of each word in a plurality of said texts; for each said text, according to the TF-IDF of each word in a plurality of said texts and the word vector corresponding to each word in this text, determine The text vector corresponding to the text; according to the first preset classification number, use a preset clustering algorithm to cluster a plurality of text vectors corresponding to the text to obtain the first preset classification number of first texts clustering.

According to one or more embodiments of the present disclosure, Example 3 provides the method of Example 1, the determining the second preset classification quantity corresponding to the first text cluster includes: according to the text in the first text cluster , to determine the second preset classification quantity corresponding to the first text cluster.

According to one or more embodiments of the present disclosure, Example 4 provides the method of Example 3, wherein according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes : According to the number of texts in the first text cluster, determine the second preset classification number corresponding to the first text cluster.

According to one or more embodiments of the present disclosure, Example 5 provides the method of Example 1, the second preset classification number includes multiple, and the first text is clustered according to the second preset classification number clustering the texts in to obtain the second preset classification number of second text clusters, including: for each of the second preset classification numbers, according to the second preset classification number, using the preset clustering Clustering the texts in the first text clustering by class algorithm to obtain the second preset classification number of candidate text clusters; according to the candidate text clusters, from the plurality of second preset classification numbers Determining a target preset number of categories; clustering candidate texts corresponding to the target preset number of categories as second text clusters of the second preset number of categories.

According to one or more embodiments of the present disclosure, Example 6 provides the method of Example 1, the determining the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters, including : For each of the second text clusters, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, And according to the TF-IDF corresponding to each word in the second text cluster, determine the target word corresponding to the second text cluster; use the target text and the target word as the video hotspot.

According to one or more embodiments of the present disclosure, Example 7 provides the method of Example 1, the page information includes at least one of text information and audio information, and the page information of at least one video page is identified to obtain The multiple texts corresponding to the at least one video page include: performing text recognition on the text information of each of the video pages to obtain the page text corresponding to each of the video pages; audio information of each of the video pages Perform audio recognition to obtain audio text corresponding to each video page; use the page text and the audio text as the plurality of texts.

According to one or more embodiments of the present disclosure, Example 8 provides an apparatus for acquiring video hotspots, including: an acquisition module configured to identify the page information of at least one video page, and obtain the information corresponding to the at least one video page A plurality of texts; a first clustering module, configured to cluster a plurality of texts to obtain a first preset classification number of first text clusters; a second clustering module, configured for each of the first text clusters A text clustering, determining a second preset classification number corresponding to the first text clustering, and clustering the texts in the first text clustering according to the second preset classification number, to obtain the second A preset number of second text clusters; a determination module configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.

According to one or more embodiments of the present disclosure, Example 9 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the methods described in Example 1 to Example 7 are implemented.

According to one or more embodiments of the present disclosure, Example 10 provides an electronic device, including: a storage device, on which a computer program is stored; a processing device, configured to execute the computer program in the storage device, to Implement the steps of the method described in Example 1 to Example 7.

The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or to be performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Claims

A method for acquiring video hotspots, wherein the method includes:

Identifying the page information of at least one video page to obtain multiple texts corresponding to the at least one video page;

clustering a plurality of said texts to obtain a first preset number of first text clusters;

For each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and cluster the texts in the first text cluster according to the second preset classification number , obtaining the second text clusters of the second preset classification quantity;

A video hotspot corresponding to the at least one video page is determined according to the cluster center of each second text cluster.
The method according to claim 1, wherein said clustering a plurality of said texts to obtain a first preset classification number of first text clusters comprises:

determining a TF-IDF for each term in a plurality of said texts;

For each of the texts, determine the text vector corresponding to the text according to the TF-IDF of each word in the plurality of texts and the word vector corresponding to each word in the text;

According to the first preset number of classifications, the plurality of text vectors corresponding to the texts are clustered by using a preset clustering algorithm to obtain the first number of first text clusters of the first preset classifications.
The method according to claim 2, wherein said determining the TF-IDF of each word in a plurality of said texts comprises:

performing text preprocessing on the plurality of texts;

Segmenting the multiple texts after the text preprocessing, and constructing a vocabulary corresponding to the multiple texts according to the word segmentation results;

Calculate the TF-IDF of each word in the vocabulary corresponding to the plurality of texts.
The method according to claim 2, wherein, for each text, determine the text corresponding to the text according to the TF-IDF of each word in the multiple texts and the word vector corresponding to each word in the text vector, including:

For each text, according to the word vector corresponding to each word in the plurality of texts, the TF-IDF of each word in the plurality of texts is used for weighted averaging to obtain the text vector corresponding to the text.
The method according to claim 1, wherein said determining the second preset classification quantity corresponding to the first text clustering comprises:

According to the texts in the first text cluster, determine the second preset classification quantity corresponding to the first text cluster.
The method according to claim 5, wherein, according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes:

According to the central sentence and keywords of the first text cluster, determine the second preset classification quantity corresponding to the first text cluster, wherein the central sentence includes the distance between the first text cluster and the The text closest to the cluster center of the first text cluster, the keyword includes the word with the largest TF-IDF in the first text cluster.
The method according to claim 5, wherein, according to the text in the first text cluster, determining the second preset classification quantity corresponding to the first text cluster includes:

According to the number of texts in the first text cluster, the second preset classification number corresponding to the first text cluster is determined.
The method according to claim 1, wherein the second preset classification number includes multiple, and the texts in the first text clustering are clustered according to the second preset classification number to obtain the The number of second text clusters of the second preset classification includes:

For each of the second preset classification numbers, according to the second preset classification numbers, the text in the first text clustering is clustered using a preset clustering algorithm to obtain the second preset classification numbers candidate text clustering;

determining a target preset category number from a plurality of second preset category numbers according to the candidate text clustering;

The candidate text clusters corresponding to the target preset number of categories are used as the second preset number of second text clusters.
The method according to claim 1, wherein said determining the video hotspot corresponding to said at least one video page according to the clustering center of each said second text clustering comprises:

For each of the second text clusters, according to the distance between each text in the second text cluster and the cluster center of the second text cluster, determine the target text corresponding to the second text cluster, and According to the TF-IDF corresponding to each word in the second text clustering, determine the target word corresponding to the second text clustering;

The target text and the target words are used as the video hotspots.
The method according to claim 1, wherein the page information includes at least one of text information and audio information, and the page information of the at least one video page is identified to obtain the multiple information corresponding to the at least one video page. text, including:

Carry out text recognition to the text information of each described video page, obtain the corresponding page text of each described video page;

Perform audio recognition on the audio information of each of the video pages to obtain the corresponding audio text of each of the video pages;

The page text and the audio text are used as the plurality of texts.
A device for acquiring video hotspots, wherein the device includes:

An acquisition module, configured to identify the page information of at least one video page, and obtain multiple texts corresponding to the at least one video page;

A first clustering module, configured to cluster a plurality of said texts to obtain a first preset number of first text clusters;

The second clustering module is configured to, for each of the first text clusters, determine a second preset classification number corresponding to the first text cluster, and classify the first text according to the second preset classification number The texts in the clustering are clustered to obtain the second preset classification number of second text clusters;

The determination module is configured to determine the video hotspot corresponding to the at least one video page according to the cluster center of each of the second text clusters.
A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processing device, the steps of the method according to any one of claims 1-10 are realized.
An electronic device, comprising:

a storage device on which a computer program is stored;

A processing device configured to execute the computer program in the storage device to implement the steps of the method according to any one of claims 1-10.