CN109918656B

CN109918656B - Live broadcast hotspot acquisition method and device, server and storage medium

Info

Publication number: CN109918656B
Application number: CN201910148553.4A
Authority: CN
Inventors: 肖源
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2022-12-23
Anticipated expiration: 2039-02-28
Also published as: CN109918656A

Abstract

The embodiment of the invention discloses a live broadcast hotspot obtaining method, a live broadcast hotspot obtaining device, a server and a storage medium, and belongs to the field of network live broadcast. The method comprises the following steps: after collecting the live topic data, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words; selecting high-frequency words, constructing a set of sentences in which the high-frequency words are positioned, and performing AND operation on the sets of the sentences in which all the high-frequency words are positioned to obtain collinear words among the sets of the sentences; selecting high-frequency collinear words, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination; and acquiring hot topics corresponding to the ternary vocabulary combination. By the technical scheme, the hot topics can be accurately described by combining the ternary vocabulary combination, the hot extraction process is simplified, and the efficiency is improved.

Description

Live broadcast hotspot acquisition method and device, server and storage medium

Technical Field

The invention relates to the field of live webcasting, in particular to a live hotspot acquisition method, a live hotspot acquisition device, a server and a storage medium.

Background

The hot topic is easy to attract wide attention to users with increasingly developed networks, and has important significance in the aspects of accurately acquiring the hot topic in real time for each website and APP, increasing the online time of the users, improving the user traffic and the like. Especially for the live broadcast platform gathering various anchor broadcasters and users, the live broadcast platform can be widely discussed through the barrage or the community, hotspots can be found in time, and user experience can be improved.

At present, the common hotspot discovery technology in the market is to perform feature extraction after word segmentation based on text data, calculate similarity by means of cluster analysis, an LDA model and the like, and obtain a hotspot topic according to hotspot keyword frequency after similarity calculation. In the method, hot spots are obtained through vocabulary or sentence similarity calculation, the hot spot obtaining accuracy is not high, and ambiguous expressions are easy to generate.

Disclosure of Invention

In view of this, embodiments of the present invention provide a live broadcast hotspot acquiring method, apparatus, server and storage medium, so as to improve hotspot acquiring efficiency and ensure acquiring accuracy.

In combination with the first aspect of the embodiments of the present invention, a live broadcast hotspot obtaining method is provided, including:

after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;

selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;

selecting high-frequency collinear words among sentence sets, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;

and acquiring hot topics corresponding to the ternary vocabulary combination.

In combination with the second aspect of the embodiment of the present invention, there is provided a live broadcast hotspot obtaining device, including:

the word segmentation module is used for carrying out word segmentation on sentences in the live topic data after the live topic data in a preset time length is collected, and counting the occurrence frequency of all words;

the first operation module is used for selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;

the second operation module is used for selecting high-frequency collinear words among sentence sets, solving the intersection of the high-frequency collinear words and the high-frequency words and obtaining a ternary word combination;

and the acquisition module is used for acquiring the hot topics corresponding to the ternary vocabulary combination.

In combination with the third aspect of the embodiments of the present invention, there is provided a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the embodiments of the present invention.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.

In a fifth aspect of the embodiments of the present invention, a computer program product is provided, where the computer program product includes a computer program, and the computer program is used for implementing the steps of the method provided in the first aspect of the embodiments of the present invention when being executed by one or more processors.

In the embodiment of the invention, the frequency is counted after the topic sentence is subjected to word segmentation processing by extracting the live topic data in the recent period of time. And operating the high-frequency words and the sentence set to obtain two-dimensional collinear words, then obtaining intersection of the collinear words and the high-frequency words to obtain a ternary word combination, and accurately expressing the hot topic by the high-frequency ternary word combination.

Drawings

Fig. 1 is a flowchart of a live broadcast hotspot obtaining method according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of a flowchart of a live broadcast hotspot obtaining method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a live broadcast hotspot obtaining device provided by a sixth embodiment of the present invention

Fig. 4 is a schematic structural diagram of a server according to an eighth embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a live broadcast hotspot acquiring method and device, a server and a storage medium, which are used for accurately and efficiently acquiring a live broadcast hotspot and are convenient to push the hotspot.

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Example one

Referring to fig. 1, a flow diagram of a live broadcast hotspot obtaining method provided by the embodiment of the present invention includes:

s101, after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;

the predetermined time period is a range of time periods, and may be a time period of several hours, days or months, and generally, a time period within the current last day or week may be selected. The live topic data is topic data related to live content or anchor, such as common barrage, and also can include content data in communities and forums related to live broadcast. Preferably, the live topic data is a complete topic statement extracted from a barrage or a community, and generally includes a structure meeting the requirement of Chinese semantic expression, such as a principal and a predicate object.

Live topic data include the live room barrage data of the certain length of time collection of interval at least, and the certain length of time collection barrage of interval can prevent the short-term user action of swiping the screen.

The word segmentation process is a process of dividing a sentence into separate words, namely recombining continuous word sequences into word sequences according to a certain specification. The word segmentation method comprises character string matching, understanding-based word segmentation and the like, and word composition in the sentence can be obtained through word segmentation.

Optionally, recording each sentence collected in the live topic data, and removing stop words of each sentence to obtain a vocabulary in each sentence; and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary from high to low.

Preferably, the collected live topic data is preprocessed, sensitive sentences and phrases are removed, the sensitive words are speeches which do not meet the requirements of related regulations, and the phrases are sentences with one or two characters or single-character repeated sentences. The subsequent processing efficiency of the statement can be effectively improved through preprocessing.

S102, selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;

the high-frequency vocabulary is vocabularies with higher occurrence frequency, specifically, the first vocabularies with higher occurrence frequency are selected after the vocabulary is sorted by counting the occurrence frequency of each vocabulary, and for example, the first 100 vocabularies are high-frequency vocabularies.

And searching sentences of the high-frequency words, and putting the sentences of each high-frequency word into a set corresponding to the word, wherein if the sentences of the word "trade war" are put into the set corresponding to the word "trade war", all the sentences are the sentences in the collected live topic data.

The and operation means that two sentences are compared pairwise, two words in the same sentence are found out to be collinear words, such as the sentence set and the sentence set of the high-frequency word "Zhongmei", and sentences of the "Zhongmei" and the "trade war" which appear in the two sets simultaneously are found out through the and operation, and the number of the sentences is counted, wherein if the number of the sentences is not less than 1, the "Zhongmei" and the "trade war" can be regarded as collinear words.

Optionally, collinear words among the sentence sets are obtained according to formula (1):

where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&denote and operation (collinear), score (w) _i ,w _j ) And expressing the association degree of the binary phrases.

S103, selecting high-frequency collinear words among sentence sets, solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;

the high-frequency collinear words are obtained according to the number of the collinear words among all sentence sets, a plurality of collinear words with high occurrence frequency are selected, specifically, the collinear words with a certain number which is ranked in the front or has high association degree are selected after the high-frequency collinear words are sequenced by counting the number of the two high-frequency words appearing in the same sentence or calculating the association degree, and the collinear words are hot topic keywords.

The intersection refers to the intersection of the high-frequency collinear words and the high-frequency words, and according to sentences appearing in the high-frequency words and the high-frequency collinear words, the intersection is solved by the high-frequency collinear words and the high-frequency collinear words, so that a ternary word combination can be obtained, and hot topics can be accurately described through the ternary word combination.

Optionally, the intersection of the high-frequency collinear words and the high-frequency vocabulary is solved according to formula (2):

where n denotes the number of high-frequency words, m denotes the number of high-frequency collinear words, set (i, j) denotes a sentence subset of a binary phrase, set (k) denotes a high-frequency word, score (w) _i ,w _j ,w _k ) Representing the degree of association between the triads.

And S104, acquiring hot topics corresponding to the ternary vocabulary combination.

The ternary vocabulary combination can accurately express hot topic contents, corresponding hot topics can be obtained by selecting a high-frequency ternary vocabulary combination, and specifically, sentences corresponding to the ternary vocabulary combination are obtained; and sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are located, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration. The obtained ternary vocabulary combination can be directly displayed after being sequentially recombined or can be displayed after being put into a corresponding sentence to be integrated, and the main contents of the hot spots can be accurately described due to the ternary vocabulary combination, and can be pushed to a user after being selected and sorted.

Preferably, the vocabulary in the ternary vocabulary combination is used as a keyword, the associated live broadcast room is searched through keyword matching, and the associated live broadcast room is selected for recommendation. Specifically, live broadcast content of each current live broadcast room is collected, the current live broadcast content can be obtained through a barrage or a live broadcast room title and the like, keywords in the corresponding live broadcast room and the ternary vocabulary combination are correlated, when a user searches the keywords through a search engine, the keywords in the ternary vocabulary combination can be preferentially pushed to the related live broadcast rooms, and similarly, the related live broadcast rooms can also be recommended after the keywords are matched with the keywords in the ternary vocabulary combination according to the watching history or the speaking record of the user. By means of keywords in the ternary vocabulary combination, content recommendation can be facilitated, a user is guided to conduct hot topic discussion, the online time and the hot degree of a live broadcast room of the user can be improved, and user experience can be guaranteed.

In the technical scheme of the embodiment, the high-frequency vocabulary is obtained by collecting the live topic data, preprocessing the live topic data, performing word segmentation processing, and then obtaining the ternary vocabulary combination based on the quantity statistics by obtaining the intersection of the vocabulary and the vocabulary combination.

Example two

Fig. 2 is another schematic flow diagram of the live broadcast hotspot acquisition method provided in the second embodiment of the present invention, and details of the process of obtaining the ternary vocabulary combination are described on the basis of the first embodiment, including the following steps:

s201, acquiring high-frequency vocabularies;

after live topic data are collected, user barrage or community speech content is obtained, the topic data consist of one sentence, and are recorded as sn = (s 1, s 2.. Sn), the sentences are subjected to word segmentation processing, a plurality of words are obtained, and the occurrence times of the words are obtained.

Selecting words with high frequency of occurrence, illustratively, after the occurrence frequency of each word is sorted from high to low, selecting 100 words at the top of the rank as high-frequency words, recording the words as wn = (w 1, w2, w3,... W100), generating a set for each word in wn, and adding the corresponding sentence into the set (k) as long as wk occurs in sn, thereby obtaining 100 sentence sets corresponding to the high-frequency words.

S202, obtaining collinear words;

for each high-frequency word wn, a corresponding sentence set is provided, and operation is carried out between any two sentence sets, namely the quantity of the high-frequency words appearing in the two sentence sets is searched, such as between the sentence set (1) corresponding to the high-frequency word w1 and the sentence set (2) corresponding to the high-frequency word w2, the number of sentences in which w1 and w2 appear in both sentence sets, namely the number of sentences in which w1 and w2 appear in the sentence set (1) or the sentence set (2) at the same time, if the number is not less than 1, w1 and w2 can be represented as collinear words, and the relevance between two high-frequency words can be represented by counting the number of sentences in which the collinear words appear.

Specifically, collinear words among sentence sets are obtained according to the following formula (1):

set (i) denotes a set of sentences of word i, set (j) denotes a set of sentences of word j,&representation and operation (collinear), score (w) _i ,w _j ) And expressing the association degree of the binary phrases.

score(w _i ,w _j ) The high-frequency vocabulary relevancy can be described, and vocabulary associated vocabulary can be determined by counting the number of overlapped sentences in the sentence sets.

And S203, solving the intersection of the high-frequency vocabulary and the high-frequency collinear words to obtain a ternary vocabulary combination.

The relevance is calculated through a formula (1) or the number of the collinear words is directly counted, so that the high-frequency collinear words can be obtained, specifically, sequencing is carried out according to the relevance or the number of the collinear words, a certain number of collinear words are selected from high to low to serve as the high-frequency collinear words, and if 500 word pairs are selected to serve as the high-frequency collinear words.

Further, the intersection of the high-frequency collinear words and the high-frequency vocabulary is solved according to formula (2):

wherein, set (i, j) represents the sentence set of binary phrase, set (k) represents high frequency vocabulary, score (w) _i ,w _j ,w _k ) Representing the degree of association between the three phrases.

And set (i, j) is a sentence set corresponding to the high-frequency collinear words, the high-frequency words and the sentence sets corresponding to the high-frequency collinear words are matched, and intersection is obtained to obtain a ternary word combination containing the high-frequency collinear words and the high-frequency words.

The ternary vocabulary combination can specifically and accurately describe the hot event, and the description of the corresponding hot event can be obtained by disorder duplication removal and sequencing and then sorting the sentences corresponding to the ternary vocabulary combination.

Preferably, the words in the ternary word combination are used as key words to perform user search and big data matching, and the words are used for recommending a live broadcast room according to the user interests. And if the keywords are extracted according to the watching history or the speech record of the user, matching the keywords with the ternary vocabulary combination, and pushing the associated live broadcast room.

In the embodiment of the invention, the ternary vocabulary is directly calculated and extracted based on the statistical characteristics of the vocabulary, the hot topics can be accurately described, the calculation process is simple, and compared with the traditional word similarity clustering analysis, the hot extraction process is greatly simplified.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a live broadcast hotspot obtaining device provided in an embodiment of the present invention, where the device includes:

the word segmentation module 310 is configured to, after collecting live topic data within a predetermined time, perform word segmentation processing on sentences in the live topic data, and count occurrence frequencies of all vocabularies;

optionally, the live topic data at least includes live broadcast room barrage data acquired at a certain interval and a certain duration.

Optionally, the word segmentation module 310 specifically includes:

the word segmentation unit is used for recording each sentence collected in the live topic data, removing stop words of each sentence and obtaining words in each sentence;

and the statistical unit is used for counting the occurrence times of all the vocabularies and arranging the vocabularies according to the occurrence times of each vocabulary from high to low.

The first operation module 320 is configured to select a preset number of high-frequency words, construct a set of sentences in which the high-frequency words are located, and perform an and operation between the sets of sentences in which all the high-frequency words are located to obtain collinear words between the sets of sentences;

the high-frequency vocabulary is vocabularies with higher occurrence frequency, specifically, the first vocabularies with higher occurrence frequency are selected after the vocabulary is sorted by counting the occurrence frequency of each vocabulary, and for example, the first 100 vocabularies are high-frequency vocabularies. And searching sentences of the high-frequency words, and putting the sentences of each high-frequency word into a set corresponding to the word, wherein if the sentences of the word "trade war" are put into the set corresponding to the word "trade war", all the sentences are the sentences in the collected live topic data.

Optionally, the obtaining of collinear words among sentence sets by performing and operation among the sets of sentences in which all high-frequency words are located specifically includes:

finding out collinear words among sentence sets according to formula (1):

where n denotes the number of high frequency words, set (i) denotes the set of sentences of word i, set (j) denotes the set of sentences of word j,&representation and operation (collinear), score(w _i ,w _j ) And expressing the association degree of the binary phrases.

The second operation module 330 is configured to select high-frequency collinear words among sentence sets, and obtain an intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;

the high-frequency collinear words are obtained by obtaining the number of the collinear words according to all sentence sets, a plurality of collinear words with high occurrence frequency are selected, specifically, the collinear words with a certain number which are ranked in the front or have high association degree are selected after the high-frequency collinear words are sequenced by counting the number of the two high-frequency words appearing in the same sentence or calculating the association degree, and the collinear words are hot topic keywords.

Optionally, the obtaining of the intersection of the high-frequency collinear words and the high-frequency words and obtaining the ternary word combination specifically includes:

solving the intersection of the high-frequency collinear words and the high-frequency vocabulary according to formula (2):

wherein n represents the number of high-frequency words, m represents the number of high-frequency collinear words, set (i, j) represents a sentence set of a binary phrase, set (k) represents a high-frequency word, score (w) _i ,w _j ,w _k ) Representing the degree of association between the three phrases.

The obtaining module 340 is configured to obtain a hot topic corresponding to the ternary vocabulary combination.

Optionally, the obtaining module 340 includes:

the obtaining unit is used for obtaining sentences corresponding to the ternary vocabulary combination;

and the integration unit is used for sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are located, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.

Optionally, the obtaining module 340 further includes:

and the recommendation module is used for taking the vocabulary in the ternary vocabulary combination as a keyword, searching the associated live broadcast room through keyword matching, and selecting the associated live broadcast room for recommendation.

In the device, the ternary vocabulary combination in the live topic data is extracted through the first operation module and the second operation module, so that the hot topic is accurately extracted and expressed, and the device is simple and easy to implement and high in efficiency.

Example four

Fig. 4 is a schematic structural diagram of a live hotspot acquisition server according to an embodiment of the present invention. The server, which is a device providing computing services, generally refers to a computer with high computing power, and is provided to a plurality of users via a network. As shown in fig. 4, the server 4 of this embodiment includes: a memory 410, a processor 420, and a system bus 430, the memory 410 including an executable program 4101 stored thereon, it being understood by those skilled in the art that the server architecture shown in FIG. 4 does not constitute a limitation of a server and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The following describes each component of the server in detail with reference to fig. 4:

the memory 410 may be used to store software programs and modules, and the processor 420 executes various functional applications of the server and data processing by operating the software programs and modules stored in the memory 410. The memory 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the server, and the like. Further, the memory 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

On the memory 410 is embodied an executable program 4101 of a network request method, the executable program 4101 may be divided into one or more modules/units, which are stored in the memory 410 and executed by the processor 420 to obtain a hot topic, and the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program 4101 in the server 5. For example, the computer program 4101 may be divided into a word segmentation module, a first operation module, a second operation module and an acquisition module.

The processor 420 is a control center of the server, connects various parts of the entire server apparatus using various interfaces and lines, performs various functions of the server and processes data by operating or executing software programs and/or modules stored in the memory 410 and calling data stored in the memory 410, thereby performing overall monitoring of the server. Alternatively, processor 420 may include one or more processing units; preferably, the processor 420 may integrate an application processor, which mainly handles operating systems, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 420.

The system bus 430 is used to connect functional units inside the computer, and can transmit data information, address information, and control information, and may be, for example, a PCI bus, an ISA bus, a VESA bus, etc. The instructions of the processor 420 are transmitted to the memory 410 through the bus, the memory 410 feeds data back to the processor 420, and the system bus 430 is responsible for data and instruction interaction between the processor 420 and the memory 410. Of course, other devices, such as network interfaces, display devices, etc., may also be accessed by the system bus 530.

In this embodiment of the present invention, the executable program executed by the processor 420 included in the server is specifically:

a live hotspot acquisition method comprises the following steps:

and acquiring hot topics corresponding to the ternary vocabulary combination.

Further, the live topic data at least comprises live room barrage data acquired at certain time intervals.

Further, the performing word segmentation processing on the sentences in the live topic data, and counting the occurrence frequency of all words specifically include:

recording each sentence collected in the live topic data, and removing stop words of each sentence to obtain words in each sentence;

and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary.

Further, the obtaining of collinear words among sentence sets by performing and operation among the sets of sentences in which all high-frequency words are located specifically includes:

collinear words among the sentence sets are solved according to formula (1):

where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&representation and operation (collinear), score (w) _i ,w _j ) And representing the association degree of the binary phrase.

Further, the solving of the intersection of the high-frequency collinear words and the high-frequency words and the obtaining of the ternary word combination specifically comprises:

solving the intersection of the high-frequency collinear words and the high-frequency vocabulary according to a formula (2):

Further, the obtaining of the hot topics corresponding to the ternary vocabulary combination specifically includes:

obtaining sentences corresponding to the ternary vocabulary combination;

and sequencing and de-duplicating the ternary vocabulary combinations according to the number of the sentences in the ternary vocabulary combinations, selecting a certain number of ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.

Further, the obtaining of the hot topic corresponding to the ternary vocabulary combination further includes:

and taking the vocabulary in the ternary vocabulary combination as a keyword, searching a related live broadcast room through keyword matching, and selecting the related live broadcast room for recommendation.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A live hotspot obtaining method is characterized by comprising the following steps:

and acquiring hot topics corresponding to the ternary vocabulary combination.

2. The method of claim 1, wherein the live topic data comprises at least live room barrage data collected at intervals.

3. The method according to claim 1, wherein the segmenting of the sentences in the live topic data and the statistics of the occurrence frequency of all vocabularies are specifically as follows:

and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary from high to low.

4. The method according to claim 1, wherein the obtaining of the collinear words among the sentence sets by performing an and operation among the sentence sets in which all the high-frequency vocabularies are located is specifically:

finding out collinear words among sentence sets according to formula (1):

where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&denote and operate and co-linear, score (w) _i ，w _j ) And expressing the association degree of the binary phrases.

5. The method according to claim 1, wherein the intersection of the high-frequency collinear words and the high-frequency vocabulary is obtained to obtain a ternary vocabulary combination by:

where n denotes the number of high-frequency words, m denotes the number of high-frequency collinear words, set (i, j) denotes a sentence subset of a binary phrase, set (k) denotes a high-frequency word, score (w) _i ，w _j ，w _k ) Representing the degree of association between the triads.

6. The method according to claim 1, wherein the obtaining of the hot topic corresponding to the ternary vocabulary combination specifically comprises:

obtaining sentences corresponding to the ternary vocabulary combination;

and sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are positioned, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.

7. The method of claim 1 or 6, wherein the obtaining of the hot topic corresponding to the ternary vocabulary combination further comprises:

8. A live hotspot acquisition device is characterized by comprising:

9. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the live hotspot acquisition method of any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the live hotspot acquisition method of any one of claims 1 to 7.