CN109918656B - Live broadcast hotspot acquisition method and device, server and storage medium - Google Patents

Live broadcast hotspot acquisition method and device, server and storage medium Download PDF

Info

Publication number
CN109918656B
CN109918656B CN201910148553.4A CN201910148553A CN109918656B CN 109918656 B CN109918656 B CN 109918656B CN 201910148553 A CN201910148553 A CN 201910148553A CN 109918656 B CN109918656 B CN 109918656B
Authority
CN
China
Prior art keywords
words
frequency
sentences
vocabulary
ternary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910148553.4A
Other languages
Chinese (zh)
Other versions
CN109918656A (en
Inventor
肖源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201910148553.4A priority Critical patent/CN109918656B/en
Publication of CN109918656A publication Critical patent/CN109918656A/en
Application granted granted Critical
Publication of CN109918656B publication Critical patent/CN109918656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses a live broadcast hotspot obtaining method, a live broadcast hotspot obtaining device, a server and a storage medium, and belongs to the field of network live broadcast. The method comprises the following steps: after collecting the live topic data, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words; selecting high-frequency words, constructing a set of sentences in which the high-frequency words are positioned, and performing AND operation on the sets of the sentences in which all the high-frequency words are positioned to obtain collinear words among the sets of the sentences; selecting high-frequency collinear words, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination; and acquiring hot topics corresponding to the ternary vocabulary combination. By the technical scheme, the hot topics can be accurately described by combining the ternary vocabulary combination, the hot extraction process is simplified, and the efficiency is improved.

Description

Live broadcast hotspot acquisition method and device, server and storage medium
Technical Field
The invention relates to the field of live webcasting, in particular to a live hotspot acquisition method, a live hotspot acquisition device, a server and a storage medium.
Background
The hot topic is easy to attract wide attention to users with increasingly developed networks, and has important significance in the aspects of accurately acquiring the hot topic in real time for each website and APP, increasing the online time of the users, improving the user traffic and the like. Especially for the live broadcast platform gathering various anchor broadcasters and users, the live broadcast platform can be widely discussed through the barrage or the community, hotspots can be found in time, and user experience can be improved.
At present, the common hotspot discovery technology in the market is to perform feature extraction after word segmentation based on text data, calculate similarity by means of cluster analysis, an LDA model and the like, and obtain a hotspot topic according to hotspot keyword frequency after similarity calculation. In the method, hot spots are obtained through vocabulary or sentence similarity calculation, the hot spot obtaining accuracy is not high, and ambiguous expressions are easy to generate.
Disclosure of Invention
In view of this, embodiments of the present invention provide a live broadcast hotspot acquiring method, apparatus, server and storage medium, so as to improve hotspot acquiring efficiency and ensure acquiring accuracy.
In combination with the first aspect of the embodiments of the present invention, a live broadcast hotspot obtaining method is provided, including:
after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;
selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
selecting high-frequency collinear words among sentence sets, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;
and acquiring hot topics corresponding to the ternary vocabulary combination.
In combination with the second aspect of the embodiment of the present invention, there is provided a live broadcast hotspot obtaining device, including:
the word segmentation module is used for carrying out word segmentation on sentences in the live topic data after the live topic data in a preset time length is collected, and counting the occurrence frequency of all words;
the first operation module is used for selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
the second operation module is used for selecting high-frequency collinear words among sentence sets, solving the intersection of the high-frequency collinear words and the high-frequency words and obtaining a ternary word combination;
and the acquisition module is used for acquiring the hot topics corresponding to the ternary vocabulary combination.
In combination with the third aspect of the embodiments of the present invention, there is provided a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the embodiments of the present invention.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.
In a fifth aspect of the embodiments of the present invention, a computer program product is provided, where the computer program product includes a computer program, and the computer program is used for implementing the steps of the method provided in the first aspect of the embodiments of the present invention when being executed by one or more processors.
In the embodiment of the invention, the frequency is counted after the topic sentence is subjected to word segmentation processing by extracting the live topic data in the recent period of time. And operating the high-frequency words and the sentence set to obtain two-dimensional collinear words, then obtaining intersection of the collinear words and the high-frequency words to obtain a ternary word combination, and accurately expressing the hot topic by the high-frequency ternary word combination.
Drawings
Fig. 1 is a flowchart of a live broadcast hotspot obtaining method according to an embodiment of the present invention;
fig. 2 is another schematic flow chart of a flowchart of a live broadcast hotspot obtaining method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a live broadcast hotspot obtaining device provided by a sixth embodiment of the present invention
Fig. 4 is a schematic structural diagram of a server according to an eighth embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a live broadcast hotspot acquiring method and device, a server and a storage medium, which are used for accurately and efficiently acquiring a live broadcast hotspot and are convenient to push the hotspot.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example one
Referring to fig. 1, a flow diagram of a live broadcast hotspot obtaining method provided by the embodiment of the present invention includes:
s101, after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;
the predetermined time period is a range of time periods, and may be a time period of several hours, days or months, and generally, a time period within the current last day or week may be selected. The live topic data is topic data related to live content or anchor, such as common barrage, and also can include content data in communities and forums related to live broadcast. Preferably, the live topic data is a complete topic statement extracted from a barrage or a community, and generally includes a structure meeting the requirement of Chinese semantic expression, such as a principal and a predicate object.
Live topic data include the live room barrage data of the certain length of time collection of interval at least, and the certain length of time collection barrage of interval can prevent the short-term user action of swiping the screen.
The word segmentation process is a process of dividing a sentence into separate words, namely recombining continuous word sequences into word sequences according to a certain specification. The word segmentation method comprises character string matching, understanding-based word segmentation and the like, and word composition in the sentence can be obtained through word segmentation.
Optionally, recording each sentence collected in the live topic data, and removing stop words of each sentence to obtain a vocabulary in each sentence; and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary from high to low.
Preferably, the collected live topic data is preprocessed, sensitive sentences and phrases are removed, the sensitive words are speeches which do not meet the requirements of related regulations, and the phrases are sentences with one or two characters or single-character repeated sentences. The subsequent processing efficiency of the statement can be effectively improved through preprocessing.
S102, selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
the high-frequency vocabulary is vocabularies with higher occurrence frequency, specifically, the first vocabularies with higher occurrence frequency are selected after the vocabulary is sorted by counting the occurrence frequency of each vocabulary, and for example, the first 100 vocabularies are high-frequency vocabularies.
And searching sentences of the high-frequency words, and putting the sentences of each high-frequency word into a set corresponding to the word, wherein if the sentences of the word "trade war" are put into the set corresponding to the word "trade war", all the sentences are the sentences in the collected live topic data.
The and operation means that two sentences are compared pairwise, two words in the same sentence are found out to be collinear words, such as the sentence set and the sentence set of the high-frequency word "Zhongmei", and sentences of the "Zhongmei" and the "trade war" which appear in the two sets simultaneously are found out through the and operation, and the number of the sentences is counted, wherein if the number of the sentences is not less than 1, the "Zhongmei" and the "trade war" can be regarded as collinear words.
Optionally, collinear words among the sentence sets are obtained according to formula (1):
Figure BDA0001980816300000051
where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&denote and operation (collinear), score (w) i ,w j ) And expressing the association degree of the binary phrases.
S103, selecting high-frequency collinear words among sentence sets, solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;
the high-frequency collinear words are obtained according to the number of the collinear words among all sentence sets, a plurality of collinear words with high occurrence frequency are selected, specifically, the collinear words with a certain number which is ranked in the front or has high association degree are selected after the high-frequency collinear words are sequenced by counting the number of the two high-frequency words appearing in the same sentence or calculating the association degree, and the collinear words are hot topic keywords.
The intersection refers to the intersection of the high-frequency collinear words and the high-frequency words, and according to sentences appearing in the high-frequency words and the high-frequency collinear words, the intersection is solved by the high-frequency collinear words and the high-frequency collinear words, so that a ternary word combination can be obtained, and hot topics can be accurately described through the ternary word combination.
Optionally, the intersection of the high-frequency collinear words and the high-frequency vocabulary is solved according to formula (2):
Figure BDA0001980816300000061
where n denotes the number of high-frequency words, m denotes the number of high-frequency collinear words, set (i, j) denotes a sentence subset of a binary phrase, set (k) denotes a high-frequency word, score (w) i ,w j ,w k ) Representing the degree of association between the triads.
And S104, acquiring hot topics corresponding to the ternary vocabulary combination.
The ternary vocabulary combination can accurately express hot topic contents, corresponding hot topics can be obtained by selecting a high-frequency ternary vocabulary combination, and specifically, sentences corresponding to the ternary vocabulary combination are obtained; and sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are located, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration. The obtained ternary vocabulary combination can be directly displayed after being sequentially recombined or can be displayed after being put into a corresponding sentence to be integrated, and the main contents of the hot spots can be accurately described due to the ternary vocabulary combination, and can be pushed to a user after being selected and sorted.
Preferably, the vocabulary in the ternary vocabulary combination is used as a keyword, the associated live broadcast room is searched through keyword matching, and the associated live broadcast room is selected for recommendation. Specifically, live broadcast content of each current live broadcast room is collected, the current live broadcast content can be obtained through a barrage or a live broadcast room title and the like, keywords in the corresponding live broadcast room and the ternary vocabulary combination are correlated, when a user searches the keywords through a search engine, the keywords in the ternary vocabulary combination can be preferentially pushed to the related live broadcast rooms, and similarly, the related live broadcast rooms can also be recommended after the keywords are matched with the keywords in the ternary vocabulary combination according to the watching history or the speaking record of the user. By means of keywords in the ternary vocabulary combination, content recommendation can be facilitated, a user is guided to conduct hot topic discussion, the online time and the hot degree of a live broadcast room of the user can be improved, and user experience can be guaranteed.
In the technical scheme of the embodiment, the high-frequency vocabulary is obtained by collecting the live topic data, preprocessing the live topic data, performing word segmentation processing, and then obtaining the ternary vocabulary combination based on the quantity statistics by obtaining the intersection of the vocabulary and the vocabulary combination.
Example two
Fig. 2 is another schematic flow diagram of the live broadcast hotspot acquisition method provided in the second embodiment of the present invention, and details of the process of obtaining the ternary vocabulary combination are described on the basis of the first embodiment, including the following steps:
s201, acquiring high-frequency vocabularies;
after live topic data are collected, user barrage or community speech content is obtained, the topic data consist of one sentence, and are recorded as sn = (s 1, s 2.. Sn), the sentences are subjected to word segmentation processing, a plurality of words are obtained, and the occurrence times of the words are obtained.
Selecting words with high frequency of occurrence, illustratively, after the occurrence frequency of each word is sorted from high to low, selecting 100 words at the top of the rank as high-frequency words, recording the words as wn = (w 1, w2, w3,... W100), generating a set for each word in wn, and adding the corresponding sentence into the set (k) as long as wk occurs in sn, thereby obtaining 100 sentence sets corresponding to the high-frequency words.
S202, obtaining collinear words;
for each high-frequency word wn, a corresponding sentence set is provided, and operation is carried out between any two sentence sets, namely the quantity of the high-frequency words appearing in the two sentence sets is searched, such as between the sentence set (1) corresponding to the high-frequency word w1 and the sentence set (2) corresponding to the high-frequency word w2, the number of sentences in which w1 and w2 appear in both sentence sets, namely the number of sentences in which w1 and w2 appear in the sentence set (1) or the sentence set (2) at the same time, if the number is not less than 1, w1 and w2 can be represented as collinear words, and the relevance between two high-frequency words can be represented by counting the number of sentences in which the collinear words appear.
Specifically, collinear words among sentence sets are obtained according to the following formula (1):
Figure BDA0001980816300000081
set (i) denotes a set of sentences of word i, set (j) denotes a set of sentences of word j,&representation and operation (collinear), score (w) i ,w j ) And expressing the association degree of the binary phrases.
score(w i ,w j ) The high-frequency vocabulary relevancy can be described, and vocabulary associated vocabulary can be determined by counting the number of overlapped sentences in the sentence sets.
And S203, solving the intersection of the high-frequency vocabulary and the high-frequency collinear words to obtain a ternary vocabulary combination.
The relevance is calculated through a formula (1) or the number of the collinear words is directly counted, so that the high-frequency collinear words can be obtained, specifically, sequencing is carried out according to the relevance or the number of the collinear words, a certain number of collinear words are selected from high to low to serve as the high-frequency collinear words, and if 500 word pairs are selected to serve as the high-frequency collinear words.
Further, the intersection of the high-frequency collinear words and the high-frequency vocabulary is solved according to formula (2):
Figure BDA0001980816300000082
wherein, set (i, j) represents the sentence set of binary phrase, set (k) represents high frequency vocabulary, score (w) i ,w j ,w k ) Representing the degree of association between the three phrases.
And set (i, j) is a sentence set corresponding to the high-frequency collinear words, the high-frequency words and the sentence sets corresponding to the high-frequency collinear words are matched, and intersection is obtained to obtain a ternary word combination containing the high-frequency collinear words and the high-frequency words.
The ternary vocabulary combination can specifically and accurately describe the hot event, and the description of the corresponding hot event can be obtained by disorder duplication removal and sequencing and then sorting the sentences corresponding to the ternary vocabulary combination.
Preferably, the words in the ternary word combination are used as key words to perform user search and big data matching, and the words are used for recommending a live broadcast room according to the user interests. And if the keywords are extracted according to the watching history or the speech record of the user, matching the keywords with the ternary vocabulary combination, and pushing the associated live broadcast room.
In the embodiment of the invention, the ternary vocabulary is directly calculated and extracted based on the statistical characteristics of the vocabulary, the hot topics can be accurately described, the calculation process is simple, and compared with the traditional word similarity clustering analysis, the hot extraction process is greatly simplified.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a live broadcast hotspot obtaining device provided in an embodiment of the present invention, where the device includes:
the word segmentation module 310 is configured to, after collecting live topic data within a predetermined time, perform word segmentation processing on sentences in the live topic data, and count occurrence frequencies of all vocabularies;
optionally, the live topic data at least includes live broadcast room barrage data acquired at a certain interval and a certain duration.
Optionally, the word segmentation module 310 specifically includes:
the word segmentation unit is used for recording each sentence collected in the live topic data, removing stop words of each sentence and obtaining words in each sentence;
and the statistical unit is used for counting the occurrence times of all the vocabularies and arranging the vocabularies according to the occurrence times of each vocabulary from high to low.
The first operation module 320 is configured to select a preset number of high-frequency words, construct a set of sentences in which the high-frequency words are located, and perform an and operation between the sets of sentences in which all the high-frequency words are located to obtain collinear words between the sets of sentences;
the high-frequency vocabulary is vocabularies with higher occurrence frequency, specifically, the first vocabularies with higher occurrence frequency are selected after the vocabulary is sorted by counting the occurrence frequency of each vocabulary, and for example, the first 100 vocabularies are high-frequency vocabularies. And searching sentences of the high-frequency words, and putting the sentences of each high-frequency word into a set corresponding to the word, wherein if the sentences of the word "trade war" are put into the set corresponding to the word "trade war", all the sentences are the sentences in the collected live topic data.
Optionally, the obtaining of collinear words among sentence sets by performing and operation among the sets of sentences in which all high-frequency words are located specifically includes:
finding out collinear words among sentence sets according to formula (1):
Figure BDA0001980816300000101
where n denotes the number of high frequency words, set (i) denotes the set of sentences of word i, set (j) denotes the set of sentences of word j,&representation and operation (collinear), score(w i ,w j ) And expressing the association degree of the binary phrases.
The second operation module 330 is configured to select high-frequency collinear words among sentence sets, and obtain an intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;
the high-frequency collinear words are obtained by obtaining the number of the collinear words according to all sentence sets, a plurality of collinear words with high occurrence frequency are selected, specifically, the collinear words with a certain number which are ranked in the front or have high association degree are selected after the high-frequency collinear words are sequenced by counting the number of the two high-frequency words appearing in the same sentence or calculating the association degree, and the collinear words are hot topic keywords.
Optionally, the obtaining of the intersection of the high-frequency collinear words and the high-frequency words and obtaining the ternary word combination specifically includes:
solving the intersection of the high-frequency collinear words and the high-frequency vocabulary according to formula (2):
Figure BDA0001980816300000111
wherein n represents the number of high-frequency words, m represents the number of high-frequency collinear words, set (i, j) represents a sentence set of a binary phrase, set (k) represents a high-frequency word, score (w) i ,w j ,w k ) Representing the degree of association between the three phrases.
The obtaining module 340 is configured to obtain a hot topic corresponding to the ternary vocabulary combination.
Optionally, the obtaining module 340 includes:
the obtaining unit is used for obtaining sentences corresponding to the ternary vocabulary combination;
and the integration unit is used for sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are located, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.
Optionally, the obtaining module 340 further includes:
and the recommendation module is used for taking the vocabulary in the ternary vocabulary combination as a keyword, searching the associated live broadcast room through keyword matching, and selecting the associated live broadcast room for recommendation.
In the device, the ternary vocabulary combination in the live topic data is extracted through the first operation module and the second operation module, so that the hot topic is accurately extracted and expressed, and the device is simple and easy to implement and high in efficiency.
Example four
Fig. 4 is a schematic structural diagram of a live hotspot acquisition server according to an embodiment of the present invention. The server, which is a device providing computing services, generally refers to a computer with high computing power, and is provided to a plurality of users via a network. As shown in fig. 4, the server 4 of this embodiment includes: a memory 410, a processor 420, and a system bus 430, the memory 410 including an executable program 4101 stored thereon, it being understood by those skilled in the art that the server architecture shown in FIG. 4 does not constitute a limitation of a server and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The following describes each component of the server in detail with reference to fig. 4:
the memory 410 may be used to store software programs and modules, and the processor 420 executes various functional applications of the server and data processing by operating the software programs and modules stored in the memory 410. The memory 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the server, and the like. Further, the memory 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
On the memory 410 is embodied an executable program 4101 of a network request method, the executable program 4101 may be divided into one or more modules/units, which are stored in the memory 410 and executed by the processor 420 to obtain a hot topic, and the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program 4101 in the server 5. For example, the computer program 4101 may be divided into a word segmentation module, a first operation module, a second operation module and an acquisition module.
The processor 420 is a control center of the server, connects various parts of the entire server apparatus using various interfaces and lines, performs various functions of the server and processes data by operating or executing software programs and/or modules stored in the memory 410 and calling data stored in the memory 410, thereby performing overall monitoring of the server. Alternatively, processor 420 may include one or more processing units; preferably, the processor 420 may integrate an application processor, which mainly handles operating systems, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 420.
The system bus 430 is used to connect functional units inside the computer, and can transmit data information, address information, and control information, and may be, for example, a PCI bus, an ISA bus, a VESA bus, etc. The instructions of the processor 420 are transmitted to the memory 410 through the bus, the memory 410 feeds data back to the processor 420, and the system bus 430 is responsible for data and instruction interaction between the processor 420 and the memory 410. Of course, other devices, such as network interfaces, display devices, etc., may also be accessed by the system bus 530.
In this embodiment of the present invention, the executable program executed by the processor 420 included in the server is specifically:
a live hotspot acquisition method comprises the following steps:
after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;
selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
selecting high-frequency collinear words among sentence sets, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;
and acquiring hot topics corresponding to the ternary vocabulary combination.
Further, the live topic data at least comprises live room barrage data acquired at certain time intervals.
Further, the performing word segmentation processing on the sentences in the live topic data, and counting the occurrence frequency of all words specifically include:
recording each sentence collected in the live topic data, and removing stop words of each sentence to obtain words in each sentence;
and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary.
Further, the obtaining of collinear words among sentence sets by performing and operation among the sets of sentences in which all high-frequency words are located specifically includes:
collinear words among the sentence sets are solved according to formula (1):
Figure BDA0001980816300000141
where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&representation and operation (collinear), score (w) i ,w j ) And representing the association degree of the binary phrase.
Further, the solving of the intersection of the high-frequency collinear words and the high-frequency words and the obtaining of the ternary word combination specifically comprises:
solving the intersection of the high-frequency collinear words and the high-frequency vocabulary according to a formula (2):
Figure BDA0001980816300000142
wherein n represents the number of high-frequency words, m represents the number of high-frequency collinear words, set (i, j) represents a sentence set of a binary phrase, set (k) represents a high-frequency word, score (w) i ,w j ,w k ) Representing the degree of association between the three phrases.
Further, the obtaining of the hot topics corresponding to the ternary vocabulary combination specifically includes:
obtaining sentences corresponding to the ternary vocabulary combination;
and sequencing and de-duplicating the ternary vocabulary combinations according to the number of the sentences in the ternary vocabulary combinations, selecting a certain number of ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.
Further, the obtaining of the hot topic corresponding to the ternary vocabulary combination further includes:
and taking the vocabulary in the ternary vocabulary combination as a keyword, searching a related live broadcast room through keyword matching, and selecting the related live broadcast room for recommendation.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A live hotspot obtaining method is characterized by comprising the following steps:
after collecting live topic data within a preset time, performing word segmentation processing on sentences in the live topic data, and counting the occurrence frequency of all words;
selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
selecting high-frequency collinear words among sentence sets, and solving the intersection of the high-frequency collinear words and the high-frequency words to obtain a ternary word combination;
and acquiring hot topics corresponding to the ternary vocabulary combination.
2. The method of claim 1, wherein the live topic data comprises at least live room barrage data collected at intervals.
3. The method according to claim 1, wherein the segmenting of the sentences in the live topic data and the statistics of the occurrence frequency of all vocabularies are specifically as follows:
recording each sentence collected in the live topic data, and removing stop words of each sentence to obtain words in each sentence;
and counting the occurrence times of all the vocabularies, and arranging according to the occurrence times of each vocabulary from high to low.
4. The method according to claim 1, wherein the obtaining of the collinear words among the sentence sets by performing an and operation among the sentence sets in which all the high-frequency vocabularies are located is specifically:
finding out collinear words among sentence sets according to formula (1):
Figure FDA0003931920670000011
where n denotes the number of high frequency words, set (i) denotes the set of sentences for word i, set (j) denotes the set of sentences for word j,&denote and operate and co-linear, score (w) i ,w j ) And expressing the association degree of the binary phrases.
5. The method according to claim 1, wherein the intersection of the high-frequency collinear words and the high-frequency vocabulary is obtained to obtain a ternary vocabulary combination by:
solving the intersection of the high-frequency collinear words and the high-frequency vocabulary according to a formula (2):
Figure FDA0003931920670000021
where n denotes the number of high-frequency words, m denotes the number of high-frequency collinear words, set (i, j) denotes a sentence subset of a binary phrase, set (k) denotes a high-frequency word, score (w) i ,w j ,w k ) Representing the degree of association between the triads.
6. The method according to claim 1, wherein the obtaining of the hot topic corresponding to the ternary vocabulary combination specifically comprises:
obtaining sentences corresponding to the ternary vocabulary combination;
and sequencing the ternary vocabulary combinations from high to low according to the number of sentences in which the ternary vocabulary combinations are positioned, removing repeated vocabularies or vocabulary combinations, selecting a certain number of high-ranking ternary vocabulary combinations, and displaying the ternary vocabulary combinations as hot topics after integration.
7. The method of claim 1 or 6, wherein the obtaining of the hot topic corresponding to the ternary vocabulary combination further comprises:
and taking the vocabulary in the ternary vocabulary combination as a keyword, searching a related live broadcast room through keyword matching, and selecting the related live broadcast room for recommendation.
8. A live hotspot acquisition device is characterized by comprising:
the word segmentation module is used for carrying out word segmentation on sentences in the live topic data after the live topic data in a preset time length is collected, and counting the occurrence frequency of all words;
the first operation module is used for selecting a preset number of high-frequency words, constructing a set of sentences in which the high-frequency words are located, and performing AND operation on the sets of the sentences in which all the high-frequency words are located to obtain collinear words among the sets of the sentences;
the second operation module is used for selecting high-frequency collinear words among sentence sets, solving the intersection of the high-frequency collinear words and the high-frequency words and obtaining a ternary word combination;
and the acquisition module is used for acquiring the hot topics corresponding to the ternary vocabulary combination.
9. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the live hotspot acquisition method of any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the live hotspot acquisition method of any one of claims 1 to 7.
CN201910148553.4A 2019-02-28 2019-02-28 Live broadcast hotspot acquisition method and device, server and storage medium Active CN109918656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910148553.4A CN109918656B (en) 2019-02-28 2019-02-28 Live broadcast hotspot acquisition method and device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910148553.4A CN109918656B (en) 2019-02-28 2019-02-28 Live broadcast hotspot acquisition method and device, server and storage medium

Publications (2)

Publication Number Publication Date
CN109918656A CN109918656A (en) 2019-06-21
CN109918656B true CN109918656B (en) 2022-12-23

Family

ID=66962688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910148553.4A Active CN109918656B (en) 2019-02-28 2019-02-28 Live broadcast hotspot acquisition method and device, server and storage medium

Country Status (1)

Country Link
CN (1) CN109918656B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011182B (en) * 2019-12-19 2023-10-03 北京多点在线科技有限公司 Method, device and storage medium for labeling target object
CN114615510B (en) * 2020-12-08 2024-04-02 抖音视界有限公司 Live broadcast interface display method and equipment
CN113011178B (en) * 2021-03-29 2023-05-16 广州博冠信息科技有限公司 Text generation method, text generation device, electronic device and storage medium
CN113139377A (en) * 2021-04-26 2021-07-20 北京沃东天骏信息技术有限公司 Method, device, equipment and computer readable medium for pushing information
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment
CN114598899B (en) * 2022-03-15 2023-06-16 中科大数据研究院 Barrage broadcasting analysis method based on crawlers

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008106A (en) * 2013-02-25 2014-08-27 腾讯科技(深圳)有限公司 Method and apparatus for obtaining hot topic
CN104077274A (en) * 2014-06-13 2014-10-01 清华大学 Method and device for extracting hot word phrases from document set
WO2015027909A1 (en) * 2013-08-29 2015-03-05 Tencent Technology (Shenzhen) Company Limited Method and apparatus for obtaining hot-topic information
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008106A (en) * 2013-02-25 2014-08-27 腾讯科技(深圳)有限公司 Method and apparatus for obtaining hot topic
WO2015027909A1 (en) * 2013-08-29 2015-03-05 Tencent Technology (Shenzhen) Company Limited Method and apparatus for obtaining hot-topic information
CN104077274A (en) * 2014-06-13 2014-10-01 清华大学 Method and device for extracting hot word phrases from document set
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining

Also Published As

Publication number Publication date
CN109918656A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109918656B (en) Live broadcast hotspot acquisition method and device, server and storage medium
US11775760B2 (en) Man-machine conversation method, electronic device, and computer-readable medium
CN106649818B (en) Application search intention identification method and device, application search method and server
US7860878B2 (en) Prioritizing media assets for publication
EP2159715B1 (en) System and method for providing a topic-directed search
US9008489B2 (en) Keyword-tagging of scenes of interest within video content
CN109190017B (en) Method and device for determining hotspot information, server and storage medium
WO2016112679A1 (en) Method, system and storage medium for realizing intelligent answering of questions
US20150269163A1 (en) Providing search recommendation
Shi et al. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news
US8255414B2 (en) Search assist powered by session analysis
US8949227B2 (en) System and method for matching entities and synonym group organizer used therein
US20110153595A1 (en) System And Method For Identifying Topics For Short Text Communications
CN106682170B (en) Application search method and device
CN107544988B (en) Method and device for acquiring public opinion data
CN110162768B (en) Method and device for acquiring entity relationship, computer readable medium and electronic equipment
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
CN110889024A (en) Method and device for calculating information-related stock
CN105512300B (en) information filtering method and system
CN111753526A (en) Similar competitive product data analysis method and system
CN111401039A (en) Word retrieval method, device, equipment and storage medium based on binary mutual information
US10078686B2 (en) Combination filter for search query suggestions
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN108509449B (en) Information processing method and server
CN111104583A (en) Live broadcast room recommendation method, storage medium, electronic device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant