CN117076785A - Hot topic determination method, device, electronic equipment and storage medium - Google Patents

Hot topic determination method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117076785A
CN117076785A CN202310170224.6A CN202310170224A CN117076785A CN 117076785 A CN117076785 A CN 117076785A CN 202310170224 A CN202310170224 A CN 202310170224A CN 117076785 A CN117076785 A CN 117076785A
Authority
CN
China
Prior art keywords
text
topic
texts
posting
comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310170224.6A
Other languages
Chinese (zh)
Inventor
黄海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Rockwell Technology Co Ltd
Original Assignee
Beijing Rockwell Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Rockwell Technology Co Ltd filed Critical Beijing Rockwell Technology Co Ltd
Priority to CN202310170224.6A priority Critical patent/CN117076785A/en
Publication of CN117076785A publication Critical patent/CN117076785A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The application relates to a hot topic determination method, a device, electronic equipment and a storage medium, which relate to the technical field of data processing and comprise the following steps: determining text topics contained in a target text, wherein the target text comprises posting texts and comment texts, and counting the number of the posting texts and the number of the comment texts aiming at each text topic in the target text; determining a first influence weight determined by the posting text for the text topic and a second influence weight determined by the comment text for the text topic, wherein the first influence weight is greater than the second influence weight, and the sum of the first influence weight and the second influence weight is 1; substituting the posting text quantity and comment text quantity, the first influence weight and the second influence weight of each text topic into a preset topic sound volume calculation equation to calculate the topic sound volume of each text topic; and determining the text topics with the corresponding topic sound volumes within the preset sound volume range as hot topics. According to the scheme, the influence of water army comments can be reduced, and the extraction accuracy of topics is effectively improved.

Description

Hot topic determination method, device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of data processing, in particular to a hot topic determination method, a hot topic determination device, electronic equipment and a storage medium.
Background
With the development of technology, more and more users realize the propagation and sharing of messages through instant messaging clients. Because users can comment and forward forum posts propagated in the instant messaging client when using the instant messaging client, the information content of the propagated messages is large, and how to extract valuable topic information from the information becomes a focus of attention in recent years.
At present, when hot topics are extracted, the traditional scheme is to directly perform unified analysis on forum posts and comments, text importance is not distinguished, and competitors are likely to hire water armies and irrigate water, so that the extracted final topics are deviated.
Disclosure of Invention
In view of the above, the application provides a hot topic determination method, a hot topic determination device, electronic equipment and a storage medium, which can reduce the influence of water army comments and effectively improve the topic accuracy.
According to a first aspect of the present disclosure, there is provided a hot topic determination method, including:
Determining a text topic contained in a target text, wherein the target text comprises a plurality of posting texts and a plurality of comment texts;
counting the number of posting texts and the number of comment texts aiming at each text topic in the target text;
determining a first influence weight determined by the posting text for a text topic and a second influence weight determined by the evaluation text for the text topic, wherein the first influence weight is greater than the second influence weight, and the sum of the first influence weight and the second influence weight is 1;
substituting the posting text quantity and comment text quantity of each text topic and the first influence weight and the second influence weight into a preset topic sound quantity calculation equation to calculate the topic sound quantity of each text topic, wherein the preset topic sound quantity calculation equation is used for calculating accumulated values of posting text quantization indexes and comment text quantization indexes, the accumulated values are determined to be topic sound quantities, the topic sound quantities are used for representing topic discussion heat, the posting text quantization indexes are products of the posting text quantity and the first influence weight, and the comment text quantization indexes are products of the comment text quantity and the second influence weight;
And determining a text topic corresponding to the topic sound volume within a preset topic sound volume range as a hot topic.
According to a second aspect of the present disclosure, there is provided a hot topic determination apparatus including:
the first determining module is used for determining text topics contained in target texts, wherein the target texts comprise a plurality of posting texts and a plurality of comment texts;
the statistics module is used for counting the number of posting texts and the number of comment texts aiming at each text topic in the target text;
a second determining module, configured to determine a first impact weight determined by the posting text for a text topic and a second impact weight determined by the evaluation text for the text topic, where the first impact weight is greater than the second impact weight, and a sum of the first impact weight and the second impact weight is 1;
the computing module is used for substituting the posting text quantity and comment text quantity of each text topic, the first influence weight and the second influence weight into a preset topic sound quantity computing equation to compute the topic sound quantity of each text topic, wherein the preset topic sound quantity computing equation is used for computing accumulated values of posting text quantization indexes and comment text quantization indexes, the accumulated values are determined to be topic sound quantities, the topic sound quantities are used for representing topic discussion heat, the posting text quantization indexes are products of the posting text quantity and the first influence weight, and the comment paper quantization indexes are products of the comment paper quantity and the second influence weight;
And the third determining module is used for determining a text topic corresponding to the topic sound volume in the preset topic sound volume range as a hot topic.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect described above.
According to a fourth aspect of the present disclosure there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the preceding first aspect.
According to a fifth aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as in the first aspect described above.
The hot topic determination method, the hot topic determination device, the electronic equipment and the storage medium provided by the disclosure are used for determining text topics contained in target texts, wherein the target texts comprise a plurality of posting texts and a plurality of comment texts; counting the number of posting texts and comment texts aiming at each text topic in a target text; determining a first influence weight determined by the posting text for the text topic and a second influence weight determined by the comment text for the text topic, and enabling the first influence weight to be greater than the second influence weight; substituting the posting text quantity, comment text quantity, first influence weight and second influence weight of each text topic into a preset topic sound volume calculation equation, calculating accumulated values of posting text quantization indexes and comment text quantization indexes by using the preset topic sound volume calculation equation, and determining the accumulated values as topic sound volumes of the corresponding text topics; and finally, determining the text topic corresponding to the topic sound volume within the preset topic sound volume range as a hot topic. According to the technical scheme, the posting text and the comment text are distinguished, the influence weight value higher than that of the comment text is configured for the posting text, and the hot topics are comprehensively determined from two dimensions by combining the number of the posting texts and the number of the comment texts, so that the importance of the texts can be effectively distinguished, the influence of water army comments is reduced, and the positioning accuracy of the hot topics is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It should be understood that the drawings are for better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flow chart of a hot topic determination method provided in an embodiment of the present disclosure;
fig. 2 is a flowchart of a hot topic determination method according to another embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a hot topic determination device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a hot topic determination device according to another embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The hot topic determination method, the device, the electronic equipment and the storage medium of the embodiments of the present disclosure are described below with reference to the accompanying drawings.
In the related art, when hot topics are extracted, the traditional scheme is to directly perform unified analysis on forum posts and comments, text importance is not distinguished, and a competitor is likely to employ water armies and irrigate the water armies, so that the extracted final topics are deviated.
In order to solve the technical problems, the disclosure provides a hot topic determination method, a hot topic determination device, electronic equipment and a storage medium. As shown in fig. 1, an embodiment of the present disclosure provides a hot topic determination method, including:
step 101, determining a text topic contained in a target text, wherein the target text comprises a plurality of posting texts and a plurality of comment texts.
In the following embodiments of the present disclosure, the technical solution of the present application is described by taking the target text as an example of the automobile text, but the technical solution of the present application is not limited specifically. For the embodiment of the disclosure, the bidding brand model and the car discussion data of the car text can be determined in advance through the service range, the car discussion data can comprise forum post data and comment data, and the data is collected into a database through an information collection system. For the embodiment of the disclosure, as a possible implementation manner, a plurality of posting texts and a plurality of comment texts related to the service may be extracted in a database, and a text topic contained in the target text is determined according to the text contents of the extracted plurality of posting texts and the plurality of comment texts.
Step 102, counting the number of posting texts and the number of comment texts in the target text aiming at each text topic, determining a first influence weight determined by the posting texts on the text topic and a second influence weight determined by the comment texts on the text topic, wherein the first influence weight is greater than the second influence weight, and the sum of the first influence weight and the second influence weight is 1.
Wherein the first influence weight is a first influence factor, and the second influence weight is a second influence factor. The influence factor is a relative statistic, and in general, the higher the influence factor is, the greater the influence. For the embodiment of the disclosure, the target text has a plurality of text topics, each text topic has a plurality of posting texts and a plurality of comment texts, the number of posting texts and the number of comment texts of each text topic are counted, and a first influence weight value of the posting texts and a second influence weight value of the comment texts are determined. In a specific application scene, the water army is basically small in size, and the water army does not have the posting authority on some platforms, so that the water army usually exists in comment texts, the weight ratio of the comment texts determined through calculation is generally lower, and the weight ratio of the posting texts determined through calculation is generally higher, so that the importance of the texts of the posting texts and the comment texts can be effectively distinguished, the importance of the effective texts is higher than that of the ineffective texts, and the influence of the extraction of the dialog questions of the water army comments can be further reduced.
For the disclosed embodiments, sound volume measurement information of posting text and comment text relative to each text topic may be obtained: i.e. the number of posting and comment texts per text topic, as well as a first impact weight of the posting text and a second impact weight of the comment text. And calculating the topic sound volume of each text topic by using the sound volume measurement information, and counting topics with the highest sound volume of the corresponding topic or within a preset sound volume range as hot topics.
And 103, substituting the number of posted texts and the number of comment texts of each text topic, as well as the first influence weight and the second influence weight into a preset topic sound volume calculation equation to calculate the topic sound volume of each text topic.
The preset topic sound volume calculation equation is a logic equation for calculating topic sound volume according to the number of posted texts and the number of comment texts, and the first influence weight and the second influence weight, and can be used for carrying out self-definition of equation logic according to actual application scenes without specific limitation; the topic sound volume is used for representing topic discussion heat, and the higher the topic sound volume corresponding to a text topic, the higher the topic discussion heat or the user interest degree of the text topic, the more likely the topic is a hot topic; conversely, the lower the topic volume corresponding to the text topic is, the cooler the text topic is, and the topic discussion heat is lower.
For the embodiments of the present disclosure, as one possible implementation, the equation logic of the preset topic sound volume calculation equation may be: calculating accumulated values of the posting text quantization indexes and the comment text quantization indexes, determining the accumulated values as topic sound volume, wherein the posting text quantization indexes are products of the number of the posting texts and the first influence weights, and the comment text quantization indexes are products of the number of the comment texts and the second influence weights. For example, when determining the topic volume of the text topic a, a first product of the number of posting texts of the text topic a and the first impact weight (i.e., a posting text quantization index) may be calculated, and a second product of the number of comment texts of the text topic a and the second impact weight (i.e., a comment text quantization index) may be calculated, and finally, a sum of the first product and the second product may be determined as the topic volume of the text topic a. In this way, the topic sound volume of each text topic contained in the target text can be calculated. In a specific application scene, the comment text and the posting text are divided into text importance, so that the topic sound volume corresponding to each text topic is comprehensively calculated by utilizing the sound volume measurement information, the accuracy of topic sound volume calculation can be improved, hot topics are determined according to the topic sound volume of the posting text, and the accuracy of topic extraction can be improved.
And 104, determining the text topics with the corresponding topic sound volume within the preset sound volume range as hot topics.
For the embodiment of the disclosure, after the topic sound volume of each text topic is obtained by calculation, as a possible implementation manner, the text topics may be ranked according to the order of the topic sound volumes from high to low, and a preset number of text topics ranked forward are determined as hot topics, where the preset number is a preset number, and the number may be set according to personal preference. For example, when the preset number is set to 1, the selected hot topic corresponds to a text topic with the highest topic volume; when the set quantity value is larger than 1 (e.g. 3), the selected hot topics correspond to a plurality of hot topics with top topic sound volume ranking of 3. As one possible implementation manner, a preset sound volume range may also be determined, and one or more text topics with corresponding topic sound volumes within the preset sound volume range are determined as hot topics. If the preset sound volume range is determined to be [1,5], the text topics with the corresponding topic sound volumes in the sound volume range [1,5] can be determined to be hot topics.
In summary, according to the hot topic determination method provided by the present disclosure, the number of posting texts and the number of comment texts for each text topic, and a first impact weight determined by the posting texts for the text topic and a second impact weight determined by the comment texts for the text topic are determined by analyzing the posting texts and the comment texts, and the first impact weight is greater than the second impact weight; substituting the posting text quantity, comment text quantity, first influence weight and second influence weight of each text topic into a preset topic sound volume calculation equation, calculating accumulated values of posting text quantization indexes and comment text quantization indexes by using the preset topic sound volume calculation equation, and determining the accumulated values as topic sound volumes of the corresponding text topics; and finally, determining the text topics with the corresponding topic sound volume within the preset sound volume range as hot topics. According to the technical scheme, the posting text and the comment text are distinguished, the influence weight value higher than that of the comment text is configured for the posting text, and the hot topics are comprehensively determined from two dimensions by combining the number of the posting texts and the number of the comment texts, so that the importance of the texts can be effectively distinguished, the influence of water army comments is reduced, and the positioning accuracy of the hot topics is improved.
Further, as a refinement and extension of the foregoing embodiment, in order to fully describe a specific implementation procedure of the method of the present embodiment, the present embodiment provides a specific method as shown in fig. 2, where the method includes:
step 201, segmenting a target text into a plurality of first words, wherein the plurality of first words form a first word sequence, determining a second word matched with a preset stop word part according to a target word part of the first word, and removing the second word from the first word sequence to obtain a second word sequence, wherein the target text comprises a plurality of posting texts and a plurality of comment texts.
For the embodiment of the present disclosure, as a possible implementation manner, the target text may be segmented into a plurality of first words based on a preset word segmentation technique, so as to obtain a first word sequence formed by the plurality of first words. The preset word segmentation technology is a preset technology capable of segmenting a posting text and a comment text uploaded by a user into words, such as word segmentation tools constructed by a conditional random field (Conditional Random Field, CRF). In a specific application scenario, word segmentation can be performed on the target text by using a word segmentation tool to obtain each independent first word and a word segmentation first word sequence, wherein each first word is marked with a corresponding target part of speech. Specifically, after the target text is obtained, word segmentation processing is required to be performed on the target text, a word sequence is generated, each first word is used as an element in the word sequence, the first word sequences are sequentially arranged according to the appearance positions in the input text, and further the first word sequences are obtained, wherein the format corresponding to the first word sequences is as follows: [ word 1, word 2, word 3, … word N ].
Correspondingly, as a possible way, in order to improve the efficiency of topic clustering division, after the first word marked with the target part of speech is obtained by using the method, the first word can be further identified, some disabled parts of speech in the first word sequence are removed, and a second word sequence only containing valid first words is further obtained. Wherein, the preset stop words can be language aid words, adverbs, prepositions, connective words and the like, and the preset stop words usually have no definite meaning per se, it is only put into a complete sentence to have certain effect, such as "common" and "," and "not yet", etc.. Since these words rarely express specific information alone and these functional words have little help in topic distinction, these meaningless words may be filtered out in advance in order to improve the efficiency of topic extraction and save storage space. The recognition and filtering of the stop words can be realized based on the existing stop word vocabulary.
Illustratively, the target text is "do not know what you are speaking", and the sentence can be segmented into a first word sequence by a preset word segmentation technique: "do not, know what you are, say. And then checking whether repeated words exist in the first word sequence, if so, discarding the repeated words, defaulting the repeated words to be one word, and then eliminating the preset disabled part of speech in the first word sequence. The preset stop words can be arranged in a preset stop word dictionary, namely, stop word elimination is carried out on the first words obtained after splitting, namely, stop second words are obtained through matching with word characteristics (namely, preset stop word properties) of the stop words in the preset stop word dictionary, and the stop second words are eliminated from the first word sequence, so that a second word sequence is obtained.
Step 202, carrying out clustering division of word semantics on the second word sequence, and determining text topics corresponding to different clustering division results.
For the embodiment of the disclosure, as a possible implementation manner, a preset clustering algorithm may be used to perform clustering division of word semantics on the second word sequence, and text topics corresponding to different clustering division results are determined. The clustering algorithm classifies the data according to specific rules. For the embodiment of the disclosure, the preset clustering algorithm may perform clustering division of word semantics on the second word sequence using a k-means clustering algorithm (k-means clustering algorithm, k-means). The k-means algorithm is a cluster analysis algorithm for iterative solution, and the specific implementation steps of the algorithm are as follows: the second word sequence comprises a plurality of words, the words in the second word sequence can be divided into K groups (a plurality of words are divided into each group), one word is randomly selected from each group to serve as a center point of the group, then the distance between each word in each group and each center point is calculated, each word is distributed to the center point closest to the center point according to the calculated distance, each group can recalculate the center point due to the change of the words in the group until the words in the group are not changed, and the center point which is not changed is the text topic.
And 203, counting the number of posting texts and comment texts aiming at each text topic in the target text.
For embodiments of the present disclosure, topic keywords for a text topic may be determined based on a keyword extraction algorithm, i.e., keywords in a document are extracted using statistics of terms in the document. As one possible implementation manner, the topic keyword of the text topic may be determined by using a method of counting word frequency, where the word frequency represents the frequency of occurrence of a word in the text, that is, counting the most frequent word occurring in the text topic, and taking the word as the topic keyword of the text topic. As a possible implementation manner, the cosine similarity is used for calculating the semantic feature similarity, namely, feature vectorization is performed on topic keywords and the posting text, the similarity between the topic keywords and the posting text is calculated through a cosine formula, the first semantic feature similarity is determined, and the number of posting texts with the first semantic feature similarity being greater than a preset threshold (for example, the preset threshold is 90%) is counted; and vectorizing the features of the topic keywords and the comment texts, calculating the similarity between the topic keywords and the comment texts through a cosine formula, determining the similarity of the second semantic features, and counting the number of the comment texts with the similarity of the second semantic features being larger than a preset threshold (for example, the preset threshold is 90%). Correspondingly, the specific implementation steps of this embodiment may be: determining topic keywords of each text topic, calculating the similarity of first semantic features of each posting text in the topic keywords and the posting texts, counting the number of posting texts in the posting texts, corresponding to the first semantic feature similarity, greater than a preset threshold, calculating the similarity of second semantic features of each comment text in the topic keywords and the comment texts, and counting the number of comment texts in the comment texts, corresponding to the second semantic feature similarity, greater than the preset threshold.
Step 204, determining a first influence weight determined by the posting text for the text topic and a second influence weight determined by the comment text for the text topic, wherein the first influence weight is greater than the second influence weight, and the sum of the first influence weight and the second influence weight is 1.
For the embodiment of the present disclosure, a history text marked with a target topic in a preset time period may be obtained first, where the preset time period is a time period preset by the system, a user may select a corresponding time period according to a requirement, and an exemplary time period preset by the system may be a time period of 3 days, 5 days, one week, one month, three months, half year, etc., and if the user needs history text data of the last 3 days, the user may select the time period as 3 days in the system setting, so that the history text data in the last 3 days may be seen. The target topics are hot topics corresponding to the historical texts, and the historical texts can comprise historical posting texts and historical comment texts. In a specific application scenario, the weight of the influence weight can be divided according to texts of different data sources (such as social software or websites and the like). For the embodiment of the disclosure, when determining the first influence weight of the posting text and the second influence weight of the comment text, a large amount of historical texts with the same sources as the text acquisition sources of the target text can be utilized to calculate the first influence weight of the posting text relative to the text topics and the second influence weight of the comment text relative to the text topics. In a specific application scenario, since the water army is basically small in size, the water army on some platforms does not have the posting authority, so that the water army usually exists in comment texts, and for target texts containing more water army comment texts, the comment content is often weaker in correlation with target topics, so that by evaluating the importance degree of historical posting texts and historical comment texts on the target topics, posting texts can have larger weights, comment texts can have smaller weights, for example, the first influence weight is 0.8, and the second influence weight is 0.2. By distinguishing the posting text from the comment text, the posting text is configured with an influence weight value higher than that of the comment text, so that the text importance of the posting text and the comment text can be effectively distinguished.
As one possible implementation, the first contribution value of the historical posting text corresponding to the target topic and the second contribution value of the historical comment text corresponding to the target topic may be calculated based on a statistical algorithm (Term Frequency-Inverse Document Frequency, TF-IDF), and the first contribution value may be further determined as the first impact weight and the second contribution value may be determined as the second impact weight. The TF-IDF algorithm is a common weighting technique for information retrieval and data mining, and is used for evaluating importance degrees (i.e., contribution values/weights) of historical posting texts and historical comment texts on target topics, the weights of the target topics calculated by the TF-IDF algorithm are dynamically updated along with the updating of the historical texts, and if no history text is updated later, the weights are not required to be calculated again by the TF-IDF algorithm, and the original weights are directly referenced. The TF-IDF algorithm is actually: TF, i.e., term Frequency (TF), which refers to the number of occurrences of a given word in the document, is a measure of the general importance of the word. The weight calculation process is as follows: first, according to a word frequency calculation formula, calculating a first word frequency of a target keyword in a historical posting text and a second word frequency of the target keyword in a historical comment text.
The word frequency calculation formula is characterized in that:
in the formula, TF i,j For the first word frequency/the second word frequency, ni, j is the number of times the target keyword i appears in the history posting text/the history comment text,is the sum of the number of occurrences of the target keyword i in the history text.
And secondly, calculating a first reverse file frequency of the target keyword in the historical posting text and a second reverse file frequency of the target keyword in the historical comment text according to a reverse file frequency calculation formula.
The reverse file frequency calculation formula is characterized as follows:
in the formula, IDF i For the first reverse document frequency/the second reverse document frequency, |D| is the total number of historical posting text/historical comment text, | { j: t i ∈d j The } |+1 represents the total number of target keywords i contained in the history text.
Finally, determining the product of the first word frequency and the first reverse file frequency as a first contribution value of the historical posting text corresponding to the target topic; and determining the product of the second word frequency and the second reverse file frequency as a second contribution value of the historical posting text corresponding to the target topic.
TF-IDF i,j =TF i,j ×IDF i
Wherein TF-IDF i,j Is the first contribution value (first influence weight)/the second contribution value (second influence weight).
Correspondingly, the specific implementation steps of this embodiment may be: the method comprises the steps of obtaining a historical text marked with a target topic in a preset time period, wherein the historical text and the text of the target text are the same in obtaining source, the historical text comprises a historical posting text and a historical comment text, calculating a first contribution value of the historical posting text corresponding to the target topic and a second contribution value of the historical comment text corresponding to the target topic, determining the first contribution value as a first influence weight, and determining the second contribution value as a second influence weight. By utilizing the historical text marked with the target topics, the first influence weight of the posting text determined on the text topics and the second influence weight of the comment text determined on the text topics are calculated reversely, and for the text containing more water army comment texts, the posting text can be calculated to obtain larger weight, and the comment text can be calculated to obtain smaller weight. By the method, the authenticity of the first influence weight and the second influence weight can be guaranteed, and the influence of the water army comments is effectively reduced.
Step 205, substituting the number of posted texts and the number of comment texts of each text topic, as well as the first influence weight and the second influence weight into a preset topic sound volume calculation equation, calculating the topic sound volume of each text topic, and determining the text topic with the corresponding topic sound volume within the preset sound volume range as a hot topic.
The formula feature description of the preset topic sound volume calculation equation can be as follows:
P=Q i w q +A i w a
wherein P is the topic sound volume of the text topic i, Q i Number of posted texts representing text topic i, w q Representing a first impact weight, A i Number of comment texts indicating text topic i, w a Representing a second impact weight.
After the topic sound volume is obtained through calculation, as a possible implementation manner, the text topics can be ranked according to the order of the topic sound volume from high to low, and the preset number of the text topics with the front ranking is determined as hot topics, wherein the preset number is the preset number, and the number can be set according to personal preference. As one possible implementation manner, a preset sound volume range may also be determined, and one or more text topics with corresponding topic sound volumes within the preset sound volume range are determined as hot topics. If the preset sound volume range is determined to be [1,5], the text topics with the corresponding topic sound volumes in the sound volume range [1,5] can be determined to be hot topics.
As a possible implementation manner, after the hot topics are determined, the hot topics within a preset time period (for example, 3 days) can be counted, wherein the hot topics can be hot-proposal vehicle type discussion points, hot-proposal vehicle types and the like, and the hot topics are displayed according to a preset display form, and the preset display form can be a vocabulary form, a website form, a word cloud graph form and the like. In a specific application scene, as a possible implementation manner, the preset display form is a vocabulary display form, hot topics can be ordered from large to small according to topic sound volume, and the first two with the highest topic sound volume are selected for vocabulary display on a system page. As a possible implementation manner, the preset display form is a website display form, hot topics can be ordered from large to small according to topic sound volume, and websites corresponding to hot topics with highest topic sound volume are displayed on a system page. In a specific application scene, as a possible implementation manner, the preset display form is a word cloud image form, all hot topics can be displayed as data content of the word cloud image, the higher the sound volume of the topics, the larger the space occupation ratio displayed in the word cloud image is, and in addition, each hot topic can be distinguished by different colors.
As a possible implementation manner, hot topics may be pushed in the social group at regular time (for example, 9 am points per day) and whether the hot topics meet preset topic abnormal conditions is judged in real time, if yes, early warning prompt information is output in the group, wherein the preset topic abnormal conditions may be that the sound volume of the current hot topics is obviously N times higher than the sound volume of preset average values (the preset average values are custom average values, may be daily average values, zhou Junzhi, month average values and the like), or sensitive topics are involved. For example, when the sound volume of a hot topic is significantly higher than 5 times of the sound volume of Zhou Junzhi, early warning prompt information (such as information prompt, alarm prompt, popup window prompt and the like) can be sent in the group, and meanwhile, a temporary group is established, and relevant personnel are pulled in to follow up in time, so that the first time is aimed at solving.
In summary, according to the hot topic determination method provided by the embodiment of the disclosure, by distinguishing the posting text and the comment text, configuring the posting text with the influence weight value higher than that of the comment text, comprehensively determining the hot topics from multiple dimensions by combining the number of the posting texts and the number of the comment texts, the importance of the texts can be effectively distinguished, the influence of the water army comments is reduced, and the positioning accuracy of the hot topics is improved.
Based on the specific implementation of the method shown in fig. 1-2, this embodiment provides a hot topic determining apparatus, as shown in fig. 3, including: a first determining module 31, a statistics module 32, a second determining module 33, a calculation module 34, a third determining module 35;
a first determining module 31 configured to determine a text topic contained in a target text, the target text including a plurality of posting texts and a plurality of comment texts;
a statistics module 32 configured to count the number of posting texts and the number of comment texts for each text topic in the target text;
a second determining module 33 configured to determine a first impact weight determined by the posting text for the text topic and a second impact weight determined by the comment text for the text topic, the first impact weight being greater than the second impact weight, the sum of the first impact weight and the second impact weight being 1;
the calculating module 34 is configured to substitute the number of posted texts and the number of comment texts of each text topic, and the first influence weight and the second influence weight into a preset topic sound volume calculating equation, calculate the topic sound volume of each text topic, wherein the preset topic sound volume calculating equation is used for calculating the accumulated value of the quantisation index of the posted texts and the quantisation index of the comment texts, the accumulated value is determined as the topic sound volume, the topic sound volume is used for representing topic discussion heat, the quantisation index of the posted texts is the product of the number of the posted texts and the first influence weight, and the quantisation index of the comment texts is the product of the number of the comment texts and the second influence weight;
The third determining module 35 is configured to determine a text topic whose corresponding topic sound volume is within a preset sound volume range as a hot topic.
In a specific application scenario, the first determining module 31 is specifically configured to segment the target text into a plurality of first words, the plurality of first words form a first word sequence, determine a second word matched with a preset stop word part according to the target word part of the first word, reject the second word from the first word sequence to obtain a second word sequence, perform clustering division of word semantics on the second word sequence, and determine text topics corresponding to different clustering division results.
In a specific application scenario, the statistics module 32 is specifically configured to determine a topic keyword of each text topic, calculate a first semantic feature similarity of the topic keyword and each of the plurality of posting texts, count the number of posting texts in the plurality of posting texts, where the first semantic feature similarity is greater than a preset threshold, calculate a second semantic feature similarity of the topic keyword and each of the plurality of comment texts, and count the number of comment texts in the plurality of comment texts, where the second semantic feature similarity is greater than the preset threshold.
In a specific application scenario, the second determining module 33 is specifically configured to obtain a history text marked with a target topic within a preset period of time, where the history text is the same as a text obtaining source of the target text, and the history text includes a history posting text and a history comment text, calculate a first contribution value of the history posting text corresponding to the target topic, and a second contribution value of the history comment text corresponding to the target topic, determine the first contribution value as a first impact weight, and determine the second contribution value as a second impact weight.
In a specific application scenario, the formula characteristic of the preset topic sound volume calculation equation is described as follows:
P=Q i w q +A i w a
wherein P is the topic sound volume of the text topic i, Q i Number of posted texts representing text topic i, w q Representing a first impact weight, A i Number of comment texts indicating text topic i, w a Representing a second impact weight.
In a specific application scenario, as shown in fig. 4, the apparatus further includes: a display module 36;
the display module 36 is configured to count hot topics within a preset time period and display the hot topics in a preset display form.
In a specific application scenario, the apparatus further includes: a judgment module 37;
The judging module 37 is configured to judge whether the hot topic meets the preset topic abnormal condition, if yes, the early warning prompt information is output.
It should be noted that, other corresponding descriptions of each functional unit related to the hot topic determination device provided in this embodiment may refer to corresponding descriptions in fig. 1-2, and are not described herein again.
Based on the above method shown in fig. 1-2, correspondingly, the present embodiment further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the above method shown in fig. 1-2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present application.
In order to achieve the above object, based on the method shown in fig. 1-2 and the virtual device embodiment shown in fig. 3-4, an embodiment of the present application further provides an electronic device, which may be configured on an end side of a vehicle (such as an electric automobile), where the device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the method as described above with reference to fig. 1-2.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be appreciated by those skilled in the art that the above-described physical device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be a different arrangement of components.
The storage medium may also include an operating system, a network communication module. The operating system is a program that manages the physical device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the information processing entity equipment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. By applying the scheme of the embodiment, the posting text and the comment text are respectively analyzed, the posting text is configured with the influence weight value higher than that of the comment text, and the hot topics are comprehensively determined from a plurality of dimensions by combining the number of the posting texts and the number of the comment texts, so that the importance of the texts can be effectively distinguished, the influence of the water army comments is reduced, and the positioning accuracy of the hot topics is improved.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The foregoing is merely exemplary of embodiments of the present application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for determining a hot topic, comprising:
determining a text topic contained in a target text, wherein the target text comprises a plurality of posting texts and a plurality of comment texts;
counting the number of posting texts and the number of comment texts aiming at each text topic in the target text;
determining a first influence weight determined by the posting text for a text topic and a second influence weight determined by the evaluation text for the text topic, wherein the first influence weight is greater than the second influence weight, and the sum of the first influence weight and the second influence weight is 1;
substituting the posting text quantity and comment text quantity of each text topic and the first influence weight and the second influence weight into a preset topic sound quantity calculation equation to calculate the topic sound quantity of each text topic, wherein the preset topic sound quantity calculation equation is used for calculating accumulated values of posting text quantization indexes and comment text quantization indexes, the accumulated values are determined to be topic sound quantities, the topic sound quantities are used for representing topic discussion heat, the posting text quantization indexes are products of the posting text quantity and the first influence weight, and the comment text quantization indexes are products of the comment text quantity and the second influence weight;
And determining a text topic corresponding to the topic sound volume within a preset topic sound volume range as a hot topic.
2. The method of claim 1, wherein the determining the text topic contained in the target text comprises:
segmenting a target text into a plurality of first words, wherein the plurality of first words form a first word sequence;
determining a second word matched with a preset stop word part according to the target word part of the first word, and removing the second word from the first word sequence to obtain a second word sequence;
and carrying out clustering division of word semantics on the second word sequence, and determining text topics corresponding to different clustering division results.
3. The method of claim 1, wherein the counting the number of posting texts and the number of comment texts for each of the text topics in the target text comprises:
determining topic keywords of each text topic;
calculating first semantic feature similarity of the topic keyword and each of the plurality of posting texts, and counting the number of posting texts in the plurality of posting texts, wherein the first semantic feature similarity is larger than a preset threshold;
Calculating the second semantic feature similarity of the topic keyword and each comment text in the comment texts, and counting the number of comment texts, corresponding to the second semantic feature similarity, in the comment texts, greater than a preset threshold value.
4. The method of claim 1, wherein the determining a first impact weight for text topic determination for the posting text and a second impact weight for text topic determination for the review text comprises:
acquiring a history text marked with a target topic in a preset time period, wherein the history text and the text of the target text are the same in acquisition source, and the history text comprises a history posting text and a history comment text;
calculating a first contribution value of the historical posting text corresponding to the target topic and a second contribution value of the historical comment text corresponding to the target topic;
the first contribution value is determined as a first impact weight and the second contribution value is determined as a second impact weight.
5. The method of claim 1, wherein the formula of the pre-topic sound volume calculation equation is characterized by:
P= i w q + o w a
wherein P is the topic sound volume of the text topic i, Q i Number of posted texts representing text topic i, w q Representing a first impact weight, A i Number of comment texts indicating text topic i, w a Representing a second impact weight.
6. The method of claim 1, wherein after determining a text topic corresponding to the topic sound volume within a preset sound volume range as a hot topic, the method further comprises:
counting hot topics within a preset time period, and displaying according to a preset display form; and
judging whether the hot topic accords with a preset topic abnormal condition, and if so, outputting early warning prompt information.
7. A hot topic determination apparatus, comprising:
the first determining module is used for determining text topics contained in target texts, wherein the target texts comprise a plurality of posting texts and a plurality of comment texts;
the statistics module is used for counting the number of posting texts and the number of comment texts aiming at each text topic in the target text;
a second determining module, configured to determine a first impact weight determined by the posting text for a text topic and a second impact weight determined by the evaluation text for the text topic, where the first impact weight is greater than the second impact weight, and a sum of the first impact weight and the second impact weight is 1;
The calculation module is used for substituting the posting text quantity and comment text quantity of each text topic, the first influence weight and the second influence weight into a preset topic sound quantity calculation equation, calculating the topic sound quantity of each text topic, wherein the preset topic sound quantity calculation equation is used for calculating accumulated values of posting text quantization indexes and comment text quantization indexes, determining the accumulated values as topic sound quantities, the topic sound quantities are used for representing topic discussion heat, the posting text quantization indexes are products of the posting text quantity and the first influence weight, and the comment text quantization indexes are products of the comment text quantity and the second influence weight;
and the third determining module is used for determining a text topic corresponding to the topic sound volume in the preset topic sound volume range as a hot topic.
8. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
9. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
CN202310170224.6A 2023-02-14 2023-02-14 Hot topic determination method, device, electronic equipment and storage medium Pending CN117076785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310170224.6A CN117076785A (en) 2023-02-14 2023-02-14 Hot topic determination method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310170224.6A CN117076785A (en) 2023-02-14 2023-02-14 Hot topic determination method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117076785A true CN117076785A (en) 2023-11-17

Family

ID=88702962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310170224.6A Pending CN117076785A (en) 2023-02-14 2023-02-14 Hot topic determination method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117076785A (en)

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN109145216B (en) Network public opinion monitoring method, device and storage medium
CN105893533B (en) Text matching method and device
US20210056571A1 (en) Determining of summary of user-generated content and recommendation of user-generated content
US9201880B2 (en) Processing a content item with regard to an event and a location
RU2517368C2 (en) Method and apparatus for determining and evaluating significance of words
WO2019076191A1 (en) Keyword extraction method and device, and storage medium and electronic device
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
US20100079464A1 (en) Information processing apparatus capable of easily generating graph for comparing of a plurality of commercial products
CN107544988B (en) Method and device for acquiring public opinion data
CN108269122B (en) Advertisement similarity processing method and device
CN109508373B (en) Method and device for calculating enterprise public opinion index and computer readable storage medium
EP3608799A1 (en) Search method and apparatus, and non-temporary computer-readable storage medium
CN111310011A (en) Information pushing method and device, electronic equipment and storage medium
CN109740156B (en) Feedback information processing method and device, electronic equipment and storage medium
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN112700203B (en) Intelligent marking method and device
CN115687790B (en) Advertisement pushing method and system based on big data and cloud platform
CN108628875B (en) Text label extraction method and device and server
CN112182448A (en) Page information processing method, device and equipment
CN117076785A (en) Hot topic determination method, device, electronic equipment and storage medium
KR101614551B1 (en) System and method for extracting keyword using category matching
CN114610796A (en) Text similarity determination method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination