CN111046650A - Network public opinion automatic identification technology based on element co-occurrence - Google Patents

Network public opinion automatic identification technology based on element co-occurrence Download PDF

Info

Publication number
CN111046650A
CN111046650A CN201911248914.9A CN201911248914A CN111046650A CN 111046650 A CN111046650 A CN 111046650A CN 201911248914 A CN201911248914 A CN 201911248914A CN 111046650 A CN111046650 A CN 111046650A
Authority
CN
China
Prior art keywords
words
occurrence
word
feature
public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911248914.9A
Other languages
Chinese (zh)
Inventor
程南昌
宋康
邹煜
滕永林
杨柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201911248914.9A priority Critical patent/CN111046650A/en
Publication of CN111046650A publication Critical patent/CN111046650A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses an element co-occurrence-based network public opinion automatic identification technology, which comprises two steps of an implementation method and a weighting algorithm, wherein the implementation method comprises the following steps: s101: 9436 linguistic data are collected and recorded as X, and 1250 ten thousand characters are totally obtained, wherein 1836 linguistic data related to public sentiment are recorded as Y, more than 250 ten thousand characters, 7600 linguistic data related to non-public sentiment are recorded as Z, and about 1000 ten thousand characters are obtained; s102: then, performing word segmentation on the corpus by adopting an automatic word segmentation system CUCBst, and performing word frequency statistics; s103: dividing words in X, Y, Z into five grades according to frequency; s104: comparing the words in the Z with the words in the same frequency band in the X according to the frequency band; the weighting algorithm comprises the following steps: s201: firstly, calculating a weight value of the feature words, and then, based on the co-occurrence of the three types of feature words; s202: and combining the appearance position of the characteristic word and the length of the text, and performing weighted calculation by four factors to obtain a text score.

Description

Network public opinion automatic identification technology based on element co-occurrence
Technical Field
The invention relates to the technical field of public opinion monitoring, in particular to an automatic network public opinion identification technology based on element co-occurrence.
Background
Research related to public opinion detection mainly focuses on the field of topic detection, and special evaluation activities, namely topic detection and tracking, have been internationally held. In topic detection and tracking, a topic refers to a set of stories consisting of "one seed event or activity and events or activities directly related to it. The task of topic detection is to detect and organize topics that are not known by the system in advance. The technology mainly adopts a clustering algorithm based on statistics, such as K-Means, centroid and hierarchical clustering and the like. Because the clustering method is large in calculation amount, when the system is oriented to massive network documents, the system for detecting the public sentiment related topics directly by the clustering method is rare.
Although topic detection and tracking evaluation has ceased by 2004, related research continues. In recent years, the existing literature proposes a new event detection method based on topic segmentation and based on lemma reevaluation. The new event detection technique can be used to detect the first report of an emergency like 9 · 11, related to public opinion detection. The literature adds subtopic information of the topic to be detected to an experiment for judging a new event, for example, the possibility that the topic with more subtopics is the new event is less than the topic with less subtopics. The document finds that the sensibility of the lemmas with different parts of speech in different classes of news is different, so the weights of the lemmas need to be evaluated again according to the specific classes of the news in the calculation process. Topic detection and tracking and evaluating corpora used in documents are classified in detail according to different topics, but a real network document has no relevant information such as categories, sub-topics and the like available. The literature adopts a search method based on key words to find out emergencies in the Xinlang blogs, and restricts search results by a method of limiting time periods and domain names, so that redundancy is reduced. This is similar to the keyword ranking method mentioned earlier. The document identifies hot sentences through hot words, and then clusters the hot sentences, so that identification of hot topics is realized. The hot topic has a high possibility of belonging to public sentiment and is relevant to the research. Although documents reduce the computation of clusters from chapters to sentence level, the recognition of hot words and hot sentences consumes a large amount of computation.
In summary, the current defects of public opinion detection can be summarized as 3 points:
(1) the field pertinence is not strong, and the system of the world is basically oriented to the whole society and politics;
(2) a method based on batch keywords or public sentiment dictionaries is mainly adopted, and the defects of the method are mentioned in the introduction part;
(3) the statistical-based clustering method and other new methods are still mostly on the theoretical level, and are not common in the actual public opinion detection.
Disclosure of Invention
The invention aims to provide an automatic network public opinion identification technology based on element co-occurrence, which is characterized in that three main elements (subjects, objects and emotional tendencies) forming public opinions are respectively represented by three types of feature words from the essence of the public opinions, and the three types of feature words are dynamically combined according to combination and aggregation relations, so that topics related to the public opinions in a certain field can be generated, and public opinion information in the field can be effectively identified. The method is practically applied to a language and character public opinion monitoring system and an advanced education public opinion monitoring system, and respectively achieves the accuracy of 92% and 93% so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
the network public opinion automatic identification technology based on element co-occurrence comprises two steps of an implementation method and a weighting algorithm, wherein the implementation method comprises the following steps:
s101: 9436 linguistic data are collected and recorded as X, and 1250 ten thousand characters are totally obtained, wherein 1836 linguistic data related to public sentiment are recorded as Y, more than 250 ten thousand characters, 7600 linguistic data related to non-public sentiment are recorded as Z, and about 1000 ten thousand characters are obtained;
s102: then, performing word segmentation on the corpus by adopting an automatic word segmentation system CUCBst, and performing word frequency statistics;
s103: dividing words in X, Y, Z into five grades according to frequency;
s104: comparing the words in the Z with the words in the same frequency band in the X according to the frequency band, and aiming at extracting characteristic words in the language and character public sentiment;
the weighting algorithm comprises the following steps:
s201: firstly, calculating a weight value of the feature words, and then, based on the co-occurrence of the three types of feature words;
s202: and (4) combining the occurrence positions of the characteristic words and the length of the text, performing weighted calculation on four factors to obtain a text score, and judging that the text belongs to the language word public sentiment when the score reaches a certain threshold value.
Further, the five stages in S103 include: level 1 (more than or equal to 1000), level 2 (between 500 and 999), level 3 (between 100 and 499), level 4 (between 5 and 99), and level 5 (between 1 and 4).
Further, the weighting algorithm further comprises calculating feature word weights.
Further, the factors considered by the weighting algorithm include feature word weights, co-occurrence conditions among the three types of feature words, feature word positions and text lengths.
Furthermore, the quality of the feature word set determines the accuracy and recall rate of public opinion information detection, and in order to ensure the quality of the feature word set, three types of feature words which are automatically extracted need to be manually confirmed item by item.
Compared with the prior art, the invention has the beneficial effects that: the invention starts from the essence of public sentiment, three main elements (subject, object and emotional tendency) forming the public sentiment are respectively represented by three types of characteristic words, and the three types of characteristic words are dynamically combined according to combination and aggregation relations, thereby not only generating topics related to the public sentiment in a certain field, but also effectively identifying the public sentiment information in the field. The method is practically applied to a language and character public opinion monitoring system and an advanced education public opinion monitoring system, and the accuracy rates of the method are respectively 92% and 93%.
Drawings
FIG. 1 is a diagram of three types of characteristic words and expressions of language and public sentiment and their relationships according to the present invention;
fig. 2 is a flow chart of the extraction of three types of feature words according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Public sentiment, as defined in the literature, is the public consisting of individuals and various social groups, and is the sum of various emotions, will, attitudes and opinions held by various public matters concerned by or closely related to their interests in a certain historical stage and social space. The visible public sentiment is composed of three basic elements: subjects (people), objects (various public matters), emotional tendencies (the sum of emotions, willingness, attitudes, and stagger of opinions). The element co-occurrence method starts from the essence of public sentiment, the three elements forming the public sentiment are represented by three types of characteristic words, each type of characteristic word represents a public sentiment element, and the three types of characteristic words can be dynamically combined and matched to generate topics related to the public sentiment in a certain field. For example, in the field of language and text, public sentiment events such as "simple and complicated war", "defending dialect", "letter and word wind wave" and the like may occur. After three types of elements of public sentiment are represented by characteristic words, the relationship is shown in fig. 1.
As shown in fig. 1, the language word public opinion is represented as: subjects such as experts and teachers have opinions or attitudes against, reject or approve objects (objects) such as mandarin, traditional Chinese characters, dialects and alphabetic words. The characteristic words in the three elements of 'subject', 'object' and 'emotional tendency' are automatically extracted or summarized empirically based on the existing linguistic data. The three characteristic words respectively play their roles, can be dynamically combined together, have extremely strong tension, can cover all public sentiment information possibly appearing in the field of language and characters, and can exclude most non-public sentiment information. The theoretical basis is just the combined polymerization theory of grovels.
Swiss linguist states that everything is relationship based in the language state. Its core is the sentence segment relation and association relation, i.e. the combination relation and aggregation relation. The combination relation indicates the horizontal relation among all language units which appear in the speech and are established on a linear basis; aggregation refers to the vertical relationship between units that may appear in the same location and have the same function in the language hierarchy. According to the theory of grovels, the three kinds of feature words are dynamically combined and collocated to generate different topics, for example, according to a combination relationship: the teacher promotes Mandarin, expert rejection dialect, media abuse letter words, popular praise for traditional characters … …, etc.; the aggregate relationship may yield: the teacher popularizes mandarin, the expert popularizes mandarin, the media popularizes mandarin, the public popularizes mandarin … …, and the like. The element co-occurrence method is just simulating the domain knowledge word bank (aggregation) in the human brain, the cognitive understanding of objective objects and the generation expression process (combination), has strong topic generation capacity, and can effectively identify all topics which can be generated by the element co-occurrence method. If the above three types of feature word sets can be established for the public sentiment features of a certain field, the public sentiment of the field can be effectively identified. The topic generation capability of the element co-occurrence method is potential, when the characteristic words are co-occurring in a certain language segment, other words which are not related to the characteristic words can be automatically ignored, and the dynamically generated topic is matched with the characteristic words. For example, a language segment "some students after 90 like traditional Chinese characters" can successfully identify the topic "students like traditional Chinese characters" by ignoring other words.
From the perspective of public opinion detection, the characteristic words corresponding to objects are the most important, and only words related to language characters appear in the text, and it is meaningful to discuss whether the language characters belong to the public opinion, so that the words can be called as 'subject words'; secondly, characteristic words with emotional tendency are called as 'emotional words'; thirdly, the characteristic words correspond to the main body, and the main body is generally the people, such as students, parents, teachers and the like. In addition, public sentiment needs a certain space-time background, and corresponding characteristic words such as classroom, classroom and school have certain functions in public sentiment detection, and some can also replace a main body such as school promoted mandarin, which is closer to the characteristic words corresponding to the main body, so that the characteristic words related to the main body can be combined into the categories of character and background, namely the background words. In the three types of characteristic words, any one type of characteristic words appearing independently cannot directly form the public sentiment, more than two types of characteristic words are required to be co-appeared, and a certain topic can be the public sentiment. Based on this, we call the method "elemental co-occurrence" method.
The element co-occurrence method realizes public opinion detection by constructing a speech knowledge system related to public opinions in a certain field, focuses on the combination of three basic elements related to the public opinions instead of one point, presents strong tension, and is essentially different from the detection method of the conventional keyword and public opinion dictionary. The way of the batch keyword or public opinion dictionary is one-dimensional, and a point such as 'removal event', 'move away blood case', 'violence terrorist event' and the like is searched. The element co-occurrence method is three-dimensional, and three types of characteristic words are combined and co-occur to form different topics. The method of keyword grading or public sentiment dictionary also considers the co-occurrence, but the co-occurrence is bound with a specific word, and all elements of the element co-occurrence method can be dynamically combined and matched, so that the method has strong topic generation capability. The element co-occurrence method also utilizes the dynamic combination and collocation, and endows the public sentiment monitoring system with the public sentiment early warning function of finding unknown topics in real time.
The network public opinion automatic identification technology based on element co-occurrence comprises two steps of an implementation method and a weighting algorithm, wherein the implementation method firstly extracts three types of feature words, and the premise of implementing the element co-occurrence method is to find out a feature word set corresponding to three elements of public opinion. The characteristic words can be summarized by manual induction or obtained by an automatic searching method.
The implementation method comprises the following steps:
s101: 9436 corpora (hereinafter referred to as X in this collection) are collected, and 1250 ten thousand words are provided, wherein 1836 corpora (hereinafter referred to as Y in this collection), 250 more than ten thousand words, 7600 corpora (hereinafter referred to as Z in this collection) are provided as the non-public-sentiment-related corpora, and about 1000 ten thousand words are provided;
s102: then, performing word segmentation on the corpus by adopting an automatic word segmentation system CUCBst, and performing word frequency statistics;
s103: the words in X, Y, Z are divided into five levels according to the frequency: level 1 (more than or equal to 1000), level 2 (between 500 and 999), level 3 (between 100 and 499), level 4 (between 5 and 99), level 5 (between 1 and 4);
s104: comparing the words in the Z with the words in the same frequency band in the X according to the frequency band, and aiming at extracting characteristic words in the language and character public sentiment; taking the word of "language" as an example, in X, the frequency of occurrence is 7161 times, which belongs to level 1 words, and in Z, only 62 times, which belongs to level 4 words, if the comparison is performed without frequency division, words with language character public sentiment characteristics cannot be extracted.
The extracted entries are further classified by comparison, wherein the identification of the subject word is identified by taking a dictionary of 'linguistic nouns' as a reference; the emotion words are identified by taking an emotion dictionary arranged in Yangjiang as a reference; those that do not fall into these two categories are automatically categorized as background words. Taking level 1 words in X and Z as an example, the extraction process of the feature words is shown in fig. 2.
The quality of the feature word set determines the accuracy rate and the recall rate of the public opinion information detection. In order to ensure the quality of the feature word set, the three types of feature words which are automatically extracted need to be manually confirmed item by item.
The successful extraction of the feature word set of the weighting algorithm is a precondition for realizing an element co-occurrence method, the element co-occurrence is a main factor for judging public sentiment, but not a unique factor, and the final judgment can be carried out only by combining with the weighted calculation of other factors. Firstly, calculating the weight of the feature words, then, on the basis of the co-occurrence of the three types of feature words, combining the occurrence positions of the feature words and the length of the text, carrying out weighted calculation by four factors to obtain the score of the text, and when the score reaches a certain threshold value, judging that the text belongs to the language word public sentiment.
Calculating the weight of the characteristic word, and the importance of a word in the text set, which is generally expressed by the word frequency-document frequency value. Word frequency-document frequency theory holds that the importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the number of texts in which it appears in a corpus, i.e. more specific words appearing in only a few documents are weighted more heavily than words appearing in many documents. But the lack of word frequency-document frequency is also significant, and the method underestimates the role of frequently occurring words in a class, which are capable of representing the text characteristics of the class and should be given higher weight. Therefore, the present invention takes the normalized usage as an important quantization criterion. Because of the fact that in a text set, if the usage rate of the appearing feature words is high, the probability that the text belongs to the public opinion is high. For example, when a feature word such as "chinese" or "dialect" appears in a text, it is more likely to belong to a language word public opinion than to appear "tone" or "syllable".
Based on the above consideration, the invention determines the weight of the feature word by the normalized utilization rate of the feature word, and the weight is high when the utilization rate is high. Through the utilization rate analysis of the feature words, the feature words with high utilization rate are generally more than or equal to 0.01, the feature words with medium utilization rate are between 0.001 and 0.01, and the feature words with less than 0.001 are the feature words with low utilization rate. According to the discovery, the invention sets the weight of the feature words as 3 grades, each grade is sequentially decreased by 1, and the weight is respectively 3, 2 and 1, for example, the word "language" has 3 grades, and the word "book" has 1 grade. Table 1 shows three classes of feature words and their normalized usage, each class extracting 10 representative.
TABLE 1 feature words and their normalized usage
Figure BDA0002308463100000071
Figure BDA0002308463100000081
The calculation formula of the normalized utilization rate is as follows:
Figure BDA0002308463100000082
where F represents the frequency of the word, D represents the distribution ratio, the denominator is the normalization term, and V represents the set of all homogenous panelists (all word types).
TABLE 2 Co-occurrence of three classes of feature words in clauses
Figure BDA0002308463100000083
Table 2 shows that in example sentence 1, three types of feature words co-occur, and can be determined as language word public sentiment; in the example sentence 2, the subject term and the emotional term co-occur and can be basically judged as the language word public sentiment; in example 3, the subject word and the background word co-occur, and this example may be public sentiment information in the aspect of international spreading of chinese, and may also be only an introduction to a certain proofreading foreign-chinese professional, so that it cannot be directly determined as language word public sentiment information.
Generally, the closer the elements are, the more closely the syntactic and semantic relationships between them, and the greater the likelihood of belonging to a public opinion related topic. In the three language segments listed in table 2, the co-occurrence distances between feature words are smaller and are all within a small sentence, whereas more times, the co-occurrence distances between the three types of feature words are in a sentence or a paragraph. Therefore, it is necessary to solve a problem of how much the co-occurrence distance between the three types of feature words is recognized as the best. The invention divides the co-occurrence distance between three types of words into four levels of sections, paragraphs, sentences and small sentences, and compares the sections, the paragraphs, the sentences and the small sentences respectively, and a weighting algorithm is needed for comparison.
Besides the co-occurrence, the position of the feature word in the text and the length of the text are also factors to be considered by the weighting algorithm. In terms of location, the present invention only considers the two cases of title and body. The feature words appearing in the title and the text are different in weight. In terms of text length, the longer the text, the higher its score may be, and therefore some constraint must be placed on this, the present invention being constrained by the average length of the text in Y.
To summarize, the weighting algorithm takes into account four factors: the feature word weight, the co-occurrence condition among the three types of feature words, the feature word position and the text length. The weighting algorithm needs to segment the text according to the co-occurrence distance between the feature words, as mentioned above, the invention divides the co-occurrence distance into four levels of "chapter, paragraph, sentence, and small sentence", and this section discusses the process of the weighting algorithm by taking the co-occurrence distance at the sentence level as an example. First in. Is there a | A "as the boundary, the text is cut into sentences, and the sentence score is shown in formula (2).
Figure BDA0002308463100000092
SeniThe score of a sentence i is represented, a, b and c respectively represent a word in three types of feature word lists, F represents the frequency of the word, U represents a weight, P represents a position score, and the score of the feature word in a certain word list in the sentence is equal to the frequency of the word in the sentence multiplied by the weight and then added with the position score. GiThe co-occurrence scores of the three characteristic words are highest, the subject word + the sentiment word is next to the three characteristic words, and the subject word + the background word is lowest.
Finally, the score of the entire text is shown in equation (3).
Figure BDA0002308463100000091
TextiRepresents the score of the text i, AL (Average Length) represents the Average Length of all texts in Y, and Li represents the Length of the text i. I.e., the score of the article is equal to the sum of the scores of all sentences, multiplied by the average text length divided by the length of the text.
Table 3 shows statistics of a one-week monitoring result of randomly extracted language and text, and data shows that the average accuracy of the recognizer in actual monitoring reaches 92%. The system is adopted by departments such as a language and character information management department of the education department, a national language resource monitoring and research center and the like, and the system operates for more than 6 years all day.
TABLE 3 one week accuracy of language and character public opinion monitoring system
Figure BDA0002308463100000101
In order to verify the universality of the element co-occurrence method, the method is adopted to identify the education public sentiment in the network in the monitoring of the higher education public sentiment, and the success is achieved. Table 4 shows the monitoring results of the monitoring system randomly drawing the advanced education opinions for one week.
TABLE 4 high education public opinion monitoring System one week accuracy
Figure BDA0002308463100000102
Data show that the recognizer has an average accuracy rate of 93% in the detection of higher education public sentiment. The system is adopted by a national advanced education quality monitoring and evaluation research base advanced education transmission and public opinion monitoring research center subordinate to an advanced education teaching evaluation center of the education department, and the system operates for more than 4 years in all weather.
The invention starts from the essence of public sentiment, three main elements (subject, object and emotional tendency) forming the public sentiment are respectively represented by three types of characteristic words, and the three types of characteristic words are dynamically combined according to combination and aggregation relations, thereby not only generating topics related to the public sentiment in a certain field, but also effectively identifying the public sentiment information in the field. The method is practically applied to a language and character public opinion monitoring system and an advanced education public opinion monitoring system, and the accuracy rates of the method are respectively 92% and 93%.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (5)

1. The network public opinion automatic identification technology based on element co-occurrence is characterized by comprising two steps of an implementation method and a weighting algorithm, wherein the implementation method comprises the following steps:
s101: 9436 linguistic data are collected and recorded as X, and 1250 ten thousand characters are totally obtained, wherein 1836 linguistic data related to public sentiment are recorded as Y, more than 250 ten thousand characters, 7600 linguistic data related to non-public sentiment are recorded as Z, and about 1000 ten thousand characters are obtained;
s102: then, performing word segmentation on the corpus by adopting an automatic word segmentation system CUCBst, and performing word frequency statistics;
s103: dividing words in X, Y, Z into five grades according to frequency;
s104: comparing the words in the Z with the words in the same frequency band in the X according to the frequency band, and aiming at extracting characteristic words in the language and character public sentiment;
the weighting algorithm comprises the following steps:
s201: firstly, calculating a weight value of the feature words, and then, based on the co-occurrence of the three types of feature words;
s202: and (4) combining the occurrence positions of the characteristic words and the length of the text, performing weighted calculation on four factors to obtain a text score, and judging that the text belongs to the language word public sentiment when the score reaches a certain threshold value.
2. The element co-occurrence-based internet public opinion monitoring method according to claim 1, wherein five levels in S103 include: level 1 (more than or equal to 1000), level 2 (between 500 and 999), level 3 (between 100 and 499), level 4 (between 5 and 99), and level 5 (between 1 and 4).
3. The element co-occurrence-based internet public opinion monitoring method according to claim 1, wherein the weighting algorithm further comprises calculating feature word weights.
4. The element co-occurrence-based internet public opinion monitoring method according to claim 1, wherein the factors considered by the weighting algorithm include feature word weight, co-occurrence between three types of feature words, feature word position, and text length.
5. The element co-occurrence-based network public opinion monitoring method according to claim 1, wherein the quality of the feature word set determines the accuracy and recall rate of public opinion information detection, and in order to ensure the quality of the feature word set, manual confirmation is required for automatically extracted three types of feature words item by item.
CN201911248914.9A 2019-12-09 2019-12-09 Network public opinion automatic identification technology based on element co-occurrence Pending CN111046650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911248914.9A CN111046650A (en) 2019-12-09 2019-12-09 Network public opinion automatic identification technology based on element co-occurrence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911248914.9A CN111046650A (en) 2019-12-09 2019-12-09 Network public opinion automatic identification technology based on element co-occurrence

Publications (1)

Publication Number Publication Date
CN111046650A true CN111046650A (en) 2020-04-21

Family

ID=70235124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911248914.9A Pending CN111046650A (en) 2019-12-09 2019-12-09 Network public opinion automatic identification technology based on element co-occurrence

Country Status (1)

Country Link
CN (1) CN111046650A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902619A (en) * 2012-12-28 2014-07-02 中国移动通信集团公司 Internet public opinion monitoring method and system
US20150186503A1 (en) * 2012-10-12 2015-07-02 Tencent Technology (Shenzhen) Company Limited Method, system, and computer readable medium for interest tag recommendation
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186503A1 (en) * 2012-10-12 2015-07-02 Tencent Technology (Shenzhen) Company Limited Method, system, and computer readable medium for interest tag recommendation
CN103902619A (en) * 2012-12-28 2014-07-02 中国移动通信集团公司 Internet public opinion monitoring method and system
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Similar Documents

Publication Publication Date Title
US9971974B2 (en) Methods and systems for knowledge discovery
Turney Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
CN107608999A (en) A kind of Question Classification method suitable for automatically request-answering system
CN111950273A (en) Network public opinion emergency automatic identification method based on emotion information extraction analysis
CN1687924A (en) Method for producing internet personage information search engine
CN109885675A (en) Method is found based on the text sub-topic for improving LDA
CN110705247A (en) Based on x2-C text similarity calculation method
Khalid et al. Topic detection from conversational dialogue corpus with parallel dirichlet allocation model and elbow method
CN113032550B (en) Viewpoint abstract evaluation system based on pre-training language model
Tahrat et al. Text2geo: from textual data to geospatial information
CN112000804A (en) Microblog hot topic user group emotion tendentiousness analysis method
Fu et al. Multi-aspect blog sentiment analysis based on LDA topic model and hownet lexicon
CN111046650A (en) Network public opinion automatic identification technology based on element co-occurrence
Mahfuzh et al. Improving joint layer RNN based keyphrase extraction by using syntactical features
CN115619443A (en) Company operation prediction method and system for emotion analysis based on annual report of listed company
Marin et al. Detecting authority bids in online discussions
Yang et al. Mining Personality Traits from Social Text Messages
CN111783426A (en) Long text emotion calculation method based on double-question method
Razavi et al. Parameterized contrast in second order soft co-occurrences: A novel text representation technique in text mining and knowledge extraction
Meraliyev et al. Content Analysis of Extracted Suicide Texts From Social Media Networks by Using Natural Language Processing and Machine Learning Techniques
Arnfield Enhanced Content-Based Fake News Detection Methods with Context-Labeled News Sources
Ilmi et al. Siamese Long Short-Term Memory for Detecting Conflict of Interest on Scientific Papers
Rosalina et al. Multidocument Summarization using GloVe Word Embedding and Agglomerative Cluster Methods
CN117688910A (en) News event context extraction method and system
Manasa et al. MLSSDCNN: Automatic Sentiment Examination Model Creation using Multi Domain Light Semi Supervised Deep Convolution Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination