CN107783961A - A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification - Google Patents
A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification Download PDFInfo
- Publication number
- CN107783961A CN107783961A CN201711092187.2A CN201711092187A CN107783961A CN 107783961 A CN107783961 A CN 107783961A CN 201711092187 A CN201711092187 A CN 201711092187A CN 107783961 A CN107783961 A CN 107783961A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- talked
- topic
- much
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of much-talked-about topic to know method for distinguishing, gathers text corresponding to forum;Text is divided into word according to participle instrument;Word, and the frequency that each word that calculating sifting goes out successively occurs in the word all filtered out are screened according to corpus;Frequency is selected to be more than the word of setting value as much-talked-about topic;Wherein, segmenting the dictionary of instrument includes the word of preset standard form.It can be seen that, the dictionary of participle instrument includes the word of preset standard form, can using works and expressions for everyday use, cyberspeak as preset standard form word, when being segmented to text, the word that the words recognition of preset standard form can be come out and be used as after participle, and the word after participle is screened according to corpus.Much-talked-about topic can more accurately be identified.The invention also discloses the device and computer-readable recording medium of a kind of much-talked-about topic identification, effect is as above.
Description
Technical field
The present invention relates to computer realm, more particularly to the method, apparatus of a kind of much-talked-about topic identification and computer-readable
Storage medium.
Background technology
With the development of computer network, network various viewpoints, comment etc. emerge in an endless stream, in order to understand society in time
Meeting focus incident, observation society dynamic, make appropriate decision-making for enterprise, government etc. and provide guidance, generally require on network
Comment, viewpoint etc. are analyzed, and identify much-talked-about topic.
In the prior art, the text got is generally divided into word, and directly counts the frequency that each word occurs, choosing
The high word of frequency is taken as much-talked-about topic.And for the forum on network, user is when making comments, cyberspeak and daily
Term is more, and cyberspeak and works and expressions for everyday use are often stated lack of standardization, easily cause mistake to segment, also, for part point
Word after word, topic may can not be used as, cause the higher word of the frequency finally selected can not to be used as focus
Topic.
Therefore, how much-talked-about topic is more accurately identified, is that those skilled in the art need to solve the problems, such as at present.
The content of the invention
The method, apparatus and computer-readable recording medium identified it is an object of the invention to provide a kind of much-talked-about topic, more
Add and accurately and effectively identify social hotspots topic.
In order to solve the above-mentioned technical problem, the present invention provides a kind of much-talked-about topic knowledge method for distinguishing, including:
Gather text corresponding to forum;
The text is divided into word according to participle instrument;
The word is screened according to corpus, and each word that calculating sifting goes out successively is described in all filter out
The frequency occurred in word;
Frequency is selected to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
Preferably, after text corresponding to the collection forum, further comprise:
The text collected is pre-processed, and enters the foundation participle instrument and the text is divided into word
The step of.
Preferably, the described pair of text collected carries out pretreatment and specifically included:
The wrong word in the text and emoticon are obtained, and the text is modified;
Delete the stop words in the text.
Preferably, after the text is divided into word by the foundation participle instrument, further comprise:
The word that participle mistake be present is merged, and enters the step that the word is screened according to corpus
Suddenly.
Preferably, it is described selection frequency be more than setting value the word as much-talked-about topic after, further comprise:
Include the text of the much-talked-about topic according to sentiment dictionary analysis to obtain the Sentiment orientation of corresponding user.
Preferably, text corresponding to the collection forum is specially:
The URL link of webpage corresponding to forum is obtained by reptile iteration;
Webpage is obtained according to the URL link;
The matching of regular expression is carried out to the webpage to obtain required text.
The present invention also provides a kind of device of much-talked-about topic identification, including:
Harvester, for gathering text corresponding to forum;
Device is divided, for the text to be divided into word according to participle instrument;
Computing device is screened, for screening the word, and each word that calculating sifting goes out successively according to corpus
The frequency occurred in the word all filtered out;
Selection device, for selecting frequency to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
Preferably, in addition to:
Pretreatment unit, for being pre-processed to the text collected.
The present invention also provides a kind of device of much-talked-about topic identification, including processor, the processor are used to perform storage
The step of any of the above-described kind of much-talked-about topic knows method for distinguishing is realized during the program stored in device.
The present invention also provides a kind of computer-readable recording medium, and calculating is stored with the computer-readable recording medium
Machine program, the computer program are executed by processor to realize following steps:
Gather text corresponding to forum;
The text is divided into word according to participle instrument;
The word is screened according to corpus, and each word that calculating sifting goes out successively is described in all filter out
The frequency occurred in word;
Frequency is selected to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
Gather text corresponding to forum;Text is divided into word according to participle instrument;Word is screened according to corpus, and
The frequency that each word that calculating sifting goes out successively occurs in the word all filtered out;Frequency is selected to be more than the word of setting value
As much-talked-about topic;Wherein, segmenting the dictionary of instrument includes the word of preset standard form.It can be seen that segment the dictionary of instrument
Include the word of preset standard form, can using works and expressions for everyday use, cyberspeak as preset standard form word, to text
When being segmented, the word that the words recognition of preset standard form can be come out and be used as after participle, and to the word after participle
Language is screened according to corpus, the word for cannot function as topic, no longer calculates frequency and as final much-talked-about topic,
Therefore, it is possible to more accurately identify much-talked-about topic.The device of much-talked-about topic provided by the invention identification and computer-readable
Storage medium, effect is as above.
Brief description of the drawings
In order to illustrate the embodiments of the present invention more clearly, the required accompanying drawing used in embodiment will be done simply below
Introduce, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for ordinary skill people
For member, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the flow chart that a kind of much-talked-about topic provided in an embodiment of the present invention knows method for distinguishing;
Fig. 2 is a kind of structure chart of the device of much-talked-about topic identification provided in an embodiment of the present invention;
Fig. 3 is a kind of structure chart of the device of much-talked-about topic identification provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on this
Embodiment in invention, those of ordinary skill in the art on the premise of creative work is not paid, obtained it is all its
His embodiment, belongs to the scope of the present invention.
The method, apparatus and computer-readable recording medium identified it is an object of the invention to provide a kind of much-talked-about topic, more
Add and accurately and effectively identify social hotspots topic.
In order that those skilled in the art is better understood from technical scheme, it is below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
Fig. 1 is the flow chart that a kind of much-talked-about topic provided in an embodiment of the present invention knows method for distinguishing, as shown in figure 1, focus
The method of topic detection comprises the following steps:
S10:Gather text corresponding to forum.
Forum is exactly to be posted the platform that money order receipt to be signed and returned to the sender is talked about for user, and user can release news or propose view in forum.
The length of text in forum is general shorter, and works and expressions for everyday use or cyberspeak are more.
S11:Text is divided into word according to participle instrument.
Multiple sentences are generally comprised in text, if being divided into each sentence in the text collected using participle instrument
Dry independent word.Participle instrument can include dictionary, when being segmented to text, can be used as ginseng using the word in dictionary
Examine.
S12:Word is screened according to corpus, and each word that calculating sifting goes out successively goes out in the word all filtered out
Existing frequency.
Word set in advance can be included in corpus, these words set in advance can be used as much-talked-about topic,
The word included in corpus is screened in all words obtained after the division of all sentences, for the word not included in corpus
It language, can delete, and count each word filtered out successively and have the number occurred altogether, have the number occurred altogether with each word
Divided by the total number of the word after screening, obtain the frequency that each word occurs.
So, for the word not having in corpus, much-talked-about topic can not be used as.
S13:Frequency is selected to be more than the word of setting value as much-talked-about topic.
Frequency is bigger, it is meant that the number that the word occurs is more, is more discussed by user.And choose set as needed
Definite value, the word that setting value can be more than using selecting frequency are used as much-talked-about topic.
Wherein, segmenting the dictionary of instrument includes the word of preset standard form.
Word using works and expressions for everyday use, cyberspeak and other nonstandard words of statement as default reference format, this
Sample, when text being divided into word according to participle instrument, works and expressions for everyday use or cyberspeak can be filtered out or statement is nonstandard
Word, and independent word is divided into, so as to avoid the occurrence of participle mistake.
Gather text corresponding to forum;Text is divided into word according to participle instrument;Word is screened according to corpus, and
The frequency that each word that calculating sifting goes out successively occurs in the word all filtered out;Frequency is selected to be more than the word of setting value
As much-talked-about topic;Wherein, segmenting the dictionary of instrument includes the word of preset standard form.It can be seen that segment the dictionary of instrument
Include the word of preset standard form, can using works and expressions for everyday use, cyberspeak as preset standard form word, to text
When being segmented, the words recognition of preset standard form can be come out and be divided into independent word, and to participle after
Word screened according to corpus, the word for cannot function as topic, no longer calculate frequency and as final focus
Topic, therefore, it is possible to more accurately identify much-talked-about topic.
On the basis of above-described embodiment, in order to more accurately be segmented to text, so as to more accurately
Much-talked-about topic is identified, after gathering text corresponding to forum, is further comprised:The text collected is pre-processed, gone forward side by side
Enter step S11.
Preferably, carry out pretreatment to the text collected to specifically include, obtain the wrong word in text and emoticon,
And text is modified, delete the stop words in text.
Because how lack of standardization the statement of user is, so, wrong word or emoticon are might have in forum's text, can be pre-
Wrong word storehouse and emoticon storehouse are first set, can also include revised correct word in wrong word storehouse, in emoticon storehouse
The word corresponding to the meaning of emoticon expression can also be included, identified by storehouse set in advance and detect the mistake in text
Malapropism and emoticon, and text is modified according to storehouse set in advance.For some stop words in text, Ke Yizhi
Connect and be deleted.
On the basis of above-described embodiment, in order to more accurately identify much-talked-about topic, preferably embodiment party
Formula, after text is divided into word according to participle instrument, further comprise:The word that participle mistake be present is merged, and
Into step S12.
, can be again depending in order to which whether the word examined point is wrong after text is divided into word according to participle instrument
Original text is divided into word by participle instrument, and the word with obtaining for the first time is compared, if there is a phrase twice
Word after participle is inconsistent, and these words after participle are merged, and as the word after participle, can also manage
Solve as using this phrase as the word after participle.
Certainly, on the basis of again, original text can also again be segmented according to participle instrument, i.e., original text is carried out
Third time segments, and the word obtained after third time is segmented is compared with the word obtained after preceding participle twice, if deposited
A phrase after each participle it is all inconsistent, the word after participle is merged, that is, directly using the phrase as
Word after participle.The present invention is not construed as limiting to the number of participle.
On the basis of above-described embodiment, in order to understand attitude and view of the people for much-talked-about topic, preferably
Embodiment, select frequency be more than setting value word as much-talked-about topic after, further comprise, according to sentiment dictionary analysis
Text including much-talked-about topic is with the Sentiment orientation of user corresponding to obtaining.
The emotion that sentiment dictionary includes can be roughly divided into positive, passive, neutral three major types, can be with for every one kind
Dictionary corresponding to foundation, analysis includes the text of much-talked-about topic, if occurring word in dictionary corresponding to certain a kind of emotion in text
Language, then the attitude of the user is designated as affective style corresponding to the dictionary.
On the basis of above-described embodiment, in order to more efficiently and accurately gather the text of forum, gather corresponding to forum
Text is specially:The URL link of webpage corresponding to forum is obtained by reptile iteration, webpage is obtained according to URL link, to webpage
The matching of regular expression is carried out to obtain required text.
The URL of webpage is obtained using crawler technology, then analyzing structure of web page, and of regular expression is carried out to webpage
Match somebody with somebody, so as to be captured text corresponding to forum and be saved in local.
The embodiment that method for distinguishing is known above for much-talked-about topic is described in detail, and is described based on above-described embodiment
Much-talked-about topic know method for distinguishing, the embodiment of the present invention provides the device that a kind of corresponding with this method much-talked-about topic identifies.By
It is mutually corresponding in the embodiment of device part and the embodiment of method part, therefore the embodiment of device part refer to method portion
The embodiment description divided, is no longer described in detail here.
Fig. 2 is a kind of structure chart of the device of much-talked-about topic identification provided in an embodiment of the present invention, as shown in Fig. 2 focus
The device of topic detection includes:
Collecting unit 20, for gathering text corresponding to forum.
Division unit 21, for text to be divided into word according to participle instrument.
Computing unit 22 is screened, for screening word according to corpus, and each word that calculating sifting goes out successively is in whole
The frequency occurred in the word filtered out.
Selecting unit 23, for selecting frequency to be more than the word of setting value as much-talked-about topic.
Wherein, segmenting the dictionary of instrument includes the word of preset standard form.
Text corresponding to collecting unit collection forum;Text is divided into word by division unit according to participle instrument;Screening
Computing unit screens word according to corpus, and calculating sifting goes out successively each word occurs in the word all filtered out
Frequency;Selecting unit selection frequency is more than the word of setting value as much-talked-about topic;Wherein, segmenting the dictionary of instrument is included in advance
It is marked with the word of quasiconfiguaration.It can be seen that the dictionary for segmenting instrument includes the word of preset standard form, can by works and expressions for everyday use,
Word of the cyberspeak as preset standard form, when being segmented to text, division unit can be by preset standard form
Words recognition comes out and is divided into independent word, and screening computing unit sieves to the word after participle according to corpus
Choosing, the word for cannot function as topic, frequency is no longer calculated and as final much-talked-about topic, therefore, it is possible to more accurate
Identify much-talked-about topic in ground.
On the basis of above-described embodiment, in order to more accurately be segmented to text, so as to more accurately
Much-talked-about topic is identified, the device of much-talked-about topic identification also includes:
Pretreatment unit, for being pre-processed to the text collected.
Preferably, pretreatment unit is specifically used for obtaining the wrong word in text and emoticon, and text is repaiied
Just, the stop words in text is deleted.
The embodiment that method for distinguishing is known above for much-talked-about topic is described in detail, and is described based on above-described embodiment
Much-talked-about topic know method for distinguishing, the embodiment of the present invention provides the device that a kind of corresponding with this method much-talked-about topic identifies.By
It is mutually corresponding in the embodiment of device part and the embodiment of method part, therefore the embodiment of device part refer to method portion
The embodiment description divided, is no longer described in detail here.
Fig. 3 is a kind of structure chart of the device of much-talked-about topic identification provided in an embodiment of the present invention, as shown in figure 3, focus
The device of topic detection includes:
Memory 30 and processor 31.
Memory 30, for storing computer program.
Processor 31, during for performing the computer program stored in memory 30, it is possible to achieve following steps:
Gather text corresponding to forum;
Text is divided into word according to participle instrument;
Word is screened according to corpus, and calculating sifting goes out successively each word occurs in the word all filtered out
Frequency;
Frequency is selected to be more than the word of setting value as much-talked-about topic;
Wherein, segmenting the dictionary of instrument includes the word of preset standard form.
In some embodiments of the invention, above-mentioned processor 31, can be also used for performing the computer in memory 30
Program realizes following steps:
The text collected is pre-processed, and enters the step of text is divided into word according to participle instrument.
In some embodiments of the invention, above-mentioned processor 31, can be also used for performing the computer in memory 30
Program realizes following steps:
The wrong word in text and emoticon are obtained, and text is modified;
Delete the stop words in text.
In some embodiments of the invention, above-mentioned processor 31, can be also used for performing the computer in memory 30
Program realizes following steps:
The word that participle mistake be present is merged, and enters the step of word is screened according to corpus.
In some embodiments of the invention, above-mentioned processor 31, can be also used for performing the computer in memory 30
Program realizes following steps:
Include the text of much-talked-about topic according to sentiment dictionary analysis to obtain the Sentiment orientation of corresponding user.
In some embodiments of the invention, above-mentioned processor 31, can be also used for performing the computer in memory 30
Program realizes following steps:
The URL link of webpage corresponding to forum is obtained by reptile iteration;
Webpage is obtained according to URL link;
The matching of regular expression is carried out to webpage to obtain required text.
The present embodiment provide much-talked-about topic identification device, processor in the computer program in performing memory,
Gather text corresponding to forum;Text is divided into word according to participle instrument;Word is screened according to corpus, and calculated successively
The frequency that each word filtered out occurs in the word all filtered out;Frequency is selected to be more than the word of setting value as focus
Topic;Wherein, segmenting the dictionary of instrument includes the word of preset standard form.It can be seen that the dictionary for segmenting instrument is included in advance
The word of quasiconfiguaration is marked with, text can be segmented using works and expressions for everyday use, cyberspeak as the word of preset standard form
When, the words recognition of preset standard form can be come out and be divided into independent word, and to the word after participle according to
Being screened according to corpus, the word for cannot function as topic, no longer calculating frequency and as final much-talked-about topic, because
This, can more accurately identify much-talked-about topic.
Present invention also offers a kind of computer-readable storage corresponding with the embodiment of the method for above-mentioned much-talked-about topic identification
Medium, because the embodiment of computer-readable recording medium part and the embodiment of method part are mutually corresponding, therefore computer
The embodiment of readable storage medium storing program for executing part refer to the embodiment description of method part, and in this not go into detail.
Computer program is stored with computer-readable recording medium, computer program is executed by processor as follows to realize
Step:
Gather text corresponding to forum.
Text is divided into word according to participle instrument.
Word is screened according to corpus, and calculating sifting goes out successively each word occurs in the word all filtered out
Frequency.
Frequency is selected to be more than the word of setting value as much-talked-about topic.
Wherein, segmenting the dictionary of instrument includes the word of preset standard form.
It should be noted that the computer-readable recording medium in the present invention can be the media such as USB flash disk or CD, specifically not
It is construed as limiting.
When computer program in computer-readable recording medium provided by the invention is executed by processor, forum pair is gathered
The text answered;Text is divided into word according to participle instrument;According to corpus screen word, and successively calculating sifting go out it is each
The frequency that word occurs in the word all filtered out;Frequency is selected to be more than the word of setting value as much-talked-about topic;Wherein,
The dictionary of participle instrument includes the word of preset standard form.It can be seen that the dictionary for segmenting instrument includes preset standard form
Word, can using works and expressions for everyday use, cyberspeak as preset standard form word, when being segmented to text, can will
The words recognition of preset standard form is come out and is divided into independent word, and the word after participle is entered according to corpus
Row screening, the word for cannot function as topic, frequency is no longer calculated and as final much-talked-about topic, therefore, it is possible to more
Much-talked-about topic is identified exactly.
The method, apparatus and computer-readable recording medium of much-talked-about topic provided by the present invention identification are carried out above
It is discussed in detail.Each embodiment is described by the way of progressive in specification, and each embodiment, which stresses, is and other realities
Apply the difference of example, between each embodiment identical similar portion mutually referring to.
It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention,
Some improvement and modification can also be carried out to the present invention, these are improved and modification also falls into the protection domain of the claims in the present invention
It is interior.
It should also be noted that, in this manual, such as first and second etc relational terms are used merely to one
Individual entity either operates to be made a distinction with another entity or operation, and is not necessarily required and either implied these entities or behaviour
Any this actual relation or order between work be present.Moreover, term " comprising ", "comprising" or its any variant are intended to
Cover including for nonexcludability, so that process, method, article or equipment including a series of key element not only include that
A little key elements, but also other key elements including being not expressly set out, either also include for this process, method, article or set
Standby intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Other identical element in the process including the key element, method, article or equipment also be present.
Claims (10)
1. a kind of much-talked-about topic knows method for distinguishing, it is characterised in that including:
Gather text corresponding to forum;
The text is divided into word according to participle instrument;
The word is screened according to corpus, and each word that calculating sifting goes out successively is in the word all filtered out
The frequency of middle appearance;
Frequency is selected to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
2. according to the method for claim 1, it is characterised in that after text corresponding to the collection forum, further comprise:
The text collected is pre-processed, and enters the step that the text is divided into word according to participle instrument
Suddenly.
3. according to the method for claim 2, it is characterised in that the described pair of text collected pre-process specifically
Including:
The wrong word in the text and emoticon are obtained, and the text is modified;
Delete the stop words in the text.
4. according to the method for claim 1, it is characterised in that described that the text is divided into word according to participle instrument
Afterwards, further comprise:
The word that participle mistake be present is merged, and enters the described the step of word is screened according to corpus.
5. according to the method for claim 1, it is characterised in that the selection frequency is more than the word conduct of setting value
After much-talked-about topic, further comprise:
Include the text of the much-talked-about topic according to sentiment dictionary analysis to obtain the Sentiment orientation of corresponding user.
6. according to the method for claim 1, it is characterised in that it is described collection forum corresponding to text be specially:
The URL link of webpage corresponding to forum is obtained by reptile iteration;
Webpage is obtained according to the URL link;
The matching of regular expression is carried out to the webpage to obtain required text.
A kind of 7. device of much-talked-about topic identification, it is characterised in that including:
Collecting unit, for gathering text corresponding to forum;
Division unit, for the text to be divided into word according to participle instrument;
Computing unit is screened, for screening the word according to corpus, and each word that calculating sifting goes out successively is complete
The frequency occurred in the word that portion filters out;
Selecting unit, for selecting frequency to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
8. device according to claim 7, it is characterised in that also include:
Pretreatment unit, for being pre-processed to the text collected.
9. a kind of device of much-talked-about topic identification, it is characterised in that including processor, the processor is used to perform in memory
The step of much-talked-about topic knows method for distinguishing as described in any one of claim 1 to 6 is realized during the program of storage.
10. a kind of computer-readable recording medium, it is characterised in that be stored with computer on the computer-readable recording medium
Program, the computer program are executed by processor to realize following steps:
Gather text corresponding to forum;
The text is divided into word according to participle instrument;
The word is screened according to corpus, and each word that calculating sifting goes out successively is in the word all filtered out
The frequency of middle appearance;
Frequency is selected to be more than the word of setting value as much-talked-about topic;
Wherein, the dictionary of the participle instrument includes the word of preset standard form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711092187.2A CN107783961A (en) | 2017-11-08 | 2017-11-08 | A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711092187.2A CN107783961A (en) | 2017-11-08 | 2017-11-08 | A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107783961A true CN107783961A (en) | 2018-03-09 |
Family
ID=61433147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711092187.2A Pending CN107783961A (en) | 2017-11-08 | 2017-11-08 | A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107783961A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299248A (en) * | 2018-12-12 | 2019-02-01 | 成都航天科工大数据研究院有限公司 | A kind of business intelligence collection method based on natural language processing |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719122A (en) * | 2009-12-04 | 2010-06-02 | 中国人民解放军信息工程大学 | Method for extracting Chinese named entity from text data |
CN101980199A (en) * | 2010-10-28 | 2011-02-23 | 北京交通大学 | Method and system for discovering network hot topic based on situation assessment |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
CN105183765A (en) * | 2015-07-30 | 2015-12-23 | 成都鼎智汇科技有限公司 | Big data-based topic extraction method |
JP2016040660A (en) * | 2014-08-12 | 2016-03-24 | 日本電信電話株式会社 | Content recommendation device, content recommendation method, and content recommendation program |
CN105574092A (en) * | 2015-12-10 | 2016-05-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
-
2017
- 2017-11-08 CN CN201711092187.2A patent/CN107783961A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719122A (en) * | 2009-12-04 | 2010-06-02 | 中国人民解放军信息工程大学 | Method for extracting Chinese named entity from text data |
CN101980199A (en) * | 2010-10-28 | 2011-02-23 | 北京交通大学 | Method and system for discovering network hot topic based on situation assessment |
JP2016040660A (en) * | 2014-08-12 | 2016-03-24 | 日本電信電話株式会社 | Content recommendation device, content recommendation method, and content recommendation program |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
CN105183765A (en) * | 2015-07-30 | 2015-12-23 | 成都鼎智汇科技有限公司 | Big data-based topic extraction method |
CN105574092A (en) * | 2015-12-10 | 2016-05-11 | 百度在线网络技术(北京)有限公司 | Information mining method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299248A (en) * | 2018-12-12 | 2019-02-01 | 成都航天科工大数据研究院有限公司 | A kind of business intelligence collection method based on natural language processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101536520B1 (en) | Method and server for extracting topic and evaluating compatibility of the extracted topic | |
WO2021073116A1 (en) | Method and apparatus for generating legal document, device and storage medium | |
CN105893478B (en) | A kind of tag extraction method and apparatus | |
CN109543084A (en) | A method of establishing the detection model of the hidden sensitive text of network-oriented social media | |
CN106502989A (en) | Sentiment analysis method and device | |
US9665561B2 (en) | System and method for performing analysis on information, such as social media | |
CN107943909A (en) | User demand trend method for digging and device, storage medium based on comment data | |
CN106649334B (en) | Processing method and device of associated word set | |
DE102018007165A1 (en) | FORECASTING STYLES WITHIN A TEXT CONTENT | |
CN105912629A (en) | Intelligent question and answer method and device | |
CN108345686A (en) | A kind of data analysing method and system based on search engine technique | |
CN104317784A (en) | Cross-platform user identification method and cross-platform user identification system | |
CN104809252B (en) | Internet data extraction system | |
CN105912645A (en) | Intelligent question and answer method and apparatus | |
KR102296931B1 (en) | Real-time keyword extraction method and device in text streaming environment | |
CN104834739B (en) | Internet information storage system | |
CN110880142B (en) | Risk entity acquisition method and device | |
CN109947934A (en) | For the data digging method and system of short text | |
CN108363784A (en) | A kind of public sentiment trend estimate method based on text machine learning | |
CN104391852B (en) | A kind of method and apparatus for establishing keyword dictionary | |
CN106202034A (en) | A kind of adjective word sense disambiguation method based on interdependent constraint and knowledge and device | |
CN104239285A (en) | New article chapter detecting method and device | |
CN117520522B (en) | Intelligent dialogue method and device based on combination of RPA and AI and electronic equipment | |
CN107783961A (en) | A kind of method, apparatus and readable storage medium storing program for executing of much-talked-about topic identification | |
KR101727686B1 (en) | Method for extracting semantic entity topic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180309 |