CN109902230A - A kind of processing method and processing device of news data - Google Patents
A kind of processing method and processing device of news data Download PDFInfo
- Publication number
- CN109902230A CN109902230A CN201910112919.2A CN201910112919A CN109902230A CN 109902230 A CN109902230 A CN 109902230A CN 201910112919 A CN201910112919 A CN 201910112919A CN 109902230 A CN109902230 A CN 109902230A
- Authority
- CN
- China
- Prior art keywords
- viewpoint
- information
- holder
- news
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of processing method and processing devices of news data, comprising: the news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;The viewpoint information is stored into viewpoint database, wherein record has the different degree of the holder in holder's database;Establish the incidence relation of the news data Yu the viewpoint information;News data is retrieved according to the selected topic of configuration, is determined and the associated viewpoint information of the news data in the viewpoint database based on the news data retrieved.
Description
Technical field
This application involves the information processing technology more particularly to a kind of processing method and processing devices of news data.
Background technique
Along with the development of internet, the speed of information flow is promoted rapidly, various news portal websites, new from media etc.
It hears publisher and its news delivered is also increasing, the public sentiment of focus incident and emphasis policy guidance is from starting to breaking out again
Period to end is shorter and shorter.How from a large amount of news report, timely and accurately holding public sentiment momentum of development becomes pass
The hot spot of note.
Traditional method combs event train of thought by manual read's event relevant report, each side's viewpoint is understood, to original element
Material is arranged and is edited, and carries out situation description and analysis to focus incident.Although such event situation description and analysis are protected
Accuracy has been demonstrate,proved, but has been constrained to artificial treatment efficiency, has been difficult to meet timeliness and comprehensive.Along with modern artificial intelligence and
The development and rise of natural language processing technique, a large amount of the analysis of public opinion systems are developed, and artificial, such system energy is replaced by machine
Enough quickly processing mass texts, carry out analysis and arrangement to text, extract valuable information.
It, can not only be much of that by assayer and mechanism for certain public sentiment events or the viewpoint delivered emphasis policy
Solve the attitude and position of each side, additionally it is possible to comprehensively understand the public sentiment developing state of focus incident and emphasis policy guidance, hold
With timeliness, perspective, guiding performance, representative information, help is provided for policymaker.Therefore, extract and assayer and
The viewpoint of mechanism has great significance.
For current system to the processing of expert and mechanism viewpoint there are many more deficiency, the information of extraction is both inaccurate or incomplete
Face lacks the function of automatic Evaluation expert and mechanism different degree, it is also difficult to carry out backtracking and association analysis to viewpoint.
Summary of the invention
In order to solve the above technical problems, the embodiment of the present application provides a kind of processing method and processing device of news data.
The processing method of news data provided by the embodiments of the present application, comprising:
The news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;
Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;By the viewpoint
Information is stored into viewpoint database, wherein record has the different degree of the holder in holder's database;
Establish the incidence relation of the news data Yu the viewpoint information;
News data is retrieved according to the selected topic of configuration, is determined in the viewpoint database based on the news data retrieved
With the associated viewpoint information of the news data.
The processing unit of news data provided by the embodiments of the present application, comprising:
Abstraction module, for obtaining the news data of streaming, from the news data extract viewpoint holder information and
Viewpoint information;
Alignment module for carrying out entity registration process to the viewpoint holder information, and is stored to holder's data
Library;The viewpoint information is stored into viewpoint database, wherein record has the holder's in holder's database
Different degree;
Different degree computing module, for calculating and updating the different degree of the holder;
Analysis module, for establishing the incidence relation of the news data Yu the viewpoint information;According to the selected topic of configuration
News data is retrieved, the determining and associated sight of the news data in the viewpoint database based on the news data retrieved
Point information.
Using the above-mentioned technical proposal of the embodiment of the present application, 1) natural language processing the relevant technologies are applied, it can be from text
In accurately extract the elements such as expert, mechanism, position and viewpoint, treatment effeciency is high, and as a result accuracy rate is high, and recall ratio is high.2)
Expert and mechanism database are established, the different degree evaluation mechanism of expert, position and mechanism can be established according to news report quantity, it should
Different degree evaluation mechanism is rationally effective.3) can to the selected topic under expert, mechanism viewpoint recall, using clustering algorithm into
Row association analysis, the keyword for extracting viewpoint in clustering cluster, the different degree according to holder are ranked up.Cluster and keyword mention
It takes effect good, introduces importance sorting and embody the importance of viewpoint.4) entity alignment and denoising have been carried out to expert and mechanism,
Reduce systematic error.5) system expandability is strong, and triggering dictionary, entity alignment table can update upgrading.
Detailed description of the invention
Fig. 1 is the flow diagram of the processing method of news data provided by the embodiments of the present application;
Fig. 2 is principle framework figure provided by the embodiments of the present application;
Fig. 3 is the flow diagram of joint abstracting method provided by the embodiments of the present application;
Fig. 4 is the process flow diagram of single news of joint abstracting method provided by the embodiments of the present application;
Fig. 5 is the structure composition schematic diagram of the processing unit of news data provided by the embodiments of the present application.
Specific embodiment
The various exemplary embodiments of the application are described in detail now with reference to attached drawing.It should also be noted that unless in addition having
Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally
The range of application.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality
Proportionate relationship draw.
Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the application
And its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered as part of specification.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
The embodiment of the present application can be applied to the electronic equipments such as computer system/server, can with it is numerous other general
Or special-purpose computing system environment or configuration operate together.Suitable for what is be used together with electronic equipments such as computer system/servers
Well-known computing system, environment and/or the example of configuration include but is not limited to: personal computer system, server calculate
Machine system, thin client, thick client computer, hand-held or laptop devices, microprocessor-based system, set-top box, programmable-consumer
Electronic product, NetPC Network PC, minicomputer system, large computer system and the distribution including above-mentioned any system
Cloud computing technology environment, etc..
The electronic equipments such as computer system/server can be in the executable finger of the computer system executed by computer system
It enables and being described under the general context of (such as program module).In general, program module may include routine, program, target program, group
Part, logic, data structure etc., they execute specific task or realize specific abstract data type.Computer system/
Server can be implemented in distributed cloud computing environment, and in distributed cloud computing environment, task is by by communication network chain
What the remote processing devices connect executed.In distributed cloud computing environment, it includes the sheet for storing equipment that program module, which can be located at,
On ground or remote computing system storage medium.
Current the analysis of public opinion system is difficult to the sight for accurately extracting the expert for including in text and mechanism holds first
Point, next is difficult to assessment experts and mechanism importance, in addition to this, be also short of the backtracking to the viewpoint of expert mechanism be associated with point
Analyse function.For this purpose, the application is directed to the characteristics of Chinese text grammer, a kind of expert based on viewpoint trigger word and mechanism are proposed
And its joint abstracting method of viewpoint;Experts database and mechanism database, the weight of dynamic evaluation expert and mechanism are constructed on this basis
It spends, expert and institutional bodies is aligned and is denoised;News index is established, viewpoint library is constructed, is mentioned for viewpoint backtracking
Basis is supplied;Establish the analysis system of expert Yu mechanism viewpoint.
Fig. 1 is the flow diagram of the processing method of news data provided by the embodiments of the present application, as shown in Figure 1, described
The processing method of news data the following steps are included:
Step 101: obtaining the news data of streaming, viewpoint holder information and viewpoint letter are extracted from the news data
Breath.
In the embodiment of the present application, the news data of streaming refers to the news data got in a streaming manner in real time.This
The technical solution of application embodiment is handled real-time news data by way of streaming, here, the place of news data
Reason is included at least extracts viewpoint holder information and viewpoint information from the news data.
In the embodiment of the present application, the viewpoint holder information extracted from the news data include it is following at least it
One: organization information, job information, expert info.
Optionally, the viewpoint holder information includes organization information and expert info.
Optionally, the viewpoint holder information includes organization information, job information and expert info.
Step 102: entity registration process being carried out to the viewpoint holder information, and is stored to holder's database;It will
The viewpoint information is stored into viewpoint database, wherein record has that the holder's is important in holder's database
Degree.
In the embodiment of the present application, holder's database includes at least one of: organization data library, job data
Library, expert database.
Wherein, organization data library is used for storing mechanism information, and job database is for storing job information, expert database
For storing expert info.
In the embodiment of the present application, hold to storing after viewpoint holder information progress entity registration process to corresponding
Person's database.
In the embodiment of the present application, viewpoint holder information and viewpoint information are extracted from the news data, can be passed through
Following steps are realized:
1) news data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
2) subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
3) if there is untreated sentence, then at least one of is extracted from the sentence: organization information, position letter
Breath, expert info, viewpoint information;
4) judge extracted expert info whether without corresponding job information;When being, mended according to the position dictionary
The corresponding job information of the expert info entirely;When no, the tenure dictionary is updated;
5) the corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in corresponding be used for
In the array for saving result.
Further, in above scheme, at least one of is extracted from the sentence: organization information, job information, expert
Information, viewpoint information can be realized by following steps:
3.1) sentence to be processed is inputted;
3.2) sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
3.3) whether load viewpoint triggers dictionary, judge in the sentence to include that viewpoint is touched based on viewpoint triggering dictionary
Send out word;
If 3.4) search the sight according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence
Point trigger word subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as
Target subject;
3.5) cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: mechanism
Information, job information, expert info;And extract the viewpoint touching from the sentence with syntactic structure according to regular expressions
Send out the subsequent viewpoint information of word.
In the embodiment of the present application, the viewpoint information is stored into viewpoint database, wherein in holder's database
Record has the different degree of the holder.
In above scheme, the different degree of the holder calculates in the following manner:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, by the holder
The news quantity of table hair adds 1, and the different degree of the holder is added 1;
After presently described measurement period, according to following formula update the holder different degree and the holder
The news quantity of table hair:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair
News quantity.
Step 103: establishing the incidence relation of the news data Yu the viewpoint information;It is new according to the retrieval of the selected topic of configuration
Data are heard, are believed based on the news data retrieved is determining in the viewpoint database with the associated viewpoint of the news data
Breath.
Further, the technical solution of the embodiment of the present application is further comprising the steps of:
Step 104: clustering processing being carried out to a plurality of news data retrieved, obtains multiple sub-topics;By same sub- words
The viewpoint information of topic is classified as cluster, obtains more cluster viewpoint information set.
Step 105: the keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and according to the pass
The different degree of the corresponding holder of keyword is ranked up the keyword of each cluster.
It is illustrated below in conjunction with technical solution of the principle framework shown in Fig. 2 to the embodiment of the present application, the principle
It needs to realize by following basic module:
Abstraction module: have expert based on viewpoint trigger word and mechanism and its viewpoint combines extract function.
Different degree computing module: calculating and dynamic more new function with expert, position and mechanism different degree.
Alignment module: being aligned with expert and institutional bodies and denoising function.
Analysis module: backtracking and association analysis function with viewpoint.
The place for the news data that the embodiment of the present application proposes is utilized for the news data of streaming in internet referring to Fig. 2
Reason method extracts the elements such as expert, mechanism, position and the viewpoint in news data, and after entity registration process, deposit is special
Family's database, organization data library and viewpoint database, record has expert, mechanism and duty in expert database and organization data library
The different degree of position, the different degree are regularly updated according to news report quantity.Meanwhile the embodiment of the present application saves news data, and
Establish the incidence relation between viewpoint library.According to the selected topic (keyword) and time window of user configuration, news and new is retrieved
The viewpoint that news includes realizes viewpoint backtracking, then to news cluster, obtains sub-topic, the viewpoint under same sub-topic is classified as
One cluster extracts representative keyword using keyword extraction algorithm, finally according to the different degree pair of viewpoint holder
Viewpoint is ranked up, and completes the analysis to expert and mechanism viewpoint.The following specifically describes the technical solutions of the embodiment of the present application.
(1) expert based on viewpoint trigger word and mechanism and its viewpoint combine extraction
Chinese text form of presentation is many and diverse and abundant, expresses a people and has said there are many kinds of what modes, Ke Yifen
For the syntactic structure that explicit expression and implied expression, implied expression are not fixed, mainly based on context semanteme is inferred to, and
It is usually explicit expression in news, that is, is divided into direct speech oration and indirect speech, both expression ways have fixed grammer knot
Structure expresses the verb of viewpoint, referred to as viewpoint trigger word comprising one, can be same according to Chinese syntactic structure for viewpoint trigger word
When extract expert, mechanism and its viewpoint held and the tenure information of expert.What the application proposed, based on viewpoint trigger word
Expert and mechanism and its viewpoint to combine abstracting method as shown in Figure 3.The process that single news is extracted using this method
As shown in Figure 4.
(2) calculating of the different degree of expert, position and mechanism and dynamic update
Since different experts and mechanism have greatly different gap in terms of social effectiveness, expert and mechanism are established
Different degree evaluates mechanism for viewpoint analysis important in inhibiting.In addition, the different degree of expert, position and mechanism can be used into
The alignment of row entity and denoising.The calculation basis of different degree in the application of the present invention is news report quantity, this is according to both simple
Single also reasonable, the more expert of news report or mechanism, influence power is bigger, therefore different degree is higher.But, it is contemplated that Zhuan Jiahe
The influence power of mechanism can constantly change at any time, and the simple news report number that counts is insensitive to the dynamic change of influence power, this
Present applicant proposes a kind of importance calculation methods for invention.This method process flow is as follows:
1, different degree (importance), (the application of the present invention is set as one month) Xin Wen Bao in measurement period are initialized
Road quantity (news_count) is 0.
2, in measurement period, every processing one includes the news of the objects of statistics (expert, position, mechanism), will be corresponded to
News report quantity add one, different degree adds one.
3, after current statistic end cycle, different degree and news report quantity update as follows:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair
News quantity.
(3) expert and institutional bodies being aligned and denoising
Entity alignment mainly solves the problems, such as three following:
1, multiple expert's entity titles having the same
2, the same entity (expert, mechanism) has multiple titles.
3, same expert's entity has multiple positions.
For first problem, since expert has the limitation of professional domain, same professional domain has two names identical
The probability of expert is minimum, therefore the application is it is assumed that there is no the identical situations of name of multiple experts under the same selected topic.
For Second Problem, system constructs an entity alignment table, before entity is stored in database, according to entity pair
Neat table, is replaced entity name, is unified for the most common appellation, this table supports to update and replacement, content pass through Baidu
The existing knowledges such as encyclopaedia obtain.
For third problem, system obtains the collection for the position that same expert occurs in same all news of the selected topic first
It closes, chooses position of the maximum position of different degree as expert.If all without the expert in all news under the same selected topic
Position, then position of the most important position of expert of the same name as the expert is inquired from experts database.
Because some news can interview ruck, some cells and groupuscule, these people or tissue and group are not answered
This is considered as expert and mechanism, it is therefore desirable to be denoised to experts database and mechanism database, the application is regularly updating different degree
While, the too low expert of different degree and institutional bodies are deleted from database, to achieve the purpose that denoising.
(4) backtracking and association analysis of viewpoint
The viewpoint of one selected topic news within a certain period of time is analyzed, needs to extract expert included in news
With mechanism and its viewpoint held, however, news quantity may be bigger, processing spends the time long in real time.At the application streaming
Every news is managed, viewpoint is saved in database, and established and be associated with news, can retrieved newly by the selected topic in this way
It hears, to recall viewpoint.
It may include several sub-topics under the same selected topic, the application first clusters the news under the selected topic, obtains
It is divided to sub-topic.The clustering algorithm of newsletter archive is as follows:
1, the tfidf value of every news is calculated
2, maximum 30 words of tfidf value in news are extracted
3, the average value of the term vector of this 30 words is taken to indicate that term vector is pre- using news corpus as the vector of news
Training obtains
4, it is clustered using kmeans, the aggregation extent according to cluster determines cluster number.
Viewpoint is divided into different sets according to sub-topic, the keyword of viewpoint set is extracted using textrank algorithm.
It is finally obtained finally analyzing result according to the importance sorting of viewpoint holder (expert or mechanism).
Fig. 5 is the structure composition schematic diagram of the processing unit of news data provided by the embodiments of the present application, as shown in figure 5,
Described device includes:
Abstraction module 501 extracts viewpoint holder information for obtaining the news data of streaming from the news data
And viewpoint information;
Alignment module 502 for carrying out entity registration process to the viewpoint holder information, and is stored to holder's number
According to library;The viewpoint information is stored into viewpoint database, wherein record has the holder in holder's database
Different degree;
Different degree computing module 503, for calculating and updating the different degree of the holder;
Analysis module 504, for establishing the incidence relation of the news data Yu the viewpoint information;According to the choosing of configuration
Topic retrieval news data, based on the news data retrieved, determination is associated with the news data in the viewpoint database
Viewpoint information.
In one embodiment, the analysis module 504 is also used to carry out at cluster a plurality of news data retrieved
Reason, obtains multiple sub-topics;The viewpoint information of same sub-topic is classified as cluster, obtains more cluster viewpoint information set;From every
The keyword of the cluster viewpoint information set is extracted in cluster viewpoint information set, and according to the weight of the corresponding holder of the keyword
It spends and the keyword of each cluster is ranked up.
In one embodiment, the viewpoint holder information includes at least one of: organization information, job information,
Expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
In one embodiment, the abstraction module 501, is used for:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then extract at least one of from the sentence: organization information, job information,
Expert info, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion
The corresponding job information of the expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to protect
It deposits in the array of result.
In one embodiment, the abstraction module 501, is used for:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, judge in the sentence whether to include that viewpoint triggers based on viewpoint triggering dictionary
Word;
If in the sentence including viewpoint trigger word, according to the syntactic structure of the sentence, the viewpoint touching is searched
The subject of word and the modifier of the subject are sent out, using the modifier of the subject of the viewpoint trigger word and the subject as target
Subject;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information,
Job information, expert info;And extract the viewpoint trigger word from the sentence with syntactic structure according to regular expressions
Subsequent viewpoint information.
In one embodiment, the different degree computing module 503, for calculating the holder's in the following manner
Different degree:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, by the holder
The news quantity of table hair adds 1, and the different degree of the holder is added 1;
After presently described measurement period, according to following formula update the holder different degree and the holder
The news quantity of table hair:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair
News quantity.
It will be appreciated by those skilled in the art that the realization function of each module in the processing unit of news data shown in fig. 5
The associated description of the processing method of aforementioned news data can be can refer to and understood.In the processing unit of news data shown in fig. 5
The function of each module can be realized and running on the program on processor, can also be realized by specific logic circuit.
Description of the invention is given for the purpose of illustration and description, and is not exhaustively or will be of the invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches
It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage
The solution present invention is to design various embodiments suitable for specific applications with various modifications.
Claims (12)
1. a kind of processing method of news data, which is characterized in that the described method includes:
The news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;
Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;By the viewpoint information
It stores into viewpoint database, wherein record has the different degree of the holder in holder's database;
Establish the incidence relation of the news data Yu the viewpoint information;
Retrieve news data according to the selected topic of configuration, based on the news data retrieved in the viewpoint database determining and institute
State the associated viewpoint information of news data.
2. the method according to claim 1, wherein the method also includes:
Clustering processing is carried out to a plurality of news data retrieved, obtains multiple sub-topics;
The viewpoint information of same sub-topic is classified as cluster, obtains more cluster viewpoint information set;
The keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and is held according to the keyword is corresponding
The different degree for the person of having is ranked up the keyword of each cluster.
3. the method according to claim 1, wherein the viewpoint holder information includes at least one of:
Organization information, job information, expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
4. according to the method described in claim 3, it is characterized in that, described extract viewpoint holder letter from the news data
Breath and viewpoint information, comprising:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then at least one of is extracted from the sentence: organization information, job information, expert
Information, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion
The corresponding job information of expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to save knot
In the array of fruit.
5. according to the method described in claim 4, it is characterized in that, described extract at least one of from the sentence: machine
Structure information, job information, expert info, viewpoint information, comprising:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, whether judged in the sentence based on viewpoint triggering dictionary comprising viewpoint trigger word;
If searching the viewpoint trigger word according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence
Subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as target master
Language;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information, position
Information, expert info;And according to regular expressions with syntactic structure from being extracted in the sentence behind the viewpoint trigger word
Viewpoint information.
6. the method according to claim 1, wherein the different degree of the holder calculates in the following manner:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, and holder's table is sent out
News quantity add 1, and the different degree of the holder is added 1;
After presently described measurement period, the different degree of the holder is updated according to following formula and holder's table is sent out
News quantity:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents the news of holder's table hair
Quantity.
7. a kind of processing unit of news data, which is characterized in that described device includes:
Abstraction module extracts viewpoint holder information and viewpoint for obtaining the news data of streaming from the news data
Information;
Alignment module for carrying out entity registration process to the viewpoint holder information, and is stored to holder's database;It will
The viewpoint information is stored into viewpoint database, wherein record has that the holder's is important in holder's database
Degree;
Different degree computing module, for calculating and updating the different degree of the holder;
Analysis module, for establishing the incidence relation of the news data Yu the viewpoint information;It is retrieved according to the selected topic of configuration
News data is believed based on the news data retrieved is determining in the viewpoint database with the associated viewpoint of the news data
Breath.
8. device according to claim 7, which is characterized in that the analysis module is also used to a plurality of new to what is retrieved
It hears data and carries out clustering processing, obtain multiple sub-topics;The viewpoint information of same sub-topic is classified as cluster, more clusters is obtained and sees
Point information aggregate;The keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and according to the keyword
The different degree of corresponding holder is ranked up the keyword of each cluster.
9. device according to claim 7, which is characterized in that the viewpoint holder information includes at least one of:
Organization information, job information, expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
10. device according to claim 9, which is characterized in that the abstraction module is used for:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then at least one of is extracted from the sentence: organization information, job information, expert
Information, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion
The corresponding job information of expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to save knot
In the array of fruit.
11. device according to claim 10, which is characterized in that the abstraction module is used for:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, whether judged in the sentence based on viewpoint triggering dictionary comprising viewpoint trigger word;
If searching the viewpoint trigger word according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence
Subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as target master
Language;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information, position
Information, expert info;And according to regular expressions with syntactic structure from being extracted in the sentence behind the viewpoint trigger word
Viewpoint information.
12. device according to claim 7, which is characterized in that the different degree computing module, in the following manner
Calculate the different degree of the holder:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, and holder's table is sent out
News quantity add 1, and the different degree of the holder is added 1;
After presently described measurement period, the different degree of the holder is updated according to following formula and holder's table is sent out
News quantity:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents the news of holder's table hair
Quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910112919.2A CN109902230A (en) | 2019-02-13 | 2019-02-13 | A kind of processing method and processing device of news data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910112919.2A CN109902230A (en) | 2019-02-13 | 2019-02-13 | A kind of processing method and processing device of news data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109902230A true CN109902230A (en) | 2019-06-18 |
Family
ID=66944852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910112919.2A Pending CN109902230A (en) | 2019-02-13 | 2019-02-13 | A kind of processing method and processing device of news data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109902230A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139116A (en) * | 2020-01-19 | 2021-07-20 | 北京中科闻歌科技股份有限公司 | Method, device, equipment and storage medium for extracting media information viewpoints based on BERT |
CN117540747A (en) * | 2024-01-09 | 2024-02-09 | 《全国新书目》杂志有限责任公司 | Book publishing intelligent question selecting system based on artificial intelligence |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050125216A1 (en) * | 2003-12-05 | 2005-06-09 | Chitrapura Krishna P. | Extracting and grouping opinions from text documents |
CN102831192A (en) * | 2012-08-03 | 2012-12-19 | 人民搜索网络股份公司 | News searching device and method based on topics |
CN103116644A (en) * | 2013-02-26 | 2013-05-22 | 华南理工大学 | Method for mining orientation of Web themes and supporting decisions |
US20140089323A1 (en) * | 2012-09-21 | 2014-03-27 | Appinions Inc. | System and method for generating influencer scores |
CN104715014A (en) * | 2015-01-26 | 2015-06-17 | 中山大学 | Online news topic detection method |
CN108776652A (en) * | 2018-05-21 | 2018-11-09 | 众安信息技术服务有限公司 | A kind of forecast for market tendency method based on news corpus |
CN108984521A (en) * | 2018-06-20 | 2018-12-11 | 国家计算机网络与信息安全管理中心 | Personage's viewpoint abstracting method in a kind of media event |
-
2019
- 2019-02-13 CN CN201910112919.2A patent/CN109902230A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050125216A1 (en) * | 2003-12-05 | 2005-06-09 | Chitrapura Krishna P. | Extracting and grouping opinions from text documents |
CN102831192A (en) * | 2012-08-03 | 2012-12-19 | 人民搜索网络股份公司 | News searching device and method based on topics |
US20140089323A1 (en) * | 2012-09-21 | 2014-03-27 | Appinions Inc. | System and method for generating influencer scores |
CN103116644A (en) * | 2013-02-26 | 2013-05-22 | 华南理工大学 | Method for mining orientation of Web themes and supporting decisions |
CN104715014A (en) * | 2015-01-26 | 2015-06-17 | 中山大学 | Online news topic detection method |
CN108776652A (en) * | 2018-05-21 | 2018-11-09 | 众安信息技术服务有限公司 | A kind of forecast for market tendency method based on news corpus |
CN108984521A (en) * | 2018-06-20 | 2018-12-11 | 国家计算机网络与信息安全管理中心 | Personage's viewpoint abstracting method in a kind of media event |
Non-Patent Citations (1)
Title |
---|
SOO-MIN KIM ET.AL.: ""Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Medias Text"", 《TEXT:PROCEEDINGS OF THE ACL WORKSHOP ON SENTIMENT AND SUBJECTIVITY》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139116A (en) * | 2020-01-19 | 2021-07-20 | 北京中科闻歌科技股份有限公司 | Method, device, equipment and storage medium for extracting media information viewpoints based on BERT |
CN113139116B (en) * | 2020-01-19 | 2024-03-01 | 北京中科闻歌科技股份有限公司 | BERT-based media information viewpoint extraction method, device, equipment and storage medium |
CN117540747A (en) * | 2024-01-09 | 2024-02-09 | 《全国新书目》杂志有限责任公司 | Book publishing intelligent question selecting system based on artificial intelligence |
CN117540747B (en) * | 2024-01-09 | 2024-04-16 | 《全国新书目》杂志有限责任公司 | Book publishing intelligent question selecting system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bharti et al. | Sarcastic sentiment detection in tweets streamed in real time: a big data approach | |
CN104182389B (en) | A kind of big data analyzing business intelligence service system based on semanteme | |
Chinsha et al. | A syntactic approach for aspect based opinion mining | |
Venugopalan et al. | Exploring sentiment analysis on twitter data | |
Ahmed et al. | Effective sentimental analysis and opinion mining of web reviews using rule based classifiers | |
CN105677844A (en) | Mobile advertisement big data directional pushing and user cross-screen recognition method | |
CN110232149A (en) | A kind of focus incident detection method and system | |
Hasan et al. | TwitterNews+: a framework for real time event detection from the Twitter data stream | |
CN107967290A (en) | A kind of knowledge mapping network establishing method and system, medium based on magnanimity scientific research data | |
CN103049435A (en) | Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device | |
CN104978332B (en) | User-generated content label data generation method, device and correlation technique and device | |
US9773166B1 (en) | Identifying longform articles | |
CN104965823A (en) | Big data based opinion extraction method | |
Zhu et al. | Real-time personalized twitter search based on semantic expansion and quality model | |
Zhou et al. | Learning to suggest questions in social media | |
Das et al. | Sentiment analysis: what is the end user's requirement? | |
CN109902230A (en) | A kind of processing method and processing device of news data | |
Chen et al. | A history and theory of textual event detection and recognition | |
Sharma | Study of sentiment analysis using hadoop | |
Jain et al. | FLAKE: fuzzy graph centrality-based automatic keyword extraction | |
Zhao et al. | A system to manage and mine microblogging data | |
Thakkar | Twitter sentiment analysis using hybrid naive Bayes | |
Abuteir et al. | Automatic sarcasm detection in Arabic text: A supervised classification approach | |
Kannan et al. | Text document clustering using statistical integrated graph based sentence sensitivity ranking algorithm | |
Jayasekara et al. | Trend detection in sinhala tweets using clustering and ranking algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190618 |