CN109902230A - A kind of processing method and processing device of news data - Google Patents

A kind of processing method and processing device of news data Download PDF

Info

Publication number
CN109902230A
CN109902230A CN201910112919.2A CN201910112919A CN109902230A CN 109902230 A CN109902230 A CN 109902230A CN 201910112919 A CN201910112919 A CN 201910112919A CN 109902230 A CN109902230 A CN 109902230A
Authority
CN
China
Prior art keywords
viewpoint
information
holder
news
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910112919.2A
Other languages
Chinese (zh)
Inventor
李建欣
闫昊
唐彬
包梦蛟
彭浩
邰振赢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201910112919.2A priority Critical patent/CN109902230A/en
Publication of CN109902230A publication Critical patent/CN109902230A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of processing method and processing devices of news data, comprising: the news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;The viewpoint information is stored into viewpoint database, wherein record has the different degree of the holder in holder's database;Establish the incidence relation of the news data Yu the viewpoint information;News data is retrieved according to the selected topic of configuration, is determined and the associated viewpoint information of the news data in the viewpoint database based on the news data retrieved.

Description

A kind of processing method and processing device of news data
Technical field
This application involves the information processing technology more particularly to a kind of processing method and processing devices of news data.
Background technique
Along with the development of internet, the speed of information flow is promoted rapidly, various news portal websites, new from media etc. It hears publisher and its news delivered is also increasing, the public sentiment of focus incident and emphasis policy guidance is from starting to breaking out again Period to end is shorter and shorter.How from a large amount of news report, timely and accurately holding public sentiment momentum of development becomes pass The hot spot of note.
Traditional method combs event train of thought by manual read's event relevant report, each side's viewpoint is understood, to original element Material is arranged and is edited, and carries out situation description and analysis to focus incident.Although such event situation description and analysis are protected Accuracy has been demonstrate,proved, but has been constrained to artificial treatment efficiency, has been difficult to meet timeliness and comprehensive.Along with modern artificial intelligence and The development and rise of natural language processing technique, a large amount of the analysis of public opinion systems are developed, and artificial, such system energy is replaced by machine Enough quickly processing mass texts, carry out analysis and arrangement to text, extract valuable information.
It, can not only be much of that by assayer and mechanism for certain public sentiment events or the viewpoint delivered emphasis policy Solve the attitude and position of each side, additionally it is possible to comprehensively understand the public sentiment developing state of focus incident and emphasis policy guidance, hold With timeliness, perspective, guiding performance, representative information, help is provided for policymaker.Therefore, extract and assayer and The viewpoint of mechanism has great significance.
For current system to the processing of expert and mechanism viewpoint there are many more deficiency, the information of extraction is both inaccurate or incomplete Face lacks the function of automatic Evaluation expert and mechanism different degree, it is also difficult to carry out backtracking and association analysis to viewpoint.
Summary of the invention
In order to solve the above technical problems, the embodiment of the present application provides a kind of processing method and processing device of news data.
The processing method of news data provided by the embodiments of the present application, comprising:
The news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;
Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;By the viewpoint Information is stored into viewpoint database, wherein record has the different degree of the holder in holder's database;
Establish the incidence relation of the news data Yu the viewpoint information;
News data is retrieved according to the selected topic of configuration, is determined in the viewpoint database based on the news data retrieved With the associated viewpoint information of the news data.
The processing unit of news data provided by the embodiments of the present application, comprising:
Abstraction module, for obtaining the news data of streaming, from the news data extract viewpoint holder information and Viewpoint information;
Alignment module for carrying out entity registration process to the viewpoint holder information, and is stored to holder's data Library;The viewpoint information is stored into viewpoint database, wherein record has the holder's in holder's database Different degree;
Different degree computing module, for calculating and updating the different degree of the holder;
Analysis module, for establishing the incidence relation of the news data Yu the viewpoint information;According to the selected topic of configuration News data is retrieved, the determining and associated sight of the news data in the viewpoint database based on the news data retrieved Point information.
Using the above-mentioned technical proposal of the embodiment of the present application, 1) natural language processing the relevant technologies are applied, it can be from text In accurately extract the elements such as expert, mechanism, position and viewpoint, treatment effeciency is high, and as a result accuracy rate is high, and recall ratio is high.2) Expert and mechanism database are established, the different degree evaluation mechanism of expert, position and mechanism can be established according to news report quantity, it should Different degree evaluation mechanism is rationally effective.3) can to the selected topic under expert, mechanism viewpoint recall, using clustering algorithm into Row association analysis, the keyword for extracting viewpoint in clustering cluster, the different degree according to holder are ranked up.Cluster and keyword mention It takes effect good, introduces importance sorting and embody the importance of viewpoint.4) entity alignment and denoising have been carried out to expert and mechanism, Reduce systematic error.5) system expandability is strong, and triggering dictionary, entity alignment table can update upgrading.
Detailed description of the invention
Fig. 1 is the flow diagram of the processing method of news data provided by the embodiments of the present application;
Fig. 2 is principle framework figure provided by the embodiments of the present application;
Fig. 3 is the flow diagram of joint abstracting method provided by the embodiments of the present application;
Fig. 4 is the process flow diagram of single news of joint abstracting method provided by the embodiments of the present application;
Fig. 5 is the structure composition schematic diagram of the processing unit of news data provided by the embodiments of the present application.
Specific embodiment
The various exemplary embodiments of the application are described in detail now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of application.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.
Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the application And its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
The embodiment of the present application can be applied to the electronic equipments such as computer system/server, can with it is numerous other general Or special-purpose computing system environment or configuration operate together.Suitable for what is be used together with electronic equipments such as computer system/servers Well-known computing system, environment and/or the example of configuration include but is not limited to: personal computer system, server calculate Machine system, thin client, thick client computer, hand-held or laptop devices, microprocessor-based system, set-top box, programmable-consumer Electronic product, NetPC Network PC, minicomputer system, large computer system and the distribution including above-mentioned any system Cloud computing technology environment, etc..
The electronic equipments such as computer system/server can be in the executable finger of the computer system executed by computer system It enables and being described under the general context of (such as program module).In general, program module may include routine, program, target program, group Part, logic, data structure etc., they execute specific task or realize specific abstract data type.Computer system/ Server can be implemented in distributed cloud computing environment, and in distributed cloud computing environment, task is by by communication network chain What the remote processing devices connect executed.In distributed cloud computing environment, it includes the sheet for storing equipment that program module, which can be located at, On ground or remote computing system storage medium.
Current the analysis of public opinion system is difficult to the sight for accurately extracting the expert for including in text and mechanism holds first Point, next is difficult to assessment experts and mechanism importance, in addition to this, be also short of the backtracking to the viewpoint of expert mechanism be associated with point Analyse function.For this purpose, the application is directed to the characteristics of Chinese text grammer, a kind of expert based on viewpoint trigger word and mechanism are proposed And its joint abstracting method of viewpoint;Experts database and mechanism database, the weight of dynamic evaluation expert and mechanism are constructed on this basis It spends, expert and institutional bodies is aligned and is denoised;News index is established, viewpoint library is constructed, is mentioned for viewpoint backtracking Basis is supplied;Establish the analysis system of expert Yu mechanism viewpoint.
Fig. 1 is the flow diagram of the processing method of news data provided by the embodiments of the present application, as shown in Figure 1, described The processing method of news data the following steps are included:
Step 101: obtaining the news data of streaming, viewpoint holder information and viewpoint letter are extracted from the news data Breath.
In the embodiment of the present application, the news data of streaming refers to the news data got in a streaming manner in real time.This The technical solution of application embodiment is handled real-time news data by way of streaming, here, the place of news data Reason is included at least extracts viewpoint holder information and viewpoint information from the news data.
In the embodiment of the present application, the viewpoint holder information extracted from the news data include it is following at least it One: organization information, job information, expert info.
Optionally, the viewpoint holder information includes organization information and expert info.
Optionally, the viewpoint holder information includes organization information, job information and expert info.
Step 102: entity registration process being carried out to the viewpoint holder information, and is stored to holder's database;It will The viewpoint information is stored into viewpoint database, wherein record has that the holder's is important in holder's database Degree.
In the embodiment of the present application, holder's database includes at least one of: organization data library, job data Library, expert database.
Wherein, organization data library is used for storing mechanism information, and job database is for storing job information, expert database For storing expert info.
In the embodiment of the present application, hold to storing after viewpoint holder information progress entity registration process to corresponding Person's database.
In the embodiment of the present application, viewpoint holder information and viewpoint information are extracted from the news data, can be passed through Following steps are realized:
1) news data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
2) subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
3) if there is untreated sentence, then at least one of is extracted from the sentence: organization information, position letter Breath, expert info, viewpoint information;
4) judge extracted expert info whether without corresponding job information;When being, mended according to the position dictionary The corresponding job information of the expert info entirely;When no, the tenure dictionary is updated;
5) the corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in corresponding be used for In the array for saving result.
Further, in above scheme, at least one of is extracted from the sentence: organization information, job information, expert Information, viewpoint information can be realized by following steps:
3.1) sentence to be processed is inputted;
3.2) sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
3.3) whether load viewpoint triggers dictionary, judge in the sentence to include that viewpoint is touched based on viewpoint triggering dictionary Send out word;
If 3.4) search the sight according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence Point trigger word subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as Target subject;
3.5) cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: mechanism Information, job information, expert info;And extract the viewpoint touching from the sentence with syntactic structure according to regular expressions Send out the subsequent viewpoint information of word.
In the embodiment of the present application, the viewpoint information is stored into viewpoint database, wherein in holder's database Record has the different degree of the holder.
In above scheme, the different degree of the holder calculates in the following manner:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, by the holder The news quantity of table hair adds 1, and the different degree of the holder is added 1;
After presently described measurement period, according to following formula update the holder different degree and the holder The news quantity of table hair:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair News quantity.
Step 103: establishing the incidence relation of the news data Yu the viewpoint information;It is new according to the retrieval of the selected topic of configuration Data are heard, are believed based on the news data retrieved is determining in the viewpoint database with the associated viewpoint of the news data Breath.
Further, the technical solution of the embodiment of the present application is further comprising the steps of:
Step 104: clustering processing being carried out to a plurality of news data retrieved, obtains multiple sub-topics;By same sub- words The viewpoint information of topic is classified as cluster, obtains more cluster viewpoint information set.
Step 105: the keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and according to the pass The different degree of the corresponding holder of keyword is ranked up the keyword of each cluster.
It is illustrated below in conjunction with technical solution of the principle framework shown in Fig. 2 to the embodiment of the present application, the principle It needs to realize by following basic module:
Abstraction module: have expert based on viewpoint trigger word and mechanism and its viewpoint combines extract function.
Different degree computing module: calculating and dynamic more new function with expert, position and mechanism different degree.
Alignment module: being aligned with expert and institutional bodies and denoising function.
Analysis module: backtracking and association analysis function with viewpoint.
The place for the news data that the embodiment of the present application proposes is utilized for the news data of streaming in internet referring to Fig. 2 Reason method extracts the elements such as expert, mechanism, position and the viewpoint in news data, and after entity registration process, deposit is special Family's database, organization data library and viewpoint database, record has expert, mechanism and duty in expert database and organization data library The different degree of position, the different degree are regularly updated according to news report quantity.Meanwhile the embodiment of the present application saves news data, and Establish the incidence relation between viewpoint library.According to the selected topic (keyword) and time window of user configuration, news and new is retrieved The viewpoint that news includes realizes viewpoint backtracking, then to news cluster, obtains sub-topic, the viewpoint under same sub-topic is classified as One cluster extracts representative keyword using keyword extraction algorithm, finally according to the different degree pair of viewpoint holder Viewpoint is ranked up, and completes the analysis to expert and mechanism viewpoint.The following specifically describes the technical solutions of the embodiment of the present application.
(1) expert based on viewpoint trigger word and mechanism and its viewpoint combine extraction
Chinese text form of presentation is many and diverse and abundant, expresses a people and has said there are many kinds of what modes, Ke Yifen For the syntactic structure that explicit expression and implied expression, implied expression are not fixed, mainly based on context semanteme is inferred to, and It is usually explicit expression in news, that is, is divided into direct speech oration and indirect speech, both expression ways have fixed grammer knot Structure expresses the verb of viewpoint, referred to as viewpoint trigger word comprising one, can be same according to Chinese syntactic structure for viewpoint trigger word When extract expert, mechanism and its viewpoint held and the tenure information of expert.What the application proposed, based on viewpoint trigger word Expert and mechanism and its viewpoint to combine abstracting method as shown in Figure 3.The process that single news is extracted using this method As shown in Figure 4.
(2) calculating of the different degree of expert, position and mechanism and dynamic update
Since different experts and mechanism have greatly different gap in terms of social effectiveness, expert and mechanism are established Different degree evaluates mechanism for viewpoint analysis important in inhibiting.In addition, the different degree of expert, position and mechanism can be used into The alignment of row entity and denoising.The calculation basis of different degree in the application of the present invention is news report quantity, this is according to both simple Single also reasonable, the more expert of news report or mechanism, influence power is bigger, therefore different degree is higher.But, it is contemplated that Zhuan Jiahe The influence power of mechanism can constantly change at any time, and the simple news report number that counts is insensitive to the dynamic change of influence power, this Present applicant proposes a kind of importance calculation methods for invention.This method process flow is as follows:
1, different degree (importance), (the application of the present invention is set as one month) Xin Wen Bao in measurement period are initialized Road quantity (news_count) is 0.
2, in measurement period, every processing one includes the news of the objects of statistics (expert, position, mechanism), will be corresponded to News report quantity add one, different degree adds one.
3, after current statistic end cycle, different degree and news report quantity update as follows:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair News quantity.
(3) expert and institutional bodies being aligned and denoising
Entity alignment mainly solves the problems, such as three following:
1, multiple expert's entity titles having the same
2, the same entity (expert, mechanism) has multiple titles.
3, same expert's entity has multiple positions.
For first problem, since expert has the limitation of professional domain, same professional domain has two names identical The probability of expert is minimum, therefore the application is it is assumed that there is no the identical situations of name of multiple experts under the same selected topic.
For Second Problem, system constructs an entity alignment table, before entity is stored in database, according to entity pair Neat table, is replaced entity name, is unified for the most common appellation, this table supports to update and replacement, content pass through Baidu The existing knowledges such as encyclopaedia obtain.
For third problem, system obtains the collection for the position that same expert occurs in same all news of the selected topic first It closes, chooses position of the maximum position of different degree as expert.If all without the expert in all news under the same selected topic Position, then position of the most important position of expert of the same name as the expert is inquired from experts database.
Because some news can interview ruck, some cells and groupuscule, these people or tissue and group are not answered This is considered as expert and mechanism, it is therefore desirable to be denoised to experts database and mechanism database, the application is regularly updating different degree While, the too low expert of different degree and institutional bodies are deleted from database, to achieve the purpose that denoising.
(4) backtracking and association analysis of viewpoint
The viewpoint of one selected topic news within a certain period of time is analyzed, needs to extract expert included in news With mechanism and its viewpoint held, however, news quantity may be bigger, processing spends the time long in real time.At the application streaming Every news is managed, viewpoint is saved in database, and established and be associated with news, can retrieved newly by the selected topic in this way It hears, to recall viewpoint.
It may include several sub-topics under the same selected topic, the application first clusters the news under the selected topic, obtains It is divided to sub-topic.The clustering algorithm of newsletter archive is as follows:
1, the tfidf value of every news is calculated
2, maximum 30 words of tfidf value in news are extracted
3, the average value of the term vector of this 30 words is taken to indicate that term vector is pre- using news corpus as the vector of news Training obtains
4, it is clustered using kmeans, the aggregation extent according to cluster determines cluster number.
Viewpoint is divided into different sets according to sub-topic, the keyword of viewpoint set is extracted using textrank algorithm. It is finally obtained finally analyzing result according to the importance sorting of viewpoint holder (expert or mechanism).
Fig. 5 is the structure composition schematic diagram of the processing unit of news data provided by the embodiments of the present application, as shown in figure 5, Described device includes:
Abstraction module 501 extracts viewpoint holder information for obtaining the news data of streaming from the news data And viewpoint information;
Alignment module 502 for carrying out entity registration process to the viewpoint holder information, and is stored to holder's number According to library;The viewpoint information is stored into viewpoint database, wherein record has the holder in holder's database Different degree;
Different degree computing module 503, for calculating and updating the different degree of the holder;
Analysis module 504, for establishing the incidence relation of the news data Yu the viewpoint information;According to the choosing of configuration Topic retrieval news data, based on the news data retrieved, determination is associated with the news data in the viewpoint database Viewpoint information.
In one embodiment, the analysis module 504 is also used to carry out at cluster a plurality of news data retrieved Reason, obtains multiple sub-topics;The viewpoint information of same sub-topic is classified as cluster, obtains more cluster viewpoint information set;From every The keyword of the cluster viewpoint information set is extracted in cluster viewpoint information set, and according to the weight of the corresponding holder of the keyword It spends and the keyword of each cluster is ranked up.
In one embodiment, the viewpoint holder information includes at least one of: organization information, job information, Expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
In one embodiment, the abstraction module 501, is used for:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then extract at least one of from the sentence: organization information, job information, Expert info, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion The corresponding job information of the expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to protect It deposits in the array of result.
In one embodiment, the abstraction module 501, is used for:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, judge in the sentence whether to include that viewpoint triggers based on viewpoint triggering dictionary Word;
If in the sentence including viewpoint trigger word, according to the syntactic structure of the sentence, the viewpoint touching is searched The subject of word and the modifier of the subject are sent out, using the modifier of the subject of the viewpoint trigger word and the subject as target Subject;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information, Job information, expert info;And extract the viewpoint trigger word from the sentence with syntactic structure according to regular expressions Subsequent viewpoint information.
In one embodiment, the different degree computing module 503, for calculating the holder's in the following manner Different degree:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, by the holder The news quantity of table hair adds 1, and the different degree of the holder is added 1;
After presently described measurement period, according to following formula update the holder different degree and the holder The news quantity of table hair:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents holder's table hair News quantity.
It will be appreciated by those skilled in the art that the realization function of each module in the processing unit of news data shown in fig. 5 The associated description of the processing method of aforementioned news data can be can refer to and understood.In the processing unit of news data shown in fig. 5 The function of each module can be realized and running on the program on processor, can also be realized by specific logic circuit.
Description of the invention is given for the purpose of illustration and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage The solution present invention is to design various embodiments suitable for specific applications with various modifications.

Claims (12)

1. a kind of processing method of news data, which is characterized in that the described method includes:
The news data for obtaining streaming extracts viewpoint holder information and viewpoint information from the news data;
Entity registration process is carried out to the viewpoint holder information, and is stored to holder's database;By the viewpoint information It stores into viewpoint database, wherein record has the different degree of the holder in holder's database;
Establish the incidence relation of the news data Yu the viewpoint information;
Retrieve news data according to the selected topic of configuration, based on the news data retrieved in the viewpoint database determining and institute State the associated viewpoint information of news data.
2. the method according to claim 1, wherein the method also includes:
Clustering processing is carried out to a plurality of news data retrieved, obtains multiple sub-topics;
The viewpoint information of same sub-topic is classified as cluster, obtains more cluster viewpoint information set;
The keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and is held according to the keyword is corresponding The different degree for the person of having is ranked up the keyword of each cluster.
3. the method according to claim 1, wherein the viewpoint holder information includes at least one of: Organization information, job information, expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
4. according to the method described in claim 3, it is characterized in that, described extract viewpoint holder letter from the news data Breath and viewpoint information, comprising:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then at least one of is extracted from the sentence: organization information, job information, expert Information, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion The corresponding job information of expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to save knot In the array of fruit.
5. according to the method described in claim 4, it is characterized in that, described extract at least one of from the sentence: machine Structure information, job information, expert info, viewpoint information, comprising:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, whether judged in the sentence based on viewpoint triggering dictionary comprising viewpoint trigger word;
If searching the viewpoint trigger word according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence Subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as target master Language;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information, position Information, expert info;And according to regular expressions with syntactic structure from being extracted in the sentence behind the viewpoint trigger word Viewpoint information.
6. the method according to claim 1, wherein the different degree of the holder calculates in the following manner:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, and holder's table is sent out News quantity add 1, and the different degree of the holder is added 1;
After presently described measurement period, the different degree of the holder is updated according to following formula and holder's table is sent out News quantity:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents the news of holder's table hair Quantity.
7. a kind of processing unit of news data, which is characterized in that described device includes:
Abstraction module extracts viewpoint holder information and viewpoint for obtaining the news data of streaming from the news data Information;
Alignment module for carrying out entity registration process to the viewpoint holder information, and is stored to holder's database;It will The viewpoint information is stored into viewpoint database, wherein record has that the holder's is important in holder's database Degree;
Different degree computing module, for calculating and updating the different degree of the holder;
Analysis module, for establishing the incidence relation of the news data Yu the viewpoint information;It is retrieved according to the selected topic of configuration News data is believed based on the news data retrieved is determining in the viewpoint database with the associated viewpoint of the news data Breath.
8. device according to claim 7, which is characterized in that the analysis module is also used to a plurality of new to what is retrieved It hears data and carries out clustering processing, obtain multiple sub-topics;The viewpoint information of same sub-topic is classified as cluster, more clusters is obtained and sees Point information aggregate;The keyword of the cluster viewpoint information set is extracted from every cluster viewpoint information set, and according to the keyword The different degree of corresponding holder is ranked up the keyword of each cluster.
9. device according to claim 7, which is characterized in that the viewpoint holder information includes at least one of: Organization information, job information, expert info;
Holder's database includes at least one of: organization data library, job database, expert database.
10. device according to claim 9, which is characterized in that the abstraction module is used for:
News data to be analyzed is inputted, and initializes tenure dictionary and the array for saving result;
Subordinate sentence is carried out to the news data, judges whether there is untreated sentence;
If there is untreated sentence, then at least one of is extracted from the sentence: organization information, job information, expert Information, viewpoint information;
Judge extracted expert info whether without corresponding job information;When being, according to the tenure dictionary completion The corresponding job information of expert info;When no, the tenure dictionary is updated;
The corresponding organization information of the sentence, job information, expert info, viewpoint information are saved in and corresponding are used to save knot In the array of fruit.
11. device according to claim 10, which is characterized in that the abstraction module is used for:
Input sentence to be processed;
The sentence is handled as follows: participle, part-of-speech tagging, name Entity recognition, syntactic analysis;
It loads viewpoint and triggers dictionary, whether judged in the sentence based on viewpoint triggering dictionary comprising viewpoint trigger word;
If searching the viewpoint trigger word according to the syntactic structure of the sentence comprising viewpoint trigger word in the sentence Subject and the subject modifier, using the modifier of the subject of the viewpoint trigger word and the subject as target master Language;
Cutting is carried out to the target subject according to the name Entity recognition, obtains at least one of: organization information, position Information, expert info;And according to regular expressions with syntactic structure from being extracted in the sentence behind the viewpoint trigger word Viewpoint information.
12. device according to claim 7, which is characterized in that the different degree computing module, in the following manner Calculate the different degree of the holder:
The different degree of the holder is initialized, and the news quantity that holder's table hair is arranged is 0;
In a measurement period, every processing one includes the news of the viewpoint holder information, and holder's table is sent out News quantity add 1, and the different degree of the holder is added 1;
After presently described measurement period, the different degree of the holder is updated according to following formula and holder's table is sent out News quantity:
News_count=0
Wherein, importance represents the different degree of the holder, and news_count represents the news of holder's table hair Quantity.
CN201910112919.2A 2019-02-13 2019-02-13 A kind of processing method and processing device of news data Pending CN109902230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910112919.2A CN109902230A (en) 2019-02-13 2019-02-13 A kind of processing method and processing device of news data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910112919.2A CN109902230A (en) 2019-02-13 2019-02-13 A kind of processing method and processing device of news data

Publications (1)

Publication Number Publication Date
CN109902230A true CN109902230A (en) 2019-06-18

Family

ID=66944852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910112919.2A Pending CN109902230A (en) 2019-02-13 2019-02-13 A kind of processing method and processing device of news data

Country Status (1)

Country Link
CN (1) CN109902230A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139116A (en) * 2020-01-19 2021-07-20 北京中科闻歌科技股份有限公司 Method, device, equipment and storage medium for extracting media information viewpoints based on BERT
CN117540747A (en) * 2024-01-09 2024-02-09 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050125216A1 (en) * 2003-12-05 2005-06-09 Chitrapura Krishna P. Extracting and grouping opinions from text documents
CN102831192A (en) * 2012-08-03 2012-12-19 人民搜索网络股份公司 News searching device and method based on topics
CN103116644A (en) * 2013-02-26 2013-05-22 华南理工大学 Method for mining orientation of Web themes and supporting decisions
US20140089323A1 (en) * 2012-09-21 2014-03-27 Appinions Inc. System and method for generating influencer scores
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN108776652A (en) * 2018-05-21 2018-11-09 众安信息技术服务有限公司 A kind of forecast for market tendency method based on news corpus
CN108984521A (en) * 2018-06-20 2018-12-11 国家计算机网络与信息安全管理中心 Personage's viewpoint abstracting method in a kind of media event

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050125216A1 (en) * 2003-12-05 2005-06-09 Chitrapura Krishna P. Extracting and grouping opinions from text documents
CN102831192A (en) * 2012-08-03 2012-12-19 人民搜索网络股份公司 News searching device and method based on topics
US20140089323A1 (en) * 2012-09-21 2014-03-27 Appinions Inc. System and method for generating influencer scores
CN103116644A (en) * 2013-02-26 2013-05-22 华南理工大学 Method for mining orientation of Web themes and supporting decisions
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN108776652A (en) * 2018-05-21 2018-11-09 众安信息技术服务有限公司 A kind of forecast for market tendency method based on news corpus
CN108984521A (en) * 2018-06-20 2018-12-11 国家计算机网络与信息安全管理中心 Personage's viewpoint abstracting method in a kind of media event

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SOO-MIN KIM ET.AL.: ""Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Medias Text"", 《TEXT:PROCEEDINGS OF THE ACL WORKSHOP ON SENTIMENT AND SUBJECTIVITY》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139116A (en) * 2020-01-19 2021-07-20 北京中科闻歌科技股份有限公司 Method, device, equipment and storage medium for extracting media information viewpoints based on BERT
CN113139116B (en) * 2020-01-19 2024-03-01 北京中科闻歌科技股份有限公司 BERT-based media information viewpoint extraction method, device, equipment and storage medium
CN117540747A (en) * 2024-01-09 2024-02-09 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence
CN117540747B (en) * 2024-01-09 2024-04-16 《全国新书目》杂志有限责任公司 Book publishing intelligent question selecting system based on artificial intelligence

Similar Documents

Publication Publication Date Title
Bharti et al. Sarcastic sentiment detection in tweets streamed in real time: a big data approach
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
Chinsha et al. A syntactic approach for aspect based opinion mining
Venugopalan et al. Exploring sentiment analysis on twitter data
Ahmed et al. Effective sentimental analysis and opinion mining of web reviews using rule based classifiers
CN105677844A (en) Mobile advertisement big data directional pushing and user cross-screen recognition method
CN110232149A (en) A kind of focus incident detection method and system
Hasan et al. TwitterNews+: a framework for real time event detection from the Twitter data stream
CN107967290A (en) A kind of knowledge mapping network establishing method and system, medium based on magnanimity scientific research data
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
US9773166B1 (en) Identifying longform articles
CN104965823A (en) Big data based opinion extraction method
Zhu et al. Real-time personalized twitter search based on semantic expansion and quality model
Zhou et al. Learning to suggest questions in social media
Das et al. Sentiment analysis: what is the end user's requirement?
CN109902230A (en) A kind of processing method and processing device of news data
Chen et al. A history and theory of textual event detection and recognition
Sharma Study of sentiment analysis using hadoop
Jain et al. FLAKE: fuzzy graph centrality-based automatic keyword extraction
Zhao et al. A system to manage and mine microblogging data
Thakkar Twitter sentiment analysis using hybrid naive Bayes
Abuteir et al. Automatic sarcasm detection in Arabic text: A supervised classification approach
Kannan et al. Text document clustering using statistical integrated graph based sentence sensitivity ranking algorithm
Jayasekara et al. Trend detection in sinhala tweets using clustering and ranking algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190618