Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Fig. 1 is the flow chart of public sentiment method for monitoring and analyzing one embodiment provided by the invention, as shown in figure 1, including:
101st, the public sentiment data in internet on each Website server is obtained, public sentiment data is believed including at least one assets
Breath.
Wherein, public sentiment method for monitoring and analyzing provided by the invention can be performed by public sentiment device for monitoring and analyzing, public sentiment monitoring
The specific manifestation form of analytical equipment can be data center systems.Data center systems can specifically be located at certain in internet
On individual server, for obtaining corresponding to the public sentiment data of the acquisition system being deployed in internet on each Website server collection
Assets information, and the system that the target public sentiment data that the application condition with client matches is sent to client.Website service
Implement body can be the Website servers such as news, forum, blog, microblogging.
Public sentiment data refers to rapid propagation on the internet, forms a kind of public opinion, develops into the viewpoint number of public sentiment
According to.These viewpoint data are some events that numerous netizens are directed on internet, are showed emotion by client, attitude, viewpoint
Data.Every assets information in public sentiment data can include:Text and additional information, additional information include:Website, frequency
At least one of road, issuing time, hits and money order receipt to be signed and returned to the sender number parameter.Wherein, website and channel, refer to client as number
According to applicant, to channel corresponding to the website where the public sentiment data of data center systems application and the public sentiment data of application.
That is customer end A can apply for public sentiment data corresponding to some channel of some websites.Data center systems can be by client
The public sentiment data of end A applications is pushed to customer end A.In addition, public sentiment data be further divided into it is main patch catalogue file and from patch catalogue
File, wherein, it is main patch catalogue file and from patch catalogue file be each Website server according to post information generate, main patch catalogue
File refers to the main patch information in news or forum, and money order receipt to be signed and returned to the sender corresponding to main patch information in forum is referred to from patch catalogue file
Information.
Specifically, data center systems are uniformly got on internet after the public sentiment data of each Website server, will be obtained
Public sentiment data be saved in the txt formatted files of standard in the database of data center systems, such as Zookeeper, Map
Reduce, HDFS, HBase and Hadoop Core databases, using the data uploading process in data center systems to txt
Formatted file is parsed, i.e., is identified using data uploading process according to corresponding to text, website and channel from txt formatted files
In extract text, website and channel, so as to obtain at least one assets information, an assets information can include title, just
Text, issuing time, author, channel, website, hits and money order receipt to be signed and returned to the sender number etc..One assets information can be one in internet
Metadata, such as a piece of news.The news of interconnection Web realease has the attributes such as title, text, time, author.Data upload journey
These Parameter analysis of electrochemical out can be deposited into the database of data center systems by sequence.
Wherein, data uploading process is parsed to txt formatted files thread as shown in Fig. 2 including:Read main patch mesh
File is recorded, obtains each file in main patch catalogue file successively, by the file and data center systems in main patch catalogue file
The document queue of reading be compared, judge it is main patch catalogue file in file whether read, by it is main patch catalogue file in
Read filename to be saved in the document queue of reading in data center systems, the filename of not reading in main patch catalogue file is preserved
Do not read into data center systems in document queue;Each file do not read in document queue is parsed, will be parsed
Obtained assets information is stored to the main patch memory queue in data center systems(read Assets Map)In, it will not read file
The filename of file corresponding with the assets information is removed in queue.The number of assets information exceedes pre- in main patch memory queue
If first threshold when, the file in main patch memory queue is screened, is distributed to each client, empties main patch internal memory team
Row, it is then followed by not reading the file in document queue described in parsing, the assets information that parsing obtains is stored to data center systems
In main patch memory queue, untill the All Files do not read in document queue is all parsed.To main patch mesh
During record file is parsed, other threads can also be opened, while to literary from patch catalogue corresponding to main patch catalogue file
Part is parsed.It is file name that wherein read Assets Map, which include key and value, key, and value is assets information.
102nd, the acquisition request that client is sent is received, obtains and application condition and client identification is carried in request.
Wherein, when assets information includes:Text and additional information, additional information include:Website, channel, issuing time, point
When hitting number and at least one of money order receipt to be signed and returned to the sender number parameter, data center systems can receive application condition that client sends it
Before, the additional information in each bar assets information is sent to client, so that client is additional in each bar assets information
Information determines the parameter in application condition.For example, when additional information includes website and channel, application condition can include:Net
Stand and/or channel.For example, when additional information includes website, channel, issuing time, hits and money order receipt to be signed and returned to the sender number, application condition
It can include:Website, channel, issuing time, hits and money order receipt to be signed and returned to the sender number.Wherein, the website in application condition can be at least one
The address of individual Website server or title, issuing time are specifically as follows time point or period, and hits and money order receipt to be signed and returned to the sender number are specific
It can be number range.
Optionally, if data center systems are not received by the acquisition request of client transmission, data center systems can be with
The public sentiment data matched with focus website and/or focus channel is sent to client.Data center systems can also will click on
The public sentiment data that number or money order receipt to be signed and returned to the sender number exceed preset times is sent to client.Optionally, focus website can be client
Apply for that quantity exceedes the website of default website threshold value, focus channel can be that the application quantity of client exceedes channel predetermined threshold value
Channel.
103rd, the assets information included by public sentiment data is screened according to application condition, acquisition matches with application condition
Target public sentiment data.
Further, in order to reduce network resources waste, data center systems can also be filtered to assets information, subtracted
Few target public sentiment data, therefore before step 103, data center systems can also judge assets information according to default keyword
Whether it is advertising message;When assets information is advertising message, assets information is filtered.
Specifically, dictionary can be configured in data center systems, dictionary includes default keyword, data center systems
It may determine that and whether there is default keyword in the title or text of assets information;If exist in the text of assets information default
Keyword, then assets information is defined as advertising message;The assets information for being defined as advertising message is filtered.
In addition, it can also include in assets information:Title and author;Data center systems can be in assets information just
When default keyword be present in text, title or author, assets information is defined as advertising message;To being defined as advertising message
Assets information is filtered.
Wherein, data center systems can find rule, Entity recognition rule, sensitivity according to sensitive recognition rule, hot word
Word recommendation etc. is adjusted to default keyword in database;Default keyword can be according to intelligent key in database
The classifications such as word, intelligence summary are stored;Data center systems can use hybrid index, similarity indexing, range retrieval, unisonance
The modes such as synonymous retrieval are retrieved to assets information, obtain the assets information for including default keyword.
Further, before step 103, data center systems can be inquired about in public sentiment data extremely according to application condition
A few assets information, it is determined that the desired asset information including application condition;Client identification is defined as desired asset information
Corresponding mark;
Corresponding, step 103 is specifically as follows:The assets information included by public sentiment data is carried out according to client identification
Screening, obtain the target public sentiment data matched with application condition.
Specifically, when application condition includes the parameters such as website, channel, issuing time, hits and money order receipt to be signed and returned to the sender number, data
Intellectual analysis service in centring system can be according to the website in the application condition that client is sent, channel, issuing time, point
The feature vocabulary such as number and money order receipt to be signed and returned to the sender number are hit, are labeled.Such as a certain client is configured with some vocabulary, intellectual analysis service can be with
Whether the above-mentioned vocabulary of client configuration is included in queries asset information, i.e. whether assets information meets the application slip of client
Part.If assets information meets the application condition of client, stamp the mark of client for the assets information, screening sequence according to
Marked corresponding to assets information, judge which client the assets information should be sent to, assets information is stored in client
In corresponding catalogue, that is, complete the screening to the assets information.
Wherein, data uploading process is screened to assets information thread as shown in figure 3, including:
Read it is main patch memory queue and from patch memory queue in assets information;Judge to whether there is in main patch memory queue
Assets information, if in the presence of the assets information of taking-up fixed strip number from main patch memory queue, it is determined whether need to call intelligence to think
Service, i.e., whether the assets information of fixed strip number all corresponds to specific client identification, if assets information is not corresponding specific
Client identification, then call intelligence to think service, i.e., website in the application condition of each client and every assets information,
Channel, issuing time, hits and money order receipt to be signed and returned to the sender number determine to need the client of this assets information;Circulate the assets letter of fixed strip number
Breath, each bar assets information is saved in the client of needs this assets information by client identification according to corresponding to assets information
In main patch queue, and the assets information in main patch memory queue is accordingly deleted, and corresponding delete in parsing thread has read file
Filename in queue;If assets information is not present in main patch memory queue, judge from patch memory queue with the presence or absence of money
Information is produced, if in the presence of, using the method similar with main patch memory queue to being handled from the assets information of patch memory queue, by
Assets information be saved in need this assets information client from patch queue in, and accordingly delete from patch memory queue in
Assets information, and corresponding delete the filename read in parsing thread in document queue;If also do not deposited from patch memory queue
In assets information, then etc. thread storage assets information to be resolved continues to read to main patch memory queue or after memory queue is pasted
It is main to paste memory queue and from the assets information in patch memory queue, continue to handle.Wherein, in the data of data center systems
Multiple screening threads can be started by carrying program(dist User Thread)To it is multiple it is main patch memory queues and from patch memory queue
Handled simultaneously.
104th, target public sentiment data is sent to by client according to client identification.
Before target public sentiment data is sent to client by data center systems, target public sentiment data can also be pressed
Contracting, reduce the data volume of target public sentiment data, further reduce network resources waste.
Before target public sentiment data is sent to client by data center systems, it can also count in target public sentiment data
Websites quantity, and the quantity of the target public sentiment data got from each website is counted, by target public sentiment data, Websites quantity
Client is sent to the quantity of the target public sentiment data got from each website.
Wherein, target public sentiment data is sent to the output thread of client and included by data center systems:Reading client
Queue in main patch queue, if it is main patch queue in assets information be present, circulate main patch queue, by it is main patch queue in assets
Information is written in the temporary file of client, newly-built another to face when the size of temporary file exceedes default Second Threshold
When file, assets information is written in another temporary file of client.
In addition, the priority parameters of client can also be preserved in data center systems, if for example, the priority of client
For 1 grade, then the public sentiment data of 1000 home Web sites before ranking in target public sentiment data can be sent to client by data center systems
End, if the priority of client is 2 grades, data center systems can be by 500 home Web sites before ranking in target public sentiment data
Public sentiment data is sent to client, so as to further reduce network resources waste.
Client can be handled target public sentiment data after target public sentiment data is received, such as statistical unit
Assets information amount in time, obtains time change situation of assets information amount etc..Client can also be according to rule, topic
Deng obtaining the related assets information of the rule of user's care, topic.For example, if user is concerned about that numerous netizens readjust prices to taxi
Reaction, then client topic can be arranged to taxi price adjustment, obtain the assets information related to taxi price adjustment, according to
The assets information related to taxi price adjustment, determines the reaction that numerous netizens readjust prices to taxi, then according to numerous netizens'
Reaction, it is determined whether to readjust prices to taxi, or the price adjustment to taxi is modified.
In the present embodiment, by obtaining the public sentiment data in internet on each Website server, public sentiment data is included at least
One assets information, is screened according to the application condition of client to public sentiment data, obtains the target matched with application condition
Public sentiment data, and the target public sentiment data that the application condition with client matches is sent to client so that client can be with
Directly handled according to target public sentiment data, so as to reduce the hardware cost of public sentiment monitoring analysis system in the prior art,
Reduce the waste of Internet resources.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to
The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey
Sequence upon execution, execution the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or
Person's CD etc. is various can be with the medium of store program codes.
Fig. 4 is the structural representation of public sentiment monitoring device one embodiment provided by the invention, as shown in figure 4, including:
Acquisition module 41, for obtaining the public sentiment data in internet on each Website server, the public sentiment data includes
At least one assets information;
Receiving module 42, for receive client transmission acquisition request, it is described obtain request in carry application condition and
Client identification;
Screening module 43, for being sieved according to the application condition to the assets information included by the public sentiment data
Choosing, obtain the target public sentiment data matched with the application condition;
Sending module 44, for the target public sentiment data to be sent into the client according to the client identification.
Further, the assets information includes:Text and additional information, the additional information include:Website, channel,
At least one of issuing time, hits and money order receipt to be signed and returned to the sender number parameter;
The sending module 44 is additionally operable to, will before the acquisition request that the receiving module 42 receives that client is sent
Additional information in each bar assets information is sent to the client, so that the client is attached in each bar assets information
Information is added to determine the parameter in the application condition.
Further, in order to reduce network resources waste, public sentiment device for monitoring and analyzing can also be carried out to assets information
Filtering, reduces target public sentiment data, and the public sentiment device for monitoring and analyzing can also include:Judge module and filtering module;
The judge module, in the screening module according to the application condition to included by the public sentiment data
Before assets information is screened, judge whether the assets information is advertising message according to default keyword;
The filtering module, for when the assets information is advertising message, being filtered to the assets information.
Yet further, the screening module 43 is additionally operable to, in the screening module 43 according to the application condition to institute
State before the assets information included by public sentiment data screened, inquired about according to the application condition in the public sentiment data extremely
A few assets information, it is determined that the desired asset information including the application condition;
The client identification is defined as mark corresponding to the desired asset information;
The screening module is specifically used for, according to the client identification to the assets information included by the public sentiment data
Screened, obtain the target public sentiment data matched with the application condition.
In the present embodiment, by obtaining the public sentiment data in internet on each Website server, public sentiment data is included at least
One assets information, is screened according to the application condition of client to public sentiment data, obtains the target matched with application condition
Public sentiment data, and the target public sentiment data that the application condition with client matches is sent to client so that client can be with
Directly handled according to target public sentiment data, so as to reduce the hardware cost of public sentiment monitoring analysis system in the prior art,
Reduce the waste of Internet resources.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme.