Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 1 is the process flow diagram of a public sentiment method for monitoring and analyzing provided by the invention embodiment, as shown in Figure 1, comprising:
101, obtain the public sentiment data in internet on each Website server, public sentiment data comprises at least one assets information.
Wherein, public sentiment method for monitoring and analyzing provided by the invention can be performed by public sentiment device for monitoring and analyzing, and the concrete manifestation form of public sentiment device for monitoring and analyzing can be data center systems.Data center systems specifically can be arranged on certain server of internet, be deployed in assets information corresponding to public sentiment data that acquisition system in internet on each Website server gathers for acquisition unit, and the target public sentiment data of mating with the application condition of client sent to the system of client.Website server is specifically as follows the Website servers such as news, forum, blog, microblogging.
Public sentiment data refers to bamboo telegraph on the internet, forms a kind of public opinion, develops into the viewpoint data in order to public sentiment.These viewpoint data be numerous netizens for the some events on internet, to be showed emotion by client, the data of attitude, viewpoint.Every bar assets information in public sentiment data can comprise: text and additional information, and additional information comprises: at least one parameter in website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number.Wherein, website and channel, refer to client as data applicant, to the website at the public sentiment data place of data center systems application and channel corresponding to the public sentiment data of application.That is customer end A can apply for the public sentiment data that certain channel of some websites is corresponding.The public sentiment data of customer end A application can be pushed to customer end A by data center systems.In addition, public sentiment data can also be divided into main subsides catalogue file and from subsides catalogue file, wherein, main subsides catalogue file and to paste catalogue file be each Website server according to information generation of posting, main subsides catalogue file refers to the main subsides information in news or forum, refers to from subsides catalogue file the money order receipt to be signed and returned to the sender information that forum, main subsides information is corresponding.
Particularly, after data center systems unification gets the public sentiment data of each Website server on internet, the public sentiment data of acquisition is saved in the database of data center systems with the txt formatted file of standard, such as Zookeeper, Map Reduce, HDFS, HBase and Hadoop Core database, the data uploading process in data center systems is adopted to resolve txt formatted file, namely adopt data uploading process according to text, website and mark corresponding to channel extract text from txt formatted file, website and channel, thus obtain at least one assets information, article one, assets information can comprise title, text, issuing time, author, channel, website, clicks and money order receipt to be signed and returned to the sender number etc.Article one, assets information can be metadata, such as an a piece of news in internet.The news of interconnected Web realease has the attribute such as title, text, time, author.These Parameter analysis of electrochemicals can out be deposited in the database of data center systems by data uploading process.
Wherein, the thread that data uploading process is resolved txt formatted file as shown in Figure 2, comprise: read main subsides catalogue file, obtain each file in main subsides catalogue file successively, file in main subsides catalogue file and the document queue of reading in data center systems are compared, judge whether the file in main subsides catalogue file is read, filename of reading in main subsides catalogue file is saved in and reads in document queue in data center systems, the filename of not reading in main subsides catalogue file is saved in and does not read in document queue in data center systems; The described each file do not read in document queue is resolved, deposit resolving the assets information obtained in the main subsides memory queue (read Assets Map) in data center systems, the filename not reading file corresponding with this assets information in document queue is removed.When in main subsides memory queue, the number of assets information exceedes default first threshold, file in main subsides memory queue is screened, be distributed to each client, empty main subsides memory queue, then the file in document queue is not read described in then resolving, the assets information that parsing obtains is deposited the main subsides memory queue in data center systems, till the described All Files do not read in document queue is all parsed.In the process that main subsides catalogue file is resolved, also can open other threads, simultaneously to corresponding the resolving from subsides catalogue file of main subsides catalogue file.Wherein read Assets Map comprises key and value, and key is file name, and value is assets information.
102, receive the acquisition request that client sends, obtain in request and carry application condition and client identification.
Wherein, when assets information comprises: text and additional information, additional information comprises: during at least one parameter in website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number, data center systems can before the application condition receiving client transmission, additional information in each bar assets information is sent to client, to make client according to the parameter in the additional information determination application condition in each bar assets information.Such as, when additional information comprises website and channel, application condition can comprise: website and/or channel.Such as, when additional information comprise website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number time, application condition can comprise: website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number.Wherein, the website in application condition can be address or the title of at least one Website server, and issuing time is specifically as follows time point or time period, and clicks and money order receipt to be signed and returned to the sender number are specifically as follows numerical range.
Optionally, if data center systems does not receive the acquisition request that client sends, the public sentiment data of mating with focus website and/or focus channel can be sent to client by data center systems.The public sentiment data that clicks or money order receipt to be signed and returned to the sender number also can be exceeded preset times by data center systems sends to client.Optionally, focus website can exceed the website of default website threshold value for the application quantity of client, and focus channel can exceed the channel of channel predetermined threshold value for the application quantity of client.
103, according to application condition, the assets information included by public sentiment data is screened, obtain the target public sentiment data of mating with application condition.
Further, in order to reduce network resources waste, data center systems can also be filtered assets information, reduces target public sentiment data, therefore, before step 103, according to the keyword preset, data center systems can also judge whether assets information is advertising message; When assets information is advertising message, assets information is filtered.
Particularly, can configure dictionary in data center systems, dictionary comprises default keyword, and data center systems can judge whether there is default keyword in the title of assets information or text; If there is default keyword in the text of assets information, then assets information is defined as advertising message; The assets information being defined as advertising message is filtered.
In addition, can also comprise in assets information: title and author; When data center systems can exist default keyword in the text of assets information, title or author, assets information is defined as advertising message; The assets information being defined as advertising message is filtered.
Wherein, according to responsive recognition rule, hot word, data center systems can find that rule, regular, the sensitive word recommendation of Entity recognition etc. adjust to the keyword preset in database; The keyword preset in database can store according to classifications such as intelligent keyword, intelligence summaries; Data center systems can adopt the mode such as hybrid index, similarity indexing, range retrieval, the retrieval of unisonance synonym to retrieve assets information, obtains the assets information comprising default keyword.
Further, before step 103, data center systems can inquire about at least one assets information in public sentiment data according to application condition, determine the desired asset information comprising application condition; Client identification is defined as the mark that desired asset information is corresponding;
Corresponding, step 103 is specifically as follows: screen the assets information included by public sentiment data according to client identification, obtains the target public sentiment data of mating with application condition.
Particularly, when comprising the parameters such as website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number when application condition, the feature vocabulary such as the website in the application condition that the intellectual analysis service in data center systems can send according to client, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number, mark.Some vocabulary of such as a certain client configuration, whether intellectual analysis service can comprise the above-mentioned vocabulary of client configuration in queries asset information, and namely whether assets information meets the application condition of client.If assets information meets the application condition of client, then for this assets information stamps the mark of client, the mark that screening sequence is corresponding according to assets information, judge which client this assets information should send to, assets information is stored in catalogue corresponding to client, namely completes the screening to this assets information.
Wherein, the thread that data uploading process screens assets information as shown in Figure 3, comprising:
Read main subsides memory queue and from the assets information pasted memory queue; Judge whether there is assets information in main subsides memory queue, if exist, from main subsides memory queue, then take out the assets information of fixing number, determine whether that needing to call intelligence thinks service, namely the whether all corresponding concrete client identification of assets information of number is fixed, if assets information does not have corresponding concrete client identification, then call intelligence and think service, namely determine according to the website in the application condition of each client and every bar assets information, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number the client needing this assets information; The assets information of the fixing number of circulation, each bar assets information is saved in the main subsides queue of the client needing this assets information by the client identification corresponding according to assets information, and the assets information deleted accordingly in main subsides memory queue, and the filename read in document queue is resolved in thread in corresponding deletion; If there is not assets information in main subsides memory queue, then judge whether there is assets information from subsides memory queue, if exist, the method similar with main subsides memory queue is adopted to process from the assets information pasting memory queue, assets information is saved in need this assets information client from subsides queue, and the assets information deleted accordingly from pasting memory queue, and the filename read in document queue is resolved in thread in corresponding deletion; If also there is not assets information from subsides memory queue, then etc. thread to be resolved deposits assets information to main subsides memory queue or after subsides memory queue, continues to read main subsides memory queue and from the assets information pasted memory queue, proceed to process.Wherein, the data uploading process of data center systems can start multiple screening thread (dist UserThread) to multiple main subsides memory queue and from subsides memory queue process simultaneously.
104, according to client identification, target public sentiment data is sent to client.
Data center systems can also be compressed target public sentiment data before target public sentiment data is sent to client, reduces the data volume of target public sentiment data, reduces network resources waste further.
Before target public sentiment data is sent to client by data center systems, the Websites quantity in target public sentiment data can also be added up, and add up the quantity of target public sentiment data got from each website, target public sentiment data, Websites quantity and the quantity of target public sentiment data that gets from each website are sent to client.
Wherein, target public sentiment data sends to the output thread of client to comprise by data center systems: read the main subsides queue in the queue of client, if there is assets information in main subsides queue, then circulate main subsides queue, assets information in main subsides queue is written in the temporary file of client, when the size of temporary file exceedes default Second Threshold, another temporary file newly-built, is written to assets information in another temporary file of client.
In addition, the priority parameters of client can also be preserved in data center systems, such as, if the priority of client is 1 grade, then the public sentiment data of 1000 home Web sites before rank in target public sentiment data can be sent to client by data center systems, if the priority of client is 2 grades, then the public sentiment data of 500 home Web sites before rank in target public sentiment data can be sent to client by data center systems, thus reduces network resources waste further.
Client, after receiving target public sentiment data, can process target public sentiment data, and such as, assets information amount in the statistical unit time obtains the time variations situation etc. of assets information amount.Client according to rule, topic etc., can also obtain the rule that user is concerned about, the assets information that topic is relevant.Such as, if user is concerned about the reaction that numerous netizens readjust prices to taxi, then topic can be set to taxi price modification by client, obtain readjusting prices to taxi relevant assets information, to readjust prices relevant assets information according to taxi, determine the reaction that numerous netizens readjust prices to taxi, then according to the reaction of numerous netizens, determine whether to readjust prices to taxi, or the price modification of taxi is modified.
In the present embodiment, by obtaining the public sentiment data in internet on each Website server, public sentiment data comprises at least one assets information, application condition according to client screens public sentiment data, obtain the target public sentiment data of mating with application condition, and the target public sentiment data of mating with the application condition of client is sent to client, client can directly be processed according to target public sentiment data, thus reduce the hardware cost of public sentiment monitoring analysis system in prior art, decrease the waste of Internet resources.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Fig. 4 is the structural representation of a public sentiment monitoring device provided by the invention embodiment, as shown in Figure 4, comprising:
Acquisition module 41, for obtaining the public sentiment data in internet on each Website server, described public sentiment data comprises at least one assets information;
Receiver module 42, for receiving the acquisition request that client sends, carries application condition and client identification in described acquisition request;
Screening module 43, for screening the assets information included by described public sentiment data according to described application condition, obtains the target public sentiment data of mating with described application condition;
Sending module 44, for sending to described client according to described client identification by described target public sentiment data.
Further, described assets information comprises: text and additional information, and described additional information comprises: at least one parameter in website, channel, issuing time, clicks and money order receipt to be signed and returned to the sender number;
Described sending module 44 also for, receive the acquisition request of client transmission at described receiver module 42 before, additional information in each bar assets information is sent to described client, the parameter determining in described application condition according to the additional information in each bar assets information to make described client.
Further, in order to reduce network resources waste, public sentiment device for monitoring and analyzing can also filter assets information, and reduce target public sentiment data, described public sentiment device for monitoring and analyzing can also comprise: judge module and filtering module;
According to the keyword preset, described judge module, before screening the assets information included by described public sentiment data according to described application condition in described screening module, judges whether described assets information is advertising message;
Described filtering module, for when described assets information is advertising message, filters described assets information.
Again further, described screening module 43 also for, before described screening module 43 to be screened the assets information included by described public sentiment data according to described application condition, inquire about at least one assets information in described public sentiment data according to described application condition, determine the desired asset information comprising described application condition;
Described client identification is defined as mark corresponding to described desired asset information;
Described screening module specifically for, according to described client identification, the assets information included by described public sentiment data is screened, obtains the target public sentiment data of mating with described application condition.
In the present embodiment, by obtaining the public sentiment data in internet on each Website server, public sentiment data comprises at least one assets information, application condition according to client screens public sentiment data, obtain the target public sentiment data of mating with application condition, and the target public sentiment data of mating with the application condition of client is sent to client, client can directly be processed according to target public sentiment data, thus reduce the hardware cost of public sentiment monitoring analysis system in prior art, decrease the waste of Internet resources.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.