Detailed description of the invention
For making the purpose of the application, technical scheme and advantage clearer, specifically real below in conjunction with the application
Execute example and technical scheme is clearly and completely described by corresponding accompanying drawing.Obviously, described
Embodiment is only some embodiments of the present application rather than whole embodiments.Based on the enforcement in the application
Example, the every other enforcement that those of ordinary skill in the art are obtained under not making creative work premise
Example, broadly falls into the scope of the application protection.
Shown in ginseng Fig. 1, disclosure one information monitoring method, including:
S101: capture and need monitored information;
Crawl process includes: capture information according to default key word;By support vector machine (Support
Vector Machine, SVM) grader to capture information classify, obtain needing monitored letter
Breath.
Capture the process of information according to key word, specifically, user is according to self theme of interest, in advance
First set the key word used required for monitoring, and the key word set is sent to system;Subsequently,
After system obtains key word, carry out information scratching according to this key word, grab from the network platform with crucial
The information of word coupling, wherein includes the information that theme of interest to user is relevant.In the embodiment of the present application
In, the setting of described key word can be completed configuration by artificial at client server, and described system can
To be information search engine, its information crawl matched after obtaining key word, then storage grabs
All information, and information is sent back to user side or is stored in server end, for analyzing and processing for next step.
Then, by support vector machine (i.e. SVM) grader, the information grabbed is classified, tool
For body, it is necessary first to be read out the aforementioned information grabbed identifying, and by support vector machine (i.e.
SVM) information is classified by grader, can be divided into two classes according to information and Keywords matching situation, will
Valuable information is classified as the first kind (such as: " useful " class), unworthy information is classified as Equations of The Second Kind (such as:
" useless " class), in the embodiment of the present application, in order to avoid " useless " category information occurs important information
Omit, can be set to need monitored information by described " useless " category information, in order to follow-up it is carried out deeply
Enter to process.
In the present embodiment, described support vector machine (i.e. SVM) grader, it is become by sample training
Model, and find the classification plane of aforementioned two category informations, i.e. classification function (linearly or nonlinearly), is used for
Described information is carried out the division of classification, and information can be carried out pretreatment, such as: extract before classification, & apos
Feature Words in information (can comprise graphic feature, the Yi Jixin occurred in the text feature in information, information
Feature that breath is reprinted/forwarded etc., these all can be set when model training), and convert thereof into
Characteristic vector, then completed to classify to characteristic vector by described model.It addition, described classification function is not unique,
Can be set as required, it will directly affect the accuracy of grader, it is therefore desirable to by substantial amounts of mould
Type training, training process does not repeats them here.In the embodiment of the present application, described information is through described support
After vector machine (i.e. SVM) grader, during follow-up artificial screening, can be directly to " useful "
Category information carries out artificial screening, and this mode makes user can be concerned about theme phase with the acquisition of fast accurate with it
The information closed, saves the work of substantial amounts of artificial screening, improves treatment effeciency.
It addition, in actual classification processing procedure, the number of described support vector machine (i.e. SVM) grader
Amount can be multiple, and described information repeatedly can be divided one by one through the plurality of support vector machine classifier
Class processes, and wherein, all can be provided with two classifications in each support vector machine (i.e. SVM) grader,
And each grader can arrange item of specifically classifying so that same information by many subseries and finally can improve
The accuracy of classification.
S102: obtain the sentence in described information, described sentence carries out syntactic analysis, and to obtain potential evaluation right
As;
Owing to information itself may carry some external information contenies (such as: reference information, net
Location, source, character etc.), themselves also it is not belonging to the information at its place, thus, supervised obtaining
During the sentence of measurement information, should this partial content not brought in sentence, the embodiment of the present application uses canonical
Expression formula rule, deletes this partial content, thus obtains the information that content is the most succinct, with as follows
As a example by monitored information: " it is true that the more //@Angela_ unhappy Miss than this worse product:
That product is the poorest.", this information by after regular expression rule treatments, will obtain sentence " it is true that
Than this worse product the more.”.
Further, since information includes sometimes multiple sentence (by ".”、“?”、“!" etc. symbol
Separate), therefore, the sentence acquired from same information there may be multiple (for convenience of statement, under
Literary composition is introduced in case of containing only a sentence in described monitored information)
After obtaining the sentence of described information, the embodiment of the present application carries out syntactic analysis to described sentence further,
Thus obtain potential evaluation object, specifically include:
To described sentence by syntactic analysis, obtain node corresponding to the root node (ROOT) of described sentence
Word;
Determine that the relation with the node word corresponding to described root node (ROOT) is subject-predicate relation
(Subject-Verb, SBV), dynamic guest's relation (Verb-Object, VOB), guest's relation
(Indirect-Object, IOB), preposition object (Fronting-Object, FOB), verbal endocentric phrase (Adverbial,
ADV), coordination (Coordinate, COO), structure of complementation (Complement, CMP), fixed
The node word of middle relation (Attribute, ATT) it is set to the first child node set;
Determine the relation with the node word in described first child node set be SBV, VOB, IOB,
The node word of FOB, ADV, COO, CMP, ATT it is set to the second child node set;
The node word comprised in described first child node set, the second child node set is defined as described letter
The potential evaluation object of breath.
S103: above-mentioned potential evaluation object is labeled and takes out by trained condition random field CRF
Obtain final evaluation object;
According to Emotional Factors decimation rule to the first child node set of above-mentioned potential evaluation object and the second son
Node set carries out feature extraction and obtains the confidence level of described potential evaluation object.Described confidence level is used for condition
Random field (Conditional Random Field, CRF) carries out probability calculation to each potential evaluation object,
Take the probability final evaluation object of the highest conduct.
In the embodiment of the present application, obtain the process of described final evaluation object, specific as follows:
First, carry out part-of-speech tagging and model training.Such as, the information collected manually is marked,
Obtain training data after mark and write template file, thus training CRF model, then carry out model instruction
Practicing, described model training is realized by CRF++ instrument.Described CRF model is probabilistic model, uses
In the probability calculating word to be assessed.
Second, use parser aforementioned potential evaluation object is carried out participle (the lexeme information of note word,
By word word-building), part-of-speech tagging (mark participle part of speech, such as: noun, verb, auxiliary word etc.), interdependent sentence
After method analyzes (analyzing the mutual relation between vocabulary, such as: dynamic guest's relation, subject-predicate relation etc.), according to emotion
Key element decimation rule carries out feature extraction, i.e. comes according to the syntactic relation between potential evaluation object and emotion word
Determining the confidence level of each potential evaluation object, join table 1 below, wherein, emotion word planting modes on sink characteristic is to pass through emotion
Dictionary dictionary (collecting the set of all kinds of emotion vocabulary) judges.
Table 1
3rd, according to aforementioned probabilistic model, calculate the probability of each potential evaluation object respectively, take probability
Big potential evaluation object is as the final evaluation object of this information.
S104: judge whether this final evaluation object mates with the key word preset.
In the application preferred embodiment, it is judged that whether this final evaluation object mates with the key word preset
Specifically include:
Described final evaluation object is compared with described key word, it is judged that the most whether there is friendship
Collection;
If existing, then confirm described final evaluation object and described Keywords matching, and this information is retained;
Otherwise, then confirm described final evaluation object and described crucial word mismatch, and by above-mentioned information filtering.
During this, the final evaluation object of described information is compared with the key word preset, sees two
Whether there is common factor between person, judge whether it is user's theme of interest with this.If there is not common factor,
This final evaluation object and crucial word mismatch are then described, the most described monitored information non-user are paid close attention to
Theme, now, can directly filter this information, and without creating artificial screening task again.If existing
Occuring simultaneously, then show this final evaluation object and described Keywords matching, the most described monitored information is actually
" useful " category information, belongs to user's theme of interest, should not be filtered, and now, then retains this information,
And it is created artificial screening task, bring in next step artificial screening work.Visible, by obtaining " nothing
With " mode of the final evaluation object of category information, and it is compared with key word, whether judge this information
Really " useless ", so can be substantially reduced the probability omitting significant information in " useless " information, significantly carry
The accuracy of high information monitoring.
In conjunction with the above-mentioned monitoring method of the application, the application is also disclosed a kind of information monitoring device, including:
Handling module, needs monitored information for capturing;
Acquisition module, for obtaining the sentence in described information, and carries out syntactic analysis to sentence and obtains potential
Evaluation object;
Abstraction module, for carrying out above-mentioned potential evaluation object by housebroken condition random field CRF
Mark and extraction, obtain final evaluation object;And
Judge module, for judging whether described final evaluation object mates with the key word preset.
Wherein, being provided with parser in described acquisition module, described parser is used for analyzing and obtaining
The node word corresponding to root node (ROOT) of described sentence and with this root node (ROOT) institute
The relation of corresponding node word is subject-predicate relation (i.e. SBV), dynamic guest's relation (i.e. VOB), a guest pass
System (i.e. IOB), preposition object (i.e. FOB), verbal endocentric phrase (i.e. ADV), coordination (i.e. COO),
Structure of complementation (i.e. CMP), the node word of fixed middle relation (i.e. ATT) it is set to the first child node
Set.In addition, described parser be additionally operable to analyze obtain with in described first child node set
The relation of node word is subject-predicate relation (i.e. SBV), moves guest's relation (i.e. VOB), guest's relation (i.e.
IOB), preposition object (i.e. FOB), verbal endocentric phrase (i.e. ADV), coordination (i.e. COO), dynamic
Mend structure (i.e. CMP), the node word of fixed middle relation (i.e. ATT), and it is set to the second child node
Set.
Be provided with extraction unit in described acquisition module, described extraction unit for extract the first child node set and
Node word in second child node set, and the node word extracted is set to potential evaluation object.
It is provided with confidence computation unit in described abstraction module, is used for according to Emotional Factors decimation rule above-mentioned
Potential evaluation object carries out feature extraction, and obtains the confidence level of described potential evaluation object.It addition, it is described
Probability calculation unit it is additionally provided with, for by condition random field (i.e. CRF) and above-mentioned confidence in abstraction module
Degree calculates the probability of described potential evaluation object.
Information scratching device and support vector machine (i.e. SVM) grader it is provided with in described handling module;Described
Information scratching device is for carrying out information scratching according to the key word preset;Described support vector machine (i.e. SVM)
Grader is for classifying to the information grabbed and obtaining needing monitored information.
Comparing unit and confirmation unit it is provided with in described judge module;Described comparing unit for by described finally
Evaluation object compares with described default key word, and judges the most whether there is common factor;Described
According to above-mentioned common factor presence or absence, confirmation unit is for confirming whether described information filters.
Below in conjunction with concrete application example, illustrate the application application in monitoring microblogging public feelings information, for
The information monitoring flow process of microblogging is as follows:
First, configure key word and capture microblogging public feelings information according to aforementioned key word.Such as, by key word
Being configured to " Alipay ", so acquired key word (Key Word) collection is combined into: { " Alipay " },
It is intended to from microblogging capture the public feelings information relevant to " Alipay ".
Secondly, by crawled to microblogging public feelings information be divided into two classes.Categorizing process is by support vector machine (i.e.
SVM) grader completes, and grader is formed by the training of great amount of samples word early stage.Such as, one is grabbed
Microblogging public feelings information: " waiting the money in Alipay much of that, facial cream undercarriage ", according to support vector machine (i.e.
SVM) this information is divided into " useless " category information (because in this information, the subject of sentence not " props up by grader
Pay treasured "), certainly, in order to prevent important information from omitting, it is considered as needing monitored by this " useless " information
Information, in order to follow-up examine process further.
Then, the evaluation object of this microblogging public sentiment is obtained.Detailed process is: first, utilizes syntactic analysis
Device carries out syntactic analysis to this public feelings information, obtains grammatical structure tree as shown in Figure 3.
Second, get " undercarriage " node according to root node (ROOT), and obtain further saving with root
Point word " undercarriage " is the child node of SBV, VOB, IOB, FOB, ADV, COO relation, to obtain final product
To " facial cream ", " ", " etc. " three child nodes;Then, then obtain with these three child node be SBV,
The child node of VOB, IOB, FOB, ADV, COO relation, obtain child node " reach ", " money " two
Child node;Finally, the set of the coupling word (Match Word) that obtains, its comprise key word " facial cream ",
" ", " etc. ", enough ", " money " five nodes, these five nodes are potential evaluation object.
3rd, obtain final evaluation object " facial cream " (because being calculated " facial cream " by CRF mark extraction
Maximum probability, therefore by " facial cream " as final evaluation object), and by it with keyword set "
Pay treasured " } carry out intersection operation, obtain occuring simultaneously for empty, so the information judged in this microblogging and user institute
The theme " Alipay " paid close attention to does not mates.
Finally, directly filter out this micro-blog information, no longer listed in next step processing of task.
In the embodiment of the present application, by micro-blog information identification is classified, and the mode of combining assessment object extraction,
The microblogging public feelings information grabbed is carried out screening and filtering, the accuracy rate of screening can be improved, reduce significant information
The risk omitted.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or meter
Calculation machine program product.Therefore, the application can use complete hardware embodiment, complete software implementation or knot
The form of the embodiment in terms of conjunction software and hardware.And, the application can use and wherein wrap one or more
Computer-usable storage medium containing computer usable program code (include but not limited to disk memory,
CD-ROM, optical memory etc.) form of the upper computer program implemented.
The application is with reference to method, equipment (system) and the computer program product according to the embodiment of the present application
The flow chart of product and/or block diagram describe.It should be understood that can by computer program instructions flowchart and
/ or block diagram in each flow process and/or flow process in square frame and flow chart and/or block diagram and/
Or the combination of square frame.These computer program instructions can be provided to general purpose computer, special-purpose computer, embedding
The processor of formula datatron or other programmable data processing device is to produce a machine so that by calculating
The instruction that the processor of machine or other programmable data processing device performs produces for realizing at flow chart one
The device of the function specified in individual flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and computer or the process of other programmable datas can be guided to set
In the standby computer-readable memory worked in a specific way so that be stored in this computer-readable memory
Instruction produce and include the manufacture of command device, this command device realizes in one flow process or multiple of flow chart
The function specified in flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, makes
Sequence of operations step must be performed to produce computer implemented place on computer or other programmable devices
Reason, thus the instruction performed on computer or other programmable devices provides for realizing flow chart one
The step of the function specified in flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/defeated
Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory
(RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash
RAM).Internal memory is the example of computer-readable medium.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by appointing
Where method or technology realize information storage.Information can be computer-readable instruction, data structure, program
Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory
(PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its
The random access memory (RAM) of his type, read only memory (ROM), electrically erasable are read-only
Memorizer (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory
(CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, tape magnetic
Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be calculated
The information that equipment accesses.According to defining herein, computer-readable medium does not include temporary computer-readable matchmaker
Body (transitory media), such as data signal and the carrier wave of modulation.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to non-
Comprising of exclusiveness, so that include that the process of a series of key element, method, commodity or equipment not only wrap
Include those key elements, but also include other key elements being not expressly set out, or also include for this process,
The key element that method, commodity or equipment are intrinsic.In the case of there is no more restriction, statement " include
One ... " key element that limits, it is not excluded that including the process of described key element, method, commodity or setting
Other identical element is there is also in Bei.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey
Sequence product.Therefore, the application can use complete hardware embodiment, complete software implementation or combine software and
The form of the embodiment of hardware aspect.And, the application can use and wherein include calculating one or more
The computer-usable storage medium of machine usable program code (include but not limited to disk memory, CD-ROM,
Optical memory etc.) form of the upper computer program implemented.
The foregoing is only embodiments herein, be not limited to the application.For this area skill
For art personnel, the application can have various modifications and variations.All institutes within spirit herein and principle
Any modification, equivalent substitution and improvement etc. made, within the scope of should be included in claims hereof.