CN106970925A - The abnormity early warning method and apparatus of User Perspective - Google Patents

The abnormity early warning method and apparatus of User Perspective Download PDF

Info

Publication number
CN106970925A
CN106970925A CN201610024382.0A CN201610024382A CN106970925A CN 106970925 A CN106970925 A CN 106970925A CN 201610024382 A CN201610024382 A CN 201610024382A CN 106970925 A CN106970925 A CN 106970925A
Authority
CN
China
Prior art keywords
documents
user perspective
user
early warning
newly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610024382.0A
Other languages
Chinese (zh)
Other versions
CN106970925B (en
Inventor
任望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610024382.0A priority Critical patent/CN106970925B/en
Publication of CN106970925A publication Critical patent/CN106970925A/en
Application granted granted Critical
Publication of CN106970925B publication Critical patent/CN106970925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application proposes a kind of abnormity early warning method and apparatus of User Perspective, and this method includes:Obtain the customer documentation for meeting preparatory condition;The customer documentation is clustered;Extract the User Perspective of the clustering topics;Early warning is carried out according to the customer documentation quantity of the User Perspective in preset time.This method can monitor the number of documents growth rate of each User Perspective in real time, be conducive to finding the surge of the extensive surge, especially negative view of User Perspective in time so that enterprise can rapidly make a response after pinpointing the problems, it is prevented effectively from and is worse off, improves the initiative solved the problems, such as.

Description

The abnormity early warning method and apparatus of User Perspective
Technical field
The application is related to document analysis technical field, more particularly to a kind of User Perspective abnormity early warning method and apparatus.
Background technology
In recent years, with the development of Internet technology, the On-line funchon such as chat software, network forum, microblogging gradually flows OK, the influence of public opinion is constantly amplified by media such as network, masses, and the viewpoint of user can be to corporate image Cause strong influence.For example, a large amount of forwardings and negative reviews on microblogging in a short time can to enterprise, product or Personal image produces extremely bad influence, if can not find and handle in time, is very easy to cause impact development to be expanded. Therefore, various User Perspectives are analyzed, quickly finds some negative viewpoints and carry out early warning to become extremely important.
At present, it is thus proposed that by the dynamic (dynamical) model of public opinion viewpoint, according to network topology to the user in network Viewpoint is propagated and predicted, but the network public opinion analysis means possessed at this stage can not be effectively to reality with method Public opinion situation makes accurate reaction, there is more serious hysteresis quality in prediction, deduction function aspects, and existing Network public opinion analysis method is often the modeling and prediction in terms of carrying out propagation tendency for network topology structure, it is impossible to right The content of text of User Perspective is analyzed, and then the exception that is difficult to quickly to find large-scale rapid growth or negative User Perspective, thus corresponding early warning and quickly response can not be made.
The content of the invention
To solve above mentioned problem of the prior art, the purpose of the application is to propose a kind of exception of User Perspective Method for early warning and device, can find that the abnormal of User Perspective increases according to the change of the customer documentation quantity of User Perspective And early warning is carried out, in order to find simultaneously process problem in time, prevent the state of affairs from expanding.
To reach above-mentioned purpose, the abnormity early warning method for the User Perspective that the embodiment of the present application is proposed, including:Obtain symbol Close the customer documentation of preparatory condition;The customer documentation is clustered;Extract the User Perspective of the clustering topics; Early warning is carried out according to the customer documentation quantity of the User Perspective in preset time.
To reach above-mentioned purpose, the abnormity early warning device for the User Perspective that the embodiment of the present application is proposed, including:Obtain mould Block, the customer documentation of preparatory condition is met for obtaining;Cluster module, for being clustered to the customer documentation; Extraction module, the User Perspective for extracting the clustering topics;Warning module, for according in preset time The customer documentation quantity of User Perspective carries out early warning.
The technical scheme provided from above the embodiment of the present application, by being clustered to customer documentation, and extracts each User Perspective expressed by clustering topics, the customer documentation quantity to a certain User Perspective in preset time is analyzed, The number of documents growth rate of each User Perspective is monitored in real time, early warning is made in data exception, is conducive to discovery in time The surge of the extensive surge, especially negative view of User Perspective so that enterprise can rapidly do after pinpointing the problems Go out reaction, be prevented effectively from and be worse off, improve the initiative solved the problems, such as.
The aspect and advantage that the application is added will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the application.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only It is some embodiments of the present application, for those of ordinary skill in the art, is not paying the premise of creative work Under, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the abnormity early warning method for the User Perspective that the embodiment of the application one is proposed;
Fig. 2 is the structural representation of the abnormity early warning device of the User Perspective of the embodiment of the application one;
Fig. 3 is the structural representation of the abnormity early warning device of the User Perspective of another embodiment of the application;
Fig. 4 is the structural representation of the warning module 400 of another embodiment of the application;
Fig. 5 is the structural representation of the warning module 400 of another embodiment of the application;
Fig. 6 is the structural representation of the warning module 400 of another embodiment of the application;
Fig. 7 is the schematic flow sheet that abnormity early warning is carried out to User Perspective of the specific embodiment of the application one.
Embodiment
The embodiment of the present application provides a kind of abnormity early warning method and apparatus of User Perspective.
In order that those skilled in the art more fully understand the technical scheme in the application, it is real below in conjunction with the application The accompanying drawing in example is applied, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described Embodiment be only some embodiments of the present application, rather than whole embodiment.Based on the embodiment in the application, The every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all should Belong to the scope of the application protection.
Fig. 1 is the schematic flow sheet of the abnormity early warning method for the User Perspective that the embodiment of the application one is proposed, in the figure institute In the embodiment shown, by being clustered to the customer documentation for being related to preset content, and extract expressed by each clustering topics User Perspective, User Perspective is analyzed, so as to carry out early warning to the abnormal user viewpoint of rapid growth.As schemed Shown in 1, this method includes:
Step 101, the customer documentation for meeting preparatory condition is obtained.
Specifically, obtaining the mode of customer documentation has a variety of, for example, it can be obtained from webpage, from the crawl of default website, Or extracted from known database, it can also be obtained from the record of pre-set programs.Preparatory condition can be and spy Determine the correlations such as event, product, or comprising default vocabulary, sentence etc., for example, can from be related to preset content or User's message, forwarding comment of correlation etc. are captured on the microblogging webpage of vocabulary, can also be directly obtained from internal channel Comment, message, feedback, complaint of user etc. are obtained in user feedback record.
Step 102, the customer documentation is clustered.Each customer documentation can be calculated by existing clustering algorithm Similarity is simultaneously clustered.
Step 103, the User Perspective of the clustering topics is extracted.Key in the document group that can be obtained according to cluster Word extracts the User Perspective expressed by document group, will be specifically described in detail in subsequent embodiment.
Step 104, early warning is carried out according to the customer documentation quantity of the User Perspective in preset time.
It is described that customer documentation progress cluster is included according to one embodiment of the application:Extract user's text User view feature in shelves;Documents Similarity analysis is carried out to the user view feature;According to Documents Similarity point The result of analysis is clustered to the customer documentation.Specifically, for cluster, different clustering algorithms are substantially All it is to be clustered by the measurement of various similarities.The application can use a variety of clustering methods, preferably by Streaming clustering method, i.e., the clustering algorithm learnt based on online is suitable according to the time such as Single Pass algorithms Ordered pair customer documentation is clustered in real time, by extracting the feature for being best able to express user view in customer documentation, with this According to carry out similarity analysis and cluster to document, to enable to cluster user's meaning expressed by obtained document group Figure is closest, and the cluster degree of accuracy is higher, and efficiency is faster.
According to one embodiment of the application, user view feature include interdependent feature, text feature, verb feature and User behavior feature.Wherein, interdependent feature is a kind of algorithm of dependence between descriptor and word.In interdependent syntax In, each sentence is the word of a most critical, and this word can be for representing the intention of user.Specifically, can be with Carry out interdependent feature extraction respectively to customer documentation and obtain interdependent feature, carry out Text Pretreatment and obtain text feature, carry The verb in document is taken to obtain verb feature, the behavior related to preset content to user is extracted and screening is used Family behavioural characteristic.Above-mentioned user view feature is extracted, enables to the feature extracted more efficient, so as to strengthen cluster The effect and accuracy of algorithm.
According to one embodiment of the application, the User Perspective of the extraction clustering topics includes:To the cluster Customer documentation in theme carries out word frequency sequence;Sorted according to the word frequency and extract the User Perspective of the clustering topics. All customer documentations in clustering topics can be carried out with word frequency sequence, screening obtains the several keywords of word frequency highest, The position occurred according to the keyword filtered out in each document, analysis obtains the word order of these keywords, final to extract To the User Perspective of the clustering topics.
According to one embodiment of the application, the customer documentation quantity according to the User Perspective in preset time is entered Row early warning includes:Count the number of documents information of the User Perspective in preset time;According to the number of documents information Calculate the number of documents average of the User Perspective in preset time;Newly-increased number of documents and institute when the User Perspective When the distance for stating number of documents average is more than the first predetermined threshold value, abnormal viewpoint early warning is carried out.Wherein, number of documents is believed Breath can be the increased number of documents of the User Perspective in preset time, the added value in the unit interval, in preset time The quantity statistics information such as number average value, growth rate in one or more, preset time can be according to statistical demand Setting, such as newly-increased number of documents intraday to a certain User Perspective is monitored, then can obtain nearest 30 Document data in it calculates the number of documents average for belonging to the User Perspective occurred daily.By according to a use There is the number average value of customer documentation in preset time period in family viewpoint, judges whether newly-increased number of documents is abnormal, so that Early warning can be carried out by finding this quantitative exception.The present embodiment can pass through the method based on rbf kernel Realize, will specifically be described in detail in subsequent embodiment.
According to one embodiment of the application, the customer documentation quantity according to the User Perspective in preset time is entered Row early warning includes:Count the number of documents information of the User Perspective in preset time;According to the number of documents information Newly-increased number of documents to the User Perspective is predicted, and obtains the pre- quantitation of newly-increased document;When the newly-increased text When gear number amount and the difference of the pre- quantitation are more than the second predetermined threshold value, abnormal viewpoint early warning is carried out.Specifically, may be used To be predicted using the method based on time series.Time forecasting methods are a kind of conventional to the progress of following quantity Forecasting Methodology.Common time forecasting methods have arima methods.Arima methods be it is a kind of based on historical information come pair The method that future is predicted.It can be calculated according to history archive quantity (such as first three ten days daily number of documents) To the prediction number of files value of today.If the number of files included in clustering topics is much larger than the quantity of history, just enter Row alarm.It should be noted that application of the arima methods in terms of quantitative forecast is carried out based on time series can be found in Related technical documentation, for example《The three of time series forecasting technology --- the ARIMA model predictions containing independent variable》 (Shen Hao, 2009-12-02) etc., the application is repeated no more to this.
According to one embodiment of the application, the customer documentation quantity according to the User Perspective in preset time is entered Row early warning includes:Count the number of documents information of the User Perspective in preset time;According to the number of documents information Calculate the number of documents average of the User Perspective in preset time;The user is seen according to the number of documents information The newly-increased number of documents of point is predicted, and obtains the pre- quantitation of newly-increased document;When the newly-increased document of the User Perspective The distance of quantity and the number of documents average is more than the first predetermined threshold value, and the newly-increased number of documents and the prediction When the difference of quantity is more than the second predetermined threshold value, abnormal viewpoint early warning is carried out.The present embodiment combines sentencing for two kinds of early warning Broken strip part, just carries out abnormity early warning to the User Perspective when above-mentioned two situations occur simultaneously, can effectively reduce mistake Probability is reported, the correctness of early warning is significantly improved.
, can be by being clustered to customer documentation according to embodiments herein, and extract expressed by each clustering topics User Perspective, analyzed, can supervised in real time by the customer documentation quantity to a certain User Perspective in preset time The number of documents growth rate of each User Perspective is surveyed, early warning is made in data exception, is conducive to finding that user sees in time The surge of the extensive surge, especially negative view of point so that enterprise can rapidly make a response after pinpointing the problems, It is prevented effectively from and is worse off, improves the initiative solved the problems, such as.
Based on same inventive concept, the embodiment of the present application additionally provides a kind of abnormity early warning device of User Perspective, can be with For realizing the method described by above-described embodiment, as described in the following examples.Due to the abnormity early warning of User Perspective The principle that device solves problem is similar to the abnormity early warning method of User Perspective, therefore the abnormity early warning device of User Perspective Implementation may refer to User Perspective abnormity early warning device implementation, repeat part repeat no more.It is used below, Term " unit " or " module " can realize the combination of the software and/or hardware of predetermined function.Although following real Apply the device described by example preferably to realize with software, but hardware, or the combination of software and hardware realization May and it be contemplated.
Fig. 2 is the structural representation of the abnormity early warning device of the User Perspective of the embodiment of the application one.The dress of the present embodiment The logical block that putting can be to realize corresponding function is constituted, or operation has the electronic equipment of corresponding function software.
As shown in Fig. 2 the abnormity early warning device of the User Perspective includes:Acquisition module 100, cluster module 200, Extraction module 300 and warning module 400.
Specifically, acquisition module 100 is used to obtain the customer documentation for meeting preparatory condition.
Cluster module 200 is used to cluster the customer documentation.
Extraction module 300 is used for the User Perspective for extracting the clustering topics.
Warning module 400 is used to carry out early warning according to the customer documentation quantity of the User Perspective in preset time.
It is the structural representation of the abnormity early warning device of the User Perspective of another embodiment of the application shown in Fig. 3.
According to one embodiment of the application, as shown in figure 3, cluster module 200 includes extracting sub-module 210, phase Like degree analysis submodule 220 and cluster submodule 230.
Specifically, extracting sub-module 210 is used to extract the user view feature in the customer documentation;
Similarity analysis submodule 220 is used to carry out Documents Similarity analysis to the user view feature;
The result that cluster submodule 230 is used to be analyzed according to Documents Similarity is clustered to the customer documentation.
According to one embodiment of the application, extracting sub-module 210 is specifically for extracting the interdependent spy in the document Levy, text feature, verb feature and user behavior feature.
According to one embodiment of the application, as shown in figure 3, extraction module 300 can include word frequency sorting sub-module 310 and viewpoint extracting sub-module 320.Wherein, word frequency sorting sub-module 310 is used for the use in the clustering topics Family document carries out word frequency sequence;Viewpoint extracting sub-module 320 is used to extract the clustering topics according to word frequency sequence User Perspective.
According to one embodiment of the application, as shown in figure 4, warning module 400 can include statistic submodule 410, The early warning submodule 430 of calculating sub module 420 and first.Wherein, statistic submodule 410 is used to count in preset time The number of documents information of the User Perspective;Calculating sub module 420 is used to calculate default according to the number of documents information The number of documents average of the User Perspective in time;First early warning submodule 430 is used in the new of the User Perspective When the distance for increasing number of documents and the number of documents average is more than the first predetermined threshold value, abnormal viewpoint early warning is carried out.
According to one embodiment of the application, as shown in figure 5, warning module 400 can include statistic submodule 410, Predict the early warning submodule 450 of submodule 440 and second.Statistic submodule 410, it is described in preset time for counting The number of documents information of User Perspective;Prediction submodule 440 is used to see the user according to the number of documents information The newly-increased number of documents of point is predicted, and obtains the pre- quantitation of newly-increased document;Second early warning submodule 450 is used to work as When the newly-increased number of documents and the difference of the pre- quantitation are more than the second predetermined threshold value, abnormal viewpoint early warning is carried out.
According to one embodiment of the application, as shown in fig. 6, warning module 400 can include statistic submodule 410, Calculating sub module 420, the prediction early warning submodule 460 of submodule 440 and the 3rd.Wherein, the 3rd early warning submodule 470 Distance for the newly-increased number of documents in the User Perspective and the number of documents average is more than the first predetermined threshold value, And the difference of the newly-increased number of documents and the pre- quantitation is when being more than the second predetermined threshold value, abnormal viewpoint is carried out pre- It is alert.
According to embodiments herein, it can be extracted by being clustered to customer documentation expressed by each clustering topics User Perspective, and analyzed by the customer documentation quantity to a certain User Perspective in preset time, monitoring is each in real time The number of documents growth rate of User Perspective, early warning is made in data exception, is conducive to finding User Perspective in time It is extensive to increase sharply, the especially surge of negative view so that enterprise can rapidly make a response after pinpointing the problems, and have Effect avoids being worse off, and improves the initiative solved the problems, such as.
Be shown in Fig. 7 the specific embodiment of the application one the use above method and device User Perspective is carried out it is abnormal pre- Alert schematic flow sheet:
Step 1, the customer documentation for meeting preparatory condition is obtained.
Specifically, obtaining the mode of customer documentation has a variety of, for example, it can be obtained from webpage, from the crawl of default website, Or extracted from known database, it can also be obtained from the record of pre-set programs.Preparatory condition can be and spy Determine the correlations such as event, product, or comprising default vocabulary, sentence etc., for example, can from be related to preset content or User's message, forwarding comment of correlation etc. are captured on the microblogging webpage of vocabulary, can also be directly obtained from internal channel Comment, message, feedback, complaint of user etc. are obtained in user feedback record.Specifically for example to the official of Alibaba The crawl comment related with " ant spend " in microblogging.
Step 2, the interdependent feature in the customer documentation is extracted.
Specifically, interdependent feature is a kind of feature for describing dependence between word and word in sentence.In interdependent feature sentence In method, each sentence around a most critical word, this word can for represent user intention.Specifically may be used To extract the interdependent feature in customer documentation according to existing interdependent feature algorithm.
Step 3, the text feature in the customer documentation is extracted.
Specifically, conventional pretreatment can be carried out to the text in the customer documentation, because for early warning analysis The text of customer documentation is short dialogue mostly, so usually not necessity is carried out participle, but passes through 2-gram (a kind of conventional segmenting method for being not based on dictionary, for a word to be split according to two words, for example flower Service charge is divided into:Flower, hand, formality continues to pay dues) pre-processed.Carry out 2-gram pretreatments after it Afterwards, a vector is converted the text to by text vector spatial model.
Step 4, the verb feature in the customer documentation is extracted.
In general, verb is a most important word in a sentence, user view can be most represented.So by sentence The middle verb for representing user view is extracted, and can also accurately state user view feature.
Step 5, the user behavior feature in the customer documentation is extracted.
Specifically, user's feature extraction related to preparatory condition can be come out.Suitable user characteristics is selected for carrying The correctness of high-class, has great significance.At present, user behavior feature is mainly selected by business experience. Such as preparatory condition is product " ant flower ", then can extract user and whether open the product, user's stepping on recently Record address, the nearest IP address of user etc..
Step 6, Documents Similarity analysis is carried out to the user view feature.
Wherein, user view feature includes above-mentioned interdependent feature, text feature, verb feature and user behavior feature.
Specifically, classical clustering algorithm typically has the formula of a similarity measurement.In the present embodiment, with based on Illustrated exemplified by the similarity measurement formula of cosine distances.Formula is as follows:
sim(doc1, doc2)=α cos (text1, text2)+βcos(dep1, dep2)+γ(verb1, verb2)+θ(beh1, beh2)
Alpha+beta+γ+θ=1
Wherein, doc1And doc2Represent two customer documentations, text1And text2It is doc respectively1And doc2In text Characteristic, dep1And dep2It is doc respectively1And doc2In interdependent feature syntactic component, verb1 and verb2 points It is not the verb characteristic in doc1 and doc2, beh1And beh2It is doc respectively1And doc2In user behavior Characteristic, cos () refers to measuring similarity by cosine value, and α, beta, gamma, θ refers to corresponding weight. General rule is followed, the scope of similarity requires α usually between 0 to 1, and beta, gamma, θ adds up to 1. In general, similarity closer to 1, two word just closer to.Similarity is more dissimilar closer to 0, two word, That is, the represented semantic difference of two words is bigger.
It is to be appreciated that in addition to above-mentioned four kinds of features, user view feature can also have a variety of, corresponding similarity Measure equation is also corresponding different.Four kinds of features that the present embodiment is selected enable to the feature extracted more efficient, from And strengthen the effect and accuracy of clustering algorithm.
Step 7, the result analyzed according to Documents Similarity is clustered to the customer documentation.
For example, by taking the clustering algorithm learnt based on online as an example, customer documentation can be entered sequentially in time Row is clustered in real time.
Firstly the need of some hyper parameters of assignment algorithm, t1 is the upper limit of similarity, and t2 is the lower limit of similarity.t1 And t2 span be 0 to 1 between.
Specifically, at first, clustering topics number is 0, i.e., all customer documentations all do not belong to cluster master Topic.By each customer documentation flowed into sequentially in time, above-mentioned various user view feature extractions are carried out, one is obtained Individual big vector, then calculates the barycenter of the document group of each clustering topics, then calculates the customer documentation newly flowed into respectively With the similarity of the barycenter of each clustering topics, if being more than t1 with the similarity of a certain barycenter, by this user's text Shelves ownership is in this clustering topics.If similarity is all less than t2, independent using this customer documentation as one Theme.If similarity is between t1 and t2, then it represents that the similarity of the customer documentation is difficult to define, it can throw Abandon this document.
Step 8, word frequency sequence is carried out to the customer documentation in the clustering topics.
Specifically, show in order to be able to preferably carry out viewpoint, the method that simple viewpoint can be selected to extract.For example, The word frequency of all customer documentations in each clustering topics can be counted, is sorted for the word in each theme according to word frequency. Then, screening obtains coming preceding 10 word, is used as the high frequency words of the clustering topics.
Step 9, sorted according to the word frequency and extract the User Perspective of the clustering topics.
Specifically, the position that each high frequency words filtered out occur in each customer documentation can be counted, and calculating is averaged These high frequency words are ranked up by the value of position according to the value of mean place, and analysis obtains the word order of these high frequency words, Finally extract the User Perspective of the clustering topics.For example, the word frequency obtained high frequency words of screening " are opened for " flower " It is logical " " can not ", in these three high frequency words generation, can be returned in original text shelves and obtain positional value, specifically for example, user's text Occurred in that successively in shelves " flower " " can not " two keywords, the positional value of " flower " in the document is 1, " can not " positional value in the document is 2, by that analogy, can get each high frequency in the clustering topics Positional value of the word in each customer documentation, the mean place that " flower " is worth to by being averaged for calculated location value is 1.3, the mean place of " open-minded " is 3.5, " can not " mean place be 2.3, can be obtained according to mean place sequence To viewpoint " flower can not open ".
In early warning part, the number of documents early warning of User Perspective can be carried out by following three kinds of modes.
Step 10, the number of documents information of the User Perspective in preset time is counted.
Wherein, number of documents information can be the increased number of documents of the User Perspective in preset time, in the unit interval The quantity statistics information such as added value, the number average value in preset time, growth rate in one or more, preset Time can be set according to statistical demand, such as newly-increased number of documents intraday to a certain User Perspective is monitored, The document data in nearest 30 days can be so obtained to calculate the number of documents for belonging to the User Perspective occurred daily Average.
Step 11, the number of documents average of the User Perspective in preset time is calculated according to the number of documents information.
Step 12, when the newly-increased number of documents of the User Perspective and the distance of the number of documents average are more than first During predetermined threshold value, abnormal viewpoint early warning is carried out.
Specifically, step 10-12 method for early warning can be by based on rbf kernel (Radial basis kernel function, Radial Basis Function kernel) method realize.Rbf kernel formula form is as follows:
K (x, x ')=exp (- a | | x-x ' | |)2
First, using the method based on rbf kernel, with the data instance of one month, the history number of one month is passed through According to can obtain and belong to the number of documents of the User Perspective per annual average, and obtain the user in history one month and see The standard deviation of the number of documents of point.Calculate daily document in the customer documentation quantity for newly flowing into the User Perspective and one month The distance of number average value, when such distance is more than predetermined threshold value (such as twice of standard deviation), with regard to carrying out early warning.
So by occurring the number average value of customer documentation in preset time period according to a User Perspective, judge newly-increased Whether number of documents is abnormal, so as to carry out early warning by finding this quantitative exception.
Optionally, early warning can also be carried out to the customer documentation quantity of the User Perspective by step 13-15.
Step 13, the number of documents information of the User Perspective in preset time is counted.Referring to step 10.
Step 14, the newly-increased number of documents of the User Perspective is predicted according to the number of documents information, obtained To the pre- quantitation of newly-increased document.
Step 15, when the newly-increased number of documents and the difference of the pre- quantitation are more than the second predetermined threshold value, enter Row exception viewpoint early warning.
Specifically, it can be predicted using the method based on time series.Time forecasting methods are a kind of conventional Method is predicted to following quantity.Common time forecasting methods have arima methods.Arima methods are a kind of bases The method being predicted in historical information to future.Can be according to history archive quantity (such as first three ten days daily text Gear number amount) calculate the prediction number of files value for obtaining today.Gone through if the number of files included in clustering topics is much larger than The quantity of history, with regard to being alarmed.It should be noted that arima methods are carrying out quantitative forecast side based on time series The application in face can be found in the technical documentation of correlation, for example《The three of time series forecasting technology --- containing independent variable ARIMA model predictions》(Shen Hao, 2009-12-02) etc., the application is repeated no more to this.
, can also be by step 10-15 two ways come jointly to the user in another embodiment of the application The customer documentation quantity of viewpoint carries out early warning, when newly-increased number of documents and the number of documents average of the User Perspective Distance be more than the first predetermined threshold value, and the difference of the newly-increased number of documents and the pre- quantitation is more than second and preset During threshold value, abnormal viewpoint early warning is just carried out.Misinformation probability can be effectively reduced, the correctness of early warning is significantly improved.
The present embodiment can be by clustering to customer documentation, and extracts the User Perspective expressed by each clustering topics, Analyzed by the customer documentation quantity to a certain User Perspective in preset time, each User Perspective can be monitored in real time Number of documents growth rate, make early warning in data exception, be conducive to finding that the extensive of User Perspective swashs in time Increase, the especially surge of negative view so that enterprise can rapidly make a response after pinpointing the problems, and be prevented effectively from feelings Condition deteriorates, and improves the initiative solved the problems, such as.The effect of clustering algorithm is enhanced by extracting effective user view feature Really;Using streaming clustering method, calculating in real time can be better adapted to, cluster is rapider accurate.
It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, and It is not intended that indicating or implying relative importance.In addition, in the description of the present application, it is unless otherwise indicated, " many It is individual " it is meant that two or more.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include Module, the fragment of the code of one or more executable instructions for the step of realizing specific logical function or process Or part, and the scope of the preferred embodiment of the application includes other realization, wherein can not by shown or The order of discussion, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, This should be understood by embodiments herein person of ordinary skill in the field.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Upper State in embodiment, multiple steps or method can be performed in memory and by suitable instruction execution system with storage Software or firmware realize.If for example, being realized with hardware, with another embodiment, this can be used Any one of following technology known to field or their combination are realized:With for realizing logic to data-signal The discrete logic of the logic gates of function, the application specific integrated circuit with suitable combinational logic gate circuit, can Program gate array (PGA), field programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable deposit In storage media, the program upon execution, including one or a combination set of the step of embodiment of the method.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material that the embodiment or example are described Or feature is contained at least one embodiment of the application or example.In this manual, above-mentioned term is shown The statement of meaning property is not necessarily referring to identical embodiment or example.Moreover, description specific features, structure, material or Person's feature can in an appropriate manner be combined in any one or more embodiments or example.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be right Above-described embodiment is changed, changed, replacing and modification.

Claims (14)

1. a kind of abnormity early warning method of User Perspective, it is characterised in that including:
Obtain the customer documentation for meeting preparatory condition;
The customer documentation is clustered;
Extract the User Perspective of the clustering topics;
Early warning is carried out according to the customer documentation quantity of the User Perspective in preset time.
2. according to the method described in claim 1, it is characterised in that described that cluster bag is carried out to the customer documentation Include:
Extract the user view feature in the customer documentation;
Documents Similarity analysis is carried out to the user view feature;
The result analyzed according to Documents Similarity is clustered to the customer documentation.
3. method according to claim 2, it is characterised in that the user view feature include interdependent feature, Text feature, verb feature and user behavior feature.
4. according to the method described in claim 1, it is characterised in that the user for extracting the clustering topics sees Point includes:
Word frequency sequence is carried out to the customer documentation in the clustering topics;
Sorted according to the word frequency and extract the User Perspective of the clustering topics.
5. according to the method described in claim 1, it is characterised in that described to be seen according to the user in preset time The customer documentation quantity of point, which carries out early warning, to be included:
Count the number of documents information of the User Perspective in preset time;
The number of documents average of the User Perspective in preset time is calculated according to the number of documents information;
When the newly-increased number of documents of the User Perspective with the distance of the number of documents average more than the first predetermined threshold value When, carry out abnormal viewpoint early warning.
6. according to the method described in claim 1, it is characterised in that described to be seen according to the user in preset time The customer documentation quantity of point, which carries out early warning, to be included:
Count the number of documents information of the User Perspective in preset time;
The newly-increased number of documents of the User Perspective is predicted according to the number of documents information, newly-increased document is obtained Pre- quantitation;
When the newly-increased number of documents and the difference of the pre- quantitation are more than the second predetermined threshold value, abnormal viewpoint is carried out Early warning.
7. according to the method described in claim 1, it is characterised in that described to be seen according to the user in preset time The customer documentation quantity of point, which carries out early warning, to be included:
Count the number of documents information of the User Perspective in preset time;
The number of documents average of the User Perspective in preset time is calculated according to the number of documents information;
The newly-increased number of documents of the User Perspective is predicted according to the number of documents information, newly-increased document is obtained Pre- quantitation;
When the newly-increased number of documents of the User Perspective with the distance of the number of documents average more than the first predetermined threshold value, And the difference of the newly-increased number of documents and the pre- quantitation is when being more than the second predetermined threshold value, abnormal viewpoint is carried out pre- It is alert.
8. a kind of abnormity early warning device of User Perspective, it is characterised in that including:
Acquisition module, the customer documentation of preparatory condition is met for obtaining;
Cluster module, for being clustered to the customer documentation;
Extraction module, the User Perspective for extracting the clustering topics;
Warning module, early warning is carried out for the customer documentation quantity according to the User Perspective in preset time.
9. device according to claim 8, it is characterised in that the cluster module includes:
Extracting sub-module, for extracting the user view feature in the customer documentation;
Similarity analysis submodule, for carrying out Documents Similarity analysis to the user view feature;
Submodule is clustered, the result for being analyzed according to Documents Similarity is clustered to the customer documentation.
10. device according to claim 9, it is characterised in that the extracting sub-module is specifically for extracting institute State interdependent feature, text feature, verb feature and the user behavior feature in document.
11. device according to claim 8, it is characterised in that the extraction module includes:
Word frequency sorting sub-module, for carrying out word frequency sequence to the customer documentation in the clustering topics;
Viewpoint extracting sub-module, the User Perspective of the clustering topics is extracted for being sorted according to the word frequency.
12. device according to claim 8, it is characterised in that the warning module includes:
Statistic submodule, the number of documents information for counting the User Perspective in preset time;
Calculating sub module, the number of files for calculating the User Perspective in preset time according to the number of documents information Measure average;
First early warning submodule, for the newly-increased number of documents in the User Perspective and the number of documents average away from During from more than the first predetermined threshold value, abnormal viewpoint early warning is carried out.
13. device according to claim 8, it is characterised in that the warning module includes:
Statistic submodule, the number of documents information for counting the User Perspective in preset time;
Submodule is predicted, it is pre- for being carried out according to the number of documents information to the newly-increased number of documents of the User Perspective Survey, obtain the pre- quantitation of newly-increased document;
Second early warning submodule, for being preset when the difference of the newly-increased number of documents and the pre- quantitation is more than second During threshold value, abnormal viewpoint early warning is carried out.
14. device according to claim 8, it is characterised in that the warning module includes:
Statistic submodule, the number of documents information for counting the User Perspective in preset time;
Calculating sub module, the number of files for calculating the User Perspective in preset time according to the number of documents information Measure average;
Submodule is predicted, it is pre- for being carried out according to the number of documents information to the newly-increased number of documents of the User Perspective Survey, obtain the pre- quantitation of newly-increased document;
3rd early warning submodule, for the newly-increased number of documents in the User Perspective and the number of documents average away from From more than the first predetermined threshold value, and the newly-increased number of documents and the difference of the pre- quantitation are more than the second predetermined threshold value When, carry out abnormal viewpoint early warning.
CN201610024382.0A 2016-01-14 2016-01-14 User viewpoint abnormity early warning method and device Active CN106970925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610024382.0A CN106970925B (en) 2016-01-14 2016-01-14 User viewpoint abnormity early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610024382.0A CN106970925B (en) 2016-01-14 2016-01-14 User viewpoint abnormity early warning method and device

Publications (2)

Publication Number Publication Date
CN106970925A true CN106970925A (en) 2017-07-21
CN106970925B CN106970925B (en) 2020-07-03

Family

ID=59335086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610024382.0A Active CN106970925B (en) 2016-01-14 2016-01-14 User viewpoint abnormity early warning method and device

Country Status (1)

Country Link
CN (1) CN106970925B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319692A (en) * 2018-02-01 2018-07-24 北京云知声信息技术有限公司 Abnormal punctuate cleaning method, storage medium and server

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (en) * 2008-11-12 2009-03-25 北京交通大学 Network public opinion prediction method based on time sequence
CN103473309A (en) * 2013-09-10 2013-12-25 浙江大学 Text categorization method based on probability word selection and supervision subject model
CN103744877A (en) * 2013-12-20 2014-04-23 潘大庆 Public opinion monitoring application system deployed in internet and application method
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
US20150248476A1 (en) * 2013-03-15 2015-09-03 Akuda Labs Llc Automatic Topic Discovery in Streams of Unstructured Data
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method
CN104965823A (en) * 2015-07-30 2015-10-07 成都鼎智汇科技有限公司 Big data based opinion extraction method
CN105068991A (en) * 2015-07-30 2015-11-18 成都鼎智汇科技有限公司 Big data based public sentiment discovery method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (en) * 2008-11-12 2009-03-25 北京交通大学 Network public opinion prediction method based on time sequence
US20150248476A1 (en) * 2013-03-15 2015-09-03 Akuda Labs Llc Automatic Topic Discovery in Streams of Unstructured Data
CN103473309A (en) * 2013-09-10 2013-12-25 浙江大学 Text categorization method based on probability word selection and supervision subject model
CN103744877A (en) * 2013-12-20 2014-04-23 潘大庆 Public opinion monitoring application system deployed in internet and application method
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method
CN104965823A (en) * 2015-07-30 2015-10-07 成都鼎智汇科技有限公司 Big data based opinion extraction method
CN105068991A (en) * 2015-07-30 2015-11-18 成都鼎智汇科技有限公司 Big data based public sentiment discovery method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张晓娟: "《查询意图自动分类与分析》", 30 November 2015 *
葛诗利: "《面向大学英语教学的通用计算机作文评分和反馈方法研究》", 30 September 2015 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319692A (en) * 2018-02-01 2018-07-24 北京云知声信息技术有限公司 Abnormal punctuate cleaning method, storage medium and server
CN108319692B (en) * 2018-02-01 2021-03-19 云知声智能科技股份有限公司 Abnormal punctuation cleaning method, storage medium and server

Also Published As

Publication number Publication date
CN106970925B (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN104598367B (en) Data center's event of failure management specialty and method
CN104573054B (en) A kind of information-pushing method and equipment
US9117006B2 (en) Recommending keywords
CN104216954A (en) Prediction device and prediction method for state of emergency topic
EP2973038A1 (en) Classifying resources using a deep network
CN103810162B (en) The method and system of recommendation network information
Hasan et al. TwitterNews+: a framework for real time event detection from the Twitter data stream
CN108170692A (en) A kind of focus incident information processing method and device
CN106649334B (en) Processing method and device of associated word set
CN108021651A (en) Network public opinion risk assessment method and device
US20090204595A1 (en) Method and apparatus for tracking a change in a collection of web documents
KR102105319B1 (en) Esg based enterprise assessment device and operating method thereof
CN106844576A (en) A kind of method for detecting abnormality, device and monitoring device
CN107578263A (en) A kind of detection method, device and the electronic equipment of advertisement abnormal access
CN107193883B (en) Data processing method and system
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN106202126B (en) A kind of data analysing method and device for logistics monitoring
CN110457595A (en) Emergency event alarm method, device, system, electronic equipment and storage medium
CN108984514A (en) Acquisition methods and device, storage medium, the processor of word
CN114138968A (en) Network hotspot mining method, device, equipment and storage medium
CN116049379A (en) Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
Jan et al. A statistical machine learning approach for ticket mining in IT service delivery
CN113469786A (en) Method and device for recommending articles, computer equipment and storage medium
CN116226494B (en) Crawler system and method for information search

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right