CN110458296A - The labeling method and device of object event, storage medium and electronic device - Google Patents
The labeling method and device of object event, storage medium and electronic device Download PDFInfo
- Publication number
- CN110458296A CN110458296A CN201910713377.4A CN201910713377A CN110458296A CN 110458296 A CN110458296 A CN 110458296A CN 201910713377 A CN201910713377 A CN 201910713377A CN 110458296 A CN110458296 A CN 110458296A
- Authority
- CN
- China
- Prior art keywords
- phrase
- target
- processed
- information
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a kind of labeling methods of object event and device, storage medium and electronic device.Wherein, this method comprises: obtaining the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;Target phrase is determined in multiple phrases, wherein target phrase be appear in same information to be processed and within a predetermined period of time frequency of occurrence be more than preset times threshold value phrase;Target category corresponding to the target information to be processed in information to be processed comprising target phrase is determined using disaggregated model, wherein, different classes of including target category corresponds to different weights in disaggregated model, and the weight of target category is used to indicate a possibility that target word group is as object event;It is object event in target information to be processed by the target phrase marker for including in the case where the corresponding weight of target category is more than default weight threshold.At least to solve the problems, such as that the efficiency detected in the related technology to object event is lower.
Description
Technical field
The present invention relates to game data processing technology field, a kind of labeling method in particular to object event and
Device, storage medium and electronic device.
Background technique
At present in the related art, mainly word is used to be embedded in (Word for the detection of network hotspot event
Embedding) related algorithm trains term vector model to realize.Specifically, obtaining the vector of word rank using term vector model
Then trunk word is extracted in expression in such a way that term vector splices or obtains sentence trunk, recycle the modes such as training pattern
It expresses to obtain sentence vector, then sentence vector is clustered by clustering method, obtain event cluster.But phase at present
The mode that pass technology provides, which can not achieve, carries out intelligent recognition to the classification of event cluster, that is, can not accurately determine out to be checked
The event of survey is genuine focus incident, or the normal event that interim frequency is high, it is often necessary to distinguish this by manually
Whether event is focus incident.
That is, this detection mode that the relevant technologies provide, needs to put into a large amount of human cost, so as to event
The complexity of detection increases, so as to cause the lower problem of detection efficiency.
In view of the above-mentioned problems, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of labeling method of object event and device, storage medium and electronic device, with
At least solve the problems, such as that the efficiency detected in the related technology to object event is lower.
According to an aspect of an embodiment of the present invention, a kind of labeling method of object event is provided, comprising: obtain wait locate
The content sentence carried in reason information, wherein the content sentence is split as multiple phrases;It is determined in the multiple phrase
Target phrase out, wherein the target phrase is to appear in information to be processed described in same and go out within a predetermined period of time
Occurrence number is more than the phrase of preset times threshold value;It is determined in the information to be processed using disaggregated model comprising the target phrase
Target information to be processed corresponding to target category, wherein it is different classes of in the classification mould including the target category
Different weights is corresponded in type, the weight of the target category, which is used to indicate the target word group, becomes the possibility of object event
Property;In the case where the corresponding weight of the target category is more than default weight threshold, will be wrapped in target information to be processed
The target phrase marker contained is the object event.
According to another aspect of an embodiment of the present invention, a kind of labelling apparatus of object event is additionally provided, comprising: obtain mould
Block, for obtaining the content sentence carried in information to be processed, wherein the content sentence is split as one or more words
Group;First determining module, for determining target phrase in the multiple phrase, wherein the target phrase is to appear in
Frequency of occurrence is more than the phrase of preset times threshold value in information to be processed described in same and within a predetermined period of time;Second really
Cover half block, for determining the target information institute to be processed in the information to be processed comprising the target phrase using disaggregated model
Corresponding target category, wherein different classes of including the target category corresponds to different weights in the disaggregated model,
The weight of the target category is used to indicate a possibility that target word group is as object event;Mark module, in institute
Stating the corresponding weight of target category is more than in the case where presetting weight threshold, described in including in target information to be processed
Target phrase marker is the object event.
Optionally, second determining module includes: input unit, and being used for will be described in target information input to be processed
Disaggregated model, wherein comprising one or more target phrases in the target information to be processed, the disaggregated model is to make
The phrase for including in the information to be processed is used to be trained as training sample to preliminary classification model;Output
Unit, for exporting the corresponding target category of the target phrase.
Optionally, described device further include: training module, for using the first object letter to be processed for having determined classification
Breath is trained the preliminary classification model as training sample, wherein includes mark in the first object information to be processed
It is denoted as the phrase of object event and is not marked with the phrase of object event.
Optionally, the training module includes: division unit, the letter to be processed of the first object for that will have determined classification
Breath is divided into training dataset, validation data set and test data set, wherein the training dataset and the validation data set
For being trained to the disaggregated model, the test data set is for testing the disaggregated model after training;
First cutting unit is first for the training dataset and the verify data to be concentrated the content sentence segmentation for including
Begin training phrase, is more than the initial training phrase of preset threshold as initial training sample using the frequency of occurrences, wherein described
The vector dimension of initial training sample is the quantity of the initial training sample;Computing unit, for characterizing algorithm by vector
Calculate the semantic vector characterization of the initial training sample;First training unit, for by the vector of the initial training sample
The semantic vector of dimension and initial training sample characterization inputs the preliminary classification model and is trained, and obtains the classification
Model;Test cell for testing by training result of the test data set to the disaggregated model, and adjusts institute
State the model parameter of disaggregated model.
Optionally, the training module further include: the second cutting unit, for by the mesh in target information to be processed
Marking content sentence segmentation is multiple targets training phrase, wherein is only stopped comprising Chinese character and not including in target training phrase
Word, the stop words include at least interjection and/or pronoun and/or modal particle;Determination unit, for being more than by the frequency of occurrences
The target training phrase of preset threshold is determined as bag of words;First combining unit is used for the bag of words and the classification mould
The current training sample of type merges, and forms target training sample;Second training unit, for being instructed using the target training sample
Practice the disaggregated model, and adjusts the model parameter of the disaggregated model.
Optionally, the training module further include: first acquisition unit, for obtaining last model training finish time
To the period at current time, determining the second target information to be processed, wherein wrapped in the second target information to be processed
It is more than the phrase of preset times threshold value containing frequency of occurrence in predetermined amount of time;Second combining unit is used for second target
The phrase for including in information to be processed is incorporated in the current training sample of the disaggregated model.
Optionally, first determining module includes: the first determination unit, for will appear in the same content sentence
In and in the content sentence of multiple information to be processed frequency of occurrence be more than preset threshold phrase be determined as first
Phrase, wherein only include Chinese character in first phrase;First discarding unit is accounted for for presetting accounting today less than first
Than threshold value and/or word frequency today less than the first default word frequency threshold and/or word frequency growth rate today less than the first default growth rate
First phrase of threshold value abandons, and obtains the second phrase, wherein the word frequency today growth rate is the word relative to the previous day
The growth rate that frequency obtains;Cluster cell obtains the first phrase cluster for clustering to second phrase;Second abandons list
Member was used for accounting today less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or the present
Day word frequency growth rate is abandoned less than the first phrase cluster of the second default growth rate threshold value, obtains the second phrase cluster;Second really
Order member, for determining that the phrase in the second phrase cluster is the target phrase.
Optionally, first discarding unit includes: acquisition subelement, for using following formula to obtain presently described
Accounting today of one phrase: P1=exp (log p/m)/log n) } wherein, on the day before p indicates presently described first phrase
Accounting, m and n are respectively constant;It determines subelement, for accounting today by comparing each first phrase, determines
Today minimum accounting first phrase;Subelement is abandoned, for abandoning first phrase of minimum accounting today.
Optionally, first determining module further include:
Second acquisition unit, for being obtained by the following formula fluctuation system of first phrase in current slot
Number:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates described the
Word frequency mean value of one phrase within the previous day same period, σ indicate the first phrase word within the previous day same period
The standard deviation of frequency;
Third discarding unit, for when the coefficient of variation is less than default undulating value, first phrase to be abandoned.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the determination method of above-mentioned object event when operation.
Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits
Store up the computer program that can be run on a memory and on a processor, wherein above-mentioned processor passes through computer program and executes
The determination method of above-mentioned object event.
In embodiments of the present invention, otherwise using disaggregated model automatic identification info class to be processed, by obtain to
The content sentence carried in processing information, wherein content sentence is split as multiple phrases;Target is determined in multiple phrases
Phrase, wherein target phrase is the phrase that frequency of occurrence is more than preset times threshold value in predetermined amount of time;It is true using disaggregated model
Target category corresponding to target information to be processed comprising target phrase in fixed information to be processed, wherein including target category
It is different classes of correspond to different weights in disaggregated model, the weight of target category is used to indicate target word group as target thing
A possibility that part;In the case where the corresponding weight of target category is more than default weight threshold, will be wrapped in target information to be processed
The target phrase marker contained is object event.It can be more than pre- by frequency of occurrence in predetermined amount of time by determining target phrase
If the phrase of frequency threshold value screens, the mesh in information to be processed comprising target phrase is then determined by using disaggregated model
Target category corresponding to information to be processed is marked, realizes the automatic classification to information to be processed, and be directed in disaggregated model
Different weights is arranged in different target categories, and the corresponding target phrase of classification for only reaching default weight threshold is just labeled
For object event, further screening meets the phrase of object event rule, avoids manually carrying out whether category filter is target
Event leads to the problem of inefficiency, has achieved the purpose that automatically identify target information generic to be processed, to realize
Target phrase in automatic detection target information to be processed whether be object event technical effect.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is according to a kind of hardware environment schematic diagram of the labeling method of optional object event of the embodiment of the present application;
Fig. 2 is a kind of flow chart of the labeling method of optional object event of the embodiment of the present application;
Fig. 3 is the object event alarm a kind of optional schematic diagram in interface according to an embodiment of the present invention;
Fig. 4 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention;
Fig. 5 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention;
Fig. 6 is a kind of optional flow chart according to the work order kind identification method of the embodiment of the present application;
Fig. 7 is a kind of optional flow chart of svm classifier model training method according to an embodiment of the present invention;
Fig. 8 is a kind of optional structural block diagram of the labelling apparatus of object event according to an embodiment of the present invention;
Fig. 9 is a kind of coefficient of variation display interface schematic diagram according to an embodiment of the present invention;
Figure 10 is a kind of optional certain month early warning situation schematic diagram according to an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
In order to solve the above-mentioned technical problem, the embodiment of the present application provides a kind of labeling method of object event.Fig. 1 is root
According to a kind of hardware environment schematic diagram of the labeling method of optional object event of the embodiment of the present application, as shown in Figure 1, the hardware loop
Border can include but is not limited to the first user equipment 102, network 110, server 112, second user equipment 202, wherein first
It can include but is not limited to memory 104, processor 106, display 108 in user equipment 102, server 112, which is opened, can wrap
Database 114, processing engine 116 are included but be not limited to, can include but is not limited to memory 204, place in second user equipment 202
Manage device 206, display 208.User equipment herein can be, but not limited to be smart phone (such as Android phone, iOS mobile phone
Deng), tablet computer, palm PC and mobile internet device (Mobile Internet Devices, MID), PAD etc. eventually
End equipment.The labeling method of object event mainly comprises the steps that
Information to be processed is sent network-side 110 by step S102, the first user equipment 102;
Information to be processed is transmitted to server 112 by step S104, network-side 110;
Satisfactory information flag to be processed is object event and is pushed to the second use by step S106, server 112
Family equipment 202;
Step S108, server 112 will return to network-side 110 for the processing result of information to be processed;
Processing result is fed back to user equipment 102 by step S110, network-side 110.
It should be noted that processing result can not also be fed back to the first user equipment 102 by server 112.When first
User equipment 102 send information to be processed be not configured to request a certain data result, or only mischief when, server
112 can ignore this information to be processed, not feedback processing result.
Optionally, in step S106, satisfactory information flag to be processed is that object event can lead to by server 112
It crosses following steps realization: obtaining the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;
Target phrase is determined in multiple phrases, wherein target phrase is to appear in information to be processed described in same and pre-
Frequency of occurrence is more than the phrase of preset times threshold value in section of fixing time;It is determined in information to be processed using disaggregated model comprising target
Target category corresponding to the target of phrase information to be processed, wherein different classes of in disaggregated model including target category
Corresponding different weight, the weight of target category are used to indicate a possibility that target word group is as object event;In target category
It is target in target information to be processed by the target phrase marker for including in the case that corresponding weight is more than default weight threshold
Event.
Fig. 2 is a kind of flow chart of the labeling method of optional object event of the embodiment of the present application.As shown in Fig. 2, should
Method includes:
Step S202 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple words
Group;
Step S204 determines target phrase in multiple phrases, wherein target phrase is to be processed to appear in same
Frequency of occurrence is more than the phrase of preset times threshold value in information and within a predetermined period of time;
Step S206 determines that the information institute to be processed of the target in information to be processed comprising target phrase is right using disaggregated model
The target category answered, wherein different classes of including target category corresponds to different weights in disaggregated model, target category
Weight is used to indicate a possibility that target word group is as object event;
Step S208, in the case where the corresponding weight of target category is more than default weight threshold, by target letter to be processed
The target phrase marker for including in breath is object event.
Optionally, the method for above-mentioned data processing is not limited in the acquisition scene applied to hot spot work order, can with but it is unlimited
The application scenarios of identification mark are carried out to text message in being applied to other any need, such as do shopping class, game class, Instant Messenger
Believe class, financial application class scene etc..
Optionally, information to be processed can be, but not limited to be short text information, the content sentence for including in information to be processed
It can include but is not limited to the contents such as punctuation mark, emoticon, modal particle, noun, verb, adjective.Content sentence is cut
After point, other fonts or the symbol other than Chinese character can also be removed, the content sentence carried in each information to be processed can be with
It is split as a phrase or multiple phrases.
Optionally, target category can include but is not limited to consulting (for example, game ranking mistake, installation kit can not be more
Newly), payment transaction (such as credit card lose, wallet is not opened, red packet is not opened) swindles complaint, mischief (such as is disliked
Anticipate brush screen, brush keyword), interim intermittent hot word (solar term notice, festivals or holidays remind) etc..Different target categories is being divided
The corresponding weighted of class model, and the weight of target category is used to indicate a possibility that target word group is as object event.
For example, two pieces target information A and B to be processed has currently been determined, the corresponding target category of A is that " how is game version
The consulting of update ", that is, how multiple message all update in consulting with a game version, at this point, determining this Category of consulting
Weight is higher than preset weight threshold, the corresponding target phrase of A is just determined as object event, and object event is pushed to place
The professional for managing consultation information needs to carry out emergent management to this burst focus incident.The corresponding target category of B is the " summer
To solar term ", there is a large number of users to deliver on the day of solar term for this solar term and sigh with deep feeling, and is not belonging to sudden focus incident, this
The weight of the classification of information is lower than preset weight threshold, then will not be object event by the corresponding target phrase marker of B, also not
Professional can be pushed to handle.
In one optional embodiment, determine that the target in information to be processed comprising target phrase waits for using disaggregated model
Target category corresponding to processing information can be realized by following steps:
S1, by target information input disaggregated model to be processed, wherein include multiple target words in target information to be processed
Group, disaggregated model are that the phrase for including is used to be trained to obtain to preliminary classification model as training sample in information to be processed
's;
S2, the corresponding target category of output target phrase.
Optionally, disaggregated model involved in the embodiment of the present invention can be, but not limited to be support vector machines (support
Vector machine, referred to as SVM) disaggregated model.By the phrase inputting in initial collected information to be processed to initially
In disaggregated model, preliminary classification model is trained, obtains the classification that can carry out classification detection to information to be processed automatically
Model.
In an optional embodiment, determine that the target in information to be processed comprising target phrase waits for using disaggregated model
Before handling target category corresponding to information, preliminary classification model can be trained by following steps:
The first object for having determined classification information to be processed is used to instruct as training sample to preliminary classification model
Practice, wherein comprising being labeled as the phrase of object event and being not marked with the word of object event in first object information to be processed
Group.
In an optional embodiment, uses and have determined the first object of classification information to be processed as training sample
Being trained to preliminary classification model can be realized by following steps:
The first object for having determined classification information to be processed is divided into training dataset, validation data set and survey by S1
Try data set, wherein for being trained to preliminary classification model, test data set is used for training data set validation data set
Disaggregated model after training is tested;
S2, it is initial training phrase that training dataset and verify data, which are concentrated the content sentence segmentation for including, will be occurred
Frequency is more than the initial training phrase of preset threshold as initial training sample, wherein the vector dimension of initial training sample is
The quantity of initial training sample;
S3 characterizes the semantic vector characterization that algorithm calculates initial training sample by vector;
The semantic vector of the vector dimension of initial training sample and initial training sample is characterized input preliminary classification mould by S4
Type is trained, and obtains disaggregated model;
S5 is tested by training result of the test data set to disaggregated model, and adjusts the model ginseng of disaggregated model
Number.
Optionally, it has been determined that the first object of classification information to be processed can be the short text letter for manually carrying out classification
Breath, can include but is not limited to the short text information of the sudden focus incident in part, the short text of part duration focus incident
Information, the short text information of the non-hot event in part and part mischief information.Training dataset, validation data set and survey
It may each comprise aforementioned several short text informations in examination data set, it is not limited in the embodiment of the present invention.
Initial training phrase can be obtained to content sentence segmentation by segmentation methods algorithm, it is of course also possible to use it
His word cutting algorithm or tool.The non-Chinese character portion in the methods of regular expression filtering phrase can be used in phrase after cutting
Point (punctuation mark, additional character, number, English etc.), then goes the processing of stop words to the result after participle, stopping herein
It can include but is not limited to interjection, modal particle, pronoun with this.For example, the content sentence for including in information to be processed be "
!New edition game installation kit!Why not can update!!!Why ", this content sentence finally can be with cutting for following phrase:
" new edition, updates game installation kit ", or " new edition, installation kit, cannot, update game ".The segmentation rules of phrase can root
It is configured according to the practical application scene of model, it is not limited in the embodiment of the present invention.
Optionally, using the frequency of occurrences be more than preset threshold initial training phrase as initial training sample, herein pre-
If threshold value can be 0, it is also possible to the arbitrary integer greater than 0, when preset threshold is 0, the whole that will exactly be obtained after cutting
Initial training phrase is as initial training sample.
It is alternatively possible to pass through word frequency-inverse document frequency (term frequency-inverse document
Frequency, referred to as TF-IDF) as vector characterization algorithm promote the semantic vector of sample to characterize to calculate, it can also make
Algorithm is characterized to calculate with other vectors, and it is not limited in the embodiment of the present invention.
Optionally, the characterization input of the semantic vector of the vector dimension of initial training sample and initial training Ah's sample is initial
Disaggregated model is trained, for example, when the quantity of initial training sample is 100, by the vector dimension 100 of initial training sample
It is trained in the semantic vector characterization input svm classifier model of initial training Ah's sample.
Optionally, cross validation training (k-fold cross Validation) algorithm can be rolled over by K, utilize test number
The training result of disaggregated model is tested according to collection.Such as 100 first object information to be processed are divided into k group data
Collection, wherein 2 groups are test data set, k-2 group is training dataset and/or validation data set, uses the training point of k-2 group data set
After class model, 2 groups of test data sets can be used to test training result.Joined by model of the test result to disaggregated model
Number is adjusted optimization, and test result illustrates that model parameter is more stable, reliability is stronger closer to legitimate reading.
In an optional embodiment, uses and have determined the first object of classification information to be processed as training sample
After being trained to preliminary classification model, the above method further include:
Object content sentence segmentation in target information to be processed is multiple targets training phrase, wherein target instruction by S1
Practice only comprising Chinese character and not comprising stop words in phrase, stop words includes at least interjection and/or pronoun and/or modal particle;
The target training phrase that the frequency of occurrences is more than preset threshold is determined as bag of words by S2;
S3 merges the bag of words training sample current with disaggregated model, forms target training sample;
S4 using target training sample train classification models, and adjusts the model parameter of disaggregated model.
Optionally, target can also be waited locating by target information to be processed before input disaggregated model carries out classification detection
Reason information input disaggregated model is trained.Bag of words of the high-frequency phrase as training in target information to be processed, can be real-time
The phrase for including in target information to be processed is updated into the training pattern into disaggregated model, avoids target information input to be processed point
Class model can not be identified when classification detection.
Optionally, cutting target training phrase, can be used identical method, example with aforementioned cutting initial training phrase
Such as, to target processing information in include content sentence be "!New edition ocr software installation kit!Why not can update!!!
Why ", this content sentence finally can be with cutting for following phrase: " new edition, updates ocr software installation kit ", or it is " new
Version, installation kit, cannot, update ocr software ", the segmentation rules of phrase can be set according to the practical application scene of model
It sets, it is not limited in the embodiment of the present invention.
Optionally, train phrase as training bag of words the target that the frequency of occurrences is more than preset threshold, it then will training word
Bag merges with current training sample, is the equal of constituting new training sample, to the sample comprising target training phrase
Perform trained operation.Optimization is adjusted to the model parameter to disaggregated model by test data set again after training, is tested
As a result closer to legitimate reading, illustrate that model parameter is more stable, reliability is stronger.
An optional embodiment uses and has determined the first object of classification information to be processed as training sample pair
After preliminary classification model is trained, periodically the training sample of disaggregated model can also be updated, mainly include following
Step:
S1 was obtained in last model training finish time to the period at current time, and the second target determined waits locating
Manage information, wherein comprising frequency of occurrence in predetermined amount of time more than preset times threshold value in institute's the second target information to be processed
Phrase;
The phrase for including in second target information to be processed is incorporated in the current training sample of disaggregated model by S2.
Optionally, by taking the disaggregated model training of customer service work order as an example, the update of work order disaggregated model may include it is automatic more
New and artificial regeneration.
Automatically updating may comprise steps of:
Whether S1, retrieval amended record work order library work order amount are more than that preset threshold, the last model training time gap are current
Whether interval is more than several indexs such as preset time threshold, and meeting it, first model starts update automatically;
S2, the original training sample of sample merging is transferred from amended record work order library;
S3, using segmentation methods (such as jieba algorithm), to treated, sentence is segmented;With the side such as regular expression
Non- Chinese character part (punctuation mark, additional character, number, English etc.) after method filtering participle in word;Result after participle is done
It goes stop words (interjection, modal particle, pronoun etc.) to handle, and bag of words is changed into index list;
Step 4, re -training work order disaggregated model.
Artificial regeneration step:
S1, retrieval model stability indicator (population stability index, referred to as PSI), if being more than threshold
Value, then issue alarm;
S2, it is artificial Feature Engineering is carried out to corpus again, examines corpus closely again, and training new model, manually to model into
Row arameter optimization, until model stability.
An optional embodiment, step S204 determine target phrase in multiple phrases, can pass through following step
It is rapid to realize:
The phrase that frequency of occurrence in content sentence is more than preset threshold is determined as the first phrase, wherein the first phrase by S1
In only include Chinese character;
S2, by today accounting less than the first default accounting threshold value and/or today word frequency less than the first default word frequency threshold and/
Or today word frequency growth rate less than the first default growth rate threshold value the first phrase abandon, obtain the second phrase, wherein today word
Frequency growth rate is the growth rate obtained relative to the word frequency of the previous day;
S3 clusters the second phrase, obtains the first phrase cluster;
S4, by today accounting less than the second default accounting threshold value and/or today word frequency less than the second default word frequency threshold and/
Or today, word frequency growth rate was abandoned less than the first phrase cluster of the second default growth rate threshold value, obtained the second phrase cluster;
S5 determines that the phrase in the second phrase cluster is target phrase.
Optionally, the phrase after cutting is frequently excavated first, the frequent mining algorithm of FP-GROWTH can be used,
Frequent phrase, that is, the first phrase are obtained, FP-GROWTH frequent item set mining is one kind of association rules mining algorithm, is led to
The confidence level crossed between qualified association project, support, promotion degree obtain frequent item set.Then two are carried out to the first phrase
Secondary filtering, a clustering processing.
Filtering for the first time includes: that the logarithmic curve model being arranged using regulation engine and parameter are filtered, for frequent phrase
First layer dynamic filtration is carried out, a part of ineligible frequent phrase is filtered, finally obtains the second phrase.Can pass through by
Today, accounting increased less than the first default accounting threshold value and/or word frequency today less than the first default word frequency threshold and/or word frequency today
Long rate is abandoned less than the first phrase of the first default growth rate threshold value, obtains the second phrase, optionally, today, word frequency growth rate was
The growth rate that word frequency relative to the previous day obtains.
Optionally, it in the frequent phrase of determination, can be obtained by the word frequency on the day of detection phrase.If the word frequency of phrase
Meet default word frequency threshold, it is also possible to be common property phrase, be not belonging to burst hot spot phrase, at this time it is contemplated that word frequency increases
Rate, i.e., growth rate of current word frequency phrase today relative to the previous day.When word frequency growth rate also meets preset rules, Ke Yijin
One step is determined as frequent phrase.If only seeing the word frequency growth rate of phrase, it is possible to which the previous day word frequency is especially small radix, even
It is 0, as long as it is inaccurate to will lead to data then minority occurs in word frequency today may have very high word frequency growth rate several times,
Consider whether word frequency today and/or accounting today meet preset condition simultaneously at this time, determines finally frequent after comprehensively considering
Phrase, that is, the second phrase.
It is alternatively possible to obtain accounting today of current first phrase using following formula:
P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase, m and n
Respectively constant;
By comparing accounting today of each the first phrase, determines the first phrase of minimum accounting today and abandon.
Optionally, it can also be judged by n-sigma rule for phrase word frequency, be obtained by the following formula first
Coefficient of variation of the phrase in current slot:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates that the first phrase exists
Word frequency mean value in the previous day same period, σ indicate the standard deviation of the first phrase word frequency within the previous day same period;
When coefficient of variation x ' is less than default undulating value, the first phrase is abandoned.
After filtering by above-mentioned first time, satisfactory frequent phrase, that is, the second phrase are obtained.
Optionally, DBSCAN clustering algorithm can be used to filtered frequent phrase to cluster, taking-up is all to include
The work order of frequent phrase uses bag of words vector as the expression of the vector of work order content, by DBSCAN algorithm to content vector into
Row cluster, obtains frequent phrase cluster, that is, the first phrase cluster.
It is alternatively possible to carry out rule-based filtering for the frequent phrase cluster after cluster using regulation engine, final symbol is obtained
The frequent phrase cluster of conjunction condition, that is, second is carried out to the first phrase cluster and is filtered, the second phrase cluster is obtained.Finally obtained
Phrase in two phrase clusters is exactly target phrase, and the information to be processed comprising target phrase is exactly target information to be processed.
Scheme provided in an embodiment of the present invention can be used for the emergency event arbitrarily based on short text and excavate scene, by short
Text message is monitored, after finding hot spot/emergency event can by way of mail, small routine, instant messaging group into
The prompting of row burst hot spot, prompts hot spot work order type, title is shown in the form of co-occurrence phrase cluster, in conjunction with text, chart, number
According to form burst/focus incident is reminded.
Fig. 3 is the object event alarm a kind of optional schematic diagram in interface according to an embodiment of the present invention.As shown in figure 3, logical
The end PC alarming page is crossed, shows single amount situation of change of cluster work order, and shows the year-on-year situation of work order on year-on-year basis by data,
Staff can judge the urgency level of affiliated cluster work order according to alarm, perform corresponding processing in time.Alert details page
Face illustrates the detailed page for the hot spot result that algorithm is excavated, the time series variation situation of hot spot phrase.As shown in Figure 3
A kind of optional push interface can be intuitive to see period, the name of product, hot word content " update, version of hot word screening
Downloading updates ", it is further seen that the word frequency on the day of 2019-6-2 changes with time trend, as shown by the solid line in the drawings, highest word
It occuring frequently present 12 points or so, is 41, it is further seen that the word frequency of the previous day 2019-6-1 changes, as shown in phantom in FIG., highest
Word frequency appears in 12 points or so, is 6.Average word frequency on the day of programming count goes out 2019-6-2 on interface.Pass through these linear changes
Change, can intuitively confirm whether current hot word is hot spot/emergency event.Meanwhile it can also show that work order is specific below interface
Information, such as user XXX1,2019-6-2 8:58 for the product that product code is a1 issued content be " more new version,
How to do " work order text, facilitate staff to grasp the request of specific work order in time.
Fig. 4 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention, as shown in figure 4,
It can be pushed away on a kind of optional push interface by the real-time push function of the small routine of the end computer PC or mobile terminal
Sending content may include hot word " updating, version downloading updates ", and work order classification is " consulting class ", while by name of product, push
The period (such as 8:00-12:00) of time, statistics comprising hot word work order, the work order in current slot for the hot word are anti-
Present number (24), the year-on-year growth rate (2300%) compared with the yesterday same period, current hot word accounting in whole work orders
Than (3.08%) and work order content etc., work order content is as shown in figure 4, include how " how more new version is done " " updates
Latest edition " etc..
Fig. 5 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention.As shown in figure 5,
It can be by the end computer PC or the real-time push function of the instant messaging group of mobile terminal, at a kind of optional push interface
On, such as in the instant messaging group of group's entitled " [name of product] user feedback early warning ", the push of user feedback early warning is received,
Content is pushed to include but is not limited to hot word " update, version downloading, update ", name of product, the push time, count comprising hot word work
For the work order feedback coefficient (24) of the hot word, same with yesterday in single period (such as 8:00-12:00), current slot
Accounting (3.08%) and work order of the year-on-year growth rate (2300%), current hot word that period is compared in whole work orders
Content etc., work order content is as shown in figure 5, may include when login " after why downloading update, me is also wanted to update ".
The display interface for the alarm pushing that Fig. 3 of the embodiment of the present invention to Fig. 5 is provided, may be implemented actively in time to hot spot/
The push of emergency event real-time tracking, feedback coefficient, year-on-year growth rate and hot word accounting can intuitively be presented current hot word and work as
Whether the word frequency of preceding period, the growth rate relative to proxima luce (prox. luc) and the accounting in phrase today are conveniently confirmed as being heat
Point/emergency event can also identify the mood of sender at that time from work order content, staff is facilitated to understand whenever and wherever possible
Hot spot/emergency event, is responded actively to make.
Optionally, hot spot vocabulary is excavated using FP-GROWTH model according to work order content, and hot spot vocabulary is gathered
Class, the work order for clustering out carry out the differentiation of work order classification by machine learning model again, can pass through flow chart as shown in FIG. 6
It completes.Fig. 6 is according to a kind of optional flow chart of the work order kind identification method of the embodiment of the present application, as shown in fig. 6, including
Following steps:
S601 obtains asynchronous work order, extracts work order content;
S602 extracts the description of the problems in period on same day work order, pre-processes to problem description;Pretreatment includes
The description of the problem of to after bulk processing (including using the spcial characters such as regular expression filtering expression) uses participle tool to segment,
It reuses the methods of regular expression and filters non-Chinese character part (punctuation mark, number etc.) for word;
S603, using the frequent mining algorithm Mining Frequent phrase of FP-GROWTH, FP-GROWTH frequent item set mining is to close
The one kind for joining rule mining algorithms obtains frequent episode by confidence level between qualified association project, support, promotion degree
Collection;
S604, the logarithmic curve model being arranged using regulation engine and parameter are filtered, and carry out first layer for frequent phrase
Dynamic filtration filters a part of ineligible frequent phrase;
S605 clusters filtered frequent phrase using DBSCAN clustering algorithm, and taking out all includes frequent word
The work order of group uses bag of words vector to express as the vector of work order content, is clustered by DBSCAN algorithm to content vector,
Obtain frequent phrase cluster;
S606 carries out rule-based filtering for the frequent phrase cluster after cluster using regulation engine, obtains final eligible
Frequent phrase cluster;
S607 carries out classification prediction for the work order in each frequent phrase cluster using SVM work order classification of type model;
S608, the frequent phrase cluster for reaching condition for threshold value stamp class formative.
Optionally, svm classifier model training process can be realized by following steps.Fig. 7 is according to embodiments of the present invention
Svm classifier model training method a kind of optional flow chart, as shown in fig. 7, comprises following steps:
S701, work order is labeled, determines work order classification;
S702 segments the work order after filtering and clustering processing using jieba, with the methods of regular expression
Non- Chinese character part (punctuation mark, additional character, number, English etc.) after filtering participle in word;
S703 goes stop words (interjection, modal particle, pronoun etc.) to handle the result after participle, passes through word frequency condition
Screening candidate word simultaneously constructs bag of words and bag of words is changed into index list.S704 has marked work order by history and data has been divided into
Training dataset, validation data set, test data set;
S705 rolls over cross validation training SVM work order disaggregated model, optimization model parameter by K;
S7051 segments all work orders, and filtering rejects stop words and monosyllabic word, limits selected section by word frequency
Word indicates text as the dimension of vector;
S7052 is expressed by the TF-IDF vector that TF-IDF calculates each work order as the vector of work order;
S7053 rolls over cross validation training SVM work order disaggregated model by K and adjusts ginseng.
S706 carries out model measurement by test data set, passes through the model evaluations parameter evaluation model prediction such as F1, KS value
The numerical value of ability, F1, KS is bigger, and the accuracy of model is higher.Constantly to model parameter tuning, strengthen model prediction ability and general
Change ability;
S707, the disaggregated model after output training.
Realize automatic discovery, the statistic record to work order focus incident;The essence of hot spot is realized using n-sigma parameter
Quasi- subsidiary discriminant;Hot spot discovery, focus incident type identification integration are realized, human intervention is not necessarily to, can find current heat
Point work order event, and automatically reporting event type, significant increase customer service working efficiency;With the model stabilities parameter moment such as PSI
Detection model predicts whether work order category distribution situation, monitoring model estimated performance are stable.
The scheme provided through the embodiment of the present invention realizes following technical effect:
The embodiment of the present invention uses bag of words, and bag of words are calculated in real time using current corpus, and there is no new terms to make
The case where using.
The embodiment of the present invention excavates co-occurrence phrase by FP-GROWTH model, has filtered low frequency phrase, with co-occurrence phrase work
To show form, the similar work order of carry out cluster result of high frequency co-occurrence phrase is formed by using the work order that bag of words vector characterizes
Event cluster.The event cluster excavated is directed to by regulation engine to be evaluated, and event cluster can be carried out using co-occurrence phrase
Laterally, the comparison of longitudinal two dimensions, can not only investigate the magnitude of event cluster, it is also considered that the dimensions index such as growth rate is excavated
Hot spot and excavate emergency event out.
The embodiment of the present invention realizes intelligent distinguishing to work order classification by machine learning model, solves in the prior art
Hot spot shows need to be by the pain spot manually selected.
The embodiment of the present invention realizes the closed loop of hot spot work order discovery early warning in technology side, excavation to frequent phrase and right
The classification annotation of work order classification can be automatically performed by program.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
According to the other side of the embodiment of the present application, additionally provide a kind of for implementing the label side of above-mentioned object event
The labelling apparatus of the object event of method.Fig. 8 is that one kind of the labelling apparatus of object event according to an embodiment of the present invention is optional
Structural block diagram, as shown in figure 8, the device includes:
Obtain module 802, for obtaining the content sentence carried in information to be processed, wherein content sentence be split for
One or more phrases;
First determining module 804, for determining target phrase in multiple phrases, wherein target phrase is to appear in
Frequency of occurrence is more than the phrase of preset times threshold value in information to be processed described in same and within a predetermined period of time;
Second determining module 806, for determining that the target in information to be processed comprising target phrase waits for using disaggregated model
Handle target category corresponding to information, wherein different classes of including target category corresponds to different power in disaggregated model
Weight, the weight of target category are used to indicate a possibility that target word group is as object event;
Mark module 808, in the case where the corresponding weight of target category is more than default weight threshold, target to be waited for
The target phrase marker for including in processing information is object event.
Optionally, the second determining module includes: input unit, is used for target information input disaggregated model to be processed,
In, comprising one or more target phrases in target information to be processed, disaggregated model is using the word for including in information to be processed
What group was trained preliminary classification model as training sample;Output unit, for exporting the corresponding mesh of target phrase
Mark classification.
Optionally, device further include: training module, for using the first object information to be processed for having determined classification to make
Preliminary classification model is trained for training sample, wherein comprising being labeled as object event in first object information to be processed
Phrase and be not marked with the phrase of object event.
Optionally, training module includes: division unit, and the information to be processed of the first object for that will have determined classification is drawn
Be divided into training dataset, validation data set and test data set, wherein validation data set described in training data set be used for point
Class model is trained, and test data set is for testing the disaggregated model after training;First cutting unit, for that will instruct
It is initial training phrase that white silk data set and verify data, which concentrate the content sentence segmentation for including, is more than preset threshold by the frequency of occurrences
Initial training phrase as initial training sample, wherein the vector dimension of initial training sample be initial training sample number
Amount;Computing unit, for characterizing the semantic vector characterization that algorithm calculates initial training sample by vector;First training unit,
It is carried out for the semantic vector of the vector dimension of initial training sample and initial training sample to be characterized input preliminary classification model
Training, obtains disaggregated model;Test cell, for being tested by training result of the test data set to disaggregated model, and
Adjust the model parameter of disaggregated model.
Optionally, training module further include: the second cutting unit, for by the object content language in target information to be processed
Sentence cutting is multiple targets training phrase, wherein only comprising Chinese character and not comprising stop words, stop words in target training phrase
Including at least interjection and/or pronoun and/or modal particle;Determination unit, for being more than the target of preset threshold by the frequency of occurrences
Training phrase is determined as bag of words;First combining unit forms mesh for merging the bag of words training sample current with disaggregated model
Mark training sample;Second training unit for using target training sample train classification models, and adjusts the model of disaggregated model
Parameter.
Optionally, training module further include: first acquisition unit, for obtaining last model training finish time to working as
In the period at preceding moment, determining the second target information to be processed, wherein include pre- timing in the second target information to be processed
Between in section frequency of occurrence be more than preset times threshold value phrase;Second combining unit, being used for will be in the second target information to be processed
The phrase for including is incorporated in the current training sample of disaggregated model.
Optionally, the first determining module includes: the first determination unit, for that will will appear in the same content sentence
In and in the content sentence of multiple information to be processed frequency of occurrence be more than preset threshold phrase be determined as first
Phrase, wherein only include Chinese character in the first phrase;First discarding unit is used for accounting today less than the first default accounting threshold
Value and/or word frequency today less than the first default word frequency threshold and/or word frequency growth rate today less than the first default growth rate threshold value
The first phrase abandon, obtain the second phrase, wherein today word frequency growth rate be relative to the previous day word frequency obtain growth
Rate;Cluster cell obtains the first phrase cluster for clustering to the second phrase;Second discarding unit is used for accounting today
It is less than less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or word frequency growth rate today
The first phrase cluster of second default growth rate threshold value abandons, and obtains the second phrase cluster;Second determination unit, for determining
Phrase in two phrase clusters is target phrase.
Optionally, the first discarding unit includes: acquisition subelement, for using following formula to obtain presently described first word
Accounting today of group: P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase,
M and n is respectively constant;It determines subelement, for accounting today by comparing each the first phrase, determines that today, minimum accounted for
First phrase of ratio;Subelement is abandoned, for abandoning the first phrase of minimum accounting today.
Optionally, the first determining module further include:
Second acquisition unit, for being obtained by the following formula fluctuation of first phrase in current slotCoefficient:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates that the first phrase exists
Word frequency mean value in the previous day same period, σ indicate the standard deviation of the first phrase word frequency within the previous day same period;
Third discarding unit, for when coefficient of variation is less than default undulating value, the first phrase to be abandoned.
Fig. 9 is a kind of coefficient of variation display interface schematic diagram according to an embodiment of the present invention, as shown in figure 9, horizontal axis indicates
Period, the longitudinal axis indicate the word frequency of target phrase, and dotted line indicates yesterday (2019-6-1) word frequency, and solid line indicates (2019-6- today
2) word frequency, as seen from Figure 9, highest word frequency today are 69, can be calculated by the calculation formula of above-mentioned coefficient of variation in 10:00-
The coefficient of variation of 14:00 period is 51, if preset undulating value 5, coefficient of variation today of target phrase is more than
Default undulating value, can retain.
Figure 10 is a kind of optional certain month early warning situation schematic diagram according to an embodiment of the present invention, as shown in Figure 10, this hair
The labeling method for the object event that bright embodiment provides, will be applied to e-payment, the game of the end PC, game of mobile terminal, video
Customer service work order (short text) hot spot of the products such as broadcasting, which happens suddenly, to be alerted, and is calculated overall early warning accuracy rate and is reached 81%.Compared to original
Artificial order realizes early warning and crosses over from scratch.Acted in the understaffed products of some services it is especially pronounced, greatly
The discovery of first-line staff, statistics pressure have been liberated in ground.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the label side of above-mentioned file destination
The electronic device of method, above-mentioned electronic device can be, but not limited to be applied in above-mentioned server 112 shown in FIG. 1.Such as Figure 11 institute
Show, which includes memory 902 and processor 904, is stored with computer program in the memory 902, the processor
904 are arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network
At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
Step S1 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;
Step S2 determines target phrase in multiple phrases, wherein target phrase is frequency of occurrence in predetermined amount of time
More than the phrase of preset times threshold value;
Step S3 is determined corresponding to the target information to be processed in information to be processed comprising target phrase using disaggregated model
Target category, wherein different classes of including target category corresponds to different weights, the power of target category in disaggregated model
It is reused in a possibility that instruction target word group becomes object event;
Step S4, in the case where the corresponding weight of target category is more than default weight threshold, by target information to be processed
In include target phrase marker be object event.
Optionally, it will appreciated by the skilled person that structure shown in Figure 11 is only to illustrate, electronic device can also
To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device
The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 11 it does not make to the structure of above-mentioned electronic device
At restriction.For example, electronic device may also include than shown in Figure 11 more perhaps less component (such as network interface) or
With the configuration different from shown in Figure 11.
Wherein, memory 902 can be used for storing software program and module, such as the request of data in the embodiment of the present invention
Corresponding program instruction/the module for the treatment of method and apparatus, the software journey that processor 904 is stored in memory 902 by operation
Sequence and module realize the processing method of above-mentioned request of data thereby executing various function application and data processing.It deposits
Reservoir 902 may include high speed random access memory, can also include nonvolatile memory, as one or more magnetic storage fills
It sets, flash memory or other non-volatile solid state memories.In some instances, memory 902 can further comprise relative to place
The remotely located memory of device 904 is managed, these remote memories can pass through network connection to terminal.The example packet of above-mentioned network
Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.Wherein, memory 902 specifically can be with
But it is not limited to use in the program step of the labeling method of storage object event.As an example, as shown in figure 9, above-mentioned storage
Can be, but not limited in device 902 include acquisition module 802 in the labelling apparatus of above-mentioned object event, the first determining module 804,
Second determining module 806 and mark module 808.In addition, it can include but being not limited in the labelling apparatus of above-mentioned object event
Other modular units, repeat no more in this example.
Optionally, above-mentioned transmitting device 906 is used to that data to be received or sent via a network.Above-mentioned network tool
Body example may include cable network and wireless network.In an example, transmitting device 906 includes a network adapter
(Network Interface Controller, NIC), can be connected by cable with other network equipments with router to
It can be communicated with internet or local area network.In an example, transmitting device 906 is radio frequency (Radio Frequency, RF)
Module is used to wirelessly be communicated with internet.
In addition, above-mentioned electronic device further include: display 908, the alarm pushing for displaying target event;It is total with connection
Line 910, for connecting the modules component in above-mentioned electronic device.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
Step S1 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;
Step S2 determines target phrase in multiple phrases, wherein target phrase is frequency of occurrence in predetermined amount of time
More than the phrase of preset times threshold value;
Step S3 is determined corresponding to the target information to be processed in information to be processed comprising target phrase using disaggregated model
Target category, wherein different classes of including target category corresponds to different weights, the power of target category in disaggregated model
It is reused in a possibility that instruction target word group becomes object event;
Step S4, in the case where the corresponding weight of target category is more than default weight threshold, by target information to be processed
In include target phrase marker be object event.
Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment
Computer program, this is repeated no more in the present embodiment.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment
All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in
In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory,
ROM), random access device (Random Access Memory, RAM), disk or CD etc..
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, the skill of the application
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the application
State all or part of the steps of method.
In above-described embodiment of the application, all emphasizes particularly on different fields to the description of each embodiment, do not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered
It is considered as the protection scope of the application.
Claims (10)
1. a kind of labeling method of object event characterized by comprising
Obtain the content sentence carried in information to be processed, wherein the content sentence is split multiple phrases;
Target phrase is determined in the multiple phrase, wherein the target phrase is to be processed described in same to appear in
Frequency of occurrence is more than the phrase of preset times threshold value in information and within a predetermined period of time;
It is determined corresponding to the target information to be processed in the information to be processed comprising the target phrase using disaggregated model
Target category, wherein different classes of including the target category corresponds to different weights, the mesh in the disaggregated model
The weight of mark classification is used to indicate a possibility that target word group is as object event;
In the case where the corresponding weight of the target category is more than default weight threshold, will be wrapped in target information to be processed
The target phrase marker contained is the object event.
2. the method according to claim 1, wherein described determined in the information to be processed using disaggregated model
Target category corresponding to target information to be processed comprising the target phrase, comprising:
By disaggregated model described in target information input to be processed, wherein the disaggregated model is using the letter to be processed
The phrase for including in breath is trained preliminary classification model as training sample;
Export the corresponding target category of target information to be processed.
3. according to the method described in claim 2, it is characterized in that, being determined in the information to be processed using disaggregated model and including
Before target category corresponding to the target information to be processed of the target phrase, the method also includes:
The first object for having determined classification information to be processed is used to instruct as training sample to the preliminary classification model
Practice, wherein comprising being labeled as the phrase of object event and being not marked with object event in the first object information to be processed
Phrase.
4. according to the method described in claim 3, it is characterized in that, using the first object information to be processed for having determined classification
The preliminary classification model is trained as training sample and includes:
The first object for having determined classification information to be processed is divided into training dataset, validation data set and test data
Collection, wherein the training dataset and the validation data set are for being trained the preliminary classification model, the test
Data set is for testing the disaggregated model after training;
It is initial training phrase that the training dataset and the verify data, which are concentrated the content sentence segmentation for including, will
The frequency of occurrences is more than the initial training phrase of preset threshold as initial training sample, wherein the initial training sample
Vector dimension be the initial training sample quantity;
The semantic vector characterization that algorithm calculates the initial training sample is characterized by vector;
The characterization input of the semantic vector of the vector dimension of the initial training sample and the initial training sample is described initial
Disaggregated model is trained, and obtains the disaggregated model;
It is tested by training result of the test data set to the disaggregated model, and adjusts the mould of the disaggregated model
Shape parameter.
5. according to the method described in claim 3, it is characterized in that, using the first object information to be processed for having determined classification
After being trained as training sample to the preliminary classification model, the method also includes:
It is multiple targets training phrase by the object content sentence segmentation in target information to be processed, wherein the target
Only comprising Chinese character and not comprising stop words in training phrase, the stop words includes at least interjection and/or pronoun and/or language
Gas word;
The target training phrase that the frequency of occurrences is more than preset threshold is determined as bag of words;
The bag of words training sample current with the disaggregated model is merged, target training sample is formed;
Using the target training sample training disaggregated model, and adjust the model parameter of the disaggregated model.
6. according to the described in any item methods of claim 3 to 5, which is characterized in that use the first object for having determined classification
After information to be processed is trained the preliminary classification model as training sample, the method also includes:
It obtains in last model training finish time to the period at current time, the second target information to be processed determined,
It wherein, is more than the phrase of preset times threshold value comprising frequency of occurrence in predetermined amount of time in the second target information to be processed;
The phrase for including in the second target information to be processed is incorporated in the current training sample of the disaggregated model.
7. the method according to claim 1, wherein determining that target phrase includes: in the multiple phrase
It will appear in the same content sentence and the frequency of occurrence in the content sentence of multiple information to be processed
Phrase more than preset threshold is determined as the first phrase, wherein only includes Chinese character in first phrase;
By accounting today less than the first default accounting threshold value and/or word frequency today less than the first default word frequency threshold and/or today
Word frequency growth rate is abandoned less than first phrase of the first default growth rate threshold value, obtains the second phrase, wherein described today
Word frequency growth rate is the growth rate obtained relative to the word frequency of the previous day;
Second phrase is clustered, the first phrase cluster is obtained;
By accounting today less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or today
Word frequency growth rate is abandoned less than the first phrase cluster of the second default growth rate threshold value, obtains the second phrase cluster;
Determine that the phrase in the second phrase cluster is the target phrase.
8. the method according to the description of claim 7 is characterized in that by accounting today less than described in the first default accounting threshold value
First phrase abandons
Accounting today of presently described first phrase is obtained using following formula:
P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase, m and n difference
For constant;
By comparing accounting today of the first phrase described in each, determines first phrase of minimum accounting today and lose
It abandons.
9. the method according to the description of claim 7 is characterized in that by accounting today less than the first default accounting threshold value and/or
Today, word frequency was less than the first default word frequency threshold and/or described of word frequency growth rate less than the first default growth rate threshold value today
After one phrase abandons, the method also includes:
It is obtained by the following formula coefficient of variation of first phrase in current slot:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates first word
Word frequency mean value of the group within the previous day same period, σ indicate first phrase word frequency within the previous day same period
Standard deviation;
When the coefficient of variation is less than default undulating value, first phrase is abandoned.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to execute side described in any one of claim 1 to 9 by the computer program
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910713377.4A CN110458296B (en) | 2019-08-02 | 2019-08-02 | Method and device for marking target event, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910713377.4A CN110458296B (en) | 2019-08-02 | 2019-08-02 | Method and device for marking target event, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110458296A true CN110458296A (en) | 2019-11-15 |
CN110458296B CN110458296B (en) | 2023-08-29 |
Family
ID=68484679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910713377.4A Active CN110458296B (en) | 2019-08-02 | 2019-08-02 | Method and device for marking target event, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458296B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111060325A (en) * | 2019-12-13 | 2020-04-24 | 斑马网络技术有限公司 | Test scene construction method and device, electronic equipment and storage medium |
CN111178679A (en) * | 2019-12-06 | 2020-05-19 | 中能瑞通(北京)科技有限公司 | Phase identification method based on clustering algorithm and network search |
CN111782803A (en) * | 2020-06-05 | 2020-10-16 | 京东数字科技控股有限公司 | Work order processing method and device, electronic equipment and storage medium |
CN113419210A (en) * | 2021-06-09 | 2021-09-21 | Oppo广东移动通信有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113645439A (en) * | 2021-06-22 | 2021-11-12 | 宿迁硅基智能科技有限公司 | Event detection method and system, storage medium and electronic device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929977A (en) * | 2012-10-16 | 2013-02-13 | 浙江大学 | Event tracing method aiming at news website |
US20170083484A1 (en) * | 2015-09-21 | 2017-03-23 | Tata Consultancy Services Limited | Tagging text snippets |
CN106649274A (en) * | 2016-12-27 | 2017-05-10 | 东华互联宜家数据服务有限公司 | Text content tag labeling method and device |
CN106682123A (en) * | 2016-12-09 | 2017-05-17 | 北京锐安科技有限公司 | Hot event acquiring method and device |
CN108170692A (en) * | 2016-12-07 | 2018-06-15 | 腾讯科技(深圳)有限公司 | A kind of focus incident information processing method and device |
CN108563655A (en) * | 2017-12-28 | 2018-09-21 | 北京百度网讯科技有限公司 | Text based event recognition method and device |
CN108595519A (en) * | 2018-03-26 | 2018-09-28 | 平安科技(深圳)有限公司 | Focus incident sorting technique, device and storage medium |
CN108763272A (en) * | 2018-04-08 | 2018-11-06 | 平安科技(深圳)有限公司 | A kind of event information analysis method, computer readable storage medium and terminal device |
CN109271639A (en) * | 2018-10-11 | 2019-01-25 | 南京中孚信息技术有限公司 | Hot ticket finds method and device |
US20190065507A1 (en) * | 2017-08-22 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for information processing |
CN109726289A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Event detecting method and device |
CN109918505A (en) * | 2019-02-26 | 2019-06-21 | 西安电子科技大学 | A kind of network security incident visualization method based on text-processing |
-
2019
- 2019-08-02 CN CN201910713377.4A patent/CN110458296B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102929977A (en) * | 2012-10-16 | 2013-02-13 | 浙江大学 | Event tracing method aiming at news website |
US20170083484A1 (en) * | 2015-09-21 | 2017-03-23 | Tata Consultancy Services Limited | Tagging text snippets |
CN108170692A (en) * | 2016-12-07 | 2018-06-15 | 腾讯科技(深圳)有限公司 | A kind of focus incident information processing method and device |
CN106682123A (en) * | 2016-12-09 | 2017-05-17 | 北京锐安科技有限公司 | Hot event acquiring method and device |
CN106649274A (en) * | 2016-12-27 | 2017-05-10 | 东华互联宜家数据服务有限公司 | Text content tag labeling method and device |
US20190065507A1 (en) * | 2017-08-22 | 2019-02-28 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for information processing |
CN108563655A (en) * | 2017-12-28 | 2018-09-21 | 北京百度网讯科技有限公司 | Text based event recognition method and device |
CN108595519A (en) * | 2018-03-26 | 2018-09-28 | 平安科技(深圳)有限公司 | Focus incident sorting technique, device and storage medium |
CN108763272A (en) * | 2018-04-08 | 2018-11-06 | 平安科技(深圳)有限公司 | A kind of event information analysis method, computer readable storage medium and terminal device |
CN109271639A (en) * | 2018-10-11 | 2019-01-25 | 南京中孚信息技术有限公司 | Hot ticket finds method and device |
CN109726289A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Event detecting method and device |
CN109918505A (en) * | 2019-02-26 | 2019-06-21 | 西安电子科技大学 | A kind of network security incident visualization method based on text-processing |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178679A (en) * | 2019-12-06 | 2020-05-19 | 中能瑞通(北京)科技有限公司 | Phase identification method based on clustering algorithm and network search |
CN111060325A (en) * | 2019-12-13 | 2020-04-24 | 斑马网络技术有限公司 | Test scene construction method and device, electronic equipment and storage medium |
CN111782803A (en) * | 2020-06-05 | 2020-10-16 | 京东数字科技控股有限公司 | Work order processing method and device, electronic equipment and storage medium |
CN113419210A (en) * | 2021-06-09 | 2021-09-21 | Oppo广东移动通信有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113645439A (en) * | 2021-06-22 | 2021-11-12 | 宿迁硅基智能科技有限公司 | Event detection method and system, storage medium and electronic device |
CN113645439B (en) * | 2021-06-22 | 2022-07-29 | 宿迁硅基智能科技有限公司 | Event detection method and system, storage medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN110458296B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458296A (en) | The labeling method and device of object event, storage medium and electronic device | |
CN109271512B (en) | Emotion analysis method, device and storage medium for public opinion comment information | |
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
CN106030571B (en) | Dynamically modifying elements of a user interface based on a knowledge graph | |
US9411327B2 (en) | Systems and methods for classifying data in building automation systems | |
Inzalkar et al. | A survey on text mining-techniques and application | |
CN107578292B (en) | User portrait construction system | |
CN109145215A (en) | Internet public opinion analysis method, apparatus and storage medium | |
CN108776671A (en) | A kind of network public sentiment monitoring system and method | |
CN110020002A (en) | Querying method, device, equipment and the computer storage medium of event handling scheme | |
CN107220295A (en) | A kind of people's contradiction reconciles case retrieval and mediation strategy recommends method | |
CN109033200A (en) | Method, apparatus, equipment and the computer-readable medium of event extraction | |
CN110008343A (en) | File classification method, device, equipment and computer readable storage medium | |
CN106095939B (en) | The acquisition methods and device of account authority | |
CN107704070A (en) | Using method for cleaning, device, storage medium and electronic equipment | |
CN104050361A (en) | Intelligent analysis early warning method for dangerousness tendency of prison persons serving sentences | |
CN112948575B (en) | Text data processing method, apparatus and computer readable storage medium | |
CN104462096B (en) | Public sentiment method for monitoring and analyzing and device | |
CN107145516A (en) | A kind of Text Clustering Method and system | |
CN107229614A (en) | Method and apparatus for grouped data | |
CN107704289A (en) | Using method for cleaning, device, storage medium and electronic equipment | |
CN109960719A (en) | A kind of document handling method and relevant apparatus | |
CN105512300B (en) | information filtering method and system | |
CN112966072A (en) | Case prediction method and device, electronic device and storage medium | |
CN115858906A (en) | Enterprise searching method, device, equipment, computer storage medium and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |