CN110458296A - The labeling method and device of object event, storage medium and electronic device - Google Patents

The labeling method and device of object event, storage medium and electronic device Download PDF

Info

Publication number
CN110458296A
CN110458296A CN201910713377.4A CN201910713377A CN110458296A CN 110458296 A CN110458296 A CN 110458296A CN 201910713377 A CN201910713377 A CN 201910713377A CN 110458296 A CN110458296 A CN 110458296A
Authority
CN
China
Prior art keywords
phrase
target
processed
information
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910713377.4A
Other languages
Chinese (zh)
Other versions
CN110458296B (en
Inventor
邹耿鹏
段建波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910713377.4A priority Critical patent/CN110458296B/en
Publication of CN110458296A publication Critical patent/CN110458296A/en
Application granted granted Critical
Publication of CN110458296B publication Critical patent/CN110458296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a kind of labeling methods of object event and device, storage medium and electronic device.Wherein, this method comprises: obtaining the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;Target phrase is determined in multiple phrases, wherein target phrase be appear in same information to be processed and within a predetermined period of time frequency of occurrence be more than preset times threshold value phrase;Target category corresponding to the target information to be processed in information to be processed comprising target phrase is determined using disaggregated model, wherein, different classes of including target category corresponds to different weights in disaggregated model, and the weight of target category is used to indicate a possibility that target word group is as object event;It is object event in target information to be processed by the target phrase marker for including in the case where the corresponding weight of target category is more than default weight threshold.At least to solve the problems, such as that the efficiency detected in the related technology to object event is lower.

Description

The labeling method and device of object event, storage medium and electronic device
Technical field
The present invention relates to game data processing technology field, a kind of labeling method in particular to object event and Device, storage medium and electronic device.
Background technique
At present in the related art, mainly word is used to be embedded in (Word for the detection of network hotspot event Embedding) related algorithm trains term vector model to realize.Specifically, obtaining the vector of word rank using term vector model Then trunk word is extracted in expression in such a way that term vector splices or obtains sentence trunk, recycle the modes such as training pattern It expresses to obtain sentence vector, then sentence vector is clustered by clustering method, obtain event cluster.But phase at present The mode that pass technology provides, which can not achieve, carries out intelligent recognition to the classification of event cluster, that is, can not accurately determine out to be checked The event of survey is genuine focus incident, or the normal event that interim frequency is high, it is often necessary to distinguish this by manually Whether event is focus incident.
That is, this detection mode that the relevant technologies provide, needs to put into a large amount of human cost, so as to event The complexity of detection increases, so as to cause the lower problem of detection efficiency.
In view of the above-mentioned problems, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of labeling method of object event and device, storage medium and electronic device, with At least solve the problems, such as that the efficiency detected in the related technology to object event is lower.
According to an aspect of an embodiment of the present invention, a kind of labeling method of object event is provided, comprising: obtain wait locate The content sentence carried in reason information, wherein the content sentence is split as multiple phrases;It is determined in the multiple phrase Target phrase out, wherein the target phrase is to appear in information to be processed described in same and go out within a predetermined period of time Occurrence number is more than the phrase of preset times threshold value;It is determined in the information to be processed using disaggregated model comprising the target phrase Target information to be processed corresponding to target category, wherein it is different classes of in the classification mould including the target category Different weights is corresponded in type, the weight of the target category, which is used to indicate the target word group, becomes the possibility of object event Property;In the case where the corresponding weight of the target category is more than default weight threshold, will be wrapped in target information to be processed The target phrase marker contained is the object event.
According to another aspect of an embodiment of the present invention, a kind of labelling apparatus of object event is additionally provided, comprising: obtain mould Block, for obtaining the content sentence carried in information to be processed, wherein the content sentence is split as one or more words Group;First determining module, for determining target phrase in the multiple phrase, wherein the target phrase is to appear in Frequency of occurrence is more than the phrase of preset times threshold value in information to be processed described in same and within a predetermined period of time;Second really Cover half block, for determining the target information institute to be processed in the information to be processed comprising the target phrase using disaggregated model Corresponding target category, wherein different classes of including the target category corresponds to different weights in the disaggregated model, The weight of the target category is used to indicate a possibility that target word group is as object event;Mark module, in institute Stating the corresponding weight of target category is more than in the case where presetting weight threshold, described in including in target information to be processed Target phrase marker is the object event.
Optionally, second determining module includes: input unit, and being used for will be described in target information input to be processed Disaggregated model, wherein comprising one or more target phrases in the target information to be processed, the disaggregated model is to make The phrase for including in the information to be processed is used to be trained as training sample to preliminary classification model;Output Unit, for exporting the corresponding target category of the target phrase.
Optionally, described device further include: training module, for using the first object letter to be processed for having determined classification Breath is trained the preliminary classification model as training sample, wherein includes mark in the first object information to be processed It is denoted as the phrase of object event and is not marked with the phrase of object event.
Optionally, the training module includes: division unit, the letter to be processed of the first object for that will have determined classification Breath is divided into training dataset, validation data set and test data set, wherein the training dataset and the validation data set For being trained to the disaggregated model, the test data set is for testing the disaggregated model after training; First cutting unit is first for the training dataset and the verify data to be concentrated the content sentence segmentation for including Begin training phrase, is more than the initial training phrase of preset threshold as initial training sample using the frequency of occurrences, wherein described The vector dimension of initial training sample is the quantity of the initial training sample;Computing unit, for characterizing algorithm by vector Calculate the semantic vector characterization of the initial training sample;First training unit, for by the vector of the initial training sample The semantic vector of dimension and initial training sample characterization inputs the preliminary classification model and is trained, and obtains the classification Model;Test cell for testing by training result of the test data set to the disaggregated model, and adjusts institute State the model parameter of disaggregated model.
Optionally, the training module further include: the second cutting unit, for by the mesh in target information to be processed Marking content sentence segmentation is multiple targets training phrase, wherein is only stopped comprising Chinese character and not including in target training phrase Word, the stop words include at least interjection and/or pronoun and/or modal particle;Determination unit, for being more than by the frequency of occurrences The target training phrase of preset threshold is determined as bag of words;First combining unit is used for the bag of words and the classification mould The current training sample of type merges, and forms target training sample;Second training unit, for being instructed using the target training sample Practice the disaggregated model, and adjusts the model parameter of the disaggregated model.
Optionally, the training module further include: first acquisition unit, for obtaining last model training finish time To the period at current time, determining the second target information to be processed, wherein wrapped in the second target information to be processed It is more than the phrase of preset times threshold value containing frequency of occurrence in predetermined amount of time;Second combining unit is used for second target The phrase for including in information to be processed is incorporated in the current training sample of the disaggregated model.
Optionally, first determining module includes: the first determination unit, for will appear in the same content sentence In and in the content sentence of multiple information to be processed frequency of occurrence be more than preset threshold phrase be determined as first Phrase, wherein only include Chinese character in first phrase;First discarding unit is accounted for for presetting accounting today less than first Than threshold value and/or word frequency today less than the first default word frequency threshold and/or word frequency growth rate today less than the first default growth rate First phrase of threshold value abandons, and obtains the second phrase, wherein the word frequency today growth rate is the word relative to the previous day The growth rate that frequency obtains;Cluster cell obtains the first phrase cluster for clustering to second phrase;Second abandons list Member was used for accounting today less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or the present Day word frequency growth rate is abandoned less than the first phrase cluster of the second default growth rate threshold value, obtains the second phrase cluster;Second really Order member, for determining that the phrase in the second phrase cluster is the target phrase.
Optionally, first discarding unit includes: acquisition subelement, for using following formula to obtain presently described Accounting today of one phrase: P1=exp (log p/m)/log n) } wherein, on the day before p indicates presently described first phrase Accounting, m and n are respectively constant;It determines subelement, for accounting today by comparing each first phrase, determines Today minimum accounting first phrase;Subelement is abandoned, for abandoning first phrase of minimum accounting today.
Optionally, first determining module further include:
Second acquisition unit, for being obtained by the following formula fluctuation system of first phrase in current slot Number:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates described the Word frequency mean value of one phrase within the previous day same period, σ indicate the first phrase word within the previous day same period The standard deviation of frequency;
Third discarding unit, for when the coefficient of variation is less than default undulating value, first phrase to be abandoned.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the determination method of above-mentioned object event when operation.
Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits Store up the computer program that can be run on a memory and on a processor, wherein above-mentioned processor passes through computer program and executes The determination method of above-mentioned object event.
In embodiments of the present invention, otherwise using disaggregated model automatic identification info class to be processed, by obtain to The content sentence carried in processing information, wherein content sentence is split as multiple phrases;Target is determined in multiple phrases Phrase, wherein target phrase is the phrase that frequency of occurrence is more than preset times threshold value in predetermined amount of time;It is true using disaggregated model Target category corresponding to target information to be processed comprising target phrase in fixed information to be processed, wherein including target category It is different classes of correspond to different weights in disaggregated model, the weight of target category is used to indicate target word group as target thing A possibility that part;In the case where the corresponding weight of target category is more than default weight threshold, will be wrapped in target information to be processed The target phrase marker contained is object event.It can be more than pre- by frequency of occurrence in predetermined amount of time by determining target phrase If the phrase of frequency threshold value screens, the mesh in information to be processed comprising target phrase is then determined by using disaggregated model Target category corresponding to information to be processed is marked, realizes the automatic classification to information to be processed, and be directed in disaggregated model Different weights is arranged in different target categories, and the corresponding target phrase of classification for only reaching default weight threshold is just labeled For object event, further screening meets the phrase of object event rule, avoids manually carrying out whether category filter is target Event leads to the problem of inefficiency, has achieved the purpose that automatically identify target information generic to be processed, to realize Target phrase in automatic detection target information to be processed whether be object event technical effect.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is according to a kind of hardware environment schematic diagram of the labeling method of optional object event of the embodiment of the present application;
Fig. 2 is a kind of flow chart of the labeling method of optional object event of the embodiment of the present application;
Fig. 3 is the object event alarm a kind of optional schematic diagram in interface according to an embodiment of the present invention;
Fig. 4 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention;
Fig. 5 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention;
Fig. 6 is a kind of optional flow chart according to the work order kind identification method of the embodiment of the present application;
Fig. 7 is a kind of optional flow chart of svm classifier model training method according to an embodiment of the present invention;
Fig. 8 is a kind of optional structural block diagram of the labelling apparatus of object event according to an embodiment of the present invention;
Fig. 9 is a kind of coefficient of variation display interface schematic diagram according to an embodiment of the present invention;
Figure 10 is a kind of optional certain month early warning situation schematic diagram according to an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of optional electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
In order to solve the above-mentioned technical problem, the embodiment of the present application provides a kind of labeling method of object event.Fig. 1 is root According to a kind of hardware environment schematic diagram of the labeling method of optional object event of the embodiment of the present application, as shown in Figure 1, the hardware loop Border can include but is not limited to the first user equipment 102, network 110, server 112, second user equipment 202, wherein first It can include but is not limited to memory 104, processor 106, display 108 in user equipment 102, server 112, which is opened, can wrap Database 114, processing engine 116 are included but be not limited to, can include but is not limited to memory 204, place in second user equipment 202 Manage device 206, display 208.User equipment herein can be, but not limited to be smart phone (such as Android phone, iOS mobile phone Deng), tablet computer, palm PC and mobile internet device (Mobile Internet Devices, MID), PAD etc. eventually End equipment.The labeling method of object event mainly comprises the steps that
Information to be processed is sent network-side 110 by step S102, the first user equipment 102;
Information to be processed is transmitted to server 112 by step S104, network-side 110;
Satisfactory information flag to be processed is object event and is pushed to the second use by step S106, server 112 Family equipment 202;
Step S108, server 112 will return to network-side 110 for the processing result of information to be processed;
Processing result is fed back to user equipment 102 by step S110, network-side 110.
It should be noted that processing result can not also be fed back to the first user equipment 102 by server 112.When first User equipment 102 send information to be processed be not configured to request a certain data result, or only mischief when, server 112 can ignore this information to be processed, not feedback processing result.
Optionally, in step S106, satisfactory information flag to be processed is that object event can lead to by server 112 It crosses following steps realization: obtaining the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases; Target phrase is determined in multiple phrases, wherein target phrase is to appear in information to be processed described in same and pre- Frequency of occurrence is more than the phrase of preset times threshold value in section of fixing time;It is determined in information to be processed using disaggregated model comprising target Target category corresponding to the target of phrase information to be processed, wherein different classes of in disaggregated model including target category Corresponding different weight, the weight of target category are used to indicate a possibility that target word group is as object event;In target category It is target in target information to be processed by the target phrase marker for including in the case that corresponding weight is more than default weight threshold Event.
Fig. 2 is a kind of flow chart of the labeling method of optional object event of the embodiment of the present application.As shown in Fig. 2, should Method includes:
Step S202 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple words Group;
Step S204 determines target phrase in multiple phrases, wherein target phrase is to be processed to appear in same Frequency of occurrence is more than the phrase of preset times threshold value in information and within a predetermined period of time;
Step S206 determines that the information institute to be processed of the target in information to be processed comprising target phrase is right using disaggregated model The target category answered, wherein different classes of including target category corresponds to different weights in disaggregated model, target category Weight is used to indicate a possibility that target word group is as object event;
Step S208, in the case where the corresponding weight of target category is more than default weight threshold, by target letter to be processed The target phrase marker for including in breath is object event.
Optionally, the method for above-mentioned data processing is not limited in the acquisition scene applied to hot spot work order, can with but it is unlimited The application scenarios of identification mark are carried out to text message in being applied to other any need, such as do shopping class, game class, Instant Messenger Believe class, financial application class scene etc..
Optionally, information to be processed can be, but not limited to be short text information, the content sentence for including in information to be processed It can include but is not limited to the contents such as punctuation mark, emoticon, modal particle, noun, verb, adjective.Content sentence is cut After point, other fonts or the symbol other than Chinese character can also be removed, the content sentence carried in each information to be processed can be with It is split as a phrase or multiple phrases.
Optionally, target category can include but is not limited to consulting (for example, game ranking mistake, installation kit can not be more Newly), payment transaction (such as credit card lose, wallet is not opened, red packet is not opened) swindles complaint, mischief (such as is disliked Anticipate brush screen, brush keyword), interim intermittent hot word (solar term notice, festivals or holidays remind) etc..Different target categories is being divided The corresponding weighted of class model, and the weight of target category is used to indicate a possibility that target word group is as object event.
For example, two pieces target information A and B to be processed has currently been determined, the corresponding target category of A is that " how is game version The consulting of update ", that is, how multiple message all update in consulting with a game version, at this point, determining this Category of consulting Weight is higher than preset weight threshold, the corresponding target phrase of A is just determined as object event, and object event is pushed to place The professional for managing consultation information needs to carry out emergent management to this burst focus incident.The corresponding target category of B is the " summer To solar term ", there is a large number of users to deliver on the day of solar term for this solar term and sigh with deep feeling, and is not belonging to sudden focus incident, this The weight of the classification of information is lower than preset weight threshold, then will not be object event by the corresponding target phrase marker of B, also not Professional can be pushed to handle.
In one optional embodiment, determine that the target in information to be processed comprising target phrase waits for using disaggregated model Target category corresponding to processing information can be realized by following steps:
S1, by target information input disaggregated model to be processed, wherein include multiple target words in target information to be processed Group, disaggregated model are that the phrase for including is used to be trained to obtain to preliminary classification model as training sample in information to be processed 's;
S2, the corresponding target category of output target phrase.
Optionally, disaggregated model involved in the embodiment of the present invention can be, but not limited to be support vector machines (support Vector machine, referred to as SVM) disaggregated model.By the phrase inputting in initial collected information to be processed to initially In disaggregated model, preliminary classification model is trained, obtains the classification that can carry out classification detection to information to be processed automatically Model.
In an optional embodiment, determine that the target in information to be processed comprising target phrase waits for using disaggregated model Before handling target category corresponding to information, preliminary classification model can be trained by following steps:
The first object for having determined classification information to be processed is used to instruct as training sample to preliminary classification model Practice, wherein comprising being labeled as the phrase of object event and being not marked with the word of object event in first object information to be processed Group.
In an optional embodiment, uses and have determined the first object of classification information to be processed as training sample Being trained to preliminary classification model can be realized by following steps:
The first object for having determined classification information to be processed is divided into training dataset, validation data set and survey by S1 Try data set, wherein for being trained to preliminary classification model, test data set is used for training data set validation data set Disaggregated model after training is tested;
S2, it is initial training phrase that training dataset and verify data, which are concentrated the content sentence segmentation for including, will be occurred Frequency is more than the initial training phrase of preset threshold as initial training sample, wherein the vector dimension of initial training sample is The quantity of initial training sample;
S3 characterizes the semantic vector characterization that algorithm calculates initial training sample by vector;
The semantic vector of the vector dimension of initial training sample and initial training sample is characterized input preliminary classification mould by S4 Type is trained, and obtains disaggregated model;
S5 is tested by training result of the test data set to disaggregated model, and adjusts the model ginseng of disaggregated model Number.
Optionally, it has been determined that the first object of classification information to be processed can be the short text letter for manually carrying out classification Breath, can include but is not limited to the short text information of the sudden focus incident in part, the short text of part duration focus incident Information, the short text information of the non-hot event in part and part mischief information.Training dataset, validation data set and survey It may each comprise aforementioned several short text informations in examination data set, it is not limited in the embodiment of the present invention.
Initial training phrase can be obtained to content sentence segmentation by segmentation methods algorithm, it is of course also possible to use it His word cutting algorithm or tool.The non-Chinese character portion in the methods of regular expression filtering phrase can be used in phrase after cutting Point (punctuation mark, additional character, number, English etc.), then goes the processing of stop words to the result after participle, stopping herein It can include but is not limited to interjection, modal particle, pronoun with this.For example, the content sentence for including in information to be processed be " !New edition game installation kit!Why not can update!!!Why ", this content sentence finally can be with cutting for following phrase: " new edition, updates game installation kit ", or " new edition, installation kit, cannot, update game ".The segmentation rules of phrase can root It is configured according to the practical application scene of model, it is not limited in the embodiment of the present invention.
Optionally, using the frequency of occurrences be more than preset threshold initial training phrase as initial training sample, herein pre- If threshold value can be 0, it is also possible to the arbitrary integer greater than 0, when preset threshold is 0, the whole that will exactly be obtained after cutting Initial training phrase is as initial training sample.
It is alternatively possible to pass through word frequency-inverse document frequency (term frequency-inverse document Frequency, referred to as TF-IDF) as vector characterization algorithm promote the semantic vector of sample to characterize to calculate, it can also make Algorithm is characterized to calculate with other vectors, and it is not limited in the embodiment of the present invention.
Optionally, the characterization input of the semantic vector of the vector dimension of initial training sample and initial training Ah's sample is initial Disaggregated model is trained, for example, when the quantity of initial training sample is 100, by the vector dimension 100 of initial training sample It is trained in the semantic vector characterization input svm classifier model of initial training Ah's sample.
Optionally, cross validation training (k-fold cross Validation) algorithm can be rolled over by K, utilize test number The training result of disaggregated model is tested according to collection.Such as 100 first object information to be processed are divided into k group data Collection, wherein 2 groups are test data set, k-2 group is training dataset and/or validation data set, uses the training point of k-2 group data set After class model, 2 groups of test data sets can be used to test training result.Joined by model of the test result to disaggregated model Number is adjusted optimization, and test result illustrates that model parameter is more stable, reliability is stronger closer to legitimate reading.
In an optional embodiment, uses and have determined the first object of classification information to be processed as training sample After being trained to preliminary classification model, the above method further include:
Object content sentence segmentation in target information to be processed is multiple targets training phrase, wherein target instruction by S1 Practice only comprising Chinese character and not comprising stop words in phrase, stop words includes at least interjection and/or pronoun and/or modal particle;
The target training phrase that the frequency of occurrences is more than preset threshold is determined as bag of words by S2;
S3 merges the bag of words training sample current with disaggregated model, forms target training sample;
S4 using target training sample train classification models, and adjusts the model parameter of disaggregated model.
Optionally, target can also be waited locating by target information to be processed before input disaggregated model carries out classification detection Reason information input disaggregated model is trained.Bag of words of the high-frequency phrase as training in target information to be processed, can be real-time The phrase for including in target information to be processed is updated into the training pattern into disaggregated model, avoids target information input to be processed point Class model can not be identified when classification detection.
Optionally, cutting target training phrase, can be used identical method, example with aforementioned cutting initial training phrase Such as, to target processing information in include content sentence be "!New edition ocr software installation kit!Why not can update!!! Why ", this content sentence finally can be with cutting for following phrase: " new edition, updates ocr software installation kit ", or it is " new Version, installation kit, cannot, update ocr software ", the segmentation rules of phrase can be set according to the practical application scene of model It sets, it is not limited in the embodiment of the present invention.
Optionally, train phrase as training bag of words the target that the frequency of occurrences is more than preset threshold, it then will training word Bag merges with current training sample, is the equal of constituting new training sample, to the sample comprising target training phrase Perform trained operation.Optimization is adjusted to the model parameter to disaggregated model by test data set again after training, is tested As a result closer to legitimate reading, illustrate that model parameter is more stable, reliability is stronger.
An optional embodiment uses and has determined the first object of classification information to be processed as training sample pair After preliminary classification model is trained, periodically the training sample of disaggregated model can also be updated, mainly include following Step:
S1 was obtained in last model training finish time to the period at current time, and the second target determined waits locating Manage information, wherein comprising frequency of occurrence in predetermined amount of time more than preset times threshold value in institute's the second target information to be processed Phrase;
The phrase for including in second target information to be processed is incorporated in the current training sample of disaggregated model by S2.
Optionally, by taking the disaggregated model training of customer service work order as an example, the update of work order disaggregated model may include it is automatic more New and artificial regeneration.
Automatically updating may comprise steps of:
Whether S1, retrieval amended record work order library work order amount are more than that preset threshold, the last model training time gap are current Whether interval is more than several indexs such as preset time threshold, and meeting it, first model starts update automatically;
S2, the original training sample of sample merging is transferred from amended record work order library;
S3, using segmentation methods (such as jieba algorithm), to treated, sentence is segmented;With the side such as regular expression Non- Chinese character part (punctuation mark, additional character, number, English etc.) after method filtering participle in word;Result after participle is done It goes stop words (interjection, modal particle, pronoun etc.) to handle, and bag of words is changed into index list;
Step 4, re -training work order disaggregated model.
Artificial regeneration step:
S1, retrieval model stability indicator (population stability index, referred to as PSI), if being more than threshold Value, then issue alarm;
S2, it is artificial Feature Engineering is carried out to corpus again, examines corpus closely again, and training new model, manually to model into Row arameter optimization, until model stability.
An optional embodiment, step S204 determine target phrase in multiple phrases, can pass through following step It is rapid to realize:
The phrase that frequency of occurrence in content sentence is more than preset threshold is determined as the first phrase, wherein the first phrase by S1 In only include Chinese character;
S2, by today accounting less than the first default accounting threshold value and/or today word frequency less than the first default word frequency threshold and/ Or today word frequency growth rate less than the first default growth rate threshold value the first phrase abandon, obtain the second phrase, wherein today word Frequency growth rate is the growth rate obtained relative to the word frequency of the previous day;
S3 clusters the second phrase, obtains the first phrase cluster;
S4, by today accounting less than the second default accounting threshold value and/or today word frequency less than the second default word frequency threshold and/ Or today, word frequency growth rate was abandoned less than the first phrase cluster of the second default growth rate threshold value, obtained the second phrase cluster;
S5 determines that the phrase in the second phrase cluster is target phrase.
Optionally, the phrase after cutting is frequently excavated first, the frequent mining algorithm of FP-GROWTH can be used, Frequent phrase, that is, the first phrase are obtained, FP-GROWTH frequent item set mining is one kind of association rules mining algorithm, is led to The confidence level crossed between qualified association project, support, promotion degree obtain frequent item set.Then two are carried out to the first phrase Secondary filtering, a clustering processing.
Filtering for the first time includes: that the logarithmic curve model being arranged using regulation engine and parameter are filtered, for frequent phrase First layer dynamic filtration is carried out, a part of ineligible frequent phrase is filtered, finally obtains the second phrase.Can pass through by Today, accounting increased less than the first default accounting threshold value and/or word frequency today less than the first default word frequency threshold and/or word frequency today Long rate is abandoned less than the first phrase of the first default growth rate threshold value, obtains the second phrase, optionally, today, word frequency growth rate was The growth rate that word frequency relative to the previous day obtains.
Optionally, it in the frequent phrase of determination, can be obtained by the word frequency on the day of detection phrase.If the word frequency of phrase Meet default word frequency threshold, it is also possible to be common property phrase, be not belonging to burst hot spot phrase, at this time it is contemplated that word frequency increases Rate, i.e., growth rate of current word frequency phrase today relative to the previous day.When word frequency growth rate also meets preset rules, Ke Yijin One step is determined as frequent phrase.If only seeing the word frequency growth rate of phrase, it is possible to which the previous day word frequency is especially small radix, even It is 0, as long as it is inaccurate to will lead to data then minority occurs in word frequency today may have very high word frequency growth rate several times, Consider whether word frequency today and/or accounting today meet preset condition simultaneously at this time, determines finally frequent after comprehensively considering Phrase, that is, the second phrase.
It is alternatively possible to obtain accounting today of current first phrase using following formula:
P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase, m and n Respectively constant;
By comparing accounting today of each the first phrase, determines the first phrase of minimum accounting today and abandon.
Optionally, it can also be judged by n-sigma rule for phrase word frequency, be obtained by the following formula first Coefficient of variation of the phrase in current slot:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates that the first phrase exists Word frequency mean value in the previous day same period, σ indicate the standard deviation of the first phrase word frequency within the previous day same period;
When coefficient of variation x ' is less than default undulating value, the first phrase is abandoned.
After filtering by above-mentioned first time, satisfactory frequent phrase, that is, the second phrase are obtained.
Optionally, DBSCAN clustering algorithm can be used to filtered frequent phrase to cluster, taking-up is all to include The work order of frequent phrase uses bag of words vector as the expression of the vector of work order content, by DBSCAN algorithm to content vector into Row cluster, obtains frequent phrase cluster, that is, the first phrase cluster.
It is alternatively possible to carry out rule-based filtering for the frequent phrase cluster after cluster using regulation engine, final symbol is obtained The frequent phrase cluster of conjunction condition, that is, second is carried out to the first phrase cluster and is filtered, the second phrase cluster is obtained.Finally obtained Phrase in two phrase clusters is exactly target phrase, and the information to be processed comprising target phrase is exactly target information to be processed.
Scheme provided in an embodiment of the present invention can be used for the emergency event arbitrarily based on short text and excavate scene, by short Text message is monitored, after finding hot spot/emergency event can by way of mail, small routine, instant messaging group into The prompting of row burst hot spot, prompts hot spot work order type, title is shown in the form of co-occurrence phrase cluster, in conjunction with text, chart, number According to form burst/focus incident is reminded.
Fig. 3 is the object event alarm a kind of optional schematic diagram in interface according to an embodiment of the present invention.As shown in figure 3, logical The end PC alarming page is crossed, shows single amount situation of change of cluster work order, and shows the year-on-year situation of work order on year-on-year basis by data, Staff can judge the urgency level of affiliated cluster work order according to alarm, perform corresponding processing in time.Alert details page Face illustrates the detailed page for the hot spot result that algorithm is excavated, the time series variation situation of hot spot phrase.As shown in Figure 3 A kind of optional push interface can be intuitive to see period, the name of product, hot word content " update, version of hot word screening Downloading updates ", it is further seen that the word frequency on the day of 2019-6-2 changes with time trend, as shown by the solid line in the drawings, highest word It occuring frequently present 12 points or so, is 41, it is further seen that the word frequency of the previous day 2019-6-1 changes, as shown in phantom in FIG., highest Word frequency appears in 12 points or so, is 6.Average word frequency on the day of programming count goes out 2019-6-2 on interface.Pass through these linear changes Change, can intuitively confirm whether current hot word is hot spot/emergency event.Meanwhile it can also show that work order is specific below interface Information, such as user XXX1,2019-6-2 8:58 for the product that product code is a1 issued content be " more new version, How to do " work order text, facilitate staff to grasp the request of specific work order in time.
Fig. 4 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention, as shown in figure 4, It can be pushed away on a kind of optional push interface by the real-time push function of the small routine of the end computer PC or mobile terminal Sending content may include hot word " updating, version downloading updates ", and work order classification is " consulting class ", while by name of product, push The period (such as 8:00-12:00) of time, statistics comprising hot word work order, the work order in current slot for the hot word are anti- Present number (24), the year-on-year growth rate (2300%) compared with the yesterday same period, current hot word accounting in whole work orders Than (3.08%) and work order content etc., work order content is as shown in figure 4, include how " how more new version is done " " updates Latest edition " etc..
Fig. 5 is object event alarm another optional schematic diagram of interface according to an embodiment of the present invention.As shown in figure 5, It can be by the end computer PC or the real-time push function of the instant messaging group of mobile terminal, at a kind of optional push interface On, such as in the instant messaging group of group's entitled " [name of product] user feedback early warning ", the push of user feedback early warning is received, Content is pushed to include but is not limited to hot word " update, version downloading, update ", name of product, the push time, count comprising hot word work For the work order feedback coefficient (24) of the hot word, same with yesterday in single period (such as 8:00-12:00), current slot Accounting (3.08%) and work order of the year-on-year growth rate (2300%), current hot word that period is compared in whole work orders Content etc., work order content is as shown in figure 5, may include when login " after why downloading update, me is also wanted to update ".
The display interface for the alarm pushing that Fig. 3 of the embodiment of the present invention to Fig. 5 is provided, may be implemented actively in time to hot spot/ The push of emergency event real-time tracking, feedback coefficient, year-on-year growth rate and hot word accounting can intuitively be presented current hot word and work as Whether the word frequency of preceding period, the growth rate relative to proxima luce (prox. luc) and the accounting in phrase today are conveniently confirmed as being heat Point/emergency event can also identify the mood of sender at that time from work order content, staff is facilitated to understand whenever and wherever possible Hot spot/emergency event, is responded actively to make.
Optionally, hot spot vocabulary is excavated using FP-GROWTH model according to work order content, and hot spot vocabulary is gathered Class, the work order for clustering out carry out the differentiation of work order classification by machine learning model again, can pass through flow chart as shown in FIG. 6 It completes.Fig. 6 is according to a kind of optional flow chart of the work order kind identification method of the embodiment of the present application, as shown in fig. 6, including Following steps:
S601 obtains asynchronous work order, extracts work order content;
S602 extracts the description of the problems in period on same day work order, pre-processes to problem description;Pretreatment includes The description of the problem of to after bulk processing (including using the spcial characters such as regular expression filtering expression) uses participle tool to segment, It reuses the methods of regular expression and filters non-Chinese character part (punctuation mark, number etc.) for word;
S603, using the frequent mining algorithm Mining Frequent phrase of FP-GROWTH, FP-GROWTH frequent item set mining is to close The one kind for joining rule mining algorithms obtains frequent episode by confidence level between qualified association project, support, promotion degree Collection;
S604, the logarithmic curve model being arranged using regulation engine and parameter are filtered, and carry out first layer for frequent phrase Dynamic filtration filters a part of ineligible frequent phrase;
S605 clusters filtered frequent phrase using DBSCAN clustering algorithm, and taking out all includes frequent word The work order of group uses bag of words vector to express as the vector of work order content, is clustered by DBSCAN algorithm to content vector, Obtain frequent phrase cluster;
S606 carries out rule-based filtering for the frequent phrase cluster after cluster using regulation engine, obtains final eligible Frequent phrase cluster;
S607 carries out classification prediction for the work order in each frequent phrase cluster using SVM work order classification of type model;
S608, the frequent phrase cluster for reaching condition for threshold value stamp class formative.
Optionally, svm classifier model training process can be realized by following steps.Fig. 7 is according to embodiments of the present invention Svm classifier model training method a kind of optional flow chart, as shown in fig. 7, comprises following steps:
S701, work order is labeled, determines work order classification;
S702 segments the work order after filtering and clustering processing using jieba, with the methods of regular expression Non- Chinese character part (punctuation mark, additional character, number, English etc.) after filtering participle in word;
S703 goes stop words (interjection, modal particle, pronoun etc.) to handle the result after participle, passes through word frequency condition Screening candidate word simultaneously constructs bag of words and bag of words is changed into index list.S704 has marked work order by history and data has been divided into Training dataset, validation data set, test data set;
S705 rolls over cross validation training SVM work order disaggregated model, optimization model parameter by K;
S7051 segments all work orders, and filtering rejects stop words and monosyllabic word, limits selected section by word frequency Word indicates text as the dimension of vector;
S7052 is expressed by the TF-IDF vector that TF-IDF calculates each work order as the vector of work order;
S7053 rolls over cross validation training SVM work order disaggregated model by K and adjusts ginseng.
S706 carries out model measurement by test data set, passes through the model evaluations parameter evaluation model prediction such as F1, KS value The numerical value of ability, F1, KS is bigger, and the accuracy of model is higher.Constantly to model parameter tuning, strengthen model prediction ability and general Change ability;
S707, the disaggregated model after output training.
Realize automatic discovery, the statistic record to work order focus incident;The essence of hot spot is realized using n-sigma parameter Quasi- subsidiary discriminant;Hot spot discovery, focus incident type identification integration are realized, human intervention is not necessarily to, can find current heat Point work order event, and automatically reporting event type, significant increase customer service working efficiency;With the model stabilities parameter moment such as PSI Detection model predicts whether work order category distribution situation, monitoring model estimated performance are stable.
The scheme provided through the embodiment of the present invention realizes following technical effect:
The embodiment of the present invention uses bag of words, and bag of words are calculated in real time using current corpus, and there is no new terms to make The case where using.
The embodiment of the present invention excavates co-occurrence phrase by FP-GROWTH model, has filtered low frequency phrase, with co-occurrence phrase work To show form, the similar work order of carry out cluster result of high frequency co-occurrence phrase is formed by using the work order that bag of words vector characterizes Event cluster.The event cluster excavated is directed to by regulation engine to be evaluated, and event cluster can be carried out using co-occurrence phrase Laterally, the comparison of longitudinal two dimensions, can not only investigate the magnitude of event cluster, it is also considered that the dimensions index such as growth rate is excavated Hot spot and excavate emergency event out.
The embodiment of the present invention realizes intelligent distinguishing to work order classification by machine learning model, solves in the prior art Hot spot shows need to be by the pain spot manually selected.
The embodiment of the present invention realizes the closed loop of hot spot work order discovery early warning in technology side, excavation to frequent phrase and right The classification annotation of work order classification can be automatically performed by program.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
According to the other side of the embodiment of the present application, additionally provide a kind of for implementing the label side of above-mentioned object event The labelling apparatus of the object event of method.Fig. 8 is that one kind of the labelling apparatus of object event according to an embodiment of the present invention is optional Structural block diagram, as shown in figure 8, the device includes:
Obtain module 802, for obtaining the content sentence carried in information to be processed, wherein content sentence be split for One or more phrases;
First determining module 804, for determining target phrase in multiple phrases, wherein target phrase is to appear in Frequency of occurrence is more than the phrase of preset times threshold value in information to be processed described in same and within a predetermined period of time;
Second determining module 806, for determining that the target in information to be processed comprising target phrase waits for using disaggregated model Handle target category corresponding to information, wherein different classes of including target category corresponds to different power in disaggregated model Weight, the weight of target category are used to indicate a possibility that target word group is as object event;
Mark module 808, in the case where the corresponding weight of target category is more than default weight threshold, target to be waited for The target phrase marker for including in processing information is object event.
Optionally, the second determining module includes: input unit, is used for target information input disaggregated model to be processed, In, comprising one or more target phrases in target information to be processed, disaggregated model is using the word for including in information to be processed What group was trained preliminary classification model as training sample;Output unit, for exporting the corresponding mesh of target phrase Mark classification.
Optionally, device further include: training module, for using the first object information to be processed for having determined classification to make Preliminary classification model is trained for training sample, wherein comprising being labeled as object event in first object information to be processed Phrase and be not marked with the phrase of object event.
Optionally, training module includes: division unit, and the information to be processed of the first object for that will have determined classification is drawn Be divided into training dataset, validation data set and test data set, wherein validation data set described in training data set be used for point Class model is trained, and test data set is for testing the disaggregated model after training;First cutting unit, for that will instruct It is initial training phrase that white silk data set and verify data, which concentrate the content sentence segmentation for including, is more than preset threshold by the frequency of occurrences Initial training phrase as initial training sample, wherein the vector dimension of initial training sample be initial training sample number Amount;Computing unit, for characterizing the semantic vector characterization that algorithm calculates initial training sample by vector;First training unit, It is carried out for the semantic vector of the vector dimension of initial training sample and initial training sample to be characterized input preliminary classification model Training, obtains disaggregated model;Test cell, for being tested by training result of the test data set to disaggregated model, and Adjust the model parameter of disaggregated model.
Optionally, training module further include: the second cutting unit, for by the object content language in target information to be processed Sentence cutting is multiple targets training phrase, wherein only comprising Chinese character and not comprising stop words, stop words in target training phrase Including at least interjection and/or pronoun and/or modal particle;Determination unit, for being more than the target of preset threshold by the frequency of occurrences Training phrase is determined as bag of words;First combining unit forms mesh for merging the bag of words training sample current with disaggregated model Mark training sample;Second training unit for using target training sample train classification models, and adjusts the model of disaggregated model Parameter.
Optionally, training module further include: first acquisition unit, for obtaining last model training finish time to working as In the period at preceding moment, determining the second target information to be processed, wherein include pre- timing in the second target information to be processed Between in section frequency of occurrence be more than preset times threshold value phrase;Second combining unit, being used for will be in the second target information to be processed The phrase for including is incorporated in the current training sample of disaggregated model.
Optionally, the first determining module includes: the first determination unit, for that will will appear in the same content sentence In and in the content sentence of multiple information to be processed frequency of occurrence be more than preset threshold phrase be determined as first Phrase, wherein only include Chinese character in the first phrase;First discarding unit is used for accounting today less than the first default accounting threshold Value and/or word frequency today less than the first default word frequency threshold and/or word frequency growth rate today less than the first default growth rate threshold value The first phrase abandon, obtain the second phrase, wherein today word frequency growth rate be relative to the previous day word frequency obtain growth Rate;Cluster cell obtains the first phrase cluster for clustering to the second phrase;Second discarding unit is used for accounting today It is less than less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or word frequency growth rate today The first phrase cluster of second default growth rate threshold value abandons, and obtains the second phrase cluster;Second determination unit, for determining Phrase in two phrase clusters is target phrase.
Optionally, the first discarding unit includes: acquisition subelement, for using following formula to obtain presently described first word Accounting today of group: P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase, M and n is respectively constant;It determines subelement, for accounting today by comparing each the first phrase, determines that today, minimum accounted for First phrase of ratio;Subelement is abandoned, for abandoning the first phrase of minimum accounting today.
Optionally, the first determining module further include:
Second acquisition unit, for being obtained by the following formula fluctuation of first phrase in current slotCoefficient:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates that the first phrase exists Word frequency mean value in the previous day same period, σ indicate the standard deviation of the first phrase word frequency within the previous day same period;
Third discarding unit, for when coefficient of variation is less than default undulating value, the first phrase to be abandoned.
Fig. 9 is a kind of coefficient of variation display interface schematic diagram according to an embodiment of the present invention, as shown in figure 9, horizontal axis indicates Period, the longitudinal axis indicate the word frequency of target phrase, and dotted line indicates yesterday (2019-6-1) word frequency, and solid line indicates (2019-6- today 2) word frequency, as seen from Figure 9, highest word frequency today are 69, can be calculated by the calculation formula of above-mentioned coefficient of variation in 10:00- The coefficient of variation of 14:00 period is 51, if preset undulating value 5, coefficient of variation today of target phrase is more than Default undulating value, can retain.
Figure 10 is a kind of optional certain month early warning situation schematic diagram according to an embodiment of the present invention, as shown in Figure 10, this hair The labeling method for the object event that bright embodiment provides, will be applied to e-payment, the game of the end PC, game of mobile terminal, video Customer service work order (short text) hot spot of the products such as broadcasting, which happens suddenly, to be alerted, and is calculated overall early warning accuracy rate and is reached 81%.Compared to original Artificial order realizes early warning and crosses over from scratch.Acted in the understaffed products of some services it is especially pronounced, greatly The discovery of first-line staff, statistics pressure have been liberated in ground.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the label side of above-mentioned file destination The electronic device of method, above-mentioned electronic device can be, but not limited to be applied in above-mentioned server 112 shown in FIG. 1.Such as Figure 11 institute Show, which includes memory 902 and processor 904, is stored with computer program in the memory 902, the processor 904 are arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
Step S1 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;
Step S2 determines target phrase in multiple phrases, wherein target phrase is frequency of occurrence in predetermined amount of time More than the phrase of preset times threshold value;
Step S3 is determined corresponding to the target information to be processed in information to be processed comprising target phrase using disaggregated model Target category, wherein different classes of including target category corresponds to different weights, the power of target category in disaggregated model It is reused in a possibility that instruction target word group becomes object event;
Step S4, in the case where the corresponding weight of target category is more than default weight threshold, by target information to be processed In include target phrase marker be object event.
Optionally, it will appreciated by the skilled person that structure shown in Figure 11 is only to illustrate, electronic device can also To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 11 it does not make to the structure of above-mentioned electronic device At restriction.For example, electronic device may also include than shown in Figure 11 more perhaps less component (such as network interface) or With the configuration different from shown in Figure 11.
Wherein, memory 902 can be used for storing software program and module, such as the request of data in the embodiment of the present invention Corresponding program instruction/the module for the treatment of method and apparatus, the software journey that processor 904 is stored in memory 902 by operation Sequence and module realize the processing method of above-mentioned request of data thereby executing various function application and data processing.It deposits Reservoir 902 may include high speed random access memory, can also include nonvolatile memory, as one or more magnetic storage fills It sets, flash memory or other non-volatile solid state memories.In some instances, memory 902 can further comprise relative to place The remotely located memory of device 904 is managed, these remote memories can pass through network connection to terminal.The example packet of above-mentioned network Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.Wherein, memory 902 specifically can be with But it is not limited to use in the program step of the labeling method of storage object event.As an example, as shown in figure 9, above-mentioned storage Can be, but not limited in device 902 include acquisition module 802 in the labelling apparatus of above-mentioned object event, the first determining module 804, Second determining module 806 and mark module 808.In addition, it can include but being not limited in the labelling apparatus of above-mentioned object event Other modular units, repeat no more in this example.
Optionally, above-mentioned transmitting device 906 is used to that data to be received or sent via a network.Above-mentioned network tool Body example may include cable network and wireless network.In an example, transmitting device 906 includes a network adapter (Network Interface Controller, NIC), can be connected by cable with other network equipments with router to It can be communicated with internet or local area network.In an example, transmitting device 906 is radio frequency (Radio Frequency, RF) Module is used to wirelessly be communicated with internet.
In addition, above-mentioned electronic device further include: display 908, the alarm pushing for displaying target event;It is total with connection Line 910, for connecting the modules component in above-mentioned electronic device.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
Step S1 obtains the content sentence carried in information to be processed, wherein content sentence is split as multiple phrases;
Step S2 determines target phrase in multiple phrases, wherein target phrase is frequency of occurrence in predetermined amount of time More than the phrase of preset times threshold value;
Step S3 is determined corresponding to the target information to be processed in information to be processed comprising target phrase using disaggregated model Target category, wherein different classes of including target category corresponds to different weights, the power of target category in disaggregated model It is reused in a possibility that instruction target word group becomes object event;
Step S4, in the case where the corresponding weight of target category is more than default weight threshold, by target information to be processed In include target phrase marker be object event.
Optionally, storage medium is also configured to store for executing step included in the method in above-described embodiment Computer program, this is repeated no more in the present embodiment.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, the skill of the application Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the application State all or part of the steps of method.
In above-described embodiment of the application, all emphasizes particularly on different fields to the description of each embodiment, do not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims (10)

1. a kind of labeling method of object event characterized by comprising
Obtain the content sentence carried in information to be processed, wherein the content sentence is split multiple phrases;
Target phrase is determined in the multiple phrase, wherein the target phrase is to be processed described in same to appear in Frequency of occurrence is more than the phrase of preset times threshold value in information and within a predetermined period of time;
It is determined corresponding to the target information to be processed in the information to be processed comprising the target phrase using disaggregated model Target category, wherein different classes of including the target category corresponds to different weights, the mesh in the disaggregated model The weight of mark classification is used to indicate a possibility that target word group is as object event;
In the case where the corresponding weight of the target category is more than default weight threshold, will be wrapped in target information to be processed The target phrase marker contained is the object event.
2. the method according to claim 1, wherein described determined in the information to be processed using disaggregated model Target category corresponding to target information to be processed comprising the target phrase, comprising:
By disaggregated model described in target information input to be processed, wherein the disaggregated model is using the letter to be processed The phrase for including in breath is trained preliminary classification model as training sample;
Export the corresponding target category of target information to be processed.
3. according to the method described in claim 2, it is characterized in that, being determined in the information to be processed using disaggregated model and including Before target category corresponding to the target information to be processed of the target phrase, the method also includes:
The first object for having determined classification information to be processed is used to instruct as training sample to the preliminary classification model Practice, wherein comprising being labeled as the phrase of object event and being not marked with object event in the first object information to be processed Phrase.
4. according to the method described in claim 3, it is characterized in that, using the first object information to be processed for having determined classification The preliminary classification model is trained as training sample and includes:
The first object for having determined classification information to be processed is divided into training dataset, validation data set and test data Collection, wherein the training dataset and the validation data set are for being trained the preliminary classification model, the test Data set is for testing the disaggregated model after training;
It is initial training phrase that the training dataset and the verify data, which are concentrated the content sentence segmentation for including, will The frequency of occurrences is more than the initial training phrase of preset threshold as initial training sample, wherein the initial training sample Vector dimension be the initial training sample quantity;
The semantic vector characterization that algorithm calculates the initial training sample is characterized by vector;
The characterization input of the semantic vector of the vector dimension of the initial training sample and the initial training sample is described initial Disaggregated model is trained, and obtains the disaggregated model;
It is tested by training result of the test data set to the disaggregated model, and adjusts the mould of the disaggregated model Shape parameter.
5. according to the method described in claim 3, it is characterized in that, using the first object information to be processed for having determined classification After being trained as training sample to the preliminary classification model, the method also includes:
It is multiple targets training phrase by the object content sentence segmentation in target information to be processed, wherein the target Only comprising Chinese character and not comprising stop words in training phrase, the stop words includes at least interjection and/or pronoun and/or language Gas word;
The target training phrase that the frequency of occurrences is more than preset threshold is determined as bag of words;
The bag of words training sample current with the disaggregated model is merged, target training sample is formed;
Using the target training sample training disaggregated model, and adjust the model parameter of the disaggregated model.
6. according to the described in any item methods of claim 3 to 5, which is characterized in that use the first object for having determined classification After information to be processed is trained the preliminary classification model as training sample, the method also includes:
It obtains in last model training finish time to the period at current time, the second target information to be processed determined, It wherein, is more than the phrase of preset times threshold value comprising frequency of occurrence in predetermined amount of time in the second target information to be processed;
The phrase for including in the second target information to be processed is incorporated in the current training sample of the disaggregated model.
7. the method according to claim 1, wherein determining that target phrase includes: in the multiple phrase
It will appear in the same content sentence and the frequency of occurrence in the content sentence of multiple information to be processed Phrase more than preset threshold is determined as the first phrase, wherein only includes Chinese character in first phrase;
By accounting today less than the first default accounting threshold value and/or word frequency today less than the first default word frequency threshold and/or today Word frequency growth rate is abandoned less than first phrase of the first default growth rate threshold value, obtains the second phrase, wherein described today Word frequency growth rate is the growth rate obtained relative to the word frequency of the previous day;
Second phrase is clustered, the first phrase cluster is obtained;
By accounting today less than the second default accounting threshold value and/or word frequency today less than the second default word frequency threshold and/or today Word frequency growth rate is abandoned less than the first phrase cluster of the second default growth rate threshold value, obtains the second phrase cluster;
Determine that the phrase in the second phrase cluster is the target phrase.
8. the method according to the description of claim 7 is characterized in that by accounting today less than described in the first default accounting threshold value First phrase abandons
Accounting today of presently described first phrase is obtained using following formula:
P1=exp (log p/m)/log n) } wherein, p indicates the accounting on the day before presently described first phrase, m and n difference For constant;
By comparing accounting today of the first phrase described in each, determines first phrase of minimum accounting today and lose It abandons.
9. the method according to the description of claim 7 is characterized in that by accounting today less than the first default accounting threshold value and/or Today, word frequency was less than the first default word frequency threshold and/or described of word frequency growth rate less than the first default growth rate threshold value today After one phrase abandons, the method also includes:
It is obtained by the following formula coefficient of variation of first phrase in current slot:
Wherein, x ' indicates that coefficient of variation, x indicate word frequency of first phrase in current slot, and μ indicates first word Word frequency mean value of the group within the previous day same period, σ indicate first phrase word frequency within the previous day same period Standard deviation;
When the coefficient of variation is less than default undulating value, first phrase is abandoned.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to execute side described in any one of claim 1 to 9 by the computer program Method.
CN201910713377.4A 2019-08-02 2019-08-02 Method and device for marking target event, storage medium and electronic device Active CN110458296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910713377.4A CN110458296B (en) 2019-08-02 2019-08-02 Method and device for marking target event, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910713377.4A CN110458296B (en) 2019-08-02 2019-08-02 Method and device for marking target event, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110458296A true CN110458296A (en) 2019-11-15
CN110458296B CN110458296B (en) 2023-08-29

Family

ID=68484679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910713377.4A Active CN110458296B (en) 2019-08-02 2019-08-02 Method and device for marking target event, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110458296B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111060325A (en) * 2019-12-13 2020-04-24 斑马网络技术有限公司 Test scene construction method and device, electronic equipment and storage medium
CN111178679A (en) * 2019-12-06 2020-05-19 中能瑞通(北京)科技有限公司 Phase identification method based on clustering algorithm and network search
CN111782803A (en) * 2020-06-05 2020-10-16 京东数字科技控股有限公司 Work order processing method and device, electronic equipment and storage medium
CN113419210A (en) * 2021-06-09 2021-09-21 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
CN113645439A (en) * 2021-06-22 2021-11-12 宿迁硅基智能科技有限公司 Event detection method and system, storage medium and electronic device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929977A (en) * 2012-10-16 2013-02-13 浙江大学 Event tracing method aiming at news website
US20170083484A1 (en) * 2015-09-21 2017-03-23 Tata Consultancy Services Limited Tagging text snippets
CN106649274A (en) * 2016-12-27 2017-05-10 东华互联宜家数据服务有限公司 Text content tag labeling method and device
CN106682123A (en) * 2016-12-09 2017-05-17 北京锐安科技有限公司 Hot event acquiring method and device
CN108170692A (en) * 2016-12-07 2018-06-15 腾讯科技(深圳)有限公司 A kind of focus incident information processing method and device
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN108595519A (en) * 2018-03-26 2018-09-28 平安科技(深圳)有限公司 Focus incident sorting technique, device and storage medium
CN108763272A (en) * 2018-04-08 2018-11-06 平安科技(深圳)有限公司 A kind of event information analysis method, computer readable storage medium and terminal device
CN109271639A (en) * 2018-10-11 2019-01-25 南京中孚信息技术有限公司 Hot ticket finds method and device
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
CN109726289A (en) * 2018-12-29 2019-05-07 北京百度网讯科技有限公司 Event detecting method and device
CN109918505A (en) * 2019-02-26 2019-06-21 西安电子科技大学 A kind of network security incident visualization method based on text-processing

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929977A (en) * 2012-10-16 2013-02-13 浙江大学 Event tracing method aiming at news website
US20170083484A1 (en) * 2015-09-21 2017-03-23 Tata Consultancy Services Limited Tagging text snippets
CN108170692A (en) * 2016-12-07 2018-06-15 腾讯科技(深圳)有限公司 A kind of focus incident information processing method and device
CN106682123A (en) * 2016-12-09 2017-05-17 北京锐安科技有限公司 Hot event acquiring method and device
CN106649274A (en) * 2016-12-27 2017-05-10 东华互联宜家数据服务有限公司 Text content tag labeling method and device
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN108595519A (en) * 2018-03-26 2018-09-28 平安科技(深圳)有限公司 Focus incident sorting technique, device and storage medium
CN108763272A (en) * 2018-04-08 2018-11-06 平安科技(深圳)有限公司 A kind of event information analysis method, computer readable storage medium and terminal device
CN109271639A (en) * 2018-10-11 2019-01-25 南京中孚信息技术有限公司 Hot ticket finds method and device
CN109726289A (en) * 2018-12-29 2019-05-07 北京百度网讯科技有限公司 Event detecting method and device
CN109918505A (en) * 2019-02-26 2019-06-21 西安电子科技大学 A kind of network security incident visualization method based on text-processing

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178679A (en) * 2019-12-06 2020-05-19 中能瑞通(北京)科技有限公司 Phase identification method based on clustering algorithm and network search
CN111060325A (en) * 2019-12-13 2020-04-24 斑马网络技术有限公司 Test scene construction method and device, electronic equipment and storage medium
CN111782803A (en) * 2020-06-05 2020-10-16 京东数字科技控股有限公司 Work order processing method and device, electronic equipment and storage medium
CN113419210A (en) * 2021-06-09 2021-09-21 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
CN113645439A (en) * 2021-06-22 2021-11-12 宿迁硅基智能科技有限公司 Event detection method and system, storage medium and electronic device
CN113645439B (en) * 2021-06-22 2022-07-29 宿迁硅基智能科技有限公司 Event detection method and system, storage medium and electronic device

Also Published As

Publication number Publication date
CN110458296B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN110458296A (en) The labeling method and device of object event, storage medium and electronic device
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
CN106030571B (en) Dynamically modifying elements of a user interface based on a knowledge graph
US9411327B2 (en) Systems and methods for classifying data in building automation systems
Inzalkar et al. A survey on text mining-techniques and application
CN107578292B (en) User portrait construction system
CN109145215A (en) Internet public opinion analysis method, apparatus and storage medium
CN108776671A (en) A kind of network public sentiment monitoring system and method
CN110020002A (en) Querying method, device, equipment and the computer storage medium of event handling scheme
CN107220295A (en) A kind of people's contradiction reconciles case retrieval and mediation strategy recommends method
CN109033200A (en) Method, apparatus, equipment and the computer-readable medium of event extraction
CN110008343A (en) File classification method, device, equipment and computer readable storage medium
CN106095939B (en) The acquisition methods and device of account authority
CN107704070A (en) Using method for cleaning, device, storage medium and electronic equipment
CN104050361A (en) Intelligent analysis early warning method for dangerousness tendency of prison persons serving sentences
CN112948575B (en) Text data processing method, apparatus and computer readable storage medium
CN104462096B (en) Public sentiment method for monitoring and analyzing and device
CN107145516A (en) A kind of Text Clustering Method and system
CN107229614A (en) Method and apparatus for grouped data
CN107704289A (en) Using method for cleaning, device, storage medium and electronic equipment
CN109960719A (en) A kind of document handling method and relevant apparatus
CN105512300B (en) information filtering method and system
CN112966072A (en) Case prediction method and device, electronic device and storage medium
CN115858906A (en) Enterprise searching method, device, equipment, computer storage medium and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant