CN110020147A - Model generates, method for distinguishing, system, equipment and storage medium are known in comment - Google Patents
Model generates, method for distinguishing, system, equipment and storage medium are known in comment Download PDFInfo
- Publication number
- CN110020147A CN110020147A CN201711225988.1A CN201711225988A CN110020147A CN 110020147 A CN110020147 A CN 110020147A CN 201711225988 A CN201711225988 A CN 201711225988A CN 110020147 A CN110020147 A CN 110020147A
- Authority
- CN
- China
- Prior art keywords
- data
- comment
- module
- model
- historical review
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a kind of models to generate, method for distinguishing, system, equipment and storage medium are known in comment, and the method that wherein model generates is the following steps are included: S1, obtain historical review data;S2, every historical review data are labeled, to generate the first intermediate data, every first intermediate data includes historical review data and corresponding label, and label is comment spam or valuable comment;S3, every first intermediate data is converted into historical review sequence;S4, obtain feature, by historical review sequence and feature be input to Recognition with Recurrent Neural Network carry out model training, to generate the Classification and Identification model of comment spam.The present invention identifies Recognition with Recurrent Neural Network applied to comment spam, the Classification and Identification model for generating comment spam is trained using historical review data, new comment data to be identified is identified by the model to determine whether for comment spam, identification cost is reduced, the coverage and accuracy of comment spam identification are improved.
Description
Technical field
The invention belongs to comment spams to identify field, the in particular to mould of a kind of comment spam based on Recognition with Recurrent Neural Network
Type generates, method for distinguishing, system, equipment and storage medium are known in comment.
Background technique
With the development of internet and artificial intelligence, the quantity and influence power of online comment are continuously increased, and comment can be
Multiple fields have an impact people, are even more important in internet area, and production can further be improved by effectively excavating user information
Product;For a user, feedback of the user to the article for having had purchased article is understood by comment content, can help oneself
The information such as advantage and disadvantage, the cost performance of article are understood in time, it is final that user is helped to make purchase decision.But comment is often flooded with
More noise, some comments do not evaluate article itself, but have write some incoherent poems, some comments are
Advertisement, link even have aggressive word, these comments are referred to as comment spam.The identification of comment spam is one and fills
The work of full challenge and meaningful.
The identification that analytic hierarchy process (AHP) carries out comment spam is generallyd use in the prior art, is specifically passed through by analyst in conjunction with business
Determining feature weight is tested, the calculation formula of comment score is then provided.The disadvantages of this method are: first, human cost is larger, needs
The link for wanting expert to have product analysis teacher of business experience to give evaluation is more, does not meet the artificial intelligence of current era machine learning
The trend of energy;Second, when data volume is larger, this method can be more multiple in order to guarantee that the accuracy of feature vector usually calculates
It is miscellaneous;Third, analytic hierarchy process (AHP) are statistical thinkings, are that weight estimation is done on Small Sample Database, and qualitative ingredient is more, quantitative ingredient
It is few, so calculated result is not accurate enough.
Summary of the invention
The technical problem to be solved by the present invention is in order to overcome the identification method of comment spam in the prior art, there are manpowers
It is at high cost, calculate complicated and not high accuracy defect, provide a kind of identification accuracy for being able to ascend comment spam based on
The model of the comment spam of Recognition with Recurrent Neural Network generates, method for distinguishing, system, equipment and storage medium are known in comment.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of methods that model generates, it is characterized in that, comprising the following steps:
S1, obtain historical review data;
S2, every historical review data are labeled, to generate the first intermediate data, every first intermediate data
Including the historical review data and corresponding label, the label is comment spam or valuable comment;
S3, every first intermediate data is converted into historical review sequence;
S4, obtain feature, by the historical review sequence and the feature be input to Recognition with Recurrent Neural Network carry out model instruction
Practice, to generate the Classification and Identification model of comment spam.
In the present solution, historical review data are comment lteral data, which recycles after need to being converted to comment sequence
Neural network could identify to carry out model training.
It is comment spam with every historical review data of determination in the present solution, being labeled first to historical review data
Or valuable comment, then the corresponding historical review sequence inputting of historical review data by feature and after marking is to recycling mind
It is trained through network, ultimately generates the Classification and Identification model of comment spam.The Classification and Identification model can be used in subsequent new
Comment data whether be comment spam identification.
In the present solution, Recognition with Recurrent Neural Network is identified applied to comment spam, life is trained using historical review data
At the Classification and Identification model for being suitable for comment spam, to help whether subsequent new comment data for comment spam provides decision.
By the Classification and Identification model can the new comment data of automatic identification whether be comment spam, be no longer dependent on artificial participation,
To reduce the human cost of rubbish identification.
Preferably, the Recognition with Recurrent Neural Network is LSTM (Long Short-Term Memory, shot and long term memory network).
Preferably, step S4Middle progress model training includes the steps that debugging core parameter, the core parameter include
Batch_size, num_steps, vocab_size, hidden_units and learning_rate.
In the present solution, Recognition with Recurrent Neural Network uses LSTM, comment spam identification is influenced in numerous parameters included by LSTM
The core parameter of accuracy includes batch_size, num_steps, vocab_size, hidden_units and learning_
rate.Wherein batch_size indicates the number of gradient decline iteration batch of data, i.e., training takes in training set every time
Batch_size sample training, generally 2 multiple;Num_steps indicates the step number of deep learning, and value range is just whole
Number;The size of vocab_size expression Recognition with Recurrent Neural Network word sliding window;Hidden_units indicates that deep learning is implicit
Layer number;The learning rate of learning_rate expression deep neural network.
In the present solution, pre- to the progress propagated forward acquisition of each parameter by choosing training data inside Recognition with Recurrent Neural Network
Measured value, and by backpropagation undated parameter, finally pick out the core parameter for influencing comment spam recognition accuracy and determination
The value of each core parameter.
Preferably, batch_size be 64, num_steps 100, vocab_size 2, hidden_units 8,
Learning_rate is 0.001.
Preferably, step S4It is middle that the core ginseng is debugged using TensorFlow (second generation artificial intelligence learning system)
Number.
In the present solution, specially multiple devices read the value of parameter simultaneously, and work as using Distributed T ensorFlow
After back-propagation algorithm is completed, the value of synchronized update parameter, individual equipment will not be individually updated parameter, and can wait
Backpropagation and then unified undated parameter are all completed to all devices, in each round iteration, distinct device obtains one at random
Fraction data, the gradient of respective training parameter need to calculate difference after all devices complete the calculating of backpropagation
The average value of parameter gradients in equipment, final updating parameter.
In the present solution, by Distributed T ensorFlow, more GPU (Graphics Processing Unit, graphics process
Device) parallel training model, the historical review sequence of big data quantity is handled, processing speed faster, is able to ascend user experience.
Preferably, the method that the model generates further includes extracting feature to generate the feature.
Preferably, the feature includes the comment feature and user characteristics of commodity;
The comment feature of the commodity includes at least one of following characteristics:
That distance away from current time of the comment ranking score of commentator, comment creation time, comment score, comment thumb up
Number, comment reply number, comment length, the picture number in comment, comment on whether have the Commercial goods labels for chasing after and commenting and including in commenting on
Quantity;
The user characteristics include at least one of following characteristics:
User's gender, user's purchasing power grade, user member's level information and user's value point.
In the present solution, feature can be extracted from historical review data by Feature Engineering for Recognition with Recurrent Neural Network training mould
Type uses.
Preferably, step S1With step S2Between further include LDA (Latent Dirichlet Allocation, Yi Zhongwen
Shelves theme generate model) Subject Clustering, the LAD Subject Clustering the following steps are included:
T1, every historical review data are converted into history feature vector;
T2, obtain the feature, feature described in the history feature vector sum be input to LDA model carry out theme and gather
Class, to obtain the quantity of the history feature vector under each classification of the LDA model;
T3, one by one judge whether the quantity of the history feature vector under each classification is less than preset value, if then holding
Row step T4, T is thened follow the steps if not5;
T4, the history feature vector institute that the quantity of the history feature vector is less than under the classification of the preset value
The corresponding historical review data are labeled, and to generate the second intermediate data, every second intermediate data includes institute
Historical review data and corresponding label are stated, the label in second intermediate data is comment spam;
T5, by the quantity of the history feature vector be greater than or equal to the preset value classification under the history feature
The corresponding historical review data of vector are set as historical review data to be marked;
Step S2Are as follows:
Every historical review data to be marked are labeled, to generate the first intermediate data, among every described first
Data include the historical review data to be marked and corresponding label, and the label is comment spam or valuable comment;
Step S3Are as follows: every first intermediate data and every second intermediate data are converted into historical review sequence
Column.
In the present solution, carrying out LDA Subject Clustering after obtaining historical review data first, distribution includes seldom under Subject Clustering
Comment classification under historical review data directly label as comment spam, other historical review data are using mark
Afterwards to be determined as comment spam or valuable comment, then it is input in Recognition with Recurrent Neural Network and is trained again.
In the present solution, on the algorithm of machine learning, in conjunction with supervised learning and unsupervised learning, specially first by LDA
Subject Clustering provides didactic rubbish tag recognition for deep learning training set, is then trained other data of collection again
Historical review data after mark are finally input to Recognition with Recurrent Neural Network and carry out model training and ginseng by the mark of rubbish label
Several optimization debugging.
In the present solution, being clustered first by LAD Subject Clustering to all historical review data, so can determine that
A part of historical review data are comment spam, to reduce the workload of mark, while also improving the essence of comment spam
Exactness.
Preferably, step S1With step S2Between it is further comprising the steps of:
Data cleansing is carried out to the historical review data;
Step S2In every historical review data after data cleansing are labeled, to generate first mediant
According to.
In the present solution, further including the data cleansing step of historical review data, the missing values number of comment can specifically include
According to processing, the heuristic process of the outlier processing of comment data and comment data.The outlier processing of comment data, such as comment
By picture normal condition within the scope of tens, having the picture of a comment once in a while is 10,000, it is believed that this comment picture
Data are abnormal Value Datas, are disposed, which does not use;The heuristic process of comment data, such as have a comment, do not have
There is any language, be all punctuation mark and number, being considered as this comment is comment spam, can also directly be labelled as rubbish
Comment, subsequent feeding Recognition with Recurrent Neural Network use.
Preferably, step S3The following steps are included:
S31, using word2vec (a tool that word is characterized as to real number value vector) by first intermediate data calculate
The vector of each word out;
S32, to all words included by first intermediate data be averaging to generate the historical review sequence.
In the present solution, the first intermediate data to be calculated to the vector for generating word by word2vec, then again to all words
Language is averaging to generate historical review sequence, so that the vector for the sentence being made of text being converted into mathematics is realized, it should
Vector is used for follow-up process.
The present invention also provides the systems that a kind of model generates, it is characterized in that, including data acquisition module, the first label
Labeling module, the first data conversion module and model training module;
The data acquisition module is for obtaining historical review data;
The first label for labelling module is for being labeled every historical review data, to generate the first mediant
Include the historical review data and corresponding label according to, every first intermediate data, the label be comment spam or
Valuable comment;
First data conversion module is used to every first intermediate data being converted into historical review sequence;
The model training module is input to circulation mind for obtaining feature, by the historical review sequence and the feature
Model training is carried out through network, to generate the Classification and Identification model of comment spam.
Preferably, the model training module further includes core parameter debugging module;
For the core parameter debugging module for debugging core parameter, the core parameter includes batch_size, num_
Steps, vocab_size, hidden_units and learning_rate.
Preferably, the system that the model generates further includes characteristic extracting module, the characteristic extracting module is for generating
The feature.
Preferably, the system that the model generates further includes LDA Subject Clustering module, the LDA Subject Clustering module packet
Include the second data conversion module, cluster execution module, judgment module, the second label for labelling module and data setup module;
The data acquisition module is also used to call second data conversion module after obtaining historical review data;
Second data conversion module is used to every historical review data being converted to history feature vector, calls
The cluster execution module;
Feature described in the history feature vector sum is input to by the cluster execution module for obtaining the feature
LDA model carries out Subject Clustering, to obtain the quantity of the history feature vector under each classification of the LDA model, adjusts
With the judgment module;
It is pre- that the judgment module is used to judge one by one whether the quantity of the history feature vector under each classification to be less than
If value, if then calling the second label for labelling module, if otherwise calling the data setup module;
The second label for labelling module is used to be less than the quantity of the history feature vector classification of the preset value
Under the corresponding historical review data of the history feature vector be labeled, to generate the second intermediate data, every institute
Stating the second intermediate data includes the historical review data and corresponding label, and the label in second intermediate data is rubbish
Comment;
The data setup module is used to for the quantity of the history feature vector being greater than or equal to the class of the preset value
The corresponding historical review data of the history feature vector under not are set as historical review data to be marked;
The first label for labelling module is for being labeled every historical review data to be marked, to generate described the
One intermediate data, every first intermediate data include the historical review data to be marked and corresponding label, the mark
Label are comment spam or valuable comment;
First data conversion module is used for every first intermediate data and every second intermediate data
It is converted into historical review sequence.
Preferably, the system that the model generates further includes data cleansing module;
The data acquisition module is also used to call the data cleansing module after obtaining historical review data;
The data cleansing module is used to carry out data cleansing to the historical review data;
The first label for labelling module is for being labeled every historical review data after data cleansing, to generate
First intermediate data.
Preferably, first data conversion module includes word vectors generation module and evaluation sequence generating module;
The word vectors generation module is used to that first intermediate data to be calculated each word using word2vec
The vector of language;
The evaluation sequence generating module is used to be averaging with life all words included by first intermediate data
At the historical review sequence.
The present invention also provides the equipment that a kind of model generates, including memory, processor and storage are on a memory simultaneously
The computer program that can be run on a processor, it is characterized in that, the processor realizes mould above-mentioned when executing described program
The method that type generates.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, it is characterized in that,
The step of method that model above-mentioned generates is realized when described program is executed by processor.
The present invention also provides a kind of comments to know method for distinguishing, it is characterized in that, comprising the following steps:
L1, obtain comment data to be identified;
L2, the comment data to be identified is converted into comment sequence to be identified;
L3, the step S of method that generates comment sequence inputting to the model above-mentioned to be identified4Classification generated
Identification model;
L4, the Classification and Identification model judge that the comment data to be identified corresponding to the comment sequence to be identified is
No is comment spam.
In the present solution, comment data to be identified is new comment data to be identified, data input is aforementioned according to history
Comment spam or valuable comment can be directly identified as after the Classification and Identification model that comment data generates.This programme provides
Comment know method for distinguishing can automatic identification comment data to be identified whether be comment spam, reduce identification cost, promoted
The coverage and accuracy of comment spam identification.In addition, comment spam is effectively recognized, show user is all to have reference
The comment of value, thus further improves user experience.
The present invention also provides a kind of systems of comment identification, it is characterized in that, including data acquisition module to be identified, sequence
The system that column-generation module, input module and model above-mentioned generate;
The data acquisition module to be identified is for obtaining comment data to be identified;
The sequence generating module is used to the comment data to be identified being converted into comment sequence to be identified;
The input module is used for the comment sequence inputting to be identified to the Classification and Identification model;
The Classification and Identification model is for judging the comment data to be identified corresponding to the comment sequence to be identified
It whether is comment spam.
The present invention also provides a kind of equipment of comment identification, including memory, processor and storage are on a memory simultaneously
The computer program that can be run on a processor, it is characterized in that, the processor realizes above-mentioned comment when executing described program
By knowledge method for distinguishing.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, it is characterized in that,
The step of method for distinguishing is known in comment above-mentioned is realized when described program is executed by processor.
The positive effect of the present invention is that: model provided by the invention generates, method for distinguishing is known in comment, system, sets
Standby and storage medium identifies Recognition with Recurrent Neural Network applied to comment spam, is trained generation using historical review data and is applicable in
New comment data to be identified is identified with true in the Classification and Identification model of comment spam, then by the Classification and Identification model
Whether fixed is comment spam.The present invention can automatic identification comment data to be identified whether be comment spam, reduce and be identified as
This, improves the coverage and accuracy of comment spam identification.In addition, comment spam is effectively recognized, user is showed all
It is the comment for having reference value, thus further improves user experience.
Detailed description of the invention
Fig. 1 is the flow chart for the method that the model of the embodiment of the present invention 1 generates.
Fig. 2 is the flow chart of step 108 in the embodiment of the present invention 1.
Fig. 3 is the flow chart of distributed training pattern in the embodiment of the present invention 1.
Fig. 4 is the module diagram for the system that the model of the embodiment of the present invention 2 generates.
Fig. 5 is the hardware structural diagram for the equipment that the model of the embodiment of the present invention 4 generates.
Fig. 6 is that the flow chart of method for distinguishing is known in the comment of the embodiment of the present invention 5.
Fig. 7 is the module diagram of the system of the comment identification of the embodiment of the present invention 6.
Specific embodiment
The present invention is further illustrated below by the mode of embodiment, but does not therefore limit the present invention to the reality
It applies among a range.
Embodiment 1
As shown in Figure 1, the method that model provided in this embodiment generates the following steps are included:
Step 101 obtains historical review data;
Step 102 carries out data cleansing to the historical review data;
Step 103 carries out feature extraction to the historical review data after data cleansing to obtain feature;
The historical review data after the cleaning of every data are converted to history feature vector by step 104;
Feature described in the history feature vector sum is input to LDA model progress Subject Clustering by step 105, to obtain
The quantity of the history feature vector under each classification of the LDA model;
Step 106 judges whether the quantity of the history feature vector under each classification is less than preset value one by one, if
Then the historical review data corresponding to the history feature vector under the category are labeled, to generate among second
Data, every second intermediate data include the historical review data and corresponding label, second mediant
The value of label in is comment spam;If otherwise by the corresponding historical review of the history feature vector under the category
Data are set as historical review data to be marked;
Step 107 is labeled every historical review data to be marked, to generate the first intermediate data, described in every
First intermediate data includes the historical review data to be marked and corresponding label, and the label is comment spam or valuable
Comment;
Every first intermediate data and every second intermediate data are converted into historical review sequence by step 108
Column;
The historical review sequence and the feature are input to LSTM progress model training by step 109, are used
TensorFlow debugs core parameter, to generate the Classification and Identification model of comment spam.
Wherein, the corresponding process of step 108 is as shown in Figure 2, comprising the following steps:
Step 1081, using word2vec, first intermediate data and second intermediate data are counted respectively
It calculates, to obtain the vector of each word;
Step 1082 asks flat to all words included by first intermediate data and second intermediate data respectively
To generate the historical review sequence.
In the present embodiment, core parameter includes batch_size, num_steps, vocab_size, hidden_units and
learning_rate。
Wherein, batch_size indicates the number of gradient decline iteration batch of data in LSTM, i.e., training is being instructed every time
Practicing to concentrate takes batch_size sample to be trained, and batch_size value is 64 in the present embodiment.
Num_steps indicates the step number of deep learning in LSTM, and value is 100 in the present embodiment.
Vocab_size indicates the size of Recognition with Recurrent Neural Network word sliding window in LSTM, and value is in the present embodiment
2。
Hidden_units indicates deep learning hidden layer number in LSTM, and value is 8 in the present embodiment.
Learning_rate indicates the learning rate of deep neural network in LSTM, and value is in the present embodiment
0.001。
Core parameter data_dir, ps_hosts, the worker_hosts in five LSTM are further related in the present embodiment,
Job_name, tf.device, these parameters are directly specified in the present embodiment.Wherein data_dir indicates training data path, this
It is divided into training set, verifying collection, test set in embodiment;Ps_hosts indicates that Distributed T ensorFlow cluster is responsible for receiving parameter
Machine;Worker_hosts indicates that Distributed T ensorFlow cluster is responsible for calculating the machine of training pattern;Job_name is used
In the title for an application task for indicating training pattern starting;Tf.device uses GPU still for specifying in training process
CPU (Central Processing Unit, central processing unit).
In the present embodiment, the feature includes the comment feature and user characteristics of commodity;The comment feature packet of the commodity
Include commentator comment ranking score, distance of the comment creation time away from current time, comment score (including favorable comment, in comment and poor
Comment), number is replied in the number that thumbs up of comment, comment, whether comment length, the picture number in comment, comment chase after and comment and comment on
In include Commercial goods labels quantity;The user characteristics include user's gender, user's purchasing power grade, user member's rank letter
Breath and user's value point.Whether the user that wherein user member's level information is similar to Jingdone district is a kind of plus (meeting in Jingdone district
Member's grade) member.
In the present embodiment, data cleansing is carried out to historical review data first, data cleansing specifically includes the missing of comment
Value Data processing, the outlier processing of comment data and comment data heuristic process.The outlier processing of comment data, example
Picture normal condition is such as commented within the scope of tens, having the picture of a comment once in a while is 10,000, it is believed that this comment
Image data is abnormal Value Data, is disposed, which does not use;The heuristic process of comment data, such as there is one to comment
By being all punctuation mark and number without any language, being considered as this comment is comment spam, can also directly be labelled
For comment spam, subsequent feeding Recognition with Recurrent Neural Network is used.
In the present embodiment, pass through LAD master after being converted to history feature vector for the historical review data after data cleansing
Topic is clustered, specifically, distribution under Subject Clustering is straight including the historical review data under the classification of seldom comment
Taking label is comment spam, and other historical review data are set as after historical review data to be marked using mark with true
The fixed data are comment spam or valuable comment, are then input in Recognition with Recurrent Neural Network and are trained again.Thus, it is possible to
Directly determining a part of historical review data is comment spam, to reduce the workload of mark, while also improving rubbish
The accuracy of comment.
In the present embodiment, data cleansing is successively passed through for historical review data, LDA Subject Clustering is marked and reset
Afterwards, it also needs manually to be marked for being set as historical review data to be marked, to determine each historical review to be marked
Data are comment spam or valuable comment, then data conversion are carried out to the first intermediate data after mark again, by text
Data are converted to the vector that LSTM is capable of handling, specially first by word2vec respectively in the first intermediate data and second
Between data meter calculated, the vector of corresponding word can be generated by calculating, then again to the institute of user one comment the inside
There is word averaging, historical review sequence, that is, the vector of a word is generated, to realize the sentence being made of text
It is converted into the vector of mathematics, feature and historical review sequence inputting are trained again to LSTM then, used
TensorFlow debugs core parameter, ultimately generates the Classification and Identification model of comment spam.The Classification and Identification model is for subsequent
New comment data whether be comment spam identification.
In the present embodiment, model training is carried out using Distributed T ensorFlow, as shown in figure 3, simultaneously for multiple devices
The value of parameter is read, and after back-propagation algorithm is completed, the value of synchronized update parameter, individual equipment will not be independent
Parameter is updated, and all devices can be waited all to complete backpropagation and then unified undated parameter, in each round iteration
When, distinct device obtains sub-fraction data, the gradient of respective training parameter, when all devices complete the meter of backpropagation at random
After calculation, need to calculate the average value of parameter gradients on distinct device, final updating parameter.
In the present embodiment, by Distributed T ensorFlow, more GPU parallel training models handle the history of big data quantity
Sequence is commented on, processing speed faster, is able to ascend user experience.
In the present embodiment, Recognition with Recurrent Neural Network is identified applied to comment spam, Recognition with Recurrent Neural Network uses LSTM, utilizes
Historical review data are trained the Classification and Identification model for generating and being suitable for comment spam, to help subsequent new comment data to be
Identify whether that comment spam provides decision.By the Classification and Identification model can the new comment data of automatic identification whether be rubbish
Comment, is no longer dependent on artificial participation, to reduce the human cost of rubbish identification.
In the present embodiment, on the algorithm of machine learning, in conjunction with supervised learning and unsupervised learning, specially first by
LDA Subject Clustering provides didactic rubbish tag recognition for deep learning training set, is then trained other numbers of collection again
According to rubbish label mark, finally by the historical review data after mark be input to Recognition with Recurrent Neural Network carry out model training with
And the optimization debugging of parameter.
Embodiment 2
As shown in figure 4, the system that the model of the present embodiment generates includes data acquisition module 1, data cleansing module 2, spy
Levy extraction module 3, LDA Subject Clustering module 4, the first label for labelling module 5, the first data conversion module 6 and model training mould
Block 7;
The model training module 7 further includes core parameter debugging module 701;
The LDA Subject Clustering module 4 includes the second data conversion module 401, cluster execution module 402, judgment module
403, the second label for labelling module 404 and data setup module 405;
First data conversion module 6 includes word vectors generation module 601 and evaluation sequence generating module 602;
For obtaining historical review data, the data acquisition module 1 is also used to go through in acquisition the data acquisition module 1
The data cleansing module 2 is called after history comment data;
The data cleansing module 2 is used to carry out data cleansing to the historical review data;
The characteristic extracting module 3 is used to carry out feature extraction to the historical review data after data cleansing to obtain feature,
Call second data conversion module 401;
Second data conversion module 401 is used to the historical review data after the cleaning of every data being converted to history spy
Vector is levied, the cluster execution module 402 is called;
The cluster execution module 402 is used to for feature described in the history feature vector sum to be input to LDA model and carry out
Subject Clustering calls the judgement mould to obtain the quantity of the history feature vector under each classification of the LDA model
Block 403;
The judgment module 403 is used to judge whether the quantity of the history feature vector under each classification is less than one by one
Preset value, if then calling the second label for labelling module 404, if otherwise calling the data setup module 405;Described
Two label for labelling modules 404 are used to be less than the quantity of the history feature vector history under the classification of the preset value
The corresponding historical review data of feature vector are labeled, to generate the second intermediate data, every second mediant
According to including the historical review data and corresponding label, the label in second intermediate data is comment spam;The number
It is gone through described in being used to for the quantity of the history feature vector being greater than or equal to according to setup module 405 under the classification of the preset value
The corresponding historical review data of history feature vector are set as historical review data to be marked;The judgment module 403 is handled
The first label for labelling module 5 is called after history feature vector under complete all categories;
The first label for labelling module 5 is for being labeled every historical review data to be marked, to generate first
Intermediate data, every first intermediate data include the historical review data to be marked and corresponding label, the label
For comment spam or valuable comment, the first label for labelling module 5 calls first data conversion module after having marked
6;
First data conversion module 6 is used for every first intermediate data and every second intermediate data
It is converted into historical review sequence and calls the model training module 7.
The model training module 7 be used for by the historical review sequence and the feature be input to Recognition with Recurrent Neural Network into
Row model training, the model training module 701 debugs the core parameter using TensorFlow, to generate comment spam
Classification and Identification model.
The word vectors generation module 601 is for being respectively adopted word2vec for first intermediate data and described the
Two intermediate data calculate the vector of each word and call the evaluation sequence generating module 602;
The evaluation sequence generating module 602 is used for respectively to first intermediate data and the second intermediate data institute
Including all words be averaging to generate the historical review sequence.
In the present embodiment, the Recognition with Recurrent Neural Network uses LSTM.
In the present embodiment, core parameter includes batch_size, num_steps, vocab_size, hidden_units and
learning_rate.It is 100, vocab_size value is 2, hidden_ that batch_size value, which is 64, num_steps value,
Units value is that 8, learning_rate value is 0.001.
In the present embodiment, the feature includes the comment feature and user characteristics of commodity;The comment feature packet of the commodity
It includes the comment ranking score of commentator, the number that distance of the comment creation time away from current time, comment score, comment thumb up, comment
By reply number, comment length, the picture number in comment, comment whether have chase after comment and comment in include Commercial goods labels number
Amount;The user characteristics include user's gender, user's purchasing power grade, user member's level information and user's value point.
The system that model provided in this embodiment generates identifies Recognition with Recurrent Neural Network applied to comment spam, circulation mind
LSTM is used through network, the Classification and Identification model for generating and being suitable for comment spam can be trained using historical review data,
It is to identify whether that comment spam provides decision to help subsequent new comment data.It can be known automatically by the Classification and Identification model
Whether not new comment data is comment spam, is no longer dependent on artificial participation, to reduce the human cost of rubbish identification.
Embodiment 3
Fig. 5 is the structural schematic diagram for the equipment that a kind of model that the embodiment of the present invention 3 provides generates.Fig. 5, which is shown, to be suitable for
For realize embodiment of the present invention exemplary model generate equipment 50 block diagram.The equipment that the model that Fig. 5 is shown generates
50 be only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, the equipment 50 that model generates can be showed in the form of universal computing device, such as it can be clothes
Business device equipment.The component of equipment 50 that model generates can include but is not limited to: at least one above-mentioned processor 51, it is above-mentioned at least
One memory 52, the bus 53 for connecting different system components (including memory 52 and processor 51).
Bus 53 includes data/address bus, address bus and control bus.
Memory 52 may include volatile memory, such as random access memory (RAM) 521 and/or cache
Memory 522 can further include read-only memory (ROM) 523.
Memory 52 can also include program/utility 525 with one group of (at least one) program module 524, this
The program module 524 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey
It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Processor 51 by the computer program that is stored in memory 52 of operation, thereby executing various function application and
Data processing, such as the method that model provided by the embodiment of the present invention 1 generates.
The equipment 50 that model generates can also be logical with one or more external equipments 54 (such as keyboard, sensing equipment etc.)
Letter.This communication can be carried out by input/output (I/O) interface 55.Also, the equipment 50 that model generates can also pass through net
Network adapter 56 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as because
Special net) communication.As shown, the other modules for the equipment 50 that network adapter 56 is generated by bus 53 and model communicate.It answers
When understand, although not shown in the drawings, can with binding model generate equipment 50 use other hardware and/or software module, including
But it is not limited to: microcode, device driver, redundant processor, external disk drive array, RAID (disk array) system, magnetic
Tape drive and data backup storage system etc..
It should be noted that although being referred to several units/modules or son of the equipment of model generation in the above detailed description
Units/modules, but it is this division be only exemplary it is not enforceable.In fact, according to presently filed embodiment,
The feature and function of two or more above-described units/modules can embody in a units/modules.On conversely,
The feature and function of one units/modules of text description can be to be embodied by multiple units/modules with further division.
Embodiment 4
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with
The step of method that model provided by embodiment 1 generates is realized when processor executes.
Embodiment 5
As shown in fig. 6, the present embodiment comment know method for distinguishing the following steps are included:
Step M1, comment data to be identified is obtained;
Step M2, the comment data to be identified is converted into comment sequence to be identified;
Step M3, the step 109 for the method for generating comment sequence inputting to the model described in embodiment 1 to be identified
Classification and Identification model generated;
Step M4, the described Classification and Identification model judges the comment number to be identified corresponding to the comment sequence to be identified
According to whether being comment spam.
In the present embodiment, comment data to be identified is new comment data to be identified, which inputs 1 basis of embodiment
Comment spam or valuable comment can be directly identified as after the Classification and Identification model that historical review data generate.
Comment provided in this embodiment know method for distinguishing can automatic identification comment data to be identified whether be comment spam,
Identification cost is reduced, the coverage and accuracy of comment spam identification are improved.In addition, comment spam is effectively recognized, exhibition
Show that user be all the comment for having reference value, thus further improves user experience.
Embodiment 6
As shown in fig. 7, the system that a kind of comment of the present embodiment identifies, including data acquisition module to be identified 8, sequence are raw
The system 11 generated at module 9, input module 10 and model as described in example 2;
The data acquisition module to be identified 8 is for obtaining comment data to be identified;
The sequence generating module 9 is used to the comment data to be identified being converted into comment sequence to be identified;
The input module 10 is used for the comment sequence inputting to be identified to the Classification and Identification model;
The Classification and Identification model is for judging the comment data to be identified corresponding to the comment sequence to be identified
It whether is comment spam.
It is provided in this embodiment comment identification system can automatic identification comment data to be identified whether be comment spam,
Identification cost is reduced, the coverage and accuracy of comment spam identification are improved.In addition, comment spam is effectively recognized, exhibition
Show that user be all the comment for having reference value, thus further improves user experience.
Embodiment 7
The equipment for present embodiments providing a kind of comment identification, including memory, processor and storage are on a memory simultaneously
The computer program that can be run on a processor, the processor realize comment provided by embodiment 5 when executing described program
Know method for distinguishing.
Embodiment 8
A kind of computer readable storage medium is present embodiments provided, computer program, described program quilt are stored thereon with
The step of method for distinguishing is known in comment provided by embodiment 5 is realized when processor executes.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that this is only
For example, protection scope of the present invention is to be defined by the appended claims.Those skilled in the art without departing substantially from
Under the premise of the principle and substance of the present invention, many changes and modifications may be made, but these change and
Modification each falls within protection scope of the present invention.
Claims (22)
1. a kind of method that model generates, which comprises the following steps:
S1, obtain historical review data;
S2, every historical review data are labeled, to generate the first intermediate data, every first intermediate data includes
The historical review data and corresponding label, the label are comment spam or valuable comment;
S3, every first intermediate data is converted into historical review sequence;
S4, obtain feature, by the historical review sequence and the feature be input to Recognition with Recurrent Neural Network carry out model training, with
Generate the Classification and Identification model of comment spam.
2. the method that model as described in claim 1 generates, which is characterized in that the Recognition with Recurrent Neural Network is LSTM.
3. the method that model as described in claim 1 generates, which is characterized in that step S4Middle progress model training includes debugging
The step of core parameter, the core parameter include batch_size, num_steps, vocab_size, hidden_units and
learning_rate。
4. the method that model as claimed in claim 3 generates, which is characterized in that batch_size 64, num_steps are
100, vocab_size 2, hidden_units 8, learning_rate 0.001.
5. the method that model as claimed in claim 3 generates, which is characterized in that step S4It is middle that institute is debugged using TensorFlow
State core parameter.
6. the method that model as described in claim 1 generates, which is characterized in that the method that the model generates further includes extracting
Feature is to generate the feature.
7. the method that model as claimed in claim 6 generates, which is characterized in that the feature include commodity comment feature and
User characteristics;
The comment feature of the commodity includes at least one of following characteristics:
Number that distance away from current time of the comment ranking score of commentator, comment creation time, comment score, comment thumb up,
Number is replied in comment, whether comment length, the picture number in comment, comment have to chase after comments and the number of the Commercial goods labels that include in commenting on
Amount;
The user characteristics include at least one of following characteristics:
User's gender, user's purchasing power grade, user member's level information and user's value point.
8. the method that model as claimed in claim 7 generates, which is characterized in that step S1With step S2Between further include LDA master
Topic cluster, the LAD Subject Clustering the following steps are included:
T1, every historical review data are converted into history feature vector;
T2, obtain the feature, feature described in the history feature vector sum is input to LDA model and carries out Subject Clustering, with
The quantity of the history feature vector under to each classification of the LDA model;
T3, one by one judge whether the quantity of the history feature vector under each classification is less than preset value, if so then execute step
T4, T is thened follow the steps if not5;
T4, corresponding to the history feature vector that is less than under the classification of the preset value to the quantity of the history feature vector
The historical review data be labeled, to generate the second intermediate data, every second intermediate data includes described goes through
History comment data and corresponding label, the label in second intermediate data are comment spam;
T5, by the quantity of the history feature vector be greater than or equal to the preset value classification under the history feature vector
The corresponding historical review data are set as historical review data to be marked;
Step S2Are as follows:
Every historical review data to be marked are labeled, to generate first intermediate data, among every described first
Data include the historical review data to be marked and corresponding label, and the label is comment spam or valuable comment;
Step S3Are as follows: every first intermediate data and every second intermediate data are converted into historical review sequence.
9. the method that model as described in claim 1 generates, which is characterized in that step S1With step S2Between further include following
Step:
Data cleansing is carried out to the historical review data;
Step S2In every historical review data after data cleansing are labeled, to generate first intermediate data.
10. the method that model as described in claim 1 generates, which is characterized in that step S3The following steps are included:
S31, first intermediate data is calculated to using word2vec the vector of each word;
S32, to all words included by first intermediate data be averaging to generate the historical review sequence.
11. the system that a kind of model generates, which is characterized in that including data acquisition module, the first label for labelling module, the first number
According to conversion module and model training module;
The data acquisition module is for obtaining historical review data;
The first label for labelling module is for being labeled every historical review data, to generate the first intermediate data, often
First intermediate data described in item includes the historical review data and corresponding label, and the label is comment spam or valuable
Comment;
First data conversion module is used to every first intermediate data being converted into historical review sequence;
The historical review sequence and the feature are input to circulation nerve net for obtaining feature by the model training module
Network carries out model training, to generate the Classification and Identification model of comment spam.
12. the system that model as claimed in claim 11 generates, which is characterized in that the model training module further includes core
Parameter testing module;
For the core parameter debugging module for debugging core parameter, the core parameter includes batch_size, num_
Steps, vocab_size, hidden_units and learning_rate.
13. the system that model as claimed in claim 11 generates, which is characterized in that the system that the model generates further includes spy
Extraction module is levied, the characteristic extracting module is for generating the feature.
14. the system that model as claimed in claim 13 generates, which is characterized in that the system that the model generates further includes
LDA Subject Clustering module, the LDA Subject Clustering module include the second data conversion module, cluster execution module, judge mould
Block, the second label for labelling module and data setup module;
The data acquisition module is also used to call second data conversion module after obtaining historical review data;
Second data conversion module is used to every historical review data being converted to history feature vector, described in calling
Cluster execution module;
Feature described in the history feature vector sum is input to LDA mould for obtaining the feature by the cluster execution module
Type carries out Subject Clustering, to obtain the quantity of the history feature vector under each classification of the LDA model, described in calling
Judgment module;
The judgment module is used to judge whether the quantity of the history feature vector under each classification is less than preset value one by one,
If then calling the second label for labelling module, if otherwise calling the data setup module;
The second label for labelling module is used to be less than under the classification of the preset value quantity of the history feature vector
The corresponding historical review data of the history feature vector are labeled, to generate the second intermediate data, every described
Two intermediate data include the historical review data and corresponding label, and the label in second intermediate data is commented for rubbish
By;
The data setup module is used to for the quantity of the history feature vector being greater than or equal under the classification of the preset value
The corresponding historical review data of the history feature vector be set as historical review data to be marked;
The first label for labelling module is for being labeled every historical review data to be marked, to generate in described first
Between data, every first intermediate data includes the historical review data to be marked and corresponding label, and the label is
Comment spam or valuable comment;
First data conversion module is used to convert every first intermediate data and every second intermediate data
At historical review sequence.
15. the system that model as claimed in claim 11 generates, which is characterized in that the system that the model generates further includes number
According to cleaning module;
The data acquisition module is also used to call the data cleansing module after obtaining historical review data;
The data cleansing module is used to carry out data cleansing to the historical review data;
The first label for labelling module is for being labeled every historical review data after data cleansing, described in generating
First intermediate data.
16. the system that model as claimed in claim 11 generates, which is characterized in that first data conversion module includes word
Language vector generation module and evaluation sequence generating module;
The word vectors generation module is used to that first intermediate data to be calculated each word using word2vec
Vector;
The evaluation sequence generating module is used to be averaging to generate all words included by first intermediate data
State historical review sequence.
17. the equipment that a kind of model generates, including memory, processor and storage can be run on a memory and on a processor
Computer program, which is characterized in that the processor realizes that claims 1 to 10 is described in any item when executing described program
The method that model generates.
18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
The step of method that the described in any item models of claims 1 to 10 generate is realized when device executes.
19. method for distinguishing is known in a kind of comment, which comprises the following steps:
L1, obtain comment data to be identified;
L2, the comment data to be identified is converted into comment sequence to be identified;
L3, the comment sequence inputting to be identified generated to the described in any item models of claims 1 to 10 method the step of
S4Classification and Identification model generated;
L4, the Classification and Identification model judge the comment data to be identified corresponding to the comment sequence to be identified whether be
Comment spam.
20. a kind of system of comment identification, which is characterized in that including data acquisition module to be identified, sequence generating module, input
The system that module and the described in any item models of claim 11 to 16 generate;
The data acquisition module to be identified is for obtaining comment data to be identified;
The sequence generating module is used to the comment data to be identified being converted into comment sequence to be identified;
The input module is used for the comment sequence inputting to be identified to the Classification and Identification model;
Whether the Classification and Identification model is for judging the comment data to be identified corresponding to the comment sequence to be identified
For comment spam.
21. a kind of equipment of comment identification, including memory, processor and storage can be run on a memory and on a processor
Computer program, which is characterized in that the processor realizes the identification of comment described in claim 19 when executing described program
Method.
22. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
The step of method for distinguishing is known in comment described in claim 19 is realized when device executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711225988.1A CN110020147A (en) | 2017-11-29 | 2017-11-29 | Model generates, method for distinguishing, system, equipment and storage medium are known in comment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711225988.1A CN110020147A (en) | 2017-11-29 | 2017-11-29 | Model generates, method for distinguishing, system, equipment and storage medium are known in comment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110020147A true CN110020147A (en) | 2019-07-16 |
Family
ID=67185904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711225988.1A Pending CN110020147A (en) | 2017-11-29 | 2017-11-29 | Model generates, method for distinguishing, system, equipment and storage medium are known in comment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020147A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103996130A (en) * | 2014-04-29 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Goods evaluation information filtering method and goods evaluation information filtering system |
CN104239539A (en) * | 2013-09-22 | 2014-12-24 | 中科嘉速(北京)并行软件有限公司 | Microblog information filtering method based on multi-information fusion |
CN105354216A (en) * | 2015-09-28 | 2016-02-24 | 哈尔滨工业大学 | Chinese microblog topic information processing method |
US20160267377A1 (en) * | 2015-03-12 | 2016-09-15 | Staples, Inc. | Review Sentiment Analysis |
CN106844349A (en) * | 2017-02-14 | 2017-06-13 | 广西师范大学 | Comment spam recognition methods based on coorinated training |
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
-
2017
- 2017-11-29 CN CN201711225988.1A patent/CN110020147A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104239539A (en) * | 2013-09-22 | 2014-12-24 | 中科嘉速(北京)并行软件有限公司 | Microblog information filtering method based on multi-information fusion |
CN103996130A (en) * | 2014-04-29 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Goods evaluation information filtering method and goods evaluation information filtering system |
US20160267377A1 (en) * | 2015-03-12 | 2016-09-15 | Staples, Inc. | Review Sentiment Analysis |
CN105354216A (en) * | 2015-09-28 | 2016-02-24 | 哈尔滨工业大学 | Chinese microblog topic information processing method |
CN106844349A (en) * | 2017-02-14 | 2017-06-13 | 广西师范大学 | Comment spam recognition methods based on coorinated training |
CN107153642A (en) * | 2017-05-16 | 2017-09-12 | 华北电力大学 | A kind of analysis method based on neural network recognization text comments Sentiment orientation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079813A (en) * | 2019-12-10 | 2020-04-28 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
CN111079813B (en) * | 2019-12-10 | 2023-07-07 | 北京百度网讯科技有限公司 | Classification model calculation method and device based on model parallelism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111139B (en) | Behavior prediction model generation method and device, electronic equipment and readable medium | |
CN110610193A (en) | Method and device for processing labeled data | |
CN110263979B (en) | Method and device for predicting sample label based on reinforcement learning model | |
JP6835204B2 (en) | Learning material recommendation method, learning material recommendation device and learning material recommendation program | |
CN109472305A (en) | Answer quality determines model training method, answer quality determination method and device | |
CN115002200B (en) | Message pushing method, device, equipment and storage medium based on user portrait | |
CN112182362A (en) | Method and device for training model for online click rate prediction and recommendation system | |
CN108932648A (en) | A kind of method and apparatus for predicting its model of item property data and training | |
Isljamovic et al. | Predicting students’ academic performance using artificial neural network: a case study from faculty of organizational sciences | |
CN110458600A (en) | Portrait model training method, device, computer equipment and storage medium | |
CN113742069A (en) | Capacity prediction method and device based on artificial intelligence and storage medium | |
CN110020147A (en) | Model generates, method for distinguishing, system, equipment and storage medium are known in comment | |
CN113705159A (en) | Merchant name labeling method, device, equipment and storage medium | |
CN108460049A (en) | A kind of method and system of determining information category | |
CN115167965A (en) | Transaction progress bar processing method and device | |
CN107291722B (en) | Descriptor classification method and device | |
Falessi et al. | The effort savings from using nlp to classify equivalent requirements | |
CN113837220A (en) | Robot target identification method, system and equipment based on online continuous learning | |
Wei et al. | Software project schedule management using machine learning & data mining | |
CN112446360A (en) | Target behavior detection method and device and electronic equipment | |
CN110807179A (en) | User identification method, device, server and storage medium | |
KR102546328B1 (en) | Method, device and system for providing content information monitoring and content planning automation solution for online marketing | |
Madan et al. | Comparison of benchmarks for machine learning cloud infrastructures | |
Waqas | A simulation-based approach to test the performance of large-scale real time software systems | |
Sabnis et al. | UPreG: An Unsupervised Approach for Building the Concept Prerequisite Graph. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190716 |