CN110119770A - Decision-tree model construction method, device, electronic equipment and medium - Google Patents
Decision-tree model construction method, device, electronic equipment and medium Download PDFInfo
- Publication number
- CN110119770A CN110119770A CN201910349851.XA CN201910349851A CN110119770A CN 110119770 A CN110119770 A CN 110119770A CN 201910349851 A CN201910349851 A CN 201910349851A CN 110119770 A CN110119770 A CN 110119770A
- Authority
- CN
- China
- Prior art keywords
- answer
- text
- answer text
- decision
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 103
- 238000010276 construction Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 56
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 8
- 238000001914 filtration Methods 0.000 description 4
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000005611 electricity Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of decision-tree model construction method, device, electronic equipment and medium, wherein this method comprises: training text is utilized to construct bag of words;The First Eigenvalue for each answer text for including according to the bag of words and the answer scoring label being arranged for each answer text, the first decision-tree model is established, and obtains the importance value of the word feature of each answer text exported by first decision-tree model;According to the importance value of the word feature of each answer text, the keyword feature for meeting preset condition is filtered out from the word feature of each answer text, and according to the Second Eigenvalue of each answer text obtained by the keyword feature and the answer scoring label for the setting of each answer text, the second decision-tree model is established, to be used for answer score in predicting.Using the application, it can guarantee the interpretation of model while improving score in predicting precision.
Description
Technical field
This application involves deep learning field more particularly to a kind of decision-tree model construction method, device, electronic equipment and
Medium.
Background technique
With the development of science and technology, in order to save the trouble manually to score, intelligent scoring system is come into being, and
It is more and more widely used in the mechanisms such as school, enterprise.Related personnel can manually formulate in intelligent scoring system
Respective rule, the rule that intelligent scoring system can manually be formulated using this carries out answer scoring, however which is used to realize
Score in predicting precision it is limited.In order to improve score in predicting precision, some of the staff use the machine learning method of logistic regression
Carry out answer scoring.Although can be realized higher score in predicting precision using the machine learning method of logistic regression, adopt
The interpretation of the model obtained with such mode is lower.
Summary of the invention
The embodiment of the present application provides a kind of decision-tree model construction method, device, electronic equipment and medium, can mention
While height scoring precision of prediction, guarantee the interpretation of model.
In a first aspect, the embodiment of the present application provides a kind of decision-tree model construction method, comprising:
Bag of words are constructed using training text;The bag of words include the first spy of each answer text in training text
Value indicative;
According to the First Eigenvalue of each answer text and it is the answer scoring label of each answer text setting, builds
Vertical first decision-tree model, and obtain each answer text exported by first decision-tree model word feature it is important
Degree value;
According to the importance value of the word feature of each answer text, screened from the word feature of each answer text
Meet the keyword feature of preset condition out, and the second feature of each answer text is obtained according to the keyword feature
Value;
It scores and marks according to the Second Eigenvalue of each answer text and the answer for the setting of each answer text
Label, establish the second decision-tree model, to be used for answer score in predicting.
Optionally, described to establish after the second decision-tree model, the method also includes:
When needing to carry out answer score in predicting to target answer text, using the target answer text as described second
The input data of decision-tree model;
The appraisal result information of the target answer text is exported by second decision-tree model.
Optionally, the importance value of the word feature according to each answer text, from each answer text
The keyword feature for meeting preset condition is filtered out in word feature, comprising:
According to the importance value of the word feature of each answer text, screened from the word feature of each answer text
Importance value is greater than or equal to the first word feature of preset value out;
It receives and deletes instruction, delete the second word feature from the first word feature according to instruction is deleted;
The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.
Optionally, the First Eigenvalue according to each answer text and the answer for the setting of each answer text
Score label, establishes the first decision-tree model, comprising:
It by the First Eigenvalue of each answer text and is the answer scoring label input of each answer text setting
First initial decision tree-model, to be trained to the first initial decision tree-model;
Using the first initial decision tree-model after training as the first decision-tree model.
Optionally, the Second Eigenvalue according to each answer text and the answer for the setting of each answer text
Score label, establishes the second decision-tree model, comprising:
It by the Second Eigenvalue of each answer text and is the answer scoring label input of each answer text setting
Second initial decision tree-model, to be trained to the second initial decision tree-model;
Using the second initial decision tree-model after training as the second decision-tree model.
Optionally, described to be commented according to the Second Eigenvalue of each answer text and for the answer of each big text setting
Minute mark label establish the second decision-tree model, comprising:
Determine the length of each answer text;
According to the length of each answer text, each answer text Second Eigenvalue and be each answer text
The answer scoring label of setting, establishes the second decision-tree model.
It is optionally, described to construct bag of words using training text, comprising:
Dictionary is constructed using training text;The dictionary includes the word feature of each answer text in the training text;
Whether each word feature counted in the dictionary occurs in each answer text;
Determine the First Eigenvalue of each answer text according to statistical result, generating includes the of each answer text
The bag of words of one characteristic value.
Second aspect, the embodiment of the present application provide a kind of decision-tree model construction device, comprising:
Construction unit, for constructing bag of words using training text;The bag of words include respectively answering in training text
Inscribe the First Eigenvalue of text;
The construction unit is also used to set according to the First Eigenvalue of each answer text and for each answer text
The answer scoring label set, establishes the first decision-tree model, and obtain by described respectively the answering of first decision-tree model output
Inscribe the importance value of the word feature of text;
Processing unit, for the importance value according to the word feature of each answer text, from each answer text
Word feature in filter out the keyword feature for meeting preset condition, and each answer text is obtained according to the keyword feature
This Second Eigenvalue;
The construction unit is also used to according to the Second Eigenvalue of each answer text and described for each answer text
The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, including processor, input equipment, output equipment
And memory, the processor, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing
Computer program, the computer program include program instruction, and the processor is configured for calling described program instruction, are held
The method of row as described in relation to the first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, and the computer program includes program instruction, and described program instructs when being executed by a processor
Make the method for the processor execution as described in relation to the first aspect.
It in conclusion electronic equipment can use training text building bag of words, and according to bag of words and is each
The answer scoring label of answer text setting establishes the first decision-tree model, to obtain being exported by the first decision-tree model each
The importance value of the word feature of answer text, for filtering out the keyword feature for meeting preset condition;Electronic equipment can
To be commented according to the Second Eigenvalue of each answer text obtained by keyword feature, and for the answer of each answer text setting
Minute mark label establish the second decision-tree model, to be used for answer score in predicting, thus while improving score in predicting precision,
It ensure that the interpretation of model.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of decision-tree model construction method provided by the embodiments of the present application;
Fig. 2 is the flow diagram of another decision-tree model construction method provided by the embodiments of the present application;
Fig. 3 is a kind of structural schematic diagram of decision-tree model construction device provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application is described.
Referring to Fig. 1, being a kind of flow diagram of decision-tree model construction method provided by the embodiments of the present application.The party
Method can be applied in electronic equipment, which can be terminal or server.Specifically, this method may include:
S101, bag of words are constructed using training text.
Wherein, which includes the First Eigenvalue of each answer text in training text.The First Eigenvalue can be with
For feature vector.The First Eigenvalue of each answer text is determined according to the numerical value of the word feature of each answer text.The number
Value is that determination whether occur in corresponding answer text according to word feature, or can also be according to word feature in corresponding answer text
What the number occurred in this determined, it is not limited in the embodiment of the present invention.
In one embodiment, electronic equipment constructs bag of words using training text, may include: that electronic equipment utilizes
Training text constructs dictionary;The dictionary includes the word feature of each answer text in the training text;Electronic equipment counts institute
Whether each word feature in predicate allusion quotation occurs in each answer text;Electronic equipment is determined according to statistical result and described is respectively answered
The First Eigenvalue of text is inscribed, the bag of words of the First Eigenvalue including each answer text are generated.
For example, the training text includes answer text 1 and answer text 2, wherein answer text 1: Chinese capital is north
Capital, answer text 2: the capital of Britain is London.Using the training text construct dictionary include: China, Britain, capital,
It is, Beijing, London.Whether occur using 0 and 1 expression this 7 words in answer text 1 and answer text 2 (appearance is then expressed as 1,
Do not occur being expressed as 0), and determines that the First Eigenvalue of answer text 1 is (1,0,1,1,1,1,0), answer according to statistical result
The First Eigenvalue of text 2 is (0,1,1,1,1,0,1), and generation includes the First Eigenvalue and answer text 2 of answer text 1
The bag of words of the First Eigenvalue.
In one embodiment, electronic equipment constructs bag of words using training text, may include: that electronic equipment utilizes
Training text constructs dictionary;The dictionary includes the word feature of each answer text in the training text;Electronic equipment counts institute
Whether each word feature in predicate allusion quotation occurs in each answer text;Electronic equipment is determined according to statistical result and described is respectively answered
The First Eigenvalue of text is inscribed, the bag of words of the First Eigenvalue including each answer text are generated.
Other than aforesaid way can be used to generate bag of words, it can also be existed by each word feature in statistics dictionary
The number occurred in each answer text, to generate bag of words.
In one embodiment, electronic equipment constructs bag of words using training text, can also include: electronic equipment benefit
Dictionary is constructed with training text, which includes word feature (such as word 1, word 2, word of each answer text in training text
N);The number that each word feature in statistics dictionary occurs in each answer text determines each according to the statistical result for number
The First Eigenvalue of a answer text, to generate the bag of words of the First Eigenvalue including each answer text.
The difference of two kinds of statisticals is, such as some word occurs twice in answer text 3, then passing through first
Kind statistical, the statistical result obtained for the word is 1 (indicating that the word occurs in text 3), passes through second and counts
Mode, the statistical result for the word are 2 (indicating that the word occurs twice in text 3).Certainly, in addition to using number, also
The statistical of frequency can be used, this will not be repeated here for this programme.
In one embodiment, electronic equipment constructs dictionary using training text, may include: electronic equipment to the training
Text is pre-processed, and dictionary is obtained.Wherein, which includes but is not limited to segment, remove stop words etc. to process
Journey, this will not be repeated here for this programme.
S102, the First Eigenvalue according to each answer text and the answer for the setting of each answer text, which are scored, marks
Label, establish the first decision-tree model, and obtain the word feature of each answer text exported by first decision-tree model
Importance value.
In one embodiment, electronic equipment according to the First Eigenvalue of each answer text and is each answer text
The answer scoring label of this setting, establishes the first decision-tree model, comprising: electronic equipment is special by the first of each answer text
Value indicative and the answer scoring label being arranged for each answer text input the first initial decision tree-model, at the beginning of described first
Beginning decision-tree model is trained;Electronic equipment is using the first initial decision tree-model after training as the first decision-tree model.
For example, it is 10 that first decision-tree model, which can be depth capacity, the decision-tree model that leaf node smallest sample number is 100.It should
Answer scoring label can be score value, and such as 90 points, or can be grade, such as excellent middle difference.
The importance value of the word feature of each answer text can be calculated in first decision-tree model, and exports and respectively answer
Inscribe the importance value of the word feature of text, wherein importance value is higher to be shown to be affected to scoring.The significance level
Value includes but is not limited to be embodied in the form of number, letter etc..
Wherein, which can also be special to the word of each answer text according to the height of importance value
Each word feature after sign is according to sequence sequence from front to back, after output sequence.
Wherein, which can also export the classification results for each word feature, for example, answer good and answering
It is bad.
S103, the importance value according to the word feature of each answer text, from the word feature of each answer text
In filter out the keyword feature for meeting preset condition, and the second of each answer text is obtained according to the keyword feature
Characteristic value.
In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described
The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, comprising: electronic equipment is respectively answered according to described
The importance value for inscribing the word feature of text, importance value is filtered out from the word feature of each answer text and is greater than or waits
In the first word feature of preset value;Electronic equipment is determined as the first word feature to meet the keyword feature of preset condition.
For example, electronic equipment exports the importance value of 1000 word features, electronic equipment can be from this 1000 word spies
500 word features that importance value is greater than or equal to preset value are filtered out in sign, and this 500 word features are determined as meeting
The keyword feature of preset condition.
In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described
The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, comprising: electronic equipment is respectively answered according to described
The importance value for inscribing the word feature of text, importance value is filtered out from the word feature of each answer text and is greater than or waits
In the first word feature of preset value;Electronic equipment, which receives, deletes instruction, deletes from the first word feature according to instruction is deleted
Second word feature;Electronic equipment will perform the first word feature of delete operation, and the keyword for being determined as meeting preset condition is special
Sign.Wherein, which can the lower word feature of or percentage contribution lower with interpretation.
For example, electronic equipment exports the importance value of 1000 word features, electronic equipment can be from this 1000 word spies
500 word features that importance value is greater than or equal to preset value are filtered out in sign, and are being received for this 500 word features
After the deletion instruction of the middle lower 50 word features of interpretation, this 50 word features are deleted, and by remaining 450 word features
It is determined as meeting the keyword feature of preset condition.
In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described
The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, may include: electronic equipment according to
The importance value of the word feature of each answer text filters out sequence from the word feature of each answer text and is located at preceding preset
Quantity word feature;The sequence is located at preceding preset quantity word feature and is determined as meeting the key of preset condition by electronic equipment
Word feature.
Electronic equipment exports the importance value of 1000 word features, this 1000 word features are according to importance value
Just, the word feature after sequence sequence from front to back, electronic equipment can filter out sequence position from this 1000 word features
In first 500 word features, and the word feature by the sequence positioned at first 500 is determined as meeting the keyword spy of preset condition
Sign.
In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described
The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, may include: electronic equipment according to
The importance value of the word feature of each answer text filters out sequence from the word feature of each answer text and is located at preceding preset
Quantity word feature;Electronic equipment, which receives, deletes instruction, is located at preceding preset quantity word to the sequence according to instruction is deleted
The preceding preset quantity word feature for performing delete operation is determined as meeting by the middle deletion third word feature of feature, electronic equipment
The keyword feature of preset condition.The third word feature according to the actual situation can be identical or different with the second word feature.This
Three word features are that the lower word feature of lower or percentage contribution can be explained.
In one embodiment, electronic equipment obtains the second feature of each answer text according to the keyword feature
Value may include: that electronic equipment deletes word feature in the First Eigenvalue of each answer text in addition to keyword word feature
Numerical value, to obtain the Second Eigenvalue of each answer text.By the way of directly deleting, modeling speed will be improved, and mitigate electricity
The workload of sub- equipment.
In addition to by the way of above-mentioned deletion, electronic equipment can also re-start statistics.In one embodiment, electronics
Equipment obtains the Second Eigenvalue of each answer text according to the keyword feature, may include: electronic equipment keyword
Whether feature occurs in each answer text, and the Second Eigenvalue of each answer text is determined according to statistical result.
Or, the number that electronic equipment keyword feature occurs in each answer text, and determined according to the statistical result for number
The Second Eigenvalue of each answer text.
S104, it is commented according to the Second Eigenvalue and the answer for the setting of each answer text of each answer text
Minute mark label establish the second decision-tree model, to be used for answer score in predicting.
Specifically, electronic equipment described is set according to the Second Eigenvalue of each answer text and for each answer text
The answer scoring label set, establishes the second decision-tree model, may include: that electronic equipment is special by the second of each answer text
Value indicative and the answer scoring label being arranged for each answer text input the second initial decision tree-model, at the beginning of described second
Beginning decision-tree model is trained;Electronic equipment is using the second initial decision tree-model after training as the second decision-tree model.
Wherein, which can be different from the first initial decision tree-model.For example, second decision-tree model can
Be depth capacity be 5, leaf node smallest sample number be 100 decision-tree model.
In one embodiment, when needing to carry out answer score in predicting to target answer text, by the target answer
Input data of the text as second decision-tree model;The target answer text is exported by second decision-tree model
This appraisal result information.Wherein, which can be answer text to be predicted, such as can be new answer
Text.The appraisal result information may include the information such as score value.
As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in FIG. 1
Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision
The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition
Sign;Electronic equipment can be according to the Second Eigenvalue of each answer text obtained by keyword feature, and is each answer text
The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting, thus pre- improving scoring
While surveying precision, the interpretation of model ensure that.
Referring to Fig. 2, for the flow diagram of another decision-tree model construction method provided by the embodiments of the present application.Tool
Body, this method may include:
S201, bag of words are constructed using training text;
S202, the First Eigenvalue according to each answer text and the answer for the setting of each answer text, which are scored, marks
Label, establish the first decision-tree model, and obtain the word feature of each answer text exported by first decision-tree model
Importance value;
S203, the importance value according to the word feature of each answer text, from the word feature of each answer text
In filter out the keyword feature for meeting preset condition, and the second of each answer text is obtained according to the keyword feature
Characteristic value.
Wherein, step S201-S203 may refer to the step S101-S103 in Fig. 1 embodiment, and the embodiment of the present application is herein
It does not repeat them here.
S204, the length for determining each answer text;
S205, according to the length of each answer text, the Second Eigenvalue of each answer text and each to answer
The answer scoring label for inscribing text setting, establishes the second decision-tree model, to be used for answer score in predicting.
In the embodiment of the present application, electronic equipment is in addition to can according to the Second Eigenvalue of each answer text and be directly every
The answer scoring label of a answer text setting, establishes except the second decision-tree model, may be incorporated into the length of each answer text
Degree, the second decision-tree model of Lai Jianli.The embodiment of the present application can be effectively improved and be commented by the length of each answer text of introducing
Divide precision of prediction.
Specifically, electronic equipment according to the Second Eigenvalue of the length of each answer text, each answer text with
And the answer scoring label for the setting of each answer text, the second decision-tree model is established, including, electronic equipment is respectively answered described
Inscribe the length of text, the Second Eigenvalue of each answer text and defeated for the answer label that scores of each answer text setting
Enter the second initial decision tree-model, to be trained to the second initial decision tree-model;Electronic equipment is by after training
Two initial decision tree-models are as the second decision-tree model.
As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in Fig. 2
Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision
The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition
Sign;Electronic equipment can according to the length of each answer text, the Second Eigenvalue of each answer text obtained by keyword feature,
And the answer scoring label for the setting of each answer text, the second decision-tree model is established, to be used for answer score in predicting, from
And while improving score in predicting precision, it ensure that the interpretation of model.
Referring to Fig. 3, being a kind of structural schematic diagram of decision-tree model construction device provided by the embodiments of the present application.Its
In, which can be applied in electronic equipment.Specifically, the apparatus may include:
Construction unit 31, for constructing bag of words using training text;The bag of words include each in training text
The First Eigenvalue of answer text;
Construction unit 31 is also used to be arranged according to the First Eigenvalue of each answer text and for each answer text
Answer score label, establish the first decision-tree model, and obtain by first decision-tree model export each answer
The importance value of the word feature of text;
Processing unit 32, for the importance value according to the word feature of each answer text, from each answer text
The keyword feature for meeting preset condition is filtered out in this word feature, and each answer is obtained according to the keyword feature
The Second Eigenvalue of text;
Construction unit 31 is also used to according to the Second Eigenvalue of each answer text and described for each answer text
The answer scoring label of setting, establishes the second decision-tree model, to be used for answer score in predicting.
In a kind of optional embodiment, processing unit 32 is also used to after establishing the second decision-tree model, when need
When carrying out answer score in predicting to target answer text, using the target answer text as second decision-tree model
Input data;The appraisal result information of the target answer text is exported by second decision-tree model.
In a kind of optional embodiment, processing unit 32 is according to the significance level of the word feature of each answer text
Value, filters out the keyword feature for meeting preset condition from the word feature of each answer text, specially according to described each
The importance value of the word feature of answer text, filtered out from the word feature of each answer text importance value be greater than or
Equal to the first word feature of preset value;It receives and deletes instruction, delete the second word from the first word feature according to instruction is deleted
Feature;The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.
In a kind of optional embodiment, construction unit 31 is according to the First Eigenvalue of each answer text and is
The answer of each answer text setting is scored label, establishes the first decision-tree model, specially by the of each answer text
One characteristic value and the first initial decision tree-model is inputted for the answer label that scores of each answer text setting, to described the
One initial decision tree-model is trained;Using the first initial decision tree-model after training as the first decision-tree model.
In a kind of optional embodiment, construction unit 31 is according to the Second Eigenvalue of each answer text and is
The answer of each answer text setting is scored label, establishes the second decision-tree model, specially by the of each answer text
Two characteristic values and the second initial decision tree-model is inputted for the answer label that scores of each answer text setting, to described the
Two initial decision tree-models are trained;Using the second initial decision tree-model after training as the second decision-tree model.
In a kind of optional embodiment, construction unit 31 is according to the Second Eigenvalue of each answer text and is
The answer scoring label of each big text setting, establishes the second decision-tree model, specially determines the length of each answer text
Degree;It is arranged according to the length of each answer text, the Second Eigenvalue of each answer text and for each answer text
Answer score label, establish the second decision-tree model.
In a kind of optional embodiment, construction unit 31 constructs bag of words using training text, specially utilizes
Training text constructs dictionary;The dictionary includes the word feature of each answer text in the training text;It counts in the dictionary
Each word feature whether occur in each answer text;The fisrt feature of each answer text is determined according to statistical result
Value generates the bag of words of the First Eigenvalue including each answer text.
As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in Fig. 3
Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision
The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition
Sign;Electronic equipment can be according to the Second Eigenvalue of each answer text obtained by keyword feature, and is each answer text
The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting, thus pre- improving scoring
While surveying precision, the interpretation of model ensure that.
Referring to Fig. 4, being the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.Wherein, the present embodiment
Described in electronic equipment may include: one or more processors 1000, one or more input equipments 2000, one or
Multiple output equipments 3000 and memory 4000.Processor 1000, input equipment 2000, output equipment 3000 and memory 4000
It can be connected by bus or other means.
Input equipment 2000, output equipment 3000 can be the wired or wireless communication interface of standard.
Processor 1000 can be central processing module (Central Processing Unit, CPU), and the processor is also
It can be other general processors, digital signal processor (Digital Signal Processor, DSP), dedicated integrated electricity
Road (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
Memory 4000 can be high speed RAM memory, can also be non-labile memory (non-volatile
), such as magnetic disk storage memory.Memory 4000 is used to store a set of program code, input equipment 2000, output equipment
3000 and processor 1000 can call the program code stored in memory 4000.Specifically:
Processor 1000, for constructing bag of words using training text;The bag of words include each in training text
The First Eigenvalue of answer text;According to the First Eigenvalue of each answer text and answering for the setting of each answer text
Topic scoring label, establishes the first decision-tree model, and obtain each answer text exported by first decision-tree model
Word feature importance value;According to the importance value of the word feature of each answer text, from each answer text
Word feature in filter out the keyword feature for meeting preset condition, and each answer text is obtained according to the keyword feature
This Second Eigenvalue;According to the Second Eigenvalue of each answer text and the answer for the setting of each answer text
Score label, establishes the second decision-tree model, to be used for answer score in predicting.
Optionally, the processor 1000, is also used to after establishing the second decision-tree model, when needing to target answer
When text carries out answer score in predicting, using the target answer text as the input data of second decision-tree model;It is logical
Cross the appraisal result information that second decision-tree model exports the target answer text.
Optionally, processor 1000 is according to the importance value of the word feature of each answer text, from each answer
The keyword feature for meeting preset condition is filtered out in the word feature of text, specially according to the word feature of each answer text
Importance value, filtered out from the word feature of each answer text importance value be greater than or equal to preset value first
Word feature;It is received by input equipment 2000 and deletes instruction, delete the second word from the first word feature according to instruction is deleted
Feature;The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.
Optionally, processor 1000 is arranged according to the First Eigenvalue of each answer text and for each answer text
Answer score label, the first decision-tree model is established, specially by the First Eigenvalue of each answer text and be every
The answer scoring label of a answer text setting inputs the first initial decision tree-model, to the first initial decision tree-model
It is trained;Using the first initial decision tree-model after training as the first decision-tree model.
Optionally, processor 1000 is arranged according to the Second Eigenvalue of each answer text and for each answer text
Answer score label, the second decision-tree model is established, specially by the Second Eigenvalue of each answer text and be every
The answer scoring label of a answer text setting inputs the second initial decision tree-model, to the second initial decision tree-model
It is trained;Using the second initial decision tree-model after training as the second decision-tree model.
Optionally, processor 1000 according to the Second Eigenvalue of each answer text and is each big text setting
Answer scoring label, establishes the second decision-tree model, specially determines the length of each answer text;According to each answer
The length of text, the Second Eigenvalue of each answer text and the answer scoring label for the setting of each answer text, build
Vertical second decision-tree model.
Optionally, processor 1000 constructs bag of words using training text, specially constructs dictionary using training text;
The dictionary includes the word feature of each answer text in the training text;Each word feature in the dictionary is counted described each
Whether occur in answer text;The First Eigenvalue of each answer text is determined according to statistical result, it includes described each for generating
The bag of words of the First Eigenvalue of answer text.
In the specific implementation, processor 1000, input equipment 2000 described in the embodiment of the present application, output equipment 3000
Implementation described in executable Fig. 1-Fig. 2 embodiment, also can be performed implementation described in the embodiment of the present application, herein
It repeats no more.
It can integrate in a processing module, be also possible to each in each functional module in each embodiment of the application
Module physically exists alone, and is also possible to two or more modules and is integrated in a module.Above-mentioned integrated module was both
It can be realized, can also be realized in the form of sampling software functional module in the form of sampling hardware.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
Above disclosed is only a kind of preferred embodiment of the application, cannot limit the power of the application with this certainly
Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and weighs according to the application
Benefit requires made equivalent variations, still belongs to the scope covered by the invention.
Claims (10)
1. a kind of decision-tree model construction method characterized by comprising
Bag of words are constructed using training text;The bag of words include the fisrt feature of each answer text in training text
Value;
According to the First Eigenvalue of each answer text and it is that the answer of each answer text setting is scored label, establishes the
One decision-tree model, and obtain the significance level of the word feature of each answer text exported by first decision-tree model
Value;
According to the importance value of the word feature of each answer text, filtered out from the word feature of each answer text full
The keyword feature of sufficient preset condition, and the Second Eigenvalue of each answer text is obtained according to the keyword feature;
According to the Second Eigenvalue of each answer text and the answer scoring label for the setting of each answer text, build
Vertical second decision-tree model, to be used for answer score in predicting.
2. the method according to claim 1, wherein described establish after the second decision-tree model, the method
Further include:
When needing to carry out answer score in predicting to target answer text, using the target answer text as second decision
The input data of tree-model;
The appraisal result information of the target answer text is exported by second decision-tree model.
3. the method according to claim 1, wherein the word feature according to each answer text is important
Degree value filters out the keyword feature for meeting preset condition from the word feature of each answer text, comprising:
According to the importance value of the word feature of each answer text, weight is filtered out from the word feature of each answer text
Degree value is wanted to be greater than or equal to the first word feature of preset value;
It receives and deletes instruction, delete the second word feature from the first word feature according to instruction is deleted;
The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.
4. method according to claim 1 to 3, which is characterized in that described according to the of each answer text
One characteristic value and be each answer text setting answer score label, establish the first decision-tree model, comprising:
It by the First Eigenvalue of each answer text and is the answer scoring label input first of each answer text setting
Initial decision tree-model, to be trained to the first initial decision tree-model;
Using the first initial decision tree-model after training as the first decision-tree model.
5. according to the method described in claim 4, it is characterized in that, the Second Eigenvalue according to each answer text with
And the answer scoring label for the setting of each answer text, establish the second decision-tree model, comprising:
It by the Second Eigenvalue of each answer text and is the answer scoring label input second of each answer text setting
Initial decision tree-model, to be trained to the second initial decision tree-model;
Using the second initial decision tree-model after training as the second decision-tree model.
6. method according to claim 1 to 3, which is characterized in that described according to the of each answer text
Two characteristic values and be that the answer of each big text setting is scored label, establish the second decision-tree model, comprising:
Determine the length of each answer text;
It is arranged according to the length of each answer text, the Second Eigenvalue of each answer text and for each answer text
Answer score label, establish the second decision-tree model.
7. the method according to claim 1, wherein described construct bag of words using training text, comprising:
Dictionary is constructed using training text;The dictionary includes the word feature of each answer text in the training text;
Whether each word feature counted in the dictionary occurs in each answer text;
The First Eigenvalue of each answer text is determined according to statistical result, and it is special to generate first including each answer text
The bag of words of value indicative.
8. a kind of decision-tree model construction device characterized by comprising
Construction unit, for constructing bag of words using training text;The bag of words include each answer text in training text
This First Eigenvalue;
The construction unit is also used to be arranged according to the First Eigenvalue of each answer text and for each answer text
Answer scoring label, establishes the first decision-tree model, and obtains each answer text exported by first decision-tree model
The importance value of this word feature;
Processing unit, for the importance value according to the word feature of each answer text, from the word of each answer text
The keyword feature for meeting preset condition is filtered out in feature, and each answer text is obtained according to the keyword feature
Second Eigenvalue;
The construction unit is also used to according to the Second Eigenvalue of each answer text and described sets for each answer text
The answer scoring label set, establishes the second decision-tree model, to be used for answer score in predicting.
9. a kind of electronic equipment, which is characterized in that including processor, input equipment, output equipment and memory, the processing
Device, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing computer program, the meter
Calculation machine program includes program instruction, and the processor is configured for calling described program instruction, executes claim 1-7 such as and appoints
Method described in one.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program include program instruction, and described program instruction executes the processor such as
The described in any item methods of claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910349851.XA CN110119770B (en) | 2019-04-28 | 2019-04-28 | Decision tree model construction method, device, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910349851.XA CN110119770B (en) | 2019-04-28 | 2019-04-28 | Decision tree model construction method, device, electronic equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110119770A true CN110119770A (en) | 2019-08-13 |
CN110119770B CN110119770B (en) | 2024-05-14 |
Family
ID=67521599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910349851.XA Active CN110119770B (en) | 2019-04-28 | 2019-04-28 | Decision tree model construction method, device, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110119770B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395855A (en) * | 2020-12-03 | 2021-02-23 | 中国联合网络通信集团有限公司 | Comment-based evaluation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150199913A1 (en) * | 2014-01-10 | 2015-07-16 | LightSide Labs, LLC | Method and system for automated essay scoring using nominal classification |
CN108073568A (en) * | 2016-11-10 | 2018-05-25 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
-
2019
- 2019-04-28 CN CN201910349851.XA patent/CN110119770B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150199913A1 (en) * | 2014-01-10 | 2015-07-16 | LightSide Labs, LLC | Method and system for automated essay scoring using nominal classification |
CN108073568A (en) * | 2016-11-10 | 2018-05-25 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395855A (en) * | 2020-12-03 | 2021-02-23 | 中国联合网络通信集团有限公司 | Comment-based evaluation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110119770B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mena‐Chalco et al. | Brazilian bibliometric coauthorship networks | |
CN106599317B (en) | Test data processing method, device and the terminal of question answering system | |
CN111090736B (en) | Question-answering model training method, question-answering method, device and computer storage medium | |
CN105389389B (en) | A kind of network public-opinion propagation situation medium control analysis method | |
CN105912716A (en) | Short text classification method and apparatus | |
CN110134845A (en) | Project public sentiment monitoring method, device, computer equipment and storage medium | |
CN105843796A (en) | Microblog emotional tendency analysis method and device | |
CN103577556A (en) | Device and method for obtaining association degree of question and answer pair | |
CN110110156A (en) | Industry public sentiment monitoring method, device, computer equipment and storage medium | |
CN112784591B (en) | Data processing method and device, electronic equipment and storage medium | |
CN111191099B (en) | User activity type identification method based on social media | |
CN109739995A (en) | A kind of information processing method and device | |
CN113590764A (en) | Training sample construction method and device, electronic equipment and storage medium | |
KR102560521B1 (en) | Method and apparatus for generating knowledge graph | |
CN110134844A (en) | Subdivision field public sentiment monitoring method, device, computer equipment and storage medium | |
US12086171B2 (en) | Word mining method and apparatus, electronic device and readable storage medium | |
CN112084342A (en) | Test question generation method and device, computer equipment and storage medium | |
CN110347934B (en) | Text data filtering method, device and medium | |
CN107256226B (en) | A kind of construction method and device of knowledge base | |
Hasanati et al. | Implementation of support vector machine with lexicon based for sentimenT ANALYSIS ON TWITter | |
CN115168537A (en) | Training method and device of semantic retrieval model, electronic equipment and storage medium | |
CN110119770A (en) | Decision-tree model construction method, device, electronic equipment and medium | |
CN109359233A (en) | Public network massive information monitoring method and system based on natural language processing technique | |
CN112784050A (en) | Method, device, equipment and medium for generating theme classification data set | |
CN107071553A (en) | Method, device and computer readable storage medium for modifying video and voice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |