CN110119770A

CN110119770A - Decision-tree model construction method, device, electronic equipment and medium

Info

Publication number: CN110119770A
Application number: CN201910349851.XA
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-08-13
Anticipated expiration: 2039-04-28
Also published as: CN110119770B

Abstract

The embodiment of the present application provides a kind of decision-tree model construction method, device, electronic equipment and medium, wherein this method comprises: training text is utilized to construct bag of words；The First Eigenvalue for each answer text for including according to the bag of words and the answer scoring label being arranged for each answer text, the first decision-tree model is established, and obtains the importance value of the word feature of each answer text exported by first decision-tree model；According to the importance value of the word feature of each answer text, the keyword feature for meeting preset condition is filtered out from the word feature of each answer text, and according to the Second Eigenvalue of each answer text obtained by the keyword feature and the answer scoring label for the setting of each answer text, the second decision-tree model is established, to be used for answer score in predicting.Using the application, it can guarantee the interpretation of model while improving score in predicting precision.

Description

Decision-tree model construction method, device, electronic equipment and medium

Technical field

This application involves deep learning field more particularly to a kind of decision-tree model construction method, device, electronic equipment and Medium.

Background technique

With the development of science and technology, in order to save the trouble manually to score, intelligent scoring system is come into being, and It is more and more widely used in the mechanisms such as school, enterprise.Related personnel can manually formulate in intelligent scoring system Respective rule, the rule that intelligent scoring system can manually be formulated using this carries out answer scoring, however which is used to realize Score in predicting precision it is limited.In order to improve score in predicting precision, some of the staff use the machine learning method of logistic regression Carry out answer scoring.Although can be realized higher score in predicting precision using the machine learning method of logistic regression, adopt The interpretation of the model obtained with such mode is lower.

Summary of the invention

The embodiment of the present application provides a kind of decision-tree model construction method, device, electronic equipment and medium, can mention While height scoring precision of prediction, guarantee the interpretation of model.

In a first aspect, the embodiment of the present application provides a kind of decision-tree model construction method, comprising:

Bag of words are constructed using training text；The bag of words include the first spy of each answer text in training text Value indicative；

According to the First Eigenvalue of each answer text and it is the answer scoring label of each answer text setting, builds Vertical first decision-tree model, and obtain each answer text exported by first decision-tree model word feature it is important Degree value；

According to the importance value of the word feature of each answer text, screened from the word feature of each answer text Meet the keyword feature of preset condition out, and the second feature of each answer text is obtained according to the keyword feature Value；

It scores and marks according to the Second Eigenvalue of each answer text and the answer for the setting of each answer text Label, establish the second decision-tree model, to be used for answer score in predicting.

Optionally, described to establish after the second decision-tree model, the method also includes:

When needing to carry out answer score in predicting to target answer text, using the target answer text as described second The input data of decision-tree model；

The appraisal result information of the target answer text is exported by second decision-tree model.

Optionally, the importance value of the word feature according to each answer text, from each answer text The keyword feature for meeting preset condition is filtered out in word feature, comprising:

According to the importance value of the word feature of each answer text, screened from the word feature of each answer text Importance value is greater than or equal to the first word feature of preset value out；

It receives and deletes instruction, delete the second word feature from the first word feature according to instruction is deleted；

The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.

Optionally, the First Eigenvalue according to each answer text and the answer for the setting of each answer text Score label, establishes the first decision-tree model, comprising:

It by the First Eigenvalue of each answer text and is the answer scoring label input of each answer text setting First initial decision tree-model, to be trained to the first initial decision tree-model；

Using the first initial decision tree-model after training as the first decision-tree model.

Optionally, the Second Eigenvalue according to each answer text and the answer for the setting of each answer text Score label, establishes the second decision-tree model, comprising:

It by the Second Eigenvalue of each answer text and is the answer scoring label input of each answer text setting Second initial decision tree-model, to be trained to the second initial decision tree-model；

Using the second initial decision tree-model after training as the second decision-tree model.

Optionally, described to be commented according to the Second Eigenvalue of each answer text and for the answer of each big text setting Minute mark label establish the second decision-tree model, comprising:

Determine the length of each answer text；

According to the length of each answer text, each answer text Second Eigenvalue and be each answer text The answer scoring label of setting, establishes the second decision-tree model.

It is optionally, described to construct bag of words using training text, comprising:

Dictionary is constructed using training text；The dictionary includes the word feature of each answer text in the training text；

Whether each word feature counted in the dictionary occurs in each answer text；

Determine the First Eigenvalue of each answer text according to statistical result, generating includes the of each answer text The bag of words of one characteristic value.

Second aspect, the embodiment of the present application provide a kind of decision-tree model construction device, comprising:

Construction unit, for constructing bag of words using training text；The bag of words include respectively answering in training text Inscribe the First Eigenvalue of text；

The construction unit is also used to set according to the First Eigenvalue of each answer text and for each answer text The answer scoring label set, establishes the first decision-tree model, and obtain by described respectively the answering of first decision-tree model output Inscribe the importance value of the word feature of text；

Processing unit, for the importance value according to the word feature of each answer text, from each answer text Word feature in filter out the keyword feature for meeting preset condition, and each answer text is obtained according to the keyword feature This Second Eigenvalue；

The construction unit is also used to according to the Second Eigenvalue of each answer text and described for each answer text The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, including processor, input equipment, output equipment And memory, the processor, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing Computer program, the computer program include program instruction, and the processor is configured for calling described program instruction, are held The method of row as described in relation to the first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, and the computer program includes program instruction, and described program instructs when being executed by a processor Make the method for the processor execution as described in relation to the first aspect.

It in conclusion electronic equipment can use training text building bag of words, and according to bag of words and is each The answer scoring label of answer text setting establishes the first decision-tree model, to obtain being exported by the first decision-tree model each The importance value of the word feature of answer text, for filtering out the keyword feature for meeting preset condition；Electronic equipment can To be commented according to the Second Eigenvalue of each answer text obtained by keyword feature, and for the answer of each answer text setting Minute mark label establish the second decision-tree model, to be used for answer score in predicting, thus while improving score in predicting precision, It ensure that the interpretation of model.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of decision-tree model construction method provided by the embodiments of the present application；

Fig. 2 is the flow diagram of another decision-tree model construction method provided by the embodiments of the present application；

Fig. 3 is a kind of structural schematic diagram of decision-tree model construction device provided by the embodiments of the present application；

Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application is described.

Referring to Fig. 1, being a kind of flow diagram of decision-tree model construction method provided by the embodiments of the present application.The party Method can be applied in electronic equipment, which can be terminal or server.Specifically, this method may include:

S101, bag of words are constructed using training text.

Wherein, which includes the First Eigenvalue of each answer text in training text.The First Eigenvalue can be with For feature vector.The First Eigenvalue of each answer text is determined according to the numerical value of the word feature of each answer text.The number Value is that determination whether occur in corresponding answer text according to word feature, or can also be according to word feature in corresponding answer text What the number occurred in this determined, it is not limited in the embodiment of the present invention.

In one embodiment, electronic equipment constructs bag of words using training text, may include: that electronic equipment utilizes Training text constructs dictionary；The dictionary includes the word feature of each answer text in the training text；Electronic equipment counts institute Whether each word feature in predicate allusion quotation occurs in each answer text；Electronic equipment is determined according to statistical result and described is respectively answered The First Eigenvalue of text is inscribed, the bag of words of the First Eigenvalue including each answer text are generated.

For example, the training text includes answer text 1 and answer text 2, wherein answer text 1: Chinese capital is north Capital, answer text 2: the capital of Britain is London.Using the training text construct dictionary include: China, Britain, capital, It is, Beijing, London.Whether occur using 0 and 1 expression this 7 words in answer text 1 and answer text 2 (appearance is then expressed as 1, Do not occur being expressed as 0), and determines that the First Eigenvalue of answer text 1 is (1,0,1,1,1,1,0), answer according to statistical result The First Eigenvalue of text 2 is (0,1,1,1,1,0,1), and generation includes the First Eigenvalue and answer text 2 of answer text 1 The bag of words of the First Eigenvalue.

Other than aforesaid way can be used to generate bag of words, it can also be existed by each word feature in statistics dictionary The number occurred in each answer text, to generate bag of words.

In one embodiment, electronic equipment constructs bag of words using training text, can also include: electronic equipment benefit Dictionary is constructed with training text, which includes word feature (such as word 1, word 2, word of each answer text in training text N)；The number that each word feature in statistics dictionary occurs in each answer text determines each according to the statistical result for number The First Eigenvalue of a answer text, to generate the bag of words of the First Eigenvalue including each answer text.

The difference of two kinds of statisticals is, such as some word occurs twice in answer text 3, then passing through first Kind statistical, the statistical result obtained for the word is 1 (indicating that the word occurs in text 3), passes through second and counts Mode, the statistical result for the word are 2 (indicating that the word occurs twice in text 3).Certainly, in addition to using number, also The statistical of frequency can be used, this will not be repeated here for this programme.

In one embodiment, electronic equipment constructs dictionary using training text, may include: electronic equipment to the training Text is pre-processed, and dictionary is obtained.Wherein, which includes but is not limited to segment, remove stop words etc. to process Journey, this will not be repeated here for this programme.

S102, the First Eigenvalue according to each answer text and the answer for the setting of each answer text, which are scored, marks Label, establish the first decision-tree model, and obtain the word feature of each answer text exported by first decision-tree model Importance value.

In one embodiment, electronic equipment according to the First Eigenvalue of each answer text and is each answer text The answer scoring label of this setting, establishes the first decision-tree model, comprising: electronic equipment is special by the first of each answer text Value indicative and the answer scoring label being arranged for each answer text input the first initial decision tree-model, at the beginning of described first Beginning decision-tree model is trained；Electronic equipment is using the first initial decision tree-model after training as the first decision-tree model. For example, it is 10 that first decision-tree model, which can be depth capacity, the decision-tree model that leaf node smallest sample number is 100.It should Answer scoring label can be score value, and such as 90 points, or can be grade, such as excellent middle difference.

The importance value of the word feature of each answer text can be calculated in first decision-tree model, and exports and respectively answer Inscribe the importance value of the word feature of text, wherein importance value is higher to be shown to be affected to scoring.The significance level Value includes but is not limited to be embodied in the form of number, letter etc..

Wherein, which can also be special to the word of each answer text according to the height of importance value Each word feature after sign is according to sequence sequence from front to back, after output sequence.

Wherein, which can also export the classification results for each word feature, for example, answer good and answering It is bad.

S103, the importance value according to the word feature of each answer text, from the word feature of each answer text In filter out the keyword feature for meeting preset condition, and the second of each answer text is obtained according to the keyword feature Characteristic value.

In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, comprising: electronic equipment is respectively answered according to described The importance value for inscribing the word feature of text, importance value is filtered out from the word feature of each answer text and is greater than or waits In the first word feature of preset value；Electronic equipment is determined as the first word feature to meet the keyword feature of preset condition.

For example, electronic equipment exports the importance value of 1000 word features, electronic equipment can be from this 1000 word spies 500 word features that importance value is greater than or equal to preset value are filtered out in sign, and this 500 word features are determined as meeting The keyword feature of preset condition.

In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, comprising: electronic equipment is respectively answered according to described The importance value for inscribing the word feature of text, importance value is filtered out from the word feature of each answer text and is greater than or waits In the first word feature of preset value；Electronic equipment, which receives, deletes instruction, deletes from the first word feature according to instruction is deleted Second word feature；Electronic equipment will perform the first word feature of delete operation, and the keyword for being determined as meeting preset condition is special Sign.Wherein, which can the lower word feature of or percentage contribution lower with interpretation.

For example, electronic equipment exports the importance value of 1000 word features, electronic equipment can be from this 1000 word spies 500 word features that importance value is greater than or equal to preset value are filtered out in sign, and are being received for this 500 word features After the deletion instruction of the middle lower 50 word features of interpretation, this 50 word features are deleted, and by remaining 450 word features It is determined as meeting the keyword feature of preset condition.

In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, may include: electronic equipment according to The importance value of the word feature of each answer text filters out sequence from the word feature of each answer text and is located at preceding preset Quantity word feature；The sequence is located at preceding preset quantity word feature and is determined as meeting the key of preset condition by electronic equipment Word feature.

Electronic equipment exports the importance value of 1000 word features, this 1000 word features are according to importance value Just, the word feature after sequence sequence from front to back, electronic equipment can filter out sequence position from this 1000 word features In first 500 word features, and the word feature by the sequence positioned at first 500 is determined as meeting the keyword spy of preset condition Sign.

In one embodiment, electronic equipment is according to the importance value of the word feature of each answer text, from described The keyword feature for meeting preset condition is filtered out in the word feature of each answer text, may include: electronic equipment according to The importance value of the word feature of each answer text filters out sequence from the word feature of each answer text and is located at preceding preset Quantity word feature；Electronic equipment, which receives, deletes instruction, is located at preceding preset quantity word to the sequence according to instruction is deleted The preceding preset quantity word feature for performing delete operation is determined as meeting by the middle deletion third word feature of feature, electronic equipment The keyword feature of preset condition.The third word feature according to the actual situation can be identical or different with the second word feature.This Three word features are that the lower word feature of lower or percentage contribution can be explained.

In one embodiment, electronic equipment obtains the second feature of each answer text according to the keyword feature Value may include: that electronic equipment deletes word feature in the First Eigenvalue of each answer text in addition to keyword word feature Numerical value, to obtain the Second Eigenvalue of each answer text.By the way of directly deleting, modeling speed will be improved, and mitigate electricity The workload of sub- equipment.

In addition to by the way of above-mentioned deletion, electronic equipment can also re-start statistics.In one embodiment, electronics Equipment obtains the Second Eigenvalue of each answer text according to the keyword feature, may include: electronic equipment keyword Whether feature occurs in each answer text, and the Second Eigenvalue of each answer text is determined according to statistical result. Or, the number that electronic equipment keyword feature occurs in each answer text, and determined according to the statistical result for number The Second Eigenvalue of each answer text.

S104, it is commented according to the Second Eigenvalue and the answer for the setting of each answer text of each answer text Minute mark label establish the second decision-tree model, to be used for answer score in predicting.

Specifically, electronic equipment described is set according to the Second Eigenvalue of each answer text and for each answer text The answer scoring label set, establishes the second decision-tree model, may include: that electronic equipment is special by the second of each answer text Value indicative and the answer scoring label being arranged for each answer text input the second initial decision tree-model, at the beginning of described second Beginning decision-tree model is trained；Electronic equipment is using the second initial decision tree-model after training as the second decision-tree model. Wherein, which can be different from the first initial decision tree-model.For example, second decision-tree model can Be depth capacity be 5, leaf node smallest sample number be 100 decision-tree model.

In one embodiment, when needing to carry out answer score in predicting to target answer text, by the target answer Input data of the text as second decision-tree model；The target answer text is exported by second decision-tree model This appraisal result information.Wherein, which can be answer text to be predicted, such as can be new answer Text.The appraisal result information may include the information such as score value.

As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in FIG. 1 Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition Sign；Electronic equipment can be according to the Second Eigenvalue of each answer text obtained by keyword feature, and is each answer text The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting, thus pre- improving scoring While surveying precision, the interpretation of model ensure that.

Referring to Fig. 2, for the flow diagram of another decision-tree model construction method provided by the embodiments of the present application.Tool Body, this method may include:

S201, bag of words are constructed using training text；

S202, the First Eigenvalue according to each answer text and the answer for the setting of each answer text, which are scored, marks Label, establish the first decision-tree model, and obtain the word feature of each answer text exported by first decision-tree model Importance value；

S203, the importance value according to the word feature of each answer text, from the word feature of each answer text In filter out the keyword feature for meeting preset condition, and the second of each answer text is obtained according to the keyword feature Characteristic value.

Wherein, step S201-S203 may refer to the step S101-S103 in Fig. 1 embodiment, and the embodiment of the present application is herein It does not repeat them here.

S204, the length for determining each answer text；

S205, according to the length of each answer text, the Second Eigenvalue of each answer text and each to answer The answer scoring label for inscribing text setting, establishes the second decision-tree model, to be used for answer score in predicting.

In the embodiment of the present application, electronic equipment is in addition to can according to the Second Eigenvalue of each answer text and be directly every The answer scoring label of a answer text setting, establishes except the second decision-tree model, may be incorporated into the length of each answer text Degree, the second decision-tree model of Lai Jianli.The embodiment of the present application can be effectively improved and be commented by the length of each answer text of introducing Divide precision of prediction.

Specifically, electronic equipment according to the Second Eigenvalue of the length of each answer text, each answer text with And the answer scoring label for the setting of each answer text, the second decision-tree model is established, including, electronic equipment is respectively answered described Inscribe the length of text, the Second Eigenvalue of each answer text and defeated for the answer label that scores of each answer text setting Enter the second initial decision tree-model, to be trained to the second initial decision tree-model；Electronic equipment is by after training Two initial decision tree-models are as the second decision-tree model.

As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in Fig. 2 Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition Sign；Electronic equipment can according to the length of each answer text, the Second Eigenvalue of each answer text obtained by keyword feature, And the answer scoring label for the setting of each answer text, the second decision-tree model is established, to be used for answer score in predicting, from And while improving score in predicting precision, it ensure that the interpretation of model.

Referring to Fig. 3, being a kind of structural schematic diagram of decision-tree model construction device provided by the embodiments of the present application.Its In, which can be applied in electronic equipment.Specifically, the apparatus may include:

Construction unit 31, for constructing bag of words using training text；The bag of words include each in training text The First Eigenvalue of answer text；

Construction unit 31 is also used to be arranged according to the First Eigenvalue of each answer text and for each answer text Answer score label, establish the first decision-tree model, and obtain by first decision-tree model export each answer The importance value of the word feature of text；

Processing unit 32, for the importance value according to the word feature of each answer text, from each answer text The keyword feature for meeting preset condition is filtered out in this word feature, and each answer is obtained according to the keyword feature The Second Eigenvalue of text；

Construction unit 31 is also used to according to the Second Eigenvalue of each answer text and described for each answer text The answer scoring label of setting, establishes the second decision-tree model, to be used for answer score in predicting.

In a kind of optional embodiment, processing unit 32 is also used to after establishing the second decision-tree model, when need When carrying out answer score in predicting to target answer text, using the target answer text as second decision-tree model Input data；The appraisal result information of the target answer text is exported by second decision-tree model.

In a kind of optional embodiment, processing unit 32 is according to the significance level of the word feature of each answer text Value, filters out the keyword feature for meeting preset condition from the word feature of each answer text, specially according to described each The importance value of the word feature of answer text, filtered out from the word feature of each answer text importance value be greater than or Equal to the first word feature of preset value；It receives and deletes instruction, delete the second word from the first word feature according to instruction is deleted Feature；The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.

In a kind of optional embodiment, construction unit 31 is according to the First Eigenvalue of each answer text and is The answer of each answer text setting is scored label, establishes the first decision-tree model, specially by the of each answer text One characteristic value and the first initial decision tree-model is inputted for the answer label that scores of each answer text setting, to described the One initial decision tree-model is trained；Using the first initial decision tree-model after training as the first decision-tree model.

In a kind of optional embodiment, construction unit 31 is according to the Second Eigenvalue of each answer text and is The answer of each answer text setting is scored label, establishes the second decision-tree model, specially by the of each answer text Two characteristic values and the second initial decision tree-model is inputted for the answer label that scores of each answer text setting, to described the Two initial decision tree-models are trained；Using the second initial decision tree-model after training as the second decision-tree model.

In a kind of optional embodiment, construction unit 31 is according to the Second Eigenvalue of each answer text and is The answer scoring label of each big text setting, establishes the second decision-tree model, specially determines the length of each answer text Degree；It is arranged according to the length of each answer text, the Second Eigenvalue of each answer text and for each answer text Answer score label, establish the second decision-tree model.

In a kind of optional embodiment, construction unit 31 constructs bag of words using training text, specially utilizes Training text constructs dictionary；The dictionary includes the word feature of each answer text in the training text；It counts in the dictionary Each word feature whether occur in each answer text；The fisrt feature of each answer text is determined according to statistical result Value generates the bag of words of the First Eigenvalue including each answer text.

As it can be seen that electronic equipment can use training text building bag of words, and according to word in embodiment shown in Fig. 3 Bag model and the answer scoring label being arranged for each answer text establish the first decision-tree model, to obtain by the first decision The importance value of the word feature of each answer text of tree-model output, for filtering out the keyword spy for meeting preset condition Sign；Electronic equipment can be according to the Second Eigenvalue of each answer text obtained by keyword feature, and is each answer text The answer scoring label of this setting, establishes the second decision-tree model, to be used for answer score in predicting, thus pre- improving scoring While surveying precision, the interpretation of model ensure that.

Referring to Fig. 4, being the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.Wherein, the present embodiment Described in electronic equipment may include: one or more processors 1000, one or more input equipments 2000, one or Multiple output equipments 3000 and memory 4000.Processor 1000, input equipment 2000, output equipment 3000 and memory 4000 It can be connected by bus or other means.

Input equipment 2000, output equipment 3000 can be the wired or wireless communication interface of standard.

Processor 1000 can be central processing module (Central Processing Unit, CPU), and the processor is also It can be other general processors, digital signal processor (Digital Signal Processor, DSP), dedicated integrated electricity Road (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

Memory 4000 can be high speed RAM memory, can also be non-labile memory (non-volatile ), such as magnetic disk storage memory.Memory 4000 is used to store a set of program code, input equipment 2000, output equipment 3000 and processor 1000 can call the program code stored in memory 4000.Specifically:

Processor 1000, for constructing bag of words using training text；The bag of words include each in training text The First Eigenvalue of answer text；According to the First Eigenvalue of each answer text and answering for the setting of each answer text Topic scoring label, establishes the first decision-tree model, and obtain each answer text exported by first decision-tree model Word feature importance value；According to the importance value of the word feature of each answer text, from each answer text Word feature in filter out the keyword feature for meeting preset condition, and each answer text is obtained according to the keyword feature This Second Eigenvalue；According to the Second Eigenvalue of each answer text and the answer for the setting of each answer text Score label, establishes the second decision-tree model, to be used for answer score in predicting.

Optionally, the processor 1000, is also used to after establishing the second decision-tree model, when needing to target answer When text carries out answer score in predicting, using the target answer text as the input data of second decision-tree model；It is logical Cross the appraisal result information that second decision-tree model exports the target answer text.

Optionally, processor 1000 is according to the importance value of the word feature of each answer text, from each answer The keyword feature for meeting preset condition is filtered out in the word feature of text, specially according to the word feature of each answer text Importance value, filtered out from the word feature of each answer text importance value be greater than or equal to preset value first Word feature；It is received by input equipment 2000 and deletes instruction, delete the second word from the first word feature according to instruction is deleted Feature；The first word feature of delete operation will be performed, is determined as meeting the keyword feature of preset condition.

Optionally, processor 1000 is arranged according to the First Eigenvalue of each answer text and for each answer text Answer score label, the first decision-tree model is established, specially by the First Eigenvalue of each answer text and be every The answer scoring label of a answer text setting inputs the first initial decision tree-model, to the first initial decision tree-model It is trained；Using the first initial decision tree-model after training as the first decision-tree model.

Optionally, processor 1000 is arranged according to the Second Eigenvalue of each answer text and for each answer text Answer score label, the second decision-tree model is established, specially by the Second Eigenvalue of each answer text and be every The answer scoring label of a answer text setting inputs the second initial decision tree-model, to the second initial decision tree-model It is trained；Using the second initial decision tree-model after training as the second decision-tree model.

Optionally, processor 1000 according to the Second Eigenvalue of each answer text and is each big text setting Answer scoring label, establishes the second decision-tree model, specially determines the length of each answer text；According to each answer The length of text, the Second Eigenvalue of each answer text and the answer scoring label for the setting of each answer text, build Vertical second decision-tree model.

Optionally, processor 1000 constructs bag of words using training text, specially constructs dictionary using training text； The dictionary includes the word feature of each answer text in the training text；Each word feature in the dictionary is counted described each Whether occur in answer text；The First Eigenvalue of each answer text is determined according to statistical result, it includes described each for generating The bag of words of the First Eigenvalue of answer text.

In the specific implementation, processor 1000, input equipment 2000 described in the embodiment of the present application, output equipment 3000 Implementation described in executable Fig. 1-Fig. 2 embodiment, also can be performed implementation described in the embodiment of the present application, herein It repeats no more.

It can integrate in a processing module, be also possible to each in each functional module in each embodiment of the application Module physically exists alone, and is also possible to two or more modules and is integrated in a module.Above-mentioned integrated module was both It can be realized, can also be realized in the form of sampling software functional module in the form of sampling hardware.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Above disclosed is only a kind of preferred embodiment of the application, cannot limit the power of the application with this certainly Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and weighs according to the application Benefit requires made equivalent variations, still belongs to the scope covered by the invention.

Claims

1. a kind of decision-tree model construction method characterized by comprising

Bag of words are constructed using training text；The bag of words include the fisrt feature of each answer text in training text Value；

According to the First Eigenvalue of each answer text and it is that the answer of each answer text setting is scored label, establishes the One decision-tree model, and obtain the significance level of the word feature of each answer text exported by first decision-tree model Value；

According to the importance value of the word feature of each answer text, filtered out from the word feature of each answer text full The keyword feature of sufficient preset condition, and the Second Eigenvalue of each answer text is obtained according to the keyword feature；

According to the Second Eigenvalue of each answer text and the answer scoring label for the setting of each answer text, build Vertical second decision-tree model, to be used for answer score in predicting.

2. the method according to claim 1, wherein described establish after the second decision-tree model, the method Further include:

When needing to carry out answer score in predicting to target answer text, using the target answer text as second decision The input data of tree-model；

3. the method according to claim 1, wherein the word feature according to each answer text is important Degree value filters out the keyword feature for meeting preset condition from the word feature of each answer text, comprising:

According to the importance value of the word feature of each answer text, weight is filtered out from the word feature of each answer text Degree value is wanted to be greater than or equal to the first word feature of preset value；

4. method according to claim 1 to 3, which is characterized in that described according to the of each answer text One characteristic value and be each answer text setting answer score label, establish the first decision-tree model, comprising:

It by the First Eigenvalue of each answer text and is the answer scoring label input first of each answer text setting Initial decision tree-model, to be trained to the first initial decision tree-model；

5. according to the method described in claim 4, it is characterized in that, the Second Eigenvalue according to each answer text with And the answer scoring label for the setting of each answer text, establish the second decision-tree model, comprising:

It by the Second Eigenvalue of each answer text and is the answer scoring label input second of each answer text setting Initial decision tree-model, to be trained to the second initial decision tree-model；

6. method according to claim 1 to 3, which is characterized in that described according to the of each answer text Two characteristic values and be that the answer of each big text setting is scored label, establish the second decision-tree model, comprising:

Determine the length of each answer text；

It is arranged according to the length of each answer text, the Second Eigenvalue of each answer text and for each answer text Answer score label, establish the second decision-tree model.

7. the method according to claim 1, wherein described construct bag of words using training text, comprising:

The First Eigenvalue of each answer text is determined according to statistical result, and it is special to generate first including each answer text The bag of words of value indicative.

8. a kind of decision-tree model construction device characterized by comprising

Construction unit, for constructing bag of words using training text；The bag of words include each answer text in training text This First Eigenvalue；

The construction unit is also used to be arranged according to the First Eigenvalue of each answer text and for each answer text Answer scoring label, establishes the first decision-tree model, and obtains each answer text exported by first decision-tree model The importance value of this word feature；

Processing unit, for the importance value according to the word feature of each answer text, from the word of each answer text The keyword feature for meeting preset condition is filtered out in feature, and each answer text is obtained according to the keyword feature Second Eigenvalue；

The construction unit is also used to according to the Second Eigenvalue of each answer text and described sets for each answer text The answer scoring label set, establishes the second decision-tree model, to be used for answer score in predicting.

9. a kind of electronic equipment, which is characterized in that including processor, input equipment, output equipment and memory, the processing Device, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing computer program, the meter Calculation machine program includes program instruction, and the processor is configured for calling described program instruction, executes claim 1-7 such as and appoints Method described in one.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program include program instruction, and described program instruction executes the processor such as The described in any item methods of claim 1-7.