Summary of the invention
This specification embodiment provides a kind of model score means of interpretation, device and equipment, as follows for solving the problems, such as: with
A kind of more convenient model score interpretation scheme is provided.
Based on this, this specification embodiment provides a kind of model score means of interpretation, comprising:
For any data to be assessed, its characteristic variable for being included is determined;
For any feature variable, characteristic interval belonging to the value of the characteristic variable is determined;
Corresponding relationship based on preset characteristic interval and model score statistical value, determines corresponding to the characteristic variable
Model score statistical value;
It to the characteristic variable of the data to be assessed, is ranked up according to the model score statistical value, generates sequence knot
Fruit determines the characteristic variable for influencing the data to be assessed based on ranking results.
Meanwhile the embodiment of this specification also provides a kind of model score interpreting means, comprising:
Characteristic determination module determines its characteristic variable for being included for any data to be assessed;
Section determining module determines characteristic interval belonging to the value of the characteristic variable for any feature variable;
Statistical value determining module, the corresponding relationship based on preset characteristic interval and model score statistical value, determine described in
Model score statistical value corresponding to characteristic variable;
Sequence and explanation module carry out the characteristic variable of the data to be assessed according to the model score statistical value
Sequence generates ranking results, and the characteristic variable for influencing the data to be assessed is determined based on ranking results.
Corresponding, this specification embodiment also provides a kind of model score explanation facilities, comprising:
Memory is stored with model score interpretive program;
Processor calls the model score interpretive program in the memory, and executes:
For any data to be assessed, its characteristic variable for being included is determined;
For any feature variable, characteristic interval belonging to the value of the characteristic variable is determined;
Corresponding relationship based on preset characteristic interval and model score statistical value, determines corresponding to the characteristic variable
Model score statistical value;
It to the characteristic variable of the data to be assessed, is ranked up according to the model score statistical value, generates sequence knot
Fruit determines the characteristic variable for influencing the data to be assessed based on ranking results.
Corresponding, the embodiment of this specification also provides a kind of nonvolatile computer storage media, is stored with computer
Executable instruction, the computer executable instructions setting are as follows:
For any data to be assessed, its characteristic variable for being included is determined;
For any feature variable, characteristic interval belonging to the value of the characteristic variable is determined;
Corresponding relationship based on preset characteristic interval and model score statistical value, determines corresponding to the characteristic variable
Model score statistical value;
It to the characteristic variable of the data to be assessed, is ranked up according to the model score statistical value, generates sequence knot
Fruit determines the characteristic variable for influencing the data to be assessed based on ranking results.
This specification embodiment use at least one above-mentioned technical solution can reach it is following the utility model has the advantages that
By pre-establishing the corresponding relationship of characteristic interval Yu model score statistical value, it is then directed to any number to be assessed
According to determining model score statistical value corresponding to characteristic interval that its each characteristic variable value is fallen into one by one, commented according to model
Divide statistical value to be ranked up each characteristic variable, may thereby determine that the characteristic variable being affected to model score, and also
Corresponding explanation reasons can be exported.Aforesaid way avoids the dependence to training data, is also not required to obtain any data in advance
Mark.At the same time it can also regularly update model score statistical value, to track the variation of overall data distribution at any time.In addition, also
It can be based on being encoded to model explanation reason and combinations thereof, so as in an encoded form directly to data to be assessed
Carry out qualitative interpretation;And it can be used in any algorithm model given a mark based on characteristic variable, wide adaptability.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one
Section Example, instead of all the embodiments.The embodiment of base in this manual, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall in the protection scope of this application.
With the development of machine learning algorithm, model in each field using more and more extensive, while these models
Internal structure also becomes to become increasingly complex.And interpretation during operation for model has natural demand, that is, mould
For type user it should be appreciated that for scoring or assessment result that model provides, which characteristic variable has large effect, with
And what specific explanation reasons are.For example, the variable importance sequence for automatically generating algorithm is as mould in model training
The explanation of type marking, but variable importance reflection is situation of the model on entire training dataset, is fixed and invariable,
The real-time evolution of data on line can not be adapted to, while the model without method interpretation single record is given a mark just.
Based on this, this specification embodiment provides a kind of model score interpretation scheme, by pre-defining characteristic interval, and
And the model score statistical value in this feature section is calculated, establish the corresponding relationship of the two.To be directed to any data to be assessed,
Can the value to its characteristic variable match above-mentioned characteristic interval one by one, and then obtain its corresponding model score statistics
Value, so as to be ranked up according to model score statistical value to each characteristic variable, to determine for the data to be assessed,
What which characteristic variable scored to it is affected.
As shown in FIG. 1, FIG. 1 is the flow diagram that model score provided by this specification embodiment is explained, the processes
Specifically includes the following steps:
S101 determines its characteristic variable for being included for any data to be assessed.
It is readily appreciated that, in model evaluation, data always include multiple characteristic variables, and each characteristic variable has correspondence
Value.Wherein, some characteristic variables are belonged to continuously, for example, " user's registration time span ", value interval can be
[0,24000] hour;Meanwhile some characteristic variables be then it is discrete, for example, " user's gender ", value be then " male " or
" female " is usually then indicated with " 0 " or " 1 " in a model.
S103 determines characteristic interval belonging to the value of the characteristic variable for any feature variable.
As previously mentioned, characteristic interval can be artificially gives division in practical applications, and it is not overlapped mutually.Such as
For " user's registration time span ", be divided into [0,2400), [2400,7200) and [7200,24000] three features
Section, it is readily appreciated that, for any data to be assessed, the value of any feature variable can only fall into some characteristic interval,
Without falling into multiple characteristic intervals simultaneously.
In this process, since data packet to be assessed contains multiple characteristic variables, therefore, it is necessary to by its all feature
Characteristic interval belonging to variable is confirmed.
S105, the corresponding relationship based on preset characteristic interval and model score statistical value determine the characteristic variable institute
Corresponding model score statistical value.
The corresponding relationship of characteristic interval and model score statistical value can be provided based on experience in practice, can also be based on
The statistics of real data provides.As shown in Fig. 2, Fig. 2 is characteristic interval provided by this specification embodiment and model score system
The schematic diagram of the corresponding relationship of evaluation.After the characteristic interval belonging to the value that characteristic variable has been determined, it can according to upper
Corresponding relationship is stated, determines the model score statistical value of each characteristic variable in data to be assessed.
S107 is ranked up the characteristic variable of the data to be assessed according to the model score statistical value, the row of generation
Sequence is as a result, determine the characteristic variable for influencing the data to be assessed based on ranking results.
The mode that characteristic variable is ranked up can be from high score to low point and sort, be also possible to assign to high score row from low
Sequence specifically can be determined according to actual needs.For example, in risk assessment field, if the higher characterization data to be assessed of model score
It is more dangerous, then by characteristic variable from high to low sequence, and take top n characteristic variable, thus N number of characteristic variable be to it is described to
The characteristic variable that the risk scoring of assessment data is affected;In another example if lower characterization data to be assessed of model score
It is more stable, then then characteristic variable can be sorted from down to height, and take top n characteristic variable, for the data, the N
A characteristic variable is the characteristic variable being affected that scores stability.
During the above-mentioned determining characteristic variable for influencing data scoring to be assessed, by pre-establish characteristic interval with
Then the corresponding relationship of model score statistical value is directed to any data to be assessed, determines its each characteristic variable value institute one by one
Model score statistical value corresponding to the characteristic interval fallen into is ranked up each characteristic variable according to model score statistical value,
It may thereby determine that the characteristic variable being affected to model score, avoid the dependence to training data, be also not required to obtain in advance
The mark of any data is taken, and is adapted to any algorithm, effectively realizes that the scoring to data to be assessed is explained.
As a kind of specific embodiment, in step S105, preset characteristic interval and model score are counted
The corresponding relationship of value, statistics obtains in advance by the following method: obtaining multiple data comprising the characteristic variable, and determines each number
According to model score;For any preset characteristic interval, the value for filtering out characteristic variable belongs to the number of the characteristic interval
According to;Calculate the characteristic variable value belong to the characteristic interval data model score statistical value;Establish the feature
The corresponding relationship in section and model score statistical value.Wherein, the model score statistical value includes average value, maximum value, minimum
Value or quantile.
It is readily appreciated that, in this process, the source of data can be training data, be also possible to number on real-time line
According to.For any feature section, typically always there is the value of the characteristic variable of partial data that can fall into this feature section.?
Data volume is more, thus in the case of having adequate representation, it is known that for this feature section, the model of the partial data
Scoring statistical value reflects this feature section for the degree of correlation of model score.In addition, when establishing corresponding relationship, no
There should be some characteristic interval to be averaged, another feature section is maximized such inconsistent situation, should keep all features
The consistency of the model score statistical value in section, that is, be all made of some identical statistical value (for example, average value).
It in practical applications, include multiple data of the characteristic variable for the acquisition in the above process, comprising: obtain
Multiple data comprising the characteristic variable within the scope of specified time.In other words, the source of model score statistical value can be
Data on real-time line, furthermore, it is possible to which dynamic changes.Worked as example, can be within the scope of specified time from product is online with coming
All data until the preceding time, are also possible to the data of last week or last month.In this manner, model score statistical value
Can be based on model for the scoring dynamic change of real data, so as to be based on practical situation, automatically adjustment is for each spy
Levy the assessment of the influence in section.
In the above scheme, the characteristic variable includes the combination of single argument or multiple variables, the characteristic interval packet
Include the combination in single argument section or multiple range of variables.For example, characteristic interval is the " user's registration duration < 30 day high-risk areas &
=1 ".
It is corresponding, while establishing characteristic interval and model score statistical value, preset characteristic area can also be set up
Between and the corresponding of explanation reasons close.Explanation reasons herein, which can be, thinks to be in advance based on relevant explanation determined by experience, example
Such as, for " high-risk areas=1 ", its corresponding is construed to " registered place is high-risk areas ".As shown in figure 3, Fig. 3 is this explanation
Characteristic variable provided by book embodiment, characteristic variable section, model score average value and explanation reasons dimension table schematic diagram.
To be also based on sequence after sequence has been determined according to characteristic variable of the model score statistical value to data to be assessed
As a result, directly determining the explanation reasons (equally taking top n) of the scoring for the data to be assessed.
In addition, corresponding explanation coding can also be given in advance for explanation reasons and combinations thereof.For example, for " explaining
Reason X " gives corresponding coding " 0101 ", give corresponding coding " 0102 " for " explanation reasons Y " and at the same include
" explanation reasons X's " and " explanation reasons Y " gives hierarchical coding " 01 ".Which specific explanation reasons can give identical classification
Coding, can be set, herein with no restrictions according to practical application.As shown in figure 4, Fig. 4 is mentioned by this specification embodiment
The schematic diagram of the explanation coding of confession.In this manner, then explanation reasons can be determined based on ranking results, thus further
Corresponding explanation coding is directly given, business personnel is facilitated quickly determine the characteristic variable for influencing data to be assessed
Position, the explanation for providing different levels of configurability.In practical applications, model score provided by this specification embodiment is explained
Scheme, as shown in figure 5, Fig. 5 is the logical schematic that model score provided by this specification embodiment is explained comprising dimension table
Definition (including explanation reasons and explains the dimension table encoded, the dimension table and characteristic interval of characteristic interval and model score statistical value
With the dimension table of explanation reasons), real-time matching, score value sequence and explain output (specific explanation reasons can be exported, can also be defeated
Corresponding explanation coding out, can also divide the explanation of different levels) four parts.
Based on same thinking, the present invention also provides a kind of model score interpreting means, as shown in fig. 6, Fig. 6 is this explanation
The structural schematic diagram of model score interpreting means provided by book embodiment, comprising:
Characteristic determination module 601 determines its characteristic variable for being included for any data to be assessed;
Section determining module 603 determines characteristic area belonging to the value of the characteristic variable for any feature variable
Between;
Statistical value determining module 605, the corresponding relationship based on preset characteristic interval and model score statistical value, determines institute
State model score statistical value corresponding to characteristic variable;
Sequence and explanation module 607, to the characteristic variable of the data to be assessed, according to the model score statistical value into
Row sequence, generates ranking results, and the characteristic variable for influencing the data to be assessed is determined based on ranking results.
Further, described device further includes statistical module 609, obtains multiple data comprising the characteristic variable, and
Determine the model score of each data;For any preset characteristic interval, the value for filtering out characteristic variable belongs to the feature
The data in section;Calculate the characteristic variable value belong to the characteristic interval data model score statistical value;It establishes
The corresponding relationship of the characteristic interval and model score statistical value;Wherein, the model score statistical value includes average value, maximum
Value, minimum value or quantile.
Further, the statistical module 609 obtains multiple numbers comprising the characteristic variable within the scope of specified time
According to.
Further, the characteristic variable includes the combination of single argument or multiple variables, and the characteristic interval includes single
The combination of range of variables or multiple range of variables.
Further, described device further includes explanation reasons module 611, based on preset characteristic interval and explanation reasons
Corresponding relationship and the ranking results determine the explanation reasons of the data to be assessed.
Further, further, described device is being directed to any data to be assessed, determines that the feature that it is included becomes
Further include coding module 613 before amount, corresponding explanation is carried out to the combination of the explanation reasons or explanation reasons and is encoded;
The explanation reasons module 611 determines that the explanation of the data to be assessed encodes according to the explanation reasons of the data to be assessed.
Corresponding, this specification embodiment also provides a kind of model score explanation facilities, comprising:
Memory is stored with model score interpretive program;
Processor calls the model score interpretive program in the memory, and executes:
For any data to be assessed, its characteristic variable for being included is determined;
For any feature variable, characteristic interval belonging to the value of the characteristic variable is determined;
Corresponding relationship based on preset characteristic interval and model score statistical value, determines corresponding to the characteristic variable
Model score statistical value;
It to the characteristic variable of the data to be assessed, is ranked up according to the model score statistical value, generates sequence knot
Fruit determines the characteristic variable for influencing the data to be assessed based on ranking results.
Based on same invention thinking, the embodiment of the present application also provides a kind of corresponding non-volatile computer storage Jie
Matter is stored with computer executable instructions, the computer executable instructions setting are as follows:
For any data to be assessed, its characteristic variable for being included is determined;
For any feature variable, characteristic interval belonging to the value of the characteristic variable is determined;
Corresponding relationship based on preset characteristic interval and model score statistical value, determines corresponding to the characteristic variable
Model score statistical value;
It to the characteristic variable of the data to be assessed, is ranked up according to the model score statistical value, generates sequence knot
Fruit determines the characteristic variable for influencing the data to be assessed based on ranking results.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For equipment and medium class embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, related place
Illustrate referring to the part of embodiment of the method, just no longer repeats one by one here.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement or step recorded in detail in the claims or module can be according to different from embodiments
Sequence executes and still may be implemented desired result.In addition, process depicted in the drawing is not necessarily required and is shown
Particular order or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing
It is also possible or may be advantageous.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when the embodiment of specification.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), the data letter number and carrier wave of such as modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiment one or more in this specification can provide for method, system or
Computer program product.Therefore, complete hardware embodiment, complete software embodiment or combination can be used in the embodiment of this specification
Form in terms of software and hardware.Moreover, it wherein includes computer that the embodiment of this specification, which can be used in one or more,
The computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of usable program code
The form of the computer program product of upper implementation.
The embodiment of this specification can retouch in the general context of computer-executable instructions executed by a computer
It states, such as program module.Generally, program module include execute particular transaction or realize particular abstract data type routine,
Programs, objects, component, data structure etc..The embodiment that this specification can also be practiced in a distributed computing environment, at this
In a little distributed computing environment, by executing affairs by the connected remote processing devices of communication network.It is counted in distribution
It calculates in environment, program module can be located in the local and remote computer storage media including storage equipment.