CN106780258A - A kind of method for building up and device of minor crime decision tree - Google Patents

A kind of method for building up and device of minor crime decision tree Download PDF

Info

Publication number
CN106780258A
CN106780258A CN201611208475.5A CN201611208475A CN106780258A CN 106780258 A CN106780258 A CN 106780258A CN 201611208475 A CN201611208475 A CN 201611208475A CN 106780258 A CN106780258 A CN 106780258A
Authority
CN
China
Prior art keywords
crime
influence factor
extent
influence
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611208475.5A
Other languages
Chinese (zh)
Other versions
CN106780258B (en
Inventor
刘力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netposa Technologies Ltd
Original Assignee
Netposa Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netposa Technologies Ltd filed Critical Netposa Technologies Ltd
Priority to CN201611208475.5A priority Critical patent/CN106780258B/en
Publication of CN106780258A publication Critical patent/CN106780258A/en
Application granted granted Critical
Publication of CN106780258B publication Critical patent/CN106780258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Abstract

The invention provides a kind of method for building up and device of minor crime decision tree, including:Obtain the sample data of minor crime;Determine influence value of every kind of crime influence factor to extent of crime, influence factor set of the influence value more than or equal to preset value is screened from above-mentioned sample data;Determine dependency degree of the extent of crime to every kind of influence factor in influence factor set, determine the degree of association of every kind of influence factor in extent of crime and influence factor set;The tight ness rating of extent of crime and every kind of influence factor in influence factor set is determined according to dependency degree and the degree of association;According to tight ness rating, the comentropy of sample data and in influence factor set, the comentropy of every kind of influence factor sets up crime decision tree.In the present invention, screen out some influence factors less to extent of crime influence value so that decision tree to set up process simple and time-consuming shorter, and introduce the tight ness rating between extent of crime and influence factor this parameter, improve the nicety of grading of the decision tree of foundation.

Description

A kind of method for building up and device of minor crime decision tree
Technical field
The present invention relates to big data analysis technical field, in particular to a kind of building for minor crime decision tree Cube method and device.
Background technology
Influence with network, science and technology, New Culture to daily life, minor crime's ratio is also rising year by year, not Adult is the following successor for building of motherland, and minor crime both endangers social stability, and minor colony is good for Kang Chengchang brings deleterious effect.Therefore, the criminal offence to minor is studied, analyzing influence minor crime's Factor, to prevent minor crime to be particularly important.
In the prior art, the general decision-tree model that minor crime's influence factor is set up using ID3 algorithms, but, Influenceing the factor of minor crime has various, and some factors are larger to minor's influence degree, and some factors are to teenage People's crime influence degree is smaller, but in the prior art when decision tree is set up, all of factor is all taken into account, so, So that decision tree to set up process more complicated and time-consuming, also, there are variety bias in ID3 algorithms, so that pass through The information gain of the influence factor that ID3 algorithms are calculated is inaccurate, and then reduces the nicety of grading of the decision tree of foundation.
The content of the invention
In view of this, the purpose of the embodiment of the present invention be provide a kind of minor crime decision tree method for building up and Device, to solve directly to set up decision tree using ID3 algorithms in the prior art so that decision tree sets up the more complicated consumption of process When, and due to the variety bias problem of ID3 algorithms, the relatively low problem of the nicety of grading of caused decision tree.
In a first aspect, a kind of method for building up of minor crime decision tree is the embodiment of the invention provides, wherein, it is described Method includes:
The sample data of minor crime is obtained, the sample data includes crime influence factor and the criminal of minor Guilty degree;
Determine that every kind of crime influence factor, to the influence value of the extent of crime, is filtered out from the sample data Influence factor set of the influence value more than or equal to preset value;
Determine dependency degree of the extent of crime to every kind of influence factor in the influence factor set, and determine the criminal The degree of association in guilty degree and the influence factor set between every kind of influence factor;
Determine the extent of crime with every kind of shadow in the influence factor set according to the dependency degree and the degree of association Tight ness rating between the factor of sound;
According to the tight ness rating, the comentropy of the sample data and every kind of influence factor pair in the influence factor set The comentropy answered sets up crime decision tree.
With reference in a first aspect, the embodiment of the invention provides the first possible implementation of above-mentioned first aspect, its In, the influence value for determining every kind of crime influence factor to the extent of crime, including:
Determine the joint probability between the property value and the property value of the extent of crime of every kind of crime influence factor Distribution;
Covariance between the crime influence factor and the extent of crime is calculated according to the joint probability distribution;
The variance of the crime influence factor and the variance of the extent of crime are calculated according to the joint probability distribution;
According to the variance of the covariance, the variance of the crime influence factor and the extent of crime, the criminal is determined Influence value of the guilty influence factor to the extent of crime.
With reference in a first aspect, the embodiment of the invention provides second possible implementation of above-mentioned first aspect, its In, the dependency degree for determining the extent of crime to every kind of influence factor in the influence factor set, including:
Determine positive domain of the extent of crime on every kind of influence factor in the influence factor set;
The corresponding sample of every kind of influence factor in number of samples and the influence factor set in the positive domain Number, determines the dependency degree.
With reference in a first aspect, the embodiment of the invention provides the third possible implementation of above-mentioned first aspect, its In, the degree of association in the determination extent of crime and the influence factor set between every kind of influence factor, including:
Every kind of influence factor takes corresponding number of samples during different property values in determining the influence factor set;
Determine to belong in the number of samples subsample number of the corresponding every kind of property value of the extent of crime;
In determining the extent of crime and the influence factor set according to the subsample number every kind of influence because The degree of association between element.
With reference in a first aspect, the embodiment of the invention provides the 4th kind of possible implementation of above-mentioned first aspect, its In, it is described to determine the extent of crime with every kind of influence in the influence factor set according to the dependency degree and the degree of association Tight ness rating between factor, including:
Calculate the product between the dependency degree and the degree of association;
Calculate between the dependency degree and the degree of association and value;
Ratio according to the sum of products and value between is determined in the extent of crime and the influence factor set Tight ness rating between every kind of influence factor.
With reference in a first aspect, the embodiment of the invention provides the 5th kind of possible implementation of above-mentioned first aspect, its In, every kind of influence according to the tight ness rating, the comentropy of the sample data and in the influence factor set because The corresponding comentropy of element sets up crime decision tree, including:
According to the tight ness rating, the comentropy of the sample data and every kind of influence factor pair in the influence factor set The comentropy answered, calculates the information gain of every kind of influence factor in the influence factor set;
The information gain of maximum is determined, and the maximum corresponding influence factor of information gain is defined as crime and determined The root split vertexes of plan tree;
The crime decision tree is set up according to described split vertexes.
Second aspect, the embodiment of the invention provides a kind of minor crime decision tree sets up device, wherein, it is described Device includes:
Acquisition module, the sample data for obtaining minor crime, the sample data includes the criminal of minor Guilty influence factor and extent of crime;
Screening module, for determining influence value of every kind of crime influence factor to the extent of crime, from the sample Influence factor set of the influence value more than or equal to preset value is filtered out in notebook data;
First determining module, for determine the extent of crime in the influence factor set every kind of influence factor according to The degree of association in Lai Du, and the determination extent of crime and the influence factor set between every kind of influence factor;
Second determining module, for determining the extent of crime with the influence according to the dependency degree and the degree of association Tight ness rating in sets of factors between every kind of influence factor;
Module is set up, for according in the tight ness rating, the comentropy of the sample data and the influence factor set The corresponding comentropy of every kind of influence factor sets up crime decision tree.
With reference to second aspect, the first possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its In, the screening module includes:
First determining unit, for determining the property value of every kind of crime influence factor and the attribute of the extent of crime Joint probability distribution between value;
First computing unit, for calculating the crime influence factor and the crime journey according to the joint probability distribution Covariance between degree;
Second computing unit, for calculating the variance of the crime influence factor and described according to the joint probability distribution The variance of extent of crime;
Second determining unit, for according to the covariance, the variance of the crime influence factor and the extent of crime Variance, determine influence value of the crime influence factor to the extent of crime.
With reference to second aspect, second possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its In, second determining module includes:
3rd computing unit, for calculating the product between the dependency degree and the degree of association;
4th computing unit, for calculating between the dependency degree and the degree of association and value;
3rd determining unit, the extent of crime and institute are determined for the ratio according to the sum of products and value between State the tight ness rating between every kind of influence factor in influence factor set.
With reference to second aspect, the third possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its In, the module of setting up includes:
5th computing unit, for according to the tight ness rating, the comentropy of the sample data and the influence factor collection The corresponding comentropy of every kind of influence factor in conjunction, calculates the information gain of every kind of influence factor in the influence factor set;
4th determining unit, the information gain for determining maximum, and by the maximum corresponding shadow of information gain The factor of sound is defined as the root split vertexes of the crime decision tree;
Unit is set up, for setting up the crime decision tree according to described split vertexes.
A kind of method for building up and device of minor crime decision tree are provided in the present invention, it is right to have been screened out in the method Less some influence factors of extent of crime influence value so that decision tree to set up process simple and time-consuming shorter, in addition, in meter Tight ness rating this parameter between extent of crime and influence factor is introduced when calculating information gain so that the root division section determined Point is more accurate, so as to improve the nicety of grading of the decision tree of foundation.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows the flow chart of the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided;
Fig. 2 shows and commit a crime in the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided influence Flow chart of the factor to the determination method of the influence value of extent of crime;
Fig. 3 shows crime decision-making in the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided The flow chart for setting up process of tree;
Fig. 4 shows the structural representation for setting up device of the minor crime decision tree that the embodiment of the present invention 2 is provided Figure.
Fig. 4 description of reference numerals:
410, acquisition module;420 screening modules;430. first determining modules;440, the second determining module;450, set up mould Block.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention Middle accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only It is a part of embodiment of the invention, rather than whole embodiments.The present invention generally described and illustrated in accompanying drawing herein is real The component for applying example can be arranged and designed with a variety of configurations.Therefore, it is of the invention to what is provided in the accompanying drawings below The detailed description of embodiment is not intended to limit the scope of claimed invention, but is merely representative of selected reality of the invention Apply example.Based on embodiments of the invention, the institute that those skilled in the art are obtained on the premise of creative work is not made There is other embodiment, belong to the scope of protection of the invention.
In view of in the prior art, when the factor to influenceing minor crime is analyzed, it is necessary to set up decision tree, Mostly it is the decision-tree model that minor crime's influence factor is set up using ID3 algorithms, but, influence minor crime's Factor has various, and some factors are larger to minor's influence degree, and some factors are smaller to minor crime's influence degree, But in the prior art when decision tree is set up, all of factor is all taken into account, so so that decision tree sets up process It is more complicated and time-consuming, also, ID3 algorithms have variety bias, so that the influence calculated by ID3 algorithms because The information gain of element is inaccurate, and then reduces the nicety of grading of the decision tree of foundation.Based on this, the embodiment of the invention provides A kind of method for building up and device of minor crime decision tree, are described below by embodiment.
Embodiment 1
A kind of method for building up of minor crime decision tree is the embodiment of the invention provides, wherein, as shown in figure 1, should Method includes step S110-S150, specific as follows.
S110, obtain minor crime sample data, the sample data include minor crime's influence factor and Extent of crime.
, it is necessary to gather teenage before minor crime decision tree is set up using method provided in an embodiment of the present invention The sample data of people's crime, the i.e. extent of crime to multiple crime minors and corresponding personal influence factor are adopted Collection, such as, record the number such as extent of crime, sex, age, home background and situation of going to school of the minor of certain crime According to, and the data storage that will be recorded.
When decision tree is set up using method provided in an embodiment of the present invention, then the sample of minor crime is obtained first Data.
Wherein, above-mentioned extent of crime includes lighter, general, heavier and serious four attribute value.
The factor of above-mentioned influence minor crime potentially includes sex, age, single parent's divorced situation, father and mother place of working Point and teacher father and mother get along situation, only-child, homeplace, family economical condition, personal schooling, father's culture Degree, mother education level, student's conduct, the bad site conditions of discrepancy, smoking situation of drinking, situation of fighting, record of previous crime feelings Condition.
Also, above-mentioned every kind of influence factor correspondence multiple property value, such as, the property value at age includes man and female;Family The property value of front yard economic condition includes rich, general and difficult;The property value of father's schooling include illiteracy, junior middle school and with Under, senior middle school, university and more than;The property value of mother education level include illiteracy, junior middle school and following, senior middle school, university and more than; The existing property value of calibration include it is excellent, good, in, it is poor;Come in and go out bad site conditions property value include frequently, it is general, seldom and nothing; The property value of smoking situation of drinking includes often, general, once in a while and nothing;The property value of situation of fighting includes frequent, general, little And nothing;The property value of record of previous crime situation is included once, twice and repeatedly.
Wherein, the above-mentioned part that simply lists may produce the factor of influence, and wherein one to minor crime The property value that some factors are potentially included, but, not on may on minor crime produce influence factor and because The property value that element includes is defined, and above-mentioned simply citing is illustrated.
S120, determines influence value of every kind of above-mentioned crime influence factor to extent of crime, is sieved from above-mentioned sample data Select influence factor set of the influence value more than or equal to preset value.
Wherein, as shown in Fig. 2 the every kind of crime influence factor of above-mentioned determination is to the influence value of extent of crime, including step S210-S230, it is specific as follows:
S210, determines the joint probability distribution between every kind of crime influence factor and extent of crime;
S220, the covariance between above-mentioned crime influence factor and extent of crime is calculated according to above-mentioned joint probability distribution;
S230, the variance of crime influence factor and the variance of extent of crime are calculated according to above-mentioned joint probability distribution;
S240, according to the variance of above-mentioned covariance, the variance of crime influence factor and extent of crime, determines above-mentioned crime shadow Influence value of the factor of sound to extent of crime.
Wherein, the covariance between above-mentioned crime influence factor and extent of crime can be calculated by formula (1);
Wherein, in above-mentioned formula (1), XPCrime influence factor is represented, the value of P is 1,2,3 ..., xqRepresent crime shadow The property value of the factor of sound, q=1,2 ..., Y represents extent of crime, ytThe property value of extent of crime is represented, wherein, t=1,2,3,4, pqtIt is the joint probability value between crime influence factor and extent of crime, E (XP) it is the expectation of influence factor, E (Y) is crime journey The expectation of degree, above-mentioned k is the number of the property value of certain crime influence factor, and m is the number of the property value of each extent of crime.
Such as, if above-mentioned crime influence factor is family economical condition, the property value of family economical condition includes richness It is abundant, general and poor, then, the number of the property value of family economical condition is 3 for the value of 3, i.e. k, and the category of extent of crime Property value include it is lighter, general, heavier and serious, therefore, the value of m is 4, certainly, the property value of extent of crime, crime influence because The property value of element is not limited thereto.
Wherein, what above-mentioned influence value was represented is the performance indications of tightness degree between crime influence factor and extent of crime, The specific mistake for determining family economical condition to the influence value of extent of crime will be discussed in detail by taking family economical condition as an example below Journey.
Such as, when the value of P is 2, then X2Family economical condition is represented, at this moment, the value of q is 3, i.e. x1Represent rich Abundant, x2Represent general, x3Represent difficult, ytThe property value of extent of crime is represented, wherein, y1Represent relatively light, y2Represent general, y3Table Show heavier, y4Represent serious.
From sample data, number, the family of domestic conditions affluence and the lighter sample of extent of crime can be respectively determined Front yard condition affluence and the general number of samples of extent of crime, domestic conditions affluence and the heavier number of samples of extent of crime and family Condition affluence and the serious number of samples of extent of crime;And domestic conditions are general and the lighter sample number of extent of crime, family Condition is general and extent of crime is general number of samples, the number of samples and family's bar that domestic conditions are general and extent of crime is heavier The number of samples that part is general and extent of crime is serious;Domestic conditions difficulty and the lighter number of samples of extent of crime, domestic conditions The difficult and general number of samples of extent of crime, domestic conditions are difficult and the heavier number of samples of extent of crime and domestic conditions are tired The difficult and serious number of samples of extent of crime.
According to the above-mentioned number of samples determined, it may be determined that the joint gone out between family economical condition and extent of crime is general Rate is distributed, such as, above domestic conditions are rich and number of the lighter sample of extent of crime is 3, domestic conditions are rich and violate The general number of samples of guilty degree is that 2, domestic conditions are rich and the heavier number of samples of extent of crime is rich 4, domestic conditions And the serious number of samples of extent of crime is 1;And domestic conditions are general and the lighter sample number of extent of crime is 5, domestic conditions The general and general number of samples of extent of crime is that the number of samples that 0, domestic conditions are general and extent of crime is heavier is 2, family The number of samples that condition is general and extent of crime is serious is 3;Domestic conditions are difficult and the lighter number of samples of extent of crime is 2, Domestic conditions are difficult and the general number of samples of extent of crime is 5, domestic conditions difficulty and the heavier number of samples of extent of crime For 1 and domestic conditions are difficult and the serious number of samples of extent of crime is 0.
The joint probability distribution such as institute of table 1 between the family economical condition and extent of crime then determined by above-mentioned data Show.
After joint probability distribution is determined, by formula (1) can determine family economical condition and extent of crime it Between covariance.
And the joint probability distribution in table (1) can determine the variance and extent of crime of family economical condition Variance.
Table 1
Wherein, the above-mentioned variance according to covariance, the variance of every kind of crime influence factor and extent of crime, by formula (2) influence value of the crime influence factor to extent of crime is calculated;
Wherein, in formula (2), ωPIt is crime influence factor to the influence value of extent of crime, above-mentioned Cov (xp, Y) and the criminal of being Covariance between guilty influence factor and extent of crime, it is above-mentionedIt is the variance of crime influence factor,It is crime The variance of degree.
In above-mentioned, the variance of family economical condition, the variance of extent of crime and family economical condition and criminal are calculated Covariance between guilty degree, therefore, influence value of the family economical condition to extent of crime can be calculated by formula (2).
Wherein, as can be seen that ω from formula (2)PValue be [0,1], and ωPIt is to weigh crime influence factor and criminal A kind of module of the degree that is closely related between guilty degree.
It is above-mentioned that influence factor set of the influence value more than or equal to preset value is filtered out from sample data, can be by such as Lower process is realized:
By above-mentioned ωPIt is compared with preset value, works as ωPDuring less than preset value, show crime influence factor and extent of crime Between degree of correlation it is smaller, at this moment, the crime influence factor is screened out, work as ωPDuring more than or equal to preset value, the shadow is illustrated The degree of correlation rung between the corresponding crime influence factor of value and extent of crime is larger, and the influence factor is added into influence factor collection In conjunction.
Wherein, an influence value set in advance during above-mentioned preset value, such as 0.6, certainly, above-mentioned preset value can be with It is other numerical value, concrete numerical value of the embodiment of the present invention not to above-mentioned preset value is configured, user can answer according to specific It is configured with scene.
Influence factor in above-mentioned crime influence factor set is the larger shadow of degree of correlation between extent of crime The factor of sound, therefore, during hereinafter setting up crime decision tree, the influence factor for being used is crime influence factor collection Influence factor in conjunction.
S130, determines dependency degree of the above-mentioned extent of crime to every kind of influence factor in influence factor set, and determine crime The degree of association in degree and influence factor set between every kind of influence factor.
Wherein, above-mentioned determination extent of crime is specifically included to the dependency degree of every kind of influence factor in influence factor set:Really Fixed positive domain of the above-mentioned extent of crime on every kind of influence factor in influence factor set;Number of samples in above-mentioned positive domain and The corresponding number of samples of every kind of influence factor, determines above-mentioned dependency degree in influence factor set.
Wherein, above-mentioned extent of crime can be by formula (3) to the dependency degree of every kind of influence factor in influence factor set Calculate;
Wherein, in above-mentioned formula (3), sptX(Y) represent extent of crime to every kind of influence factor in influence factor set Dependency degree, POSX(Y) the positive domains of X of Y are represented, wherein, Y represents extent of crime, and X represents influence factor set, and U is influence factor The corresponding increment notebook data of any one influence factor in set, Card (U) represents the number of increment notebook data, Card (POSX (Y) number of element in positive domain) is represented.
Wherein, above-mentioned positive domain and dependency degree will be discussed in detail so that influence factor is as family economical condition as an example below Specific calculating process:
Family economical condition is X2, the property value of family economical condition includes rich, general and difficulty, from sample data Choose the data on family economical condition, such as, be U=record 1 (affluence), record 2 (affluences), record 3 (affluences), 4 (general) of record, record 5 (general), record 6 (difficulties) }.
Above-mentioned U includes 6 row data records, wherein property value identical will record as one group, and such one group is designated as Equivalence class;
Equivalence class in known Y is:M={ record 2, record 3, record 6 }
X2In equivalence class be:
Group 1:E1={ 1 (affluence) of record, records 2 (affluences), records 3 (affluences) }
Group 2:E2={ 4 (general) of record, record 5 (general) }
Group 3:E3={ 6 (difficulties) of record }
Above-mentioned U is on X2Be divided into:
U/X2={ group 1, group 2, group 3 }={ { record 1, record 2, record 3 }, { record 4, record 5 }, { record 6 } }
Find above-mentioned U/X2It is middle by M={ record 2, record 3, the record 6 } element that includes, now U/X2In { record 6 } symbol Conjunction condition, the record 6 only organized in 3 is completely included by M, now POSX(Y)={ record 6 }, therefore, the positive domains of X of Y are POSX(Y) ={ record 6 }.
Above-mentioned Card refers to the number of sample in set, POSX(Y) only one of which sample in={ record 6 }, then now according to Lai Du
Wherein, the degree of association in above-mentioned determination extent of crime and influence factor set between every kind of influence factor, specific bag Include:
Every kind of influence factor takes corresponding number of samples during different property values in determining above-mentioned influence factor set;It is determined that Belong to the subsample number of corresponding every kind of property value of extent of crime in above-mentioned number of samples;It is true according to above-mentioned subsample number Determine the degree of association between every kind of influence factor in extent of crime and influence factor set.
Wherein, the above-mentioned degree of association refers to the correlation degree between certain influence factor and extent of crime, the above-mentioned degree of association Value it is higher, show to contact tightr between the influence factor and extent of crime, correlation degree is bigger.
The above-mentioned degree of association can be calculated by formula (4),
Wherein, in formula (4), S (Y, XP) be the degree of association between influence factor XP and extent of crime, i for influence because Plain XPCorresponding property value, j is the corresponding property value of extent of crime, fi,jIt is above-mentioned subsample number, fi,m+1It is to work as influence factor Take corresponding number of samples, f during each property valuen+1,jCorresponding number of samples when taking each property value for extent of crime.
The specific calculating process of the above-mentioned degree of association is still introduced by taking family economical condition as an example below:
Such as, the number of domestic conditions affluence and the lighter sample of extent of crime is 3, domestic conditions affluence and crime journey The general number of samples of degree is that 2, domestic conditions are rich and the heavier number of samples of extent of crime is 4, domestic conditions affluence and violates The serious number of samples of guilty degree is 1;And domestic conditions are general and the lighter sample number of extent of crime is that 5, domestic conditions are general And the general number of samples of extent of crime is that the number of samples that 0, domestic conditions are general and extent of crime is heavier is 2, domestic conditions The general and serious number of samples of extent of crime is 3;Domestic conditions are difficult and the lighter number of samples of extent of crime is 2, family Condition is difficult and the general number of samples of extent of crime is that 5, domestic conditions are difficult and the heavier number of samples of extent of crime is 1 and Domestic conditions are difficult and the serious number of samples of extent of crime is 0, as shown in table 2.
Table 2
According to the data in table 2, by formula (4) it can be calculated that pass between family economical condition and extent of crime Connection degree is:
The above-mentioned specific calculating process for being merely illustrative the degree of association, not to the corresponding sample of family economical condition The size of number and the specific degree of association is defined.
S140, according to above-mentioned dependency degree and the degree of association determine in above-mentioned extent of crime and influence factor set every kind of influence because Tight ness rating between element.
Because above-mentioned dependency degree and degree of association two indices are simultaneously to the association journey between influence factor and extent of crime What degree worked, accordingly, it would be desirable to two parameters of above-mentioned dependency degree and the degree of association are merged into a parameter, the specific process for merging May be referred to the principle of parallel resistance.
Wherein, according to dependency degree and the degree of association determine in extent of crime and crime influence factor set every kind of influence factor it Between tight ness rating, specifically include:
Calculate the product between above-mentioned dependency degree and the degree of association;Calculate between above-mentioned dependency degree and the degree of association and value;Root It is tight between every kind of influence factor in determining extent of crime and influence factor set according to the ratio between above-mentioned sum of products and value Degree.
Specifically, above-mentioned tight ness rating can be calculated by formula (5):
Wherein, in above-mentioned formula (5), hf (XP) between every kind of influence factor in extent of crime and influence factor set Tight ness rating, sptXY is dependency degree of the extent of crime to every kind of influence factor in influence factor set, S (Y, XP) it is influence factor XPThe degree of association between extent of crime.
S150, it is corresponding according to every kind of influence factor in above-mentioned tight ness rating, the comentropy of sample data and influence factor set Comentropy set up crime decision tree.
Wherein, as shown in figure 3, above-mentioned according to tight ness rating, the comentropy of sample data and every kind of shadow in influence factor set The corresponding comentropy of the factor of sound sets up decision tree, including step S310-S330, specific as follows.
S310, it is corresponding according to every kind of influence factor in above-mentioned tight ness rating, the comentropy of sample data and influence factor set Comentropy, calculate the information gain of every kind of influence factor in above-mentioned influence factor set;
S320, determines the information gain of maximum, and the above-mentioned maximum corresponding influence factor of information gain is defined as The root split vertexes of crime decision tree;
S330, crime decision tree is set up according to above-mentioned split vertexes.
Wherein, the above-mentioned letter according to every kind of influence factor in tight ness rating, the information of sample data and in influence factor set On breath, the information gain of every kind of influence factor in influence factor set is calculated by formula (6);
Gain(XP, S) and=Gain (XP,S)×hf(XP)=[Info (S)-InfoXP(S)]×hf(XP) (6)
Wherein, in above-mentioned formula (6), Gain (XP, S) and it is information gain that basic ID3 algorithms are calculated, Gain (XP,S)NewIt is the information gain calculated using the new ID3 algorithms obtained after above-mentioned tight ness rating amendment, above-mentioned hf (XP) criminal of being Tight ness rating in guilty degree and influence factor set between every kind of influence factor, above-mentioned Info (S) is the comentropy of sample data,It is the corresponding comentropy of every kind of influence factor in influence factor set.
By after the information gain of the every kind of influence factor during above-mentioned computing formula calculates influence factor set, Cong Zhongxuan The information gain of maximum is taken, and maximum information gain is defined as the root split vertexes of crime decision tree, afterwards, according to remaining The corresponding information gain of influence factor, determines the son of crime decision tree from the remaining influence factor in influence factor set Split vertexes, by that analogy, set up above-mentioned crime decision tree.
After above-mentioned crime decision tree is established, according to the output result of crime decision tree, it may be determined that go out to cause crime The heavier factor of degree, such as, the output result of crime decision tree is:
1st, IF single parent's divorced situation=no AND family economical conditions=difficulty father AND schooling=junior middle school and following The bad site conditions of AND discrepancy=frequent THEN extent of crime=general;
2nd, IF single parent's divorced situation=the be AND AND of bad site conditions=frequently THEN of situation=frequently that fight that come in and go out violate Guilty degree=serious;
3rd, IF single parent's divorced situation=be AND family economical conditions=difficulty AND comes in and goes out the AND of bad site conditions=frequently The smoking situation=frequently of drinking AND fights the THEN extent of crime of situation=frequently=serious;
4th, IF single parent's divorced situation=be AND family economical conditions=difficulty father AND schooling=illiteracy AND is in school Performance=difference AND record of previous crime situation=time THEN extent of crime=heavier;
5th, IF single parent's divorced situation=no AND student's conducts=difference AND fight situation=general AND records of previous crime situation= THEN extent of crime=serious twice.
According to the result of above-mentioned output, it may be determined that the factor that going out causes extent of crime heavier has:Parental separation situation, family Front yard economic condition, educational level of parents, the bad site conditions of discrepancy, situation of fighting, record of previous crime situation etc., wherein, especially Need to be concerned with parental separation situation, family economical condition, educational level of parents to minor crime by large effect.
In order to exclude the influence of criminal record, when the sample data of minor crime is gathered, can gather not preceding The crime sample data of the minor of section's crime.
After the above results determine the key factor for causing minor crime, judicial workers and relevant portion can To take the strategy of some preventions minor crime, such as, education etc. of being had conversation to the teenager of single parent's divorced.
The method for building up of minor decision tree provided in an embodiment of the present invention, has screened out to extent of crime shadow in the method Ring less some influence factors of value so that decision tree to set up process simple and time-consuming shorter, in addition, calculating information gain When introduce tight ness rating this parameter between extent of crime and influence factor so that the root split vertexes determined are more accurate Really, so as to improve the nicety of grading of the decision tree of foundation.
Embodiment 2
The embodiment of the invention provides a kind of minor crime decision tree sets up device, and the device is above-mentioned for performing The method that embodiment 1 is provided, wherein, as shown in figure 4, the device includes that acquisition module 410, screening module 420, first determine mould Block 430, the second determining module 440 and set up module 450;
Above-mentioned acquisition module 410, the sample data for obtaining minor crime, the sample data includes teenage The crime influence factor and extent of crime of people;
Above-mentioned screening module 420, for determining influence value of every kind of crime influence factor to above-mentioned extent of crime, from above-mentioned Influence factor set of the influence value more than or equal to preset value is filtered out in sample data;
Above-mentioned first determining module 430, for determining above-mentioned extent of crime on every kind of influence in above-mentioned influence factor set The degree of association in the dependency degree of factor, and the above-mentioned extent of crime of determination and influence factor set between every kind of influence factor;
Above-mentioned second determining module 440, for determining above-mentioned extent of crime with influence according to above-mentioned dependency degree and the degree of association Tight ness rating in sets of factors between every kind of influence factor;
It is above-mentioned to set up module 450, for according to every in above-mentioned tight ness rating, the comentropy of sample data and influence factor set Plant the corresponding comentropy of influence factor and set up crime decision tree.
Wherein, above-mentioned screening module 420 determines influence value of every kind of crime influence factor to extent of crime, is by first What determining unit, the first computing unit, the second computing unit and the second determining unit were realized, specifically include:
Above-mentioned first determining unit, for determine the property value of every kind of crime influence factor and extent of crime property value it Between joint probability distribution;Above-mentioned first computing unit, for according to above-mentioned joint probability distribution calculate above-mentioned crime influence because Covariance between element and extent of crime;Above-mentioned second computing unit, for according to joint probability distribution calculate crime influence because The variance of element and the variance of extent of crime;Above-mentioned second determining unit, for according to above-mentioned covariance, above-mentioned crime influence factor Variance and above-mentioned extent of crime variance, determine influence value of the above-mentioned crime influence factor to above-mentioned extent of crime.
Wherein, above-mentioned second determining module 440 determines above-mentioned extent of crime and influence factor according to dependency degree and the degree of association Tight ness rating in set between every kind of influence factor, is by the 3rd computing unit, the 4th computing unit and the 3rd determining unit Realize, specifically include:
Above-mentioned 3rd computing unit, for calculating the product between above-mentioned dependency degree and the degree of association;Above-mentioned 4th calculates single Unit, for calculating between above-mentioned dependency degree and the degree of association and value;Above-mentioned 3rd determining unit, for according to above-mentioned sum of products State the tight ness rating that the ratio and value between determines in above-mentioned extent of crime and influence factor set between every kind of influence factor.
Above-mentioned module 450 of setting up is according to above-mentioned tight ness rating, the comentropy of sample data and every kind of shadow in influence factor set The corresponding comentropy of the factor of sound sets up crime decision tree, is by the 5th computing unit, the 4th determining unit and sets up unit reality Existing, specifically include:
Above-mentioned 5th computing unit, for according to above-mentioned tight ness rating, the comentropy of above-mentioned sample data and above-mentioned influence because The corresponding comentropy of every kind of influence factor in element set, the information for calculating every kind of influence factor in above-mentioned influence factor set increases Benefit;Above-mentioned 4th determining unit, the information gain for determining maximum, and by the above-mentioned maximum corresponding influence of information gain Factor is defined as the root split vertexes of crime decision tree;It is above-mentioned to set up unit, for setting up above-mentioned according to above-mentioned split vertexes Crime decision tree.
Minor crime decision tree provided in an embodiment of the present invention sets up device, has screened out to extent of crime influence value Less some influence factors so that decision tree to set up process simple and time-consuming shorter, in addition, drawing when information gain is calculated Enter tight ness rating this parameter between extent of crime and influence factor so that the root split vertexes determined are more accurate, from And improve the nicety of grading of the decision tree of foundation.
The device of setting up of the minor crime decision tree that the embodiment of the present invention is provided can be specific hard in equipment Part or the software being installed in equipment or firmware etc..The device that the embodiment of the present invention is provided, its realization principle and generation Technique effect is identical with preceding method embodiment, and to briefly describe, device embodiment part does not refer to part, refers to foregoing side Corresponding contents in method embodiment.It is apparent to those skilled in the art that, it is for convenience and simplicity of description, foregoing The specific work process of the system, device and unit of description, may be referred to the corresponding process in above method embodiment, herein Repeat no more.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, can be by other sides Formula is realized.Device embodiment described above is only schematical, for example, the division of the unit, only one kind are patrolled Collect function to divide, there can be other dividing mode when actually realizing, but for example, multiple units or component can combine or can To be integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed each other Coupling or direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of device or unit by some communication interfaces Connect, can be electrical, mechanical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.
In addition, during each functional unit in the embodiment that the present invention is provided can be integrated in a processing unit, also may be used Being that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.
If the function is to realize in the form of SFU software functional unit and as independent production marketing or when using, can be with Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment (can be individual People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined in individual accompanying drawing, then it need not be further defined and explained in subsequent accompanying drawing, additionally, term " the One ", " second ", " the 3rd " etc. are only used for distinguishing description, and it is not intended that indicating or implying relative importance.
Finally it should be noted that:Embodiment described above, specific embodiment only of the invention, is used to illustrate the present invention Technical scheme, rather than its limitations, protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those within the art that:Any one skilled in the art The invention discloses technical scope in, it can still modify to the technical scheme described in previous embodiment or can be light Change is readily conceivable that, or equivalent is carried out to which part technical characteristic;And these modifications, change or replacement, do not make The essence of appropriate technical solution departs from the spirit and scope of embodiment of the present invention technical scheme.Should all cover in protection of the invention Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of method for building up of minor crime decision tree, it is characterised in that methods described includes:
The sample data of minor crime is obtained, the sample data includes the crime influence factor and crime journey of minor Degree;
Determine influence value of every kind of crime influence factor to the extent of crime, filtered out from the sample data described Influence factor set of the influence value more than or equal to preset value;
Determine dependency degree of the extent of crime to every kind of influence factor in the influence factor set, and determine the crime journey The degree of association in degree and the influence factor set between every kind of influence factor;
According to the dependency degree and the degree of association determine the extent of crime and every kind of influence in the influence factor set because Tight ness rating between element;
According to the tight ness rating, the comentropy of the sample data and in the influence factor set, every kind of influence factor is corresponding Comentropy sets up crime decision tree.
2. method according to claim 1, it is characterised in that the every kind of crime influence factor of determination is to the criminal The influence value of guilty degree, including:
Determine the joint probability distribution between every kind of crime influence factor and the extent of crime;
Covariance between the crime influence factor and the extent of crime is calculated according to the joint probability distribution;
The variance of the crime influence factor and the variance of the extent of crime are calculated according to the joint probability distribution;
According to the variance of the covariance, the variance of the crime influence factor and the extent of crime, the crime shadow is determined Influence value of the factor of sound to the extent of crime.
3. method according to claim 1, it is characterised in that the determination extent of crime is to the influence factor collection The dependency degree of every kind of influence factor in conjunction, including:
Determine positive domain of the extent of crime on every kind of influence factor in the influence factor set;
The corresponding number of samples of every kind of influence factor in number of samples and the influence factor set in the positive domain, really The fixed dependency degree.
4. method according to claim 1, it is characterised in that the determination extent of crime and the influence factor collection The degree of association in conjunction between every kind of influence factor, including:
Every kind of influence factor takes corresponding number of samples during different property values in determining the influence factor set;
Determine to belong in the number of samples subsample number of the corresponding every kind of property value of the extent of crime;
In determining the extent of crime and the influence factor set according to the subsample number every kind of influence factor it Between the degree of association.
5. method according to claim 1, it is characterised in that described that institute is determined according to the dependency degree and the degree of association The tight ness rating between every kind of influence factor in extent of crime and the influence factor set is stated, including:
Calculate the product between the dependency degree and the degree of association;
Calculate between the dependency degree and the degree of association and value;
Ratio according to the sum of products and value between determines every kind of in the extent of crime and the influence factor set Tight ness rating between influence factor.
6. method according to claim 1, it is characterised in that described according to the tight ness rating, the letter of the sample data The corresponding comentropy of the every kind of influence factor sets up crime decision tree in breath entropy and the influence factor set, including:
According to the tight ness rating, the comentropy of the sample data and in the influence factor set, every kind of influence factor is corresponding Comentropy, calculates the information gain of every kind of influence factor in the influence factor set;
The information gain of maximum is determined, and the maximum corresponding influence factor of information gain is defined as crime decision tree Root split vertexes;
The crime decision tree is set up according to described split vertexes.
7. a kind of minor crime decision tree sets up device, it is characterised in that described device includes:
Acquisition module, the sample data for obtaining minor crime, the sample data includes the crime shadow of minor Ring factor and extent of crime;
Screening module, for determining influence value of every kind of crime influence factor to the extent of crime, from the sample number Influence factor set of the influence value more than or equal to preset value is filtered out in;
First determining module, for determining dependence of the extent of crime to every kind of influence factor in the influence factor set The degree of association in degree, and the determination extent of crime and the influence factor set between every kind of influence factor;
Second determining module, for determining the extent of crime with the influence factor according to the dependency degree and the degree of association Tight ness rating in set between every kind of influence factor;
Module is set up, for according to every kind of in the tight ness rating, the comentropy of the sample data and the influence factor set The corresponding comentropy of influence factor sets up crime decision tree.
8. device according to claim 7, it is characterised in that the screening module includes:
First determining unit, for determine the property value of every kind of crime influence factor and the extent of crime property value it Between joint probability distribution;
First computing unit, for according to the joint probability distribution calculate the crime influence factor and the extent of crime it Between covariance;
Second computing unit, for calculating the variance of the crime influence factor and the crime according to the joint probability distribution The variance of degree;
Second determining unit, for the side according to the covariance, the variance of the crime influence factor and the extent of crime Difference, determines influence value of the crime influence factor to the extent of crime.
9. device according to claim 7, it is characterised in that second determining module includes:
3rd computing unit, for calculating the product between the dependency degree and the degree of association;
4th computing unit, for calculating between the dependency degree and the degree of association and value;
3rd determining unit, the extent of crime and the shadow are determined for the ratio according to the sum of products and value between Tight ness rating in sound sets of factors between every kind of influence factor.
10. device according to claim 7, it is characterised in that the module of setting up includes:
5th computing unit, for according in the tight ness rating, the comentropy of the sample data and the influence factor set The corresponding comentropy of every kind of influence factor, calculates the information gain of every kind of influence factor in the influence factor set;
4th determining unit, the information gain for determining maximum, and by the maximum corresponding influence of information gain because Element is defined as the root split vertexes of crime decision tree;
Unit is set up, for setting up the crime decision tree according to described split vertexes.
CN201611208475.5A 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree Active CN106780258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611208475.5A CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611208475.5A CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Publications (2)

Publication Number Publication Date
CN106780258A true CN106780258A (en) 2017-05-31
CN106780258B CN106780258B (en) 2020-08-14

Family

ID=58920145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611208475.5A Active CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Country Status (1)

Country Link
CN (1) CN106780258B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657840A (en) * 2018-11-22 2019-04-19 东软集团股份有限公司 Decision tree generation method, device, computer readable storage medium and electronic equipment
CN110298544A (en) * 2019-05-23 2019-10-01 南京柳橙信息科技有限公司 A kind of minor crime's investigation and assessment overall analysis system
CN114090601A (en) * 2021-11-23 2022-02-25 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN117172381A (en) * 2023-09-05 2023-12-05 薪海科技(上海)有限公司 Risk prediction method based on big data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657840A (en) * 2018-11-22 2019-04-19 东软集团股份有限公司 Decision tree generation method, device, computer readable storage medium and electronic equipment
CN110298544A (en) * 2019-05-23 2019-10-01 南京柳橙信息科技有限公司 A kind of minor crime's investigation and assessment overall analysis system
CN114090601A (en) * 2021-11-23 2022-02-25 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN114090601B (en) * 2021-11-23 2023-11-03 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN117172381A (en) * 2023-09-05 2023-12-05 薪海科技(上海)有限公司 Risk prediction method based on big data

Also Published As

Publication number Publication date
CN106780258B (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN104731954B (en) Music is had an X-rayed based on group and recommends method and system
CN107085803B (en) Individualized teaching resource recommendation system based on knowledge graph and ability evaluation
CN110245981B (en) Crowd type identification method based on mobile phone signaling data
CN104182517B (en) The method and device of data processing
CN103530540B (en) User identity attribute detection method based on man-machine interaction behavior characteristics
CN110413707A (en) The excavation of clique's relationship is cheated in internet and checks method and its system
CN106681996B (en) The method and apparatus for determining interest region in geographic range, point of interest
CN106780258A (en) A kind of method for building up and device of minor crime decision tree
CN106384282A (en) Method and device for building decision-making model
CN104021153A (en) Personalized recommendation method for campus books
CN110413973A (en) Computer automatically generates the method and its system of set volume
CN103208211A (en) Method and device for question selection of network education test
CN107391670A (en) A kind of mixing recommendation method for merging collaborative filtering and user property filtering
CN111723973B (en) Learning effect optimization method based on user behavior causal relationship in MOOC log data
CN109740861A (en) A kind of learning data analysis method and device
CN109871479A (en) A kind of collaborative filtering method based on user items class and the reliability that scores
CN110110225A (en) Online education recommended models and construction method based on user behavior data analysis
CN108132964A (en) A kind of collaborative filtering method to be scored based on user item class
CN107239964A (en) User is worth methods of marking and system
CN108038734B (en) Urban commercial facility spatial distribution detection method and system based on comment data
Widayat et al. Bibliometric analysis and visualization articles on presidential election in social media indexed in Scopus by Indonesian authors
CN107291616A (en) A kind of online generating platform of project report
CN114022269A (en) Enterprise credit risk assessment method in public credit field
CN106844765A (en) Notable information detecting method and device based on convolutional neural networks
Ramadiani et al. Evaluation of student academic performance using e-learning with the association rules method and the importance of performance analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20220726

Granted publication date: 20200814

PP01 Preservation of patent right