CN106780258A - A kind of method for building up and device of minor crime decision tree - Google Patents
A kind of method for building up and device of minor crime decision tree Download PDFInfo
- Publication number
- CN106780258A CN106780258A CN201611208475.5A CN201611208475A CN106780258A CN 106780258 A CN106780258 A CN 106780258A CN 201611208475 A CN201611208475 A CN 201611208475A CN 106780258 A CN106780258 A CN 106780258A
- Authority
- CN
- China
- Prior art keywords
- crime
- influence factor
- extent
- influence
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Abstract
The invention provides a kind of method for building up and device of minor crime decision tree, including:Obtain the sample data of minor crime;Determine influence value of every kind of crime influence factor to extent of crime, influence factor set of the influence value more than or equal to preset value is screened from above-mentioned sample data;Determine dependency degree of the extent of crime to every kind of influence factor in influence factor set, determine the degree of association of every kind of influence factor in extent of crime and influence factor set;The tight ness rating of extent of crime and every kind of influence factor in influence factor set is determined according to dependency degree and the degree of association;According to tight ness rating, the comentropy of sample data and in influence factor set, the comentropy of every kind of influence factor sets up crime decision tree.In the present invention, screen out some influence factors less to extent of crime influence value so that decision tree to set up process simple and time-consuming shorter, and introduce the tight ness rating between extent of crime and influence factor this parameter, improve the nicety of grading of the decision tree of foundation.
Description
Technical field
The present invention relates to big data analysis technical field, in particular to a kind of building for minor crime decision tree
Cube method and device.
Background technology
Influence with network, science and technology, New Culture to daily life, minor crime's ratio is also rising year by year, not
Adult is the following successor for building of motherland, and minor crime both endangers social stability, and minor colony is good for
Kang Chengchang brings deleterious effect.Therefore, the criminal offence to minor is studied, analyzing influence minor crime's
Factor, to prevent minor crime to be particularly important.
In the prior art, the general decision-tree model that minor crime's influence factor is set up using ID3 algorithms, but,
Influenceing the factor of minor crime has various, and some factors are larger to minor's influence degree, and some factors are to teenage
People's crime influence degree is smaller, but in the prior art when decision tree is set up, all of factor is all taken into account, so,
So that decision tree to set up process more complicated and time-consuming, also, there are variety bias in ID3 algorithms, so that pass through
The information gain of the influence factor that ID3 algorithms are calculated is inaccurate, and then reduces the nicety of grading of the decision tree of foundation.
The content of the invention
In view of this, the purpose of the embodiment of the present invention be provide a kind of minor crime decision tree method for building up and
Device, to solve directly to set up decision tree using ID3 algorithms in the prior art so that decision tree sets up the more complicated consumption of process
When, and due to the variety bias problem of ID3 algorithms, the relatively low problem of the nicety of grading of caused decision tree.
In a first aspect, a kind of method for building up of minor crime decision tree is the embodiment of the invention provides, wherein, it is described
Method includes:
The sample data of minor crime is obtained, the sample data includes crime influence factor and the criminal of minor
Guilty degree;
Determine that every kind of crime influence factor, to the influence value of the extent of crime, is filtered out from the sample data
Influence factor set of the influence value more than or equal to preset value;
Determine dependency degree of the extent of crime to every kind of influence factor in the influence factor set, and determine the criminal
The degree of association in guilty degree and the influence factor set between every kind of influence factor;
Determine the extent of crime with every kind of shadow in the influence factor set according to the dependency degree and the degree of association
Tight ness rating between the factor of sound;
According to the tight ness rating, the comentropy of the sample data and every kind of influence factor pair in the influence factor set
The comentropy answered sets up crime decision tree.
With reference in a first aspect, the embodiment of the invention provides the first possible implementation of above-mentioned first aspect, its
In, the influence value for determining every kind of crime influence factor to the extent of crime, including:
Determine the joint probability between the property value and the property value of the extent of crime of every kind of crime influence factor
Distribution;
Covariance between the crime influence factor and the extent of crime is calculated according to the joint probability distribution;
The variance of the crime influence factor and the variance of the extent of crime are calculated according to the joint probability distribution;
According to the variance of the covariance, the variance of the crime influence factor and the extent of crime, the criminal is determined
Influence value of the guilty influence factor to the extent of crime.
With reference in a first aspect, the embodiment of the invention provides second possible implementation of above-mentioned first aspect, its
In, the dependency degree for determining the extent of crime to every kind of influence factor in the influence factor set, including:
Determine positive domain of the extent of crime on every kind of influence factor in the influence factor set;
The corresponding sample of every kind of influence factor in number of samples and the influence factor set in the positive domain
Number, determines the dependency degree.
With reference in a first aspect, the embodiment of the invention provides the third possible implementation of above-mentioned first aspect, its
In, the degree of association in the determination extent of crime and the influence factor set between every kind of influence factor, including:
Every kind of influence factor takes corresponding number of samples during different property values in determining the influence factor set;
Determine to belong in the number of samples subsample number of the corresponding every kind of property value of the extent of crime;
In determining the extent of crime and the influence factor set according to the subsample number every kind of influence because
The degree of association between element.
With reference in a first aspect, the embodiment of the invention provides the 4th kind of possible implementation of above-mentioned first aspect, its
In, it is described to determine the extent of crime with every kind of influence in the influence factor set according to the dependency degree and the degree of association
Tight ness rating between factor, including:
Calculate the product between the dependency degree and the degree of association;
Calculate between the dependency degree and the degree of association and value;
Ratio according to the sum of products and value between is determined in the extent of crime and the influence factor set
Tight ness rating between every kind of influence factor.
With reference in a first aspect, the embodiment of the invention provides the 5th kind of possible implementation of above-mentioned first aspect, its
In, every kind of influence according to the tight ness rating, the comentropy of the sample data and in the influence factor set because
The corresponding comentropy of element sets up crime decision tree, including:
According to the tight ness rating, the comentropy of the sample data and every kind of influence factor pair in the influence factor set
The comentropy answered, calculates the information gain of every kind of influence factor in the influence factor set;
The information gain of maximum is determined, and the maximum corresponding influence factor of information gain is defined as crime and determined
The root split vertexes of plan tree;
The crime decision tree is set up according to described split vertexes.
Second aspect, the embodiment of the invention provides a kind of minor crime decision tree sets up device, wherein, it is described
Device includes:
Acquisition module, the sample data for obtaining minor crime, the sample data includes the criminal of minor
Guilty influence factor and extent of crime;
Screening module, for determining influence value of every kind of crime influence factor to the extent of crime, from the sample
Influence factor set of the influence value more than or equal to preset value is filtered out in notebook data;
First determining module, for determine the extent of crime in the influence factor set every kind of influence factor according to
The degree of association in Lai Du, and the determination extent of crime and the influence factor set between every kind of influence factor;
Second determining module, for determining the extent of crime with the influence according to the dependency degree and the degree of association
Tight ness rating in sets of factors between every kind of influence factor;
Module is set up, for according in the tight ness rating, the comentropy of the sample data and the influence factor set
The corresponding comentropy of every kind of influence factor sets up crime decision tree.
With reference to second aspect, the first possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its
In, the screening module includes:
First determining unit, for determining the property value of every kind of crime influence factor and the attribute of the extent of crime
Joint probability distribution between value;
First computing unit, for calculating the crime influence factor and the crime journey according to the joint probability distribution
Covariance between degree;
Second computing unit, for calculating the variance of the crime influence factor and described according to the joint probability distribution
The variance of extent of crime;
Second determining unit, for according to the covariance, the variance of the crime influence factor and the extent of crime
Variance, determine influence value of the crime influence factor to the extent of crime.
With reference to second aspect, second possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its
In, second determining module includes:
3rd computing unit, for calculating the product between the dependency degree and the degree of association;
4th computing unit, for calculating between the dependency degree and the degree of association and value;
3rd determining unit, the extent of crime and institute are determined for the ratio according to the sum of products and value between
State the tight ness rating between every kind of influence factor in influence factor set.
With reference to second aspect, the third possible implementation of above-mentioned second aspect is the embodiment of the invention provides, its
In, the module of setting up includes:
5th computing unit, for according to the tight ness rating, the comentropy of the sample data and the influence factor collection
The corresponding comentropy of every kind of influence factor in conjunction, calculates the information gain of every kind of influence factor in the influence factor set;
4th determining unit, the information gain for determining maximum, and by the maximum corresponding shadow of information gain
The factor of sound is defined as the root split vertexes of the crime decision tree;
Unit is set up, for setting up the crime decision tree according to described split vertexes.
A kind of method for building up and device of minor crime decision tree are provided in the present invention, it is right to have been screened out in the method
Less some influence factors of extent of crime influence value so that decision tree to set up process simple and time-consuming shorter, in addition, in meter
Tight ness rating this parameter between extent of crime and influence factor is introduced when calculating information gain so that the root division section determined
Point is more accurate, so as to improve the nicety of grading of the decision tree of foundation.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate
Appended accompanying drawing, is described in detail below.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be attached to what is used needed for embodiment
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows the flow chart of the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided;
Fig. 2 shows and commit a crime in the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided influence
Flow chart of the factor to the determination method of the influence value of extent of crime;
Fig. 3 shows crime decision-making in the method for building up of the minor crime decision tree that the embodiment of the present invention 1 is provided
The flow chart for setting up process of tree;
Fig. 4 shows the structural representation for setting up device of the minor crime decision tree that the embodiment of the present invention 2 is provided
Figure.
Fig. 4 description of reference numerals:
410, acquisition module;420 screening modules;430. first determining modules;440, the second determining module;450, set up mould
Block.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
Middle accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only
It is a part of embodiment of the invention, rather than whole embodiments.The present invention generally described and illustrated in accompanying drawing herein is real
The component for applying example can be arranged and designed with a variety of configurations.Therefore, it is of the invention to what is provided in the accompanying drawings below
The detailed description of embodiment is not intended to limit the scope of claimed invention, but is merely representative of selected reality of the invention
Apply example.Based on embodiments of the invention, the institute that those skilled in the art are obtained on the premise of creative work is not made
There is other embodiment, belong to the scope of protection of the invention.
In view of in the prior art, when the factor to influenceing minor crime is analyzed, it is necessary to set up decision tree,
Mostly it is the decision-tree model that minor crime's influence factor is set up using ID3 algorithms, but, influence minor crime's
Factor has various, and some factors are larger to minor's influence degree, and some factors are smaller to minor crime's influence degree,
But in the prior art when decision tree is set up, all of factor is all taken into account, so so that decision tree sets up process
It is more complicated and time-consuming, also, ID3 algorithms have variety bias, so that the influence calculated by ID3 algorithms because
The information gain of element is inaccurate, and then reduces the nicety of grading of the decision tree of foundation.Based on this, the embodiment of the invention provides
A kind of method for building up and device of minor crime decision tree, are described below by embodiment.
Embodiment 1
A kind of method for building up of minor crime decision tree is the embodiment of the invention provides, wherein, as shown in figure 1, should
Method includes step S110-S150, specific as follows.
S110, obtain minor crime sample data, the sample data include minor crime's influence factor and
Extent of crime.
, it is necessary to gather teenage before minor crime decision tree is set up using method provided in an embodiment of the present invention
The sample data of people's crime, the i.e. extent of crime to multiple crime minors and corresponding personal influence factor are adopted
Collection, such as, record the number such as extent of crime, sex, age, home background and situation of going to school of the minor of certain crime
According to, and the data storage that will be recorded.
When decision tree is set up using method provided in an embodiment of the present invention, then the sample of minor crime is obtained first
Data.
Wherein, above-mentioned extent of crime includes lighter, general, heavier and serious four attribute value.
The factor of above-mentioned influence minor crime potentially includes sex, age, single parent's divorced situation, father and mother place of working
Point and teacher father and mother get along situation, only-child, homeplace, family economical condition, personal schooling, father's culture
Degree, mother education level, student's conduct, the bad site conditions of discrepancy, smoking situation of drinking, situation of fighting, record of previous crime feelings
Condition.
Also, above-mentioned every kind of influence factor correspondence multiple property value, such as, the property value at age includes man and female;Family
The property value of front yard economic condition includes rich, general and difficult;The property value of father's schooling include illiteracy, junior middle school and with
Under, senior middle school, university and more than;The property value of mother education level include illiteracy, junior middle school and following, senior middle school, university and more than;
The existing property value of calibration include it is excellent, good, in, it is poor;Come in and go out bad site conditions property value include frequently, it is general, seldom and nothing;
The property value of smoking situation of drinking includes often, general, once in a while and nothing;The property value of situation of fighting includes frequent, general, little
And nothing;The property value of record of previous crime situation is included once, twice and repeatedly.
Wherein, the above-mentioned part that simply lists may produce the factor of influence, and wherein one to minor crime
The property value that some factors are potentially included, but, not on may on minor crime produce influence factor and because
The property value that element includes is defined, and above-mentioned simply citing is illustrated.
S120, determines influence value of every kind of above-mentioned crime influence factor to extent of crime, is sieved from above-mentioned sample data
Select influence factor set of the influence value more than or equal to preset value.
Wherein, as shown in Fig. 2 the every kind of crime influence factor of above-mentioned determination is to the influence value of extent of crime, including step
S210-S230, it is specific as follows:
S210, determines the joint probability distribution between every kind of crime influence factor and extent of crime;
S220, the covariance between above-mentioned crime influence factor and extent of crime is calculated according to above-mentioned joint probability distribution;
S230, the variance of crime influence factor and the variance of extent of crime are calculated according to above-mentioned joint probability distribution;
S240, according to the variance of above-mentioned covariance, the variance of crime influence factor and extent of crime, determines above-mentioned crime shadow
Influence value of the factor of sound to extent of crime.
Wherein, the covariance between above-mentioned crime influence factor and extent of crime can be calculated by formula (1);
Wherein, in above-mentioned formula (1), XPCrime influence factor is represented, the value of P is 1,2,3 ..., xqRepresent crime shadow
The property value of the factor of sound, q=1,2 ..., Y represents extent of crime, ytThe property value of extent of crime is represented, wherein, t=1,2,3,4,
pqtIt is the joint probability value between crime influence factor and extent of crime, E (XP) it is the expectation of influence factor, E (Y) is crime journey
The expectation of degree, above-mentioned k is the number of the property value of certain crime influence factor, and m is the number of the property value of each extent of crime.
Such as, if above-mentioned crime influence factor is family economical condition, the property value of family economical condition includes richness
It is abundant, general and poor, then, the number of the property value of family economical condition is 3 for the value of 3, i.e. k, and the category of extent of crime
Property value include it is lighter, general, heavier and serious, therefore, the value of m is 4, certainly, the property value of extent of crime, crime influence because
The property value of element is not limited thereto.
Wherein, what above-mentioned influence value was represented is the performance indications of tightness degree between crime influence factor and extent of crime,
The specific mistake for determining family economical condition to the influence value of extent of crime will be discussed in detail by taking family economical condition as an example below
Journey.
Such as, when the value of P is 2, then X2Family economical condition is represented, at this moment, the value of q is 3, i.e. x1Represent rich
Abundant, x2Represent general, x3Represent difficult, ytThe property value of extent of crime is represented, wherein, y1Represent relatively light, y2Represent general, y3Table
Show heavier, y4Represent serious.
From sample data, number, the family of domestic conditions affluence and the lighter sample of extent of crime can be respectively determined
Front yard condition affluence and the general number of samples of extent of crime, domestic conditions affluence and the heavier number of samples of extent of crime and family
Condition affluence and the serious number of samples of extent of crime;And domestic conditions are general and the lighter sample number of extent of crime, family
Condition is general and extent of crime is general number of samples, the number of samples and family's bar that domestic conditions are general and extent of crime is heavier
The number of samples that part is general and extent of crime is serious;Domestic conditions difficulty and the lighter number of samples of extent of crime, domestic conditions
The difficult and general number of samples of extent of crime, domestic conditions are difficult and the heavier number of samples of extent of crime and domestic conditions are tired
The difficult and serious number of samples of extent of crime.
According to the above-mentioned number of samples determined, it may be determined that the joint gone out between family economical condition and extent of crime is general
Rate is distributed, such as, above domestic conditions are rich and number of the lighter sample of extent of crime is 3, domestic conditions are rich and violate
The general number of samples of guilty degree is that 2, domestic conditions are rich and the heavier number of samples of extent of crime is rich 4, domestic conditions
And the serious number of samples of extent of crime is 1;And domestic conditions are general and the lighter sample number of extent of crime is 5, domestic conditions
The general and general number of samples of extent of crime is that the number of samples that 0, domestic conditions are general and extent of crime is heavier is 2, family
The number of samples that condition is general and extent of crime is serious is 3;Domestic conditions are difficult and the lighter number of samples of extent of crime is 2,
Domestic conditions are difficult and the general number of samples of extent of crime is 5, domestic conditions difficulty and the heavier number of samples of extent of crime
For 1 and domestic conditions are difficult and the serious number of samples of extent of crime is 0.
The joint probability distribution such as institute of table 1 between the family economical condition and extent of crime then determined by above-mentioned data
Show.
After joint probability distribution is determined, by formula (1) can determine family economical condition and extent of crime it
Between covariance.
And the joint probability distribution in table (1) can determine the variance and extent of crime of family economical condition
Variance.
Table 1
Wherein, the above-mentioned variance according to covariance, the variance of every kind of crime influence factor and extent of crime, by formula
(2) influence value of the crime influence factor to extent of crime is calculated;
Wherein, in formula (2), ωPIt is crime influence factor to the influence value of extent of crime, above-mentioned Cov (xp, Y) and the criminal of being
Covariance between guilty influence factor and extent of crime, it is above-mentionedIt is the variance of crime influence factor,It is crime
The variance of degree.
In above-mentioned, the variance of family economical condition, the variance of extent of crime and family economical condition and criminal are calculated
Covariance between guilty degree, therefore, influence value of the family economical condition to extent of crime can be calculated by formula (2).
Wherein, as can be seen that ω from formula (2)PValue be [0,1], and ωPIt is to weigh crime influence factor and criminal
A kind of module of the degree that is closely related between guilty degree.
It is above-mentioned that influence factor set of the influence value more than or equal to preset value is filtered out from sample data, can be by such as
Lower process is realized:
By above-mentioned ωPIt is compared with preset value, works as ωPDuring less than preset value, show crime influence factor and extent of crime
Between degree of correlation it is smaller, at this moment, the crime influence factor is screened out, work as ωPDuring more than or equal to preset value, the shadow is illustrated
The degree of correlation rung between the corresponding crime influence factor of value and extent of crime is larger, and the influence factor is added into influence factor collection
In conjunction.
Wherein, an influence value set in advance during above-mentioned preset value, such as 0.6, certainly, above-mentioned preset value can be with
It is other numerical value, concrete numerical value of the embodiment of the present invention not to above-mentioned preset value is configured, user can answer according to specific
It is configured with scene.
Influence factor in above-mentioned crime influence factor set is the larger shadow of degree of correlation between extent of crime
The factor of sound, therefore, during hereinafter setting up crime decision tree, the influence factor for being used is crime influence factor collection
Influence factor in conjunction.
S130, determines dependency degree of the above-mentioned extent of crime to every kind of influence factor in influence factor set, and determine crime
The degree of association in degree and influence factor set between every kind of influence factor.
Wherein, above-mentioned determination extent of crime is specifically included to the dependency degree of every kind of influence factor in influence factor set:Really
Fixed positive domain of the above-mentioned extent of crime on every kind of influence factor in influence factor set;Number of samples in above-mentioned positive domain and
The corresponding number of samples of every kind of influence factor, determines above-mentioned dependency degree in influence factor set.
Wherein, above-mentioned extent of crime can be by formula (3) to the dependency degree of every kind of influence factor in influence factor set
Calculate;
Wherein, in above-mentioned formula (3), sptX(Y) represent extent of crime to every kind of influence factor in influence factor set
Dependency degree, POSX(Y) the positive domains of X of Y are represented, wherein, Y represents extent of crime, and X represents influence factor set, and U is influence factor
The corresponding increment notebook data of any one influence factor in set, Card (U) represents the number of increment notebook data, Card (POSX
(Y) number of element in positive domain) is represented.
Wherein, above-mentioned positive domain and dependency degree will be discussed in detail so that influence factor is as family economical condition as an example below
Specific calculating process:
Family economical condition is X2, the property value of family economical condition includes rich, general and difficulty, from sample data
Choose the data on family economical condition, such as, be U=record 1 (affluence), record 2 (affluences), record 3 (affluences),
4 (general) of record, record 5 (general), record 6 (difficulties) }.
Above-mentioned U includes 6 row data records, wherein property value identical will record as one group, and such one group is designated as
Equivalence class;
Equivalence class in known Y is:M={ record 2, record 3, record 6 }
X2In equivalence class be:
Group 1:E1={ 1 (affluence) of record, records 2 (affluences), records 3 (affluences) }
Group 2:E2={ 4 (general) of record, record 5 (general) }
Group 3:E3={ 6 (difficulties) of record }
Above-mentioned U is on X2Be divided into:
U/X2={ group 1, group 2, group 3 }={ { record 1, record 2, record 3 }, { record 4, record 5 }, { record 6 } }
Find above-mentioned U/X2It is middle by M={ record 2, record 3, the record 6 } element that includes, now U/X2In { record 6 } symbol
Conjunction condition, the record 6 only organized in 3 is completely included by M, now POSX(Y)={ record 6 }, therefore, the positive domains of X of Y are POSX(Y)
={ record 6 }.
Above-mentioned Card refers to the number of sample in set, POSX(Y) only one of which sample in={ record 6 }, then now according to
Lai Du
Wherein, the degree of association in above-mentioned determination extent of crime and influence factor set between every kind of influence factor, specific bag
Include:
Every kind of influence factor takes corresponding number of samples during different property values in determining above-mentioned influence factor set;It is determined that
Belong to the subsample number of corresponding every kind of property value of extent of crime in above-mentioned number of samples;It is true according to above-mentioned subsample number
Determine the degree of association between every kind of influence factor in extent of crime and influence factor set.
Wherein, the above-mentioned degree of association refers to the correlation degree between certain influence factor and extent of crime, the above-mentioned degree of association
Value it is higher, show to contact tightr between the influence factor and extent of crime, correlation degree is bigger.
The above-mentioned degree of association can be calculated by formula (4),
Wherein, in formula (4), S (Y, XP) be the degree of association between influence factor XP and extent of crime, i for influence because
Plain XPCorresponding property value, j is the corresponding property value of extent of crime, fi,jIt is above-mentioned subsample number, fi,m+1It is to work as influence factor
Take corresponding number of samples, f during each property valuen+1,jCorresponding number of samples when taking each property value for extent of crime.
The specific calculating process of the above-mentioned degree of association is still introduced by taking family economical condition as an example below:
Such as, the number of domestic conditions affluence and the lighter sample of extent of crime is 3, domestic conditions affluence and crime journey
The general number of samples of degree is that 2, domestic conditions are rich and the heavier number of samples of extent of crime is 4, domestic conditions affluence and violates
The serious number of samples of guilty degree is 1;And domestic conditions are general and the lighter sample number of extent of crime is that 5, domestic conditions are general
And the general number of samples of extent of crime is that the number of samples that 0, domestic conditions are general and extent of crime is heavier is 2, domestic conditions
The general and serious number of samples of extent of crime is 3;Domestic conditions are difficult and the lighter number of samples of extent of crime is 2, family
Condition is difficult and the general number of samples of extent of crime is that 5, domestic conditions are difficult and the heavier number of samples of extent of crime is 1 and
Domestic conditions are difficult and the serious number of samples of extent of crime is 0, as shown in table 2.
Table 2
According to the data in table 2, by formula (4) it can be calculated that pass between family economical condition and extent of crime
Connection degree is:
The above-mentioned specific calculating process for being merely illustrative the degree of association, not to the corresponding sample of family economical condition
The size of number and the specific degree of association is defined.
S140, according to above-mentioned dependency degree and the degree of association determine in above-mentioned extent of crime and influence factor set every kind of influence because
Tight ness rating between element.
Because above-mentioned dependency degree and degree of association two indices are simultaneously to the association journey between influence factor and extent of crime
What degree worked, accordingly, it would be desirable to two parameters of above-mentioned dependency degree and the degree of association are merged into a parameter, the specific process for merging
May be referred to the principle of parallel resistance.
Wherein, according to dependency degree and the degree of association determine in extent of crime and crime influence factor set every kind of influence factor it
Between tight ness rating, specifically include:
Calculate the product between above-mentioned dependency degree and the degree of association;Calculate between above-mentioned dependency degree and the degree of association and value;Root
It is tight between every kind of influence factor in determining extent of crime and influence factor set according to the ratio between above-mentioned sum of products and value
Degree.
Specifically, above-mentioned tight ness rating can be calculated by formula (5):
Wherein, in above-mentioned formula (5), hf (XP) between every kind of influence factor in extent of crime and influence factor set
Tight ness rating, sptXY is dependency degree of the extent of crime to every kind of influence factor in influence factor set, S (Y, XP) it is influence factor
XPThe degree of association between extent of crime.
S150, it is corresponding according to every kind of influence factor in above-mentioned tight ness rating, the comentropy of sample data and influence factor set
Comentropy set up crime decision tree.
Wherein, as shown in figure 3, above-mentioned according to tight ness rating, the comentropy of sample data and every kind of shadow in influence factor set
The corresponding comentropy of the factor of sound sets up decision tree, including step S310-S330, specific as follows.
S310, it is corresponding according to every kind of influence factor in above-mentioned tight ness rating, the comentropy of sample data and influence factor set
Comentropy, calculate the information gain of every kind of influence factor in above-mentioned influence factor set;
S320, determines the information gain of maximum, and the above-mentioned maximum corresponding influence factor of information gain is defined as
The root split vertexes of crime decision tree;
S330, crime decision tree is set up according to above-mentioned split vertexes.
Wherein, the above-mentioned letter according to every kind of influence factor in tight ness rating, the information of sample data and in influence factor set
On breath, the information gain of every kind of influence factor in influence factor set is calculated by formula (6);
Gain(XP, S) and=Gain (XP,S)×hf(XP)=[Info (S)-InfoXP(S)]×hf(XP) (6)
Wherein, in above-mentioned formula (6), Gain (XP, S) and it is information gain that basic ID3 algorithms are calculated, Gain
(XP,S)NewIt is the information gain calculated using the new ID3 algorithms obtained after above-mentioned tight ness rating amendment, above-mentioned hf (XP) criminal of being
Tight ness rating in guilty degree and influence factor set between every kind of influence factor, above-mentioned Info (S) is the comentropy of sample data,It is the corresponding comentropy of every kind of influence factor in influence factor set.
By after the information gain of the every kind of influence factor during above-mentioned computing formula calculates influence factor set, Cong Zhongxuan
The information gain of maximum is taken, and maximum information gain is defined as the root split vertexes of crime decision tree, afterwards, according to remaining
The corresponding information gain of influence factor, determines the son of crime decision tree from the remaining influence factor in influence factor set
Split vertexes, by that analogy, set up above-mentioned crime decision tree.
After above-mentioned crime decision tree is established, according to the output result of crime decision tree, it may be determined that go out to cause crime
The heavier factor of degree, such as, the output result of crime decision tree is:
1st, IF single parent's divorced situation=no AND family economical conditions=difficulty father AND schooling=junior middle school and following
The bad site conditions of AND discrepancy=frequent THEN extent of crime=general;
2nd, IF single parent's divorced situation=the be AND AND of bad site conditions=frequently THEN of situation=frequently that fight that come in and go out violate
Guilty degree=serious;
3rd, IF single parent's divorced situation=be AND family economical conditions=difficulty AND comes in and goes out the AND of bad site conditions=frequently
The smoking situation=frequently of drinking AND fights the THEN extent of crime of situation=frequently=serious;
4th, IF single parent's divorced situation=be AND family economical conditions=difficulty father AND schooling=illiteracy AND is in school
Performance=difference AND record of previous crime situation=time THEN extent of crime=heavier;
5th, IF single parent's divorced situation=no AND student's conducts=difference AND fight situation=general AND records of previous crime situation=
THEN extent of crime=serious twice.
According to the result of above-mentioned output, it may be determined that the factor that going out causes extent of crime heavier has:Parental separation situation, family
Front yard economic condition, educational level of parents, the bad site conditions of discrepancy, situation of fighting, record of previous crime situation etc., wherein, especially
Need to be concerned with parental separation situation, family economical condition, educational level of parents to minor crime by large effect.
In order to exclude the influence of criminal record, when the sample data of minor crime is gathered, can gather not preceding
The crime sample data of the minor of section's crime.
After the above results determine the key factor for causing minor crime, judicial workers and relevant portion can
To take the strategy of some preventions minor crime, such as, education etc. of being had conversation to the teenager of single parent's divorced.
The method for building up of minor decision tree provided in an embodiment of the present invention, has screened out to extent of crime shadow in the method
Ring less some influence factors of value so that decision tree to set up process simple and time-consuming shorter, in addition, calculating information gain
When introduce tight ness rating this parameter between extent of crime and influence factor so that the root split vertexes determined are more accurate
Really, so as to improve the nicety of grading of the decision tree of foundation.
Embodiment 2
The embodiment of the invention provides a kind of minor crime decision tree sets up device, and the device is above-mentioned for performing
The method that embodiment 1 is provided, wherein, as shown in figure 4, the device includes that acquisition module 410, screening module 420, first determine mould
Block 430, the second determining module 440 and set up module 450;
Above-mentioned acquisition module 410, the sample data for obtaining minor crime, the sample data includes teenage
The crime influence factor and extent of crime of people;
Above-mentioned screening module 420, for determining influence value of every kind of crime influence factor to above-mentioned extent of crime, from above-mentioned
Influence factor set of the influence value more than or equal to preset value is filtered out in sample data;
Above-mentioned first determining module 430, for determining above-mentioned extent of crime on every kind of influence in above-mentioned influence factor set
The degree of association in the dependency degree of factor, and the above-mentioned extent of crime of determination and influence factor set between every kind of influence factor;
Above-mentioned second determining module 440, for determining above-mentioned extent of crime with influence according to above-mentioned dependency degree and the degree of association
Tight ness rating in sets of factors between every kind of influence factor;
It is above-mentioned to set up module 450, for according to every in above-mentioned tight ness rating, the comentropy of sample data and influence factor set
Plant the corresponding comentropy of influence factor and set up crime decision tree.
Wherein, above-mentioned screening module 420 determines influence value of every kind of crime influence factor to extent of crime, is by first
What determining unit, the first computing unit, the second computing unit and the second determining unit were realized, specifically include:
Above-mentioned first determining unit, for determine the property value of every kind of crime influence factor and extent of crime property value it
Between joint probability distribution;Above-mentioned first computing unit, for according to above-mentioned joint probability distribution calculate above-mentioned crime influence because
Covariance between element and extent of crime;Above-mentioned second computing unit, for according to joint probability distribution calculate crime influence because
The variance of element and the variance of extent of crime;Above-mentioned second determining unit, for according to above-mentioned covariance, above-mentioned crime influence factor
Variance and above-mentioned extent of crime variance, determine influence value of the above-mentioned crime influence factor to above-mentioned extent of crime.
Wherein, above-mentioned second determining module 440 determines above-mentioned extent of crime and influence factor according to dependency degree and the degree of association
Tight ness rating in set between every kind of influence factor, is by the 3rd computing unit, the 4th computing unit and the 3rd determining unit
Realize, specifically include:
Above-mentioned 3rd computing unit, for calculating the product between above-mentioned dependency degree and the degree of association;Above-mentioned 4th calculates single
Unit, for calculating between above-mentioned dependency degree and the degree of association and value;Above-mentioned 3rd determining unit, for according to above-mentioned sum of products
State the tight ness rating that the ratio and value between determines in above-mentioned extent of crime and influence factor set between every kind of influence factor.
Above-mentioned module 450 of setting up is according to above-mentioned tight ness rating, the comentropy of sample data and every kind of shadow in influence factor set
The corresponding comentropy of the factor of sound sets up crime decision tree, is by the 5th computing unit, the 4th determining unit and sets up unit reality
Existing, specifically include:
Above-mentioned 5th computing unit, for according to above-mentioned tight ness rating, the comentropy of above-mentioned sample data and above-mentioned influence because
The corresponding comentropy of every kind of influence factor in element set, the information for calculating every kind of influence factor in above-mentioned influence factor set increases
Benefit;Above-mentioned 4th determining unit, the information gain for determining maximum, and by the above-mentioned maximum corresponding influence of information gain
Factor is defined as the root split vertexes of crime decision tree;It is above-mentioned to set up unit, for setting up above-mentioned according to above-mentioned split vertexes
Crime decision tree.
Minor crime decision tree provided in an embodiment of the present invention sets up device, has screened out to extent of crime influence value
Less some influence factors so that decision tree to set up process simple and time-consuming shorter, in addition, drawing when information gain is calculated
Enter tight ness rating this parameter between extent of crime and influence factor so that the root split vertexes determined are more accurate, from
And improve the nicety of grading of the decision tree of foundation.
The device of setting up of the minor crime decision tree that the embodiment of the present invention is provided can be specific hard in equipment
Part or the software being installed in equipment or firmware etc..The device that the embodiment of the present invention is provided, its realization principle and generation
Technique effect is identical with preceding method embodiment, and to briefly describe, device embodiment part does not refer to part, refers to foregoing side
Corresponding contents in method embodiment.It is apparent to those skilled in the art that, it is for convenience and simplicity of description, foregoing
The specific work process of the system, device and unit of description, may be referred to the corresponding process in above method embodiment, herein
Repeat no more.
In embodiment provided by the present invention, it should be understood that disclosed apparatus and method, can be by other sides
Formula is realized.Device embodiment described above is only schematical, for example, the division of the unit, only one kind are patrolled
Collect function to divide, there can be other dividing mode when actually realizing, but for example, multiple units or component can combine or can
To be integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed each other
Coupling or direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of device or unit by some communication interfaces
Connect, can be electrical, mechanical or other forms.
The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme
's.
In addition, during each functional unit in the embodiment that the present invention is provided can be integrated in a processing unit, also may be used
Being that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.
If the function is to realize in the form of SFU software functional unit and as independent production marketing or when using, can be with
Storage is in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used to so that a computer equipment (can be individual
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined in individual accompanying drawing, then it need not be further defined and explained in subsequent accompanying drawing, additionally, term " the
One ", " second ", " the 3rd " etc. are only used for distinguishing description, and it is not intended that indicating or implying relative importance.
Finally it should be noted that:Embodiment described above, specific embodiment only of the invention, is used to illustrate the present invention
Technical scheme, rather than its limitations, protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, it will be understood by those within the art that:Any one skilled in the art
The invention discloses technical scope in, it can still modify to the technical scheme described in previous embodiment or can be light
Change is readily conceivable that, or equivalent is carried out to which part technical characteristic;And these modifications, change or replacement, do not make
The essence of appropriate technical solution departs from the spirit and scope of embodiment of the present invention technical scheme.Should all cover in protection of the invention
Within the scope of.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.
Claims (10)
1. a kind of method for building up of minor crime decision tree, it is characterised in that methods described includes:
The sample data of minor crime is obtained, the sample data includes the crime influence factor and crime journey of minor
Degree;
Determine influence value of every kind of crime influence factor to the extent of crime, filtered out from the sample data described
Influence factor set of the influence value more than or equal to preset value;
Determine dependency degree of the extent of crime to every kind of influence factor in the influence factor set, and determine the crime journey
The degree of association in degree and the influence factor set between every kind of influence factor;
According to the dependency degree and the degree of association determine the extent of crime and every kind of influence in the influence factor set because
Tight ness rating between element;
According to the tight ness rating, the comentropy of the sample data and in the influence factor set, every kind of influence factor is corresponding
Comentropy sets up crime decision tree.
2. method according to claim 1, it is characterised in that the every kind of crime influence factor of determination is to the criminal
The influence value of guilty degree, including:
Determine the joint probability distribution between every kind of crime influence factor and the extent of crime;
Covariance between the crime influence factor and the extent of crime is calculated according to the joint probability distribution;
The variance of the crime influence factor and the variance of the extent of crime are calculated according to the joint probability distribution;
According to the variance of the covariance, the variance of the crime influence factor and the extent of crime, the crime shadow is determined
Influence value of the factor of sound to the extent of crime.
3. method according to claim 1, it is characterised in that the determination extent of crime is to the influence factor collection
The dependency degree of every kind of influence factor in conjunction, including:
Determine positive domain of the extent of crime on every kind of influence factor in the influence factor set;
The corresponding number of samples of every kind of influence factor in number of samples and the influence factor set in the positive domain, really
The fixed dependency degree.
4. method according to claim 1, it is characterised in that the determination extent of crime and the influence factor collection
The degree of association in conjunction between every kind of influence factor, including:
Every kind of influence factor takes corresponding number of samples during different property values in determining the influence factor set;
Determine to belong in the number of samples subsample number of the corresponding every kind of property value of the extent of crime;
In determining the extent of crime and the influence factor set according to the subsample number every kind of influence factor it
Between the degree of association.
5. method according to claim 1, it is characterised in that described that institute is determined according to the dependency degree and the degree of association
The tight ness rating between every kind of influence factor in extent of crime and the influence factor set is stated, including:
Calculate the product between the dependency degree and the degree of association;
Calculate between the dependency degree and the degree of association and value;
Ratio according to the sum of products and value between determines every kind of in the extent of crime and the influence factor set
Tight ness rating between influence factor.
6. method according to claim 1, it is characterised in that described according to the tight ness rating, the letter of the sample data
The corresponding comentropy of the every kind of influence factor sets up crime decision tree in breath entropy and the influence factor set, including:
According to the tight ness rating, the comentropy of the sample data and in the influence factor set, every kind of influence factor is corresponding
Comentropy, calculates the information gain of every kind of influence factor in the influence factor set;
The information gain of maximum is determined, and the maximum corresponding influence factor of information gain is defined as crime decision tree
Root split vertexes;
The crime decision tree is set up according to described split vertexes.
7. a kind of minor crime decision tree sets up device, it is characterised in that described device includes:
Acquisition module, the sample data for obtaining minor crime, the sample data includes the crime shadow of minor
Ring factor and extent of crime;
Screening module, for determining influence value of every kind of crime influence factor to the extent of crime, from the sample number
Influence factor set of the influence value more than or equal to preset value is filtered out in;
First determining module, for determining dependence of the extent of crime to every kind of influence factor in the influence factor set
The degree of association in degree, and the determination extent of crime and the influence factor set between every kind of influence factor;
Second determining module, for determining the extent of crime with the influence factor according to the dependency degree and the degree of association
Tight ness rating in set between every kind of influence factor;
Module is set up, for according to every kind of in the tight ness rating, the comentropy of the sample data and the influence factor set
The corresponding comentropy of influence factor sets up crime decision tree.
8. device according to claim 7, it is characterised in that the screening module includes:
First determining unit, for determine the property value of every kind of crime influence factor and the extent of crime property value it
Between joint probability distribution;
First computing unit, for according to the joint probability distribution calculate the crime influence factor and the extent of crime it
Between covariance;
Second computing unit, for calculating the variance of the crime influence factor and the crime according to the joint probability distribution
The variance of degree;
Second determining unit, for the side according to the covariance, the variance of the crime influence factor and the extent of crime
Difference, determines influence value of the crime influence factor to the extent of crime.
9. device according to claim 7, it is characterised in that second determining module includes:
3rd computing unit, for calculating the product between the dependency degree and the degree of association;
4th computing unit, for calculating between the dependency degree and the degree of association and value;
3rd determining unit, the extent of crime and the shadow are determined for the ratio according to the sum of products and value between
Tight ness rating in sound sets of factors between every kind of influence factor.
10. device according to claim 7, it is characterised in that the module of setting up includes:
5th computing unit, for according in the tight ness rating, the comentropy of the sample data and the influence factor set
The corresponding comentropy of every kind of influence factor, calculates the information gain of every kind of influence factor in the influence factor set;
4th determining unit, the information gain for determining maximum, and by the maximum corresponding influence of information gain because
Element is defined as the root split vertexes of crime decision tree;
Unit is set up, for setting up the crime decision tree according to described split vertexes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208475.5A CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208475.5A CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106780258A true CN106780258A (en) | 2017-05-31 |
CN106780258B CN106780258B (en) | 2020-08-14 |
Family
ID=58920145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611208475.5A Active CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106780258B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657840A (en) * | 2018-11-22 | 2019-04-19 | 东软集团股份有限公司 | Decision tree generation method, device, computer readable storage medium and electronic equipment |
CN110298544A (en) * | 2019-05-23 | 2019-10-01 | 南京柳橙信息科技有限公司 | A kind of minor crime's investigation and assessment overall analysis system |
CN114090601A (en) * | 2021-11-23 | 2022-02-25 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN117172381A (en) * | 2023-09-05 | 2023-12-05 | 薪海科技(上海)有限公司 | Risk prediction method based on big data |
-
2016
- 2016-12-23 CN CN201611208475.5A patent/CN106780258B/en active Active
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657840A (en) * | 2018-11-22 | 2019-04-19 | 东软集团股份有限公司 | Decision tree generation method, device, computer readable storage medium and electronic equipment |
CN110298544A (en) * | 2019-05-23 | 2019-10-01 | 南京柳橙信息科技有限公司 | A kind of minor crime's investigation and assessment overall analysis system |
CN114090601A (en) * | 2021-11-23 | 2022-02-25 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN114090601B (en) * | 2021-11-23 | 2023-11-03 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN117172381A (en) * | 2023-09-05 | 2023-12-05 | 薪海科技(上海)有限公司 | Risk prediction method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN106780258B (en) | 2020-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104731954B (en) | Music is had an X-rayed based on group and recommends method and system | |
CN107085803B (en) | Individualized teaching resource recommendation system based on knowledge graph and ability evaluation | |
CN110245981B (en) | Crowd type identification method based on mobile phone signaling data | |
CN104182517B (en) | The method and device of data processing | |
CN103530540B (en) | User identity attribute detection method based on man-machine interaction behavior characteristics | |
CN110413707A (en) | The excavation of clique's relationship is cheated in internet and checks method and its system | |
CN106681996B (en) | The method and apparatus for determining interest region in geographic range, point of interest | |
CN106780258A (en) | A kind of method for building up and device of minor crime decision tree | |
CN106384282A (en) | Method and device for building decision-making model | |
CN104021153A (en) | Personalized recommendation method for campus books | |
CN110413973A (en) | Computer automatically generates the method and its system of set volume | |
CN103208211A (en) | Method and device for question selection of network education test | |
CN107391670A (en) | A kind of mixing recommendation method for merging collaborative filtering and user property filtering | |
CN111723973B (en) | Learning effect optimization method based on user behavior causal relationship in MOOC log data | |
CN109740861A (en) | A kind of learning data analysis method and device | |
CN109871479A (en) | A kind of collaborative filtering method based on user items class and the reliability that scores | |
CN110110225A (en) | Online education recommended models and construction method based on user behavior data analysis | |
CN108132964A (en) | A kind of collaborative filtering method to be scored based on user item class | |
CN107239964A (en) | User is worth methods of marking and system | |
CN108038734B (en) | Urban commercial facility spatial distribution detection method and system based on comment data | |
Widayat et al. | Bibliometric analysis and visualization articles on presidential election in social media indexed in Scopus by Indonesian authors | |
CN107291616A (en) | A kind of online generating platform of project report | |
CN114022269A (en) | Enterprise credit risk assessment method in public credit field | |
CN106844765A (en) | Notable information detecting method and device based on convolutional neural networks | |
Ramadiani et al. | Evaluation of student academic performance using e-learning with the association rules method and the importance of performance analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PP01 | Preservation of patent right |
Effective date of registration: 20220726 Granted publication date: 20200814 |
|
PP01 | Preservation of patent right |