CN106780258B - Method and device for establishing juvenile crime decision tree - Google Patents
Method and device for establishing juvenile crime decision tree Download PDFInfo
- Publication number
- CN106780258B CN106780258B CN201611208475.5A CN201611208475A CN106780258B CN 106780258 B CN106780258 B CN 106780258B CN 201611208475 A CN201611208475 A CN 201611208475A CN 106780258 B CN106780258 B CN 106780258B
- Authority
- CN
- China
- Prior art keywords
- crime
- degree
- influence
- influence factor
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000000366 juvenile effect Effects 0.000 title abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 abstract description 17
- 238000000926 separation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000391 smoking effect Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000035622 drinking Effects 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 108010014173 Factor X Proteins 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Educational Administration (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Alarm Systems (AREA)
Abstract
The invention provides a method and a device for establishing a juvenile crime decision tree, which comprise the following steps: acquiring sample data of a minor crime; determining the influence value of each crime influence factor on the crime degree, and screening an influence factor set of which the influence value is greater than or equal to a preset value from the sample data; determining the dependence of the crime degree on each influence factor in the influence factor set, and determining the crime degree and the association degree of each influence factor in the influence factor set; determining the crime degree and the closeness of each influence factor in the influence factor set according to the dependence degree and the relevance degree; and establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy of each influence factor in the influence factor set. In the invention, some influence factors with small influence values on the crime degree are screened out, so that the establishment process of the decision tree is simple and the time consumption is short, and the parameter of the crime degree and the tightness between the influence factors is introduced, thereby improving the classification precision of the established decision tree.
Description
Technical Field
The invention relates to the technical field of big data analysis, in particular to a method and a device for establishing a juvenile crime decision tree.
Background
With the influence of network, science and technology and new culture on daily life, the crime rate of minors is also rising year by year, the minors are commuters built in the future of China, and the minors crime not only harms social stability, but also brings bad influence on the healthy growth of minors. Therefore, it is important to study the criminal behavior of minors and analyze the factors affecting the criminal behavior of minors to prevent the minor crime.
In the prior art, an ID3 algorithm is generally adopted to establish a decision tree model of minor crime influencing factors, but there are various factors influencing minor crimes, some factors have a large influence on minor crimes, and some factors have a small influence on minor crimes, but in the prior art, all factors are taken into consideration when establishing a decision tree, so that the establishment process of the decision tree is complex and time-consuming, and the ID3 algorithm has a multi-value deviation problem, so that the information gain of the influencing factors calculated by the ID3 algorithm is inaccurate, and the classification accuracy of the established decision tree is further reduced.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for establishing a decision tree for a minor crime, so as to solve the problems in the prior art that the establishment process of the decision tree is complicated and time-consuming due to the fact that the decision tree is established by directly adopting an ID3 algorithm, and the classification accuracy of the decision tree is low due to the multi-value bias problem of an ID3 algorithm.
In a first aspect, an embodiment of the present invention provides a method for establishing a juvenile crime decision tree, where the method includes:
acquiring sample data of a minor crime, wherein the sample data comprises crime influence factors and crime degree of the minor;
determining an influence value of each crime influence factor on the crime degree, and screening out an influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
determining the degree of dependence of the crime degree on each influence factor in the set of influence factors, and determining the degree of association between the crime degree and each influence factor in the set of influence factors;
determining closeness between the crime degree and each influence factor in the influence factor set according to the dependence degree and the relevance degree;
and establishing a crime decision tree according to the closeness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the determining an influence value of each crime influencing factor on the crime degree includes:
determining a joint probability distribution between the attribute value of each of the crime affecting factors and the attribute value of the crime degree;
calculating a covariance between the crime influencing factor and the crime degree according to the joint probability distribution;
calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the determining the dependency of the crime degree on each influence factor in the set of influence factors includes:
determining a positive domain for the crime degree with respect to each influencing factor in the set of influencing factors;
and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the determining a degree of association between the crime degree and each of the influencing factors in the influencing factor set includes:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values;
determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples;
and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the determining, according to the dependency and the association, an affinity between the crime degree and each influencing factor in the set of influencing factors includes:
calculating a product between the dependency and the relevance;
calculating a sum between the dependency and the association;
and determining the degree of crime and the closeness between each influence factor in the influence factor set according to the ratio between the product and the sum.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the establishing a crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each of the influencing factors in the influencing factor set includes:
calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
and establishing the crime decision tree according to the root splitting node.
In a second aspect, an embodiment of the present invention provides an apparatus for building a minor crime decision tree, where the apparatus includes:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring sample data of minor crimes, and the sample data comprises crime influence factors and crime degrees of minor crimes;
the screening module is used for determining the influence value of each crime influence factor on the crime degree and screening out the influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
a first determining module, configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
a second determining module, configured to determine closeness between the crime degree and each influence factor in the set of influence factors according to the dependency degree and the association degree;
and the establishing module is used for establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the screening module includes:
a first determination unit for determining a joint probability distribution between an attribute value of each of the crime influencing factors and an attribute value of the crime degree;
a first calculation unit for calculating a covariance between the crime influencing factor and the crime degree from the joint probability distribution;
a second calculation unit for calculating a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution;
and the second determining unit is used for determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the second determining module includes:
a third calculation unit configured to calculate a product between the dependency and the association;
a fourth calculation unit configured to calculate a sum value between the dependency and the association;
a third determining unit, configured to determine the crime degree and the closeness between each influence factor in the set of influence factors according to a ratio between the product and the sum.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the establishing module includes:
a fifth calculating unit, configured to calculate, according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors, an information gain of each influence factor in the set of influence factors;
a fourth determining unit, configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root split node of the crime decision tree;
and the establishing unit is used for establishing the crime decision tree according to the root splitting node.
The invention provides a method and a device for establishing a minor crime decision tree, which screen out some influence factors with small influence values on crime degree, so that the establishment process of the decision tree is simple and the time consumption is short.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating a method for building a minor crime decision tree according to embodiment 1 of the present invention;
fig. 2 is a flowchart illustrating a method for determining an influence value of a crime influencing factor on a crime degree in a method for establishing a minor crime decision tree according to embodiment 1 of the present invention;
fig. 3 is a flowchart showing a crime decision tree building process in the minor crime decision tree building method provided in embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram illustrating an apparatus for creating a minor crime decision tree according to embodiment 2 of the present invention.
FIG. 4 is a drawing showing reference numerals:
410, an acquisition module; 420 a screening module; 430. a first determination module; 440, a second determination module; 450, building a module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Considering that in the prior art, when analyzing factors influencing a minor crime, a decision tree needs to be established, most of which are decision tree models adopting an ID3 algorithm to establish factors influencing the minor crime, but there are various factors influencing the minor crime, some factors have a large influence on the minor, and some factors have a small influence on the minor, but in the prior art, all factors are taken into consideration when establishing the decision tree, so that the establishment process of the decision tree is complex and time-consuming, and the ID3 algorithm has a multi-value bias problem, so that information gain of the influence factors calculated by the ID3 algorithm is inaccurate, and classification accuracy of the established decision tree is reduced. Based on this, the embodiment of the present invention provides a method and an apparatus for establishing a juvenile crime decision tree, which are described below by way of an embodiment.
Example 1
An embodiment of the present invention provides a method for establishing a minor crime decision tree, where as shown in fig. 1, the method includes steps S110 to S150, which are specifically described as follows.
S110, sample data of the minor crimes are obtained, and the sample data comprises minor crime influence factors and crime degrees.
Before the method provided by the embodiment of the invention is adopted to establish the minor crime decision tree, sample data of minor crimes needs to be acquired, namely the crime degrees of a plurality of crimes and minor crimes and corresponding personal influence factors are acquired, for example, the crime degrees, sex, age, family conditions, school conditions and other data of minor crimes of a certain crime are recorded, and the recorded data are stored.
When the method provided by the embodiment of the invention is adopted to establish the decision tree, the sample data of the minor crime is firstly obtained.
Wherein, the crime degree comprises four attribute values of light, general, heavy and serious.
The factors influencing the minor crimes may include sex, age, individual separation condition, parent work place, condition of being in the same place with the parent teacher, solitary and child, family location, family economic condition, individual culture degree, father culture degree, mother culture degree, performance at school, condition of bad places, drinking and smoking condition, fighting condition and prior record condition.
Each of the above-mentioned influencing factors corresponds to a plurality of attribute values, for example, attribute values of age include male and female; attribute values for home economic conditions include affluence, general, and difficulty; attribute values of father culture degree include illiterate, junior middle school and below, high school, university and above; attribute values of mother culture degree include illiterate, junior middle and below, high school, university and above; the attribute values of the proofreading performance comprise excellent, good, medium and poor; attribute values for bad venue conditions include frequent, general, rare, and none; attribute values for alcohol smoking situations include regular, normal, occasional and none; attribute values of the shelving conditions include frequent, general, few, and none; the attribute values of the case of the previous records include one time, two times, and a plurality of times.
However, the factors that may affect the minor crime and the attribute values included in the factors are not limited, and the above description is only an example.
And S120, determining the influence value of each crime influence factor on the crime degree, and screening out an influence factor set with the influence value larger than or equal to a preset value from the sample data.
As shown in fig. 2, the above determining the influence value of each crime influencing factor on the crime degree includes steps S210 to S230, which are specifically as follows:
s210, determining joint probability distribution between each crime influence factor and crime degree;
s220, calculating the covariance between the crime influencing factors and the crime degree according to the joint probability distribution;
s230, calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and S240, determining the influence value of the crime influencing factor on the crime degree according to the covariance, the variance of the crime influencing factor and the variance of the crime degree.
Wherein, the covariance between the crime influencing factors and the crime degree can be calculated by formula (1);
wherein, in the above formula (1), XPRepresenting the crime influencing factor, the value of P is 1,2,3 …, xqAttribute value indicating crime influence factor, q is 1,2 …, Y indicates crime degreetAttribute value indicating crime degree, where t is 1,2,3,4, pqtAs a joint probability value between crime influencing factor and crime degree, E (X)P) For the expectation of the influence factor, e (y) is the expectation of the crime degree, where k is the number of attribute values of a certain crime influence factor, and m is the number of attribute values for each crime degree.
For example, if the crime influencing factor is a family economic condition, and the attribute values of the family economic condition include rich, general and poor, the number of the attribute values of the family economic condition is 3, that is, k is 3, and the attribute value of the crime degree includes light, general, heavy and serious, so m is 4, of course, the attribute value of the crime degree and the attribute value of the crime influencing factor are not limited thereto.
The above-mentioned influence value represents a performance index indicating how close the crime influence factor is to the crime degree, and a specific process for determining the influence value of the home economic condition on the crime degree will be described in detail below by taking the home economic condition as an example.
For example, when P is 2, then X2Representing the family economic condition, and the value of q is 3, namely x1Indicates abundance, x2Denotes general, x3Indicating difficulty, ytAn attribute value representing the degree of crime, wherein y1Means lighter, y2General, y3Indicates a heavy weight, y4Indicating a severe condition.
From the sample data, the number of samples with rich family conditions and light crime degree, the number of samples with rich family conditions and common crime degree, the number of samples with rich family conditions and heavy crime degree and the number of samples with rich family conditions and serious crime degree can be respectively determined; the number of samples with common family conditions and low crime degree, the number of samples with common family conditions and high crime degree and the number of samples with common family conditions and high crime degree; the number of samples with difficult family conditions and low crime degree, the number of samples with difficult family conditions and common crime degree, the number of samples with difficult family conditions and high crime degree and the number of samples with difficult family conditions and high crime degree.
According to the determined number of samples, joint probability distribution between the household economic conditions and the crime degree can be determined, for example, the number of samples with abundant household conditions and light crime degree is 3, the number of samples with abundant household conditions and common crime degree is 2, the number of samples with abundant household conditions and heavy crime degree is 4, and the number of samples with abundant household conditions and serious crime degree is 1; the number of samples with common family conditions and light crime degree is 5, the number of samples with common family conditions and common crime degree is 0, the number of samples with common family conditions and heavy crime degree is 2, and the number of samples with common family conditions and severe crime degree is 3; the number of samples with difficult home conditions and a low crime degree is 2, the number of samples with difficult home conditions and a normal crime degree is 5, the number of samples with difficult home conditions and a high crime degree is 1, and the number of samples with difficult home conditions and a high crime degree is 0.
The joint probability distribution between the economic condition of the home and the crime degree determined by the above data is shown in table 1.
When the joint probability distribution is determined, the covariance between the home economic condition and the crime degree can be determined by equation (1).
And determining the variance of the household economic conditions and the variance of the crime degree according to the joint probability distribution in the table (1).
TABLE 1
Calculating the influence value of the crime influence factor on the crime degree through a formula (2) according to the covariance, the variance of each crime influence factor and the variance of the crime degree;
wherein, in the formula (2), ωPThe above-mentioned Cov (x) is the value of the influence of the crime influencing factor on the crime degreepY) is the covariance between the crime influencing factor and the crime degreeAs a result of crimeThe variance of the factors is such that,is the variance of crime degree.
In the above, the variance of the home economic condition, the variance of the crime degree, and the covariance between the home economic condition and the crime degree are calculated, and thus, the influence value of the home economic condition on the crime degree can be calculated by the formula (2).
Wherein, as can be seen from the formula (2), ωPIs taken as value of [0,1]And ω isPIs a measure of how closely related the crime-affecting factor is to the crime degree.
The above screening of the influence factor set with the influence value greater than or equal to the preset value from the sample data can be realized by the following processes:
the above-mentioned omegaPComparing with a preset value when the value is omegaPWhen the crime influence factor is smaller than the preset value, the correlation degree between the crime influence factor and the crime degree is smaller, at the moment, the crime influence factor is screened out, and when omega is smaller than the preset valuePAnd when the crime influence factor is larger than or equal to the preset value, the correlation degree between the crime influence factor corresponding to the influence value and the crime degree is larger, and the influence factor is added into the influence factor set.
The preset value is an influence value preset, for example, 0.6, and of course, the preset value may be other values.
The above influencing factors in the crime influencing factor set are all influencing factors with a large degree of correlation with the crime degree, and therefore, in the following process of establishing the crime decision tree, all the adopted influencing factors are influencing factors in the crime influencing factor set.
S130, determining the dependence degree of the crime degree on each influence factor in the influence factor set, and determining the correlation degree between the crime degree and each influence factor in the influence factor set.
Wherein, the determining the dependency of the crime degree on each influence factor in the influence factor set specifically includes: determining a positive domain of the crime degree with respect to each influencing factor in the set of influencing factors; and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
The dependence of the crime degree on each influence factor in the influence factor set can be calculated by a formula (3);
wherein, in the above formula (3), sptX(Y) POS represents the degree of crime dependency on each influencing factor in the set of influencing factorsX(Y) X positive region of Y, wherein Y represents crime degree, X represents influence factor set, U is subsample data corresponding to any influence factor in influence factor set, Card (U) represents number of subsample data, Card (POS)X(Y)) represents the number of elements in the positive field.
Wherein, the following will take the influence factor as the family economic condition as an example, and describe the specific calculation process of the right domain and the dependency in detail:
the household economic condition is X2The attribute values of the home economic conditions include abundance, generality, and difficulty, and data about the home economic conditions is selected from sample data, for example, U ═ { record 1 (abundance), record 2 (abundance), record 3 (abundance), record 4 (generality), record 5 (generality), record 6 (difficulty) }.
The U comprises 6 rows of data records, and records with the same attribute value are taken as a group, and the group is marked as an equivalence class;
the known equivalence classes in Y are: m ═ record 2, record 3, record 6}
X2The equivalence classes in (1) are:
group 1: e1When recording 1 (rich), recording 2 (rich), recording 3 (rich) }is performed
Group 2: e2Either the { record 4 (in general),record 5 (general) }
Group 3: e3(record 6 (difficulty) }
U above with respect to X2The method comprises the following steps:
U/X2group 1, group 2, group 3 { { record 1, record 2, record 3}, { record 4, record 5}, { record 6}
Find the above U/X2Where M is an element contained in { record 2, record 3, record 6}, in which case U/X2Record 6 of (1) is eligible, only record 6 of group 3 is fully contained by M, at which point POSX(Y) { record 6}, so the X positive field of Y is POSX(Y) = { record 6 }.
The above Card refers to the number of samples in the set, POSX(Y) only one sample of record 6, the dependency at that time
Wherein, the determining the degree of crime and the degree of association between each influencing factor in the influencing factor set specifically includes:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values; determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples; and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
The association degree refers to the association degree between a certain influence factor and the crime degree, and the higher the value of the association degree is, the closer the relation between the influence factor and the crime degree is, the greater the association degree is.
The above-mentioned degree of association can be calculated by the formula (4),
wherein, in the formula (4), S (Y, X)P) I is the influence factor X as the degree of correlation between the influence factor XP and the crime degreePCorresponding genusA sex value, j is an attribute value corresponding to the crime degree, fi,jIs the number of the above subsamples, fi,m+1For the number of samples, f, corresponding to each attribute value taken by the influencing factorn+1,jAnd taking the corresponding number of samples when each attribute value is taken for the crime degree.
The specific calculation process of the above-mentioned association degree is still described below by taking the family economic condition as an example:
for example, the number of samples with abundant household conditions and a low crime degree is 3, the number of samples with abundant household conditions and a normal crime degree is 2, the number of samples with abundant household conditions and a high crime degree is 4, and the number of samples with abundant household conditions and a high crime degree is 1; the number of samples with common family conditions and light crime degree is 5, the number of samples with common family conditions and common crime degree is 0, the number of samples with common family conditions and heavy crime degree is 2, and the number of samples with common family conditions and severe crime degree is 3; the number of samples with difficult home conditions and low crime degree is 2, the number of samples with difficult home conditions and general crime degree is 5, the number of samples with difficult home conditions and high crime degree is 1, and the number of samples with difficult home conditions and high crime degree is 0, as shown in table 2.
TABLE 2
From the data in table 2, it can be calculated by formula (4) that the correlation between the household economic condition and the crime degree is:
the above is only an example of the specific calculation process of the association degree, and the number of samples corresponding to the family economic condition and the size of the specific association degree are not limited.
And S140, determining the closeness between the crime degree and each influence factor in the influence factor set according to the dependency degree and the relevance degree.
Since the two indexes of the dependency and the relevance act on the relevance between the influencing factor and the crime degree at the same time, the two parameters of the dependency and the relevance need to be combined into one parameter, and the specific combination process can refer to the principle of parallel resistance.
Determining the closeness between the crime degree and each influence factor in the crime influence factor set according to the dependency and the relevance, wherein the closeness specifically comprises the following steps:
calculating the product between the dependence degree and the association degree; calculating the sum of the dependency and the correlation; and determining the crime degree and the closeness between each influence factor in the influence factor set according to the ratio between the product sum and the value.
Specifically, the compactness can be calculated by equation (5):
wherein, in the above formula (5), hf (X)P) Spt being the degree of crime and the closeness between each influencing factor in the set of influencing factorsXY is the degree of crime dependency on each influencing factor in the set of influencing factors, S (Y, X)P) As influencing factor XPAnd the degree of correlation with the crime degree.
S150, establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
As shown in fig. 3, the above-mentioned building of the decision tree according to the compactness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors includes steps S310 to S330, which are specifically as follows.
S310, calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
s320, determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
s330, establishing a crime decision tree according to the root splitting node.
Calculating the information gain of each influence factor in the influence factor set according to the compactness, the information of the sample data and the information of each influence factor in the influence factor set by a formula (6);
Gain(XP,S)=Gain(XP,S)×hf(XP)=[Info(S)-InfoXP(S)]×hf(XP) (6)
wherein, in the above formula (6), Gain (X)PS) information Gain calculated for the basic ID3 algorithm, Gain (X)P,S)NewThe information gain calculated using the new ID3 algorithm obtained after the compactness correction, hf (X) aboveP) The above info (S) is the information entropy of the sample data,and the information entropy corresponding to each influence factor in the influence factor set.
And after the information gain of each influence factor in the influence factor set is calculated through the calculation formula, selecting the maximum information gain from the information gains, determining the maximum information gain as a root split node of the crime decision tree, then determining sub split nodes of the crime decision tree from the rest influence factors in the influence factor set according to the information gains corresponding to the rest influence factors, and establishing the crime decision tree by analogy.
After the crime decision tree is established, factors causing a serious crime degree can be determined according to an output result of the crime decision tree, for example, the output result of the crime decision tree is as follows:
1. IF single-parent separation AND difference condition is equal to no AND family economic condition is equal to difficult AND father culture degree is equal to junior middle school AND following AND bad entrance AND exit place condition is equal to frequent THEN crime degree is equal to general;
2. IF single-parent-exit-exception condition is that the condition of an AND in-out bad place is that the frequent AND fighting condition is that the frequent THEN crime degree is serious;
3. IF single-parent-leave-off condition is that the economic condition of an AND family is that the condition of difficult AND poor entrance AND exit is that the condition of frequent AND drinking AND smoking is that the condition of frequent AND rack-breaking is that the crime degree of frequent THEN is that the crime degree is serious;
4. IF single parent separation is that the economic condition of AND family is difficult AND the culture degree of father is that the cultural degree of illiterate AND in the school is poor AND the recording condition of AND antecedent is that the criminal degree of THEN is heavy;
5. IF single parent separation is not, AND in-school performance is poor, AND frame-making is common, AND antecedent record is twice, THEN crime degree is serious.
According to the output result, the factors causing the serious crime degree can be determined as follows: the system comprises a parent dissimilarity condition, a family economic condition, a parent culture degree, a poor access place condition, a fighting condition, a foreadministrative record condition and the like, wherein the parent dissimilarity condition, the family economic condition and the parent culture degree have great influence on the minor crime, which particularly need to be concerned.
In order to eliminate the influence of pre-criminal discipline, when collecting sample data of a minor crime, crime sample data of a minor without a pre-criminal discipline may be collected.
After determining the important factors causing the minor crime through the above results, the judicial staff and the related parts can adopt some strategies for preventing the minor crime, for example, conversation education for the young people with different relatives, and the like.
According to the method for establishing the juveniles decision tree provided by the embodiment of the invention, some influence factors with small influence values on the crime degree are screened out, so that the establishment process of the decision tree is simple and the time consumption is short, in addition, the parameter of the crime degree and the tightness between the influence factors is introduced when the information gain is calculated, the determined root splitting node is more accurate, and the classification precision of the established decision tree is improved.
Example 2
The embodiment of the present invention provides an apparatus for building a juvenile crime decision tree, which is used to execute the method provided in embodiment 1, wherein, as shown in fig. 4, the apparatus includes an obtaining module 410, a screening module 420, a first determining module 430, a second determining module 440, and a building module 450;
the obtaining module 410 is configured to obtain sample data of a minor crime, where the sample data includes crime influencing factors and crime degrees of the minor;
the screening module 420 is configured to determine an influence value of each crime influence factor on the crime degree, and screen out an influence factor set with an influence value greater than or equal to a preset value from the sample data;
the first determining module 430 is configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
the second determining module 440, configured to determine closeness between the crime degree and each influencing factor in the influencing factor set according to the dependency degree and the association degree;
the establishing module 450 is configured to establish a crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influencing factor in the influencing factor set.
The screening module 420 determines the influence value of each crime influencing factor on the crime degree through a first determining unit, a first calculating unit, a second calculating unit and a second determining unit, and specifically includes:
the first determining unit is configured to determine a joint probability distribution between the attribute value of each crime influencing factor and the attribute value of the crime degree; the first calculation unit is configured to calculate a covariance between the crime influencing factor and the crime degree based on the joint probability distribution; the second calculation unit is configured to calculate a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution; the second determining unit is configured to determine an influence value of the crime influencing factor on the crime degree based on the covariance, the variance of the crime influencing factor, and the variance of the crime degree.
The determining, by the second determining module 440, the closeness between the crime degree and each influence factor in the influence factor set according to the dependency degree and the association degree is implemented by a third calculating unit, a fourth calculating unit, and a third determining unit, and specifically includes:
the third calculating unit is configured to calculate a product between the dependency and the association; the fourth calculating unit is configured to calculate a sum of the dependency and the association; the third determining unit is configured to determine the crime degree and the closeness between each of the influencing factors in the influencing factor set according to a ratio between the product and the sum.
The establishing module 450 establishes the crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the influence factor set, which is implemented by a fifth calculating unit, a fourth determining unit, and an establishing unit, and specifically includes:
the fifth calculating unit is configured to calculate an information gain of each influencing factor in the set of influencing factors according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influencing factor in the set of influencing factors; the fourth determining unit is configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root split node of a crime decision tree; the establishing unit is configured to establish the crime decision tree according to the root splitting node.
The device for establishing the minor crime decision tree provided by the embodiment of the invention screens out some influence factors with small influence values on the crime degree, so that the establishment process of the decision tree is simple and the time consumption is short.
The device for establishing the juvenile crime decision tree provided by the embodiment of the invention can be specific hardware on equipment or software or firmware installed on the equipment and the like. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (8)
1. A method of building a minor crime decision tree, the method comprising:
acquiring sample data of a minor crime, wherein the sample data comprises crime influence factors and crime degree of the minor;
determining an influence value of each crime influence factor on the crime degree, and screening out an influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
determining the degree of dependence of the crime degree on each influence factor in the set of influence factors, and determining the degree of association between the crime degree and each influence factor in the set of influence factors;
determining closeness between the crime degree and each influence factor in the influence factor set according to the dependence degree and the relevance degree;
establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
the determining the influence value of each crime influencing factor on the crime degree comprises the following steps:
determining a joint probability distribution between each of the crime influencing factors and the crime degree;
calculating a covariance between the crime influencing factor and the crime degree according to the joint probability distribution;
calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
2. The method of claim 1, wherein determining the degree of crime dependency on each influencing factor in the set of influencing factors comprises:
determining a positive domain for the crime degree with respect to each influencing factor in the set of influencing factors;
and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
3. The method of claim 1, wherein said determining a degree of association between said crime and each of said influencing factors in said set of influencing factors comprises:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values;
determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples;
and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
4. The method of claim 1, wherein said determining a closeness between said crime level and each influencer in said set of influencers based on said dependency and said relevance comprises:
calculating a product between the dependency and the relevance;
calculating a sum between the dependency and the association;
and determining the degree of crime and the closeness between each influence factor in the influence factor set according to the ratio between the product and the sum.
5. The method of claim 1, wherein said building a crime decision tree based on said closeness, an information entropy of said sample data, and an information entropy corresponding to each of said influencing factors in said set of influencing factors comprises:
calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
and establishing the crime decision tree according to the root splitting node.
6. An apparatus for creating a minor crime decision tree, the apparatus comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring sample data of minor crimes, and the sample data comprises crime influence factors and crime degrees of minor crimes;
the screening module is used for determining the influence value of each crime influence factor on the crime degree and screening out the influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
a first determining module, configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
a second determining module, configured to determine closeness between the crime degree and each influence factor in the set of influence factors according to the dependency degree and the association degree;
the establishing module is used for establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
the screening module includes:
a first determination unit for determining a joint probability distribution between an attribute value of each of the crime influencing factors and an attribute value of the crime degree;
a first calculation unit for calculating a covariance between the crime influencing factor and the crime degree from the joint probability distribution;
a second calculation unit for calculating a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution;
and the second determining unit is used for determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
7. The apparatus of claim 6, wherein the second determining module comprises:
a third calculation unit configured to calculate a product between the dependency and the association;
a fourth calculation unit configured to calculate a sum value between the dependency and the association;
a third determining unit, configured to determine the crime degree and the closeness between each influence factor in the set of influence factors according to a ratio between the product and the sum.
8. The apparatus of claim 6, wherein the establishing module comprises:
a fifth calculating unit, configured to calculate, according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors, an information gain of each influence factor in the set of influence factors;
a fourth determining unit, configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root splitting node of the crime decision tree;
and the establishing unit is used for establishing the crime decision tree according to the root splitting node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208475.5A CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611208475.5A CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106780258A CN106780258A (en) | 2017-05-31 |
CN106780258B true CN106780258B (en) | 2020-08-14 |
Family
ID=58920145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611208475.5A Active CN106780258B (en) | 2016-12-23 | 2016-12-23 | Method and device for establishing juvenile crime decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106780258B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109657840A (en) * | 2018-11-22 | 2019-04-19 | 东软集团股份有限公司 | Decision tree generation method, device, computer readable storage medium and electronic equipment |
CN110298544A (en) * | 2019-05-23 | 2019-10-01 | 南京柳橙信息科技有限公司 | A kind of minor crime's investigation and assessment overall analysis system |
CN114090601B (en) * | 2021-11-23 | 2023-11-03 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN117172381A (en) * | 2023-09-05 | 2023-12-05 | 薪海科技(上海)有限公司 | Risk prediction method based on big data |
-
2016
- 2016-12-23 CN CN201611208475.5A patent/CN106780258B/en active Active
Non-Patent Citations (3)
Title |
---|
基于均衡系数的决策树优化算法;董跃华等;《计算机应用与软件》;20160715;第33卷(第7期);正文第267-272页 * |
基于相关系数的决策树优化算法;董跃华等;《计算机工程与科学》;20150930;第37卷(第9期);正文第1784-1787页 * |
属性约简的决策树分类算法对未成年人犯罪行为的分析;王慧等;《中国人民公安大学学报( 自然科学版)》;20111115(第4期);正文第29-32页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106780258A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9875441B2 (en) | Question recommending method, apparatus and system | |
Boutyline et al. | Belief network analysis: A relational approach to understanding the structure of attitudes | |
CN106780258B (en) | Method and device for establishing juvenile crime decision tree | |
Yu et al. | Multicollinearity in hierarchical linear models | |
Leenders | Modeling social influence through network autocorrelation: constructing the weight matrix | |
Franceschet | Collaboration in computer science: A network science approach | |
US8346746B2 (en) | Aggregation, organization and provision of professional and social information | |
US20150142785A1 (en) | Generalized graph, rule, and spatial structure based recommendation engine | |
CN111259173B (en) | Search information recommendation method and device | |
Rehman et al. | OLAPing social media: The case of Twitter | |
CN110532351B (en) | Recommendation word display method, device and equipment and computer readable storage medium | |
US20140074828A1 (en) | Systems and methods for cataloging consumer preferences in creative content | |
CN108322827B (en) | Method, system, and computer-readable storage medium for measuring video preferences of a user | |
KR20130009987A (en) | Method and system of displaying friend status and computer storage medium for same | |
Michalska-Smith et al. | Telling ecological networks apart by their structure: A computational challenge | |
Sampson et al. | Surpassing the limit: Keyword clustering to improve Twitter sample coverage | |
Fiaidhi et al. | Thick data: A new qualitative analytics for identifying customer insights | |
CN111178179A (en) | Method and device for identifying urban functional area based on pixel scale | |
US9971836B2 (en) | Group forming method, data collecting method and data collecting apparatus | |
Weber et al. | Homophily in the formation and development of learning networks among university students | |
Ashton et al. | One-through six-component solutions from ratings on familiar English personality-descriptive adjectives | |
KR101780237B1 (en) | Method and device for answering user question based on q&a data provided on online | |
Yang et al. | Finding pure submodels for improved differentiation of bifactor and second-order models | |
McDaniel et al. | Changing patterns of religious affiliation, religiosity, and marital dissolution: A 35-year study of three-generation families | |
US20210150565A1 (en) | Method and apparatus for directing acquisition of information in a social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PP01 | Preservation of patent right |
Effective date of registration: 20220726 Granted publication date: 20200814 |
|
PP01 | Preservation of patent right |