CN106780258B - Method and device for establishing juvenile crime decision tree - Google Patents

Method and device for establishing juvenile crime decision tree Download PDF

Info

Publication number
CN106780258B
CN106780258B CN201611208475.5A CN201611208475A CN106780258B CN 106780258 B CN106780258 B CN 106780258B CN 201611208475 A CN201611208475 A CN 201611208475A CN 106780258 B CN106780258 B CN 106780258B
Authority
CN
China
Prior art keywords
crime
degree
influence
influence factor
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611208475.5A
Other languages
Chinese (zh)
Other versions
CN106780258A (en
Inventor
刘力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netposa Technologies Ltd
Original Assignee
Netposa Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netposa Technologies Ltd filed Critical Netposa Technologies Ltd
Priority to CN201611208475.5A priority Critical patent/CN106780258B/en
Publication of CN106780258A publication Critical patent/CN106780258A/en
Application granted granted Critical
Publication of CN106780258B publication Critical patent/CN106780258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Alarm Systems (AREA)

Abstract

The invention provides a method and a device for establishing a juvenile crime decision tree, which comprise the following steps: acquiring sample data of a minor crime; determining the influence value of each crime influence factor on the crime degree, and screening an influence factor set of which the influence value is greater than or equal to a preset value from the sample data; determining the dependence of the crime degree on each influence factor in the influence factor set, and determining the crime degree and the association degree of each influence factor in the influence factor set; determining the crime degree and the closeness of each influence factor in the influence factor set according to the dependence degree and the relevance degree; and establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy of each influence factor in the influence factor set. In the invention, some influence factors with small influence values on the crime degree are screened out, so that the establishment process of the decision tree is simple and the time consumption is short, and the parameter of the crime degree and the tightness between the influence factors is introduced, thereby improving the classification precision of the established decision tree.

Description

Method and device for establishing juvenile crime decision tree
Technical Field
The invention relates to the technical field of big data analysis, in particular to a method and a device for establishing a juvenile crime decision tree.
Background
With the influence of network, science and technology and new culture on daily life, the crime rate of minors is also rising year by year, the minors are commuters built in the future of China, and the minors crime not only harms social stability, but also brings bad influence on the healthy growth of minors. Therefore, it is important to study the criminal behavior of minors and analyze the factors affecting the criminal behavior of minors to prevent the minor crime.
In the prior art, an ID3 algorithm is generally adopted to establish a decision tree model of minor crime influencing factors, but there are various factors influencing minor crimes, some factors have a large influence on minor crimes, and some factors have a small influence on minor crimes, but in the prior art, all factors are taken into consideration when establishing a decision tree, so that the establishment process of the decision tree is complex and time-consuming, and the ID3 algorithm has a multi-value deviation problem, so that the information gain of the influencing factors calculated by the ID3 algorithm is inaccurate, and the classification accuracy of the established decision tree is further reduced.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for establishing a decision tree for a minor crime, so as to solve the problems in the prior art that the establishment process of the decision tree is complicated and time-consuming due to the fact that the decision tree is established by directly adopting an ID3 algorithm, and the classification accuracy of the decision tree is low due to the multi-value bias problem of an ID3 algorithm.
In a first aspect, an embodiment of the present invention provides a method for establishing a juvenile crime decision tree, where the method includes:
acquiring sample data of a minor crime, wherein the sample data comprises crime influence factors and crime degree of the minor;
determining an influence value of each crime influence factor on the crime degree, and screening out an influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
determining the degree of dependence of the crime degree on each influence factor in the set of influence factors, and determining the degree of association between the crime degree and each influence factor in the set of influence factors;
determining closeness between the crime degree and each influence factor in the influence factor set according to the dependence degree and the relevance degree;
and establishing a crime decision tree according to the closeness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the determining an influence value of each crime influencing factor on the crime degree includes:
determining a joint probability distribution between the attribute value of each of the crime affecting factors and the attribute value of the crime degree;
calculating a covariance between the crime influencing factor and the crime degree according to the joint probability distribution;
calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the determining the dependency of the crime degree on each influence factor in the set of influence factors includes:
determining a positive domain for the crime degree with respect to each influencing factor in the set of influencing factors;
and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the determining a degree of association between the crime degree and each of the influencing factors in the influencing factor set includes:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values;
determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples;
and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the determining, according to the dependency and the association, an affinity between the crime degree and each influencing factor in the set of influencing factors includes:
calculating a product between the dependency and the relevance;
calculating a sum between the dependency and the association;
and determining the degree of crime and the closeness between each influence factor in the influence factor set according to the ratio between the product and the sum.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the establishing a crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each of the influencing factors in the influencing factor set includes:
calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
and establishing the crime decision tree according to the root splitting node.
In a second aspect, an embodiment of the present invention provides an apparatus for building a minor crime decision tree, where the apparatus includes:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring sample data of minor crimes, and the sample data comprises crime influence factors and crime degrees of minor crimes;
the screening module is used for determining the influence value of each crime influence factor on the crime degree and screening out the influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
a first determining module, configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
a second determining module, configured to determine closeness between the crime degree and each influence factor in the set of influence factors according to the dependency degree and the association degree;
and the establishing module is used for establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the screening module includes:
a first determination unit for determining a joint probability distribution between an attribute value of each of the crime influencing factors and an attribute value of the crime degree;
a first calculation unit for calculating a covariance between the crime influencing factor and the crime degree from the joint probability distribution;
a second calculation unit for calculating a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution;
and the second determining unit is used for determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the second determining module includes:
a third calculation unit configured to calculate a product between the dependency and the association;
a fourth calculation unit configured to calculate a sum value between the dependency and the association;
a third determining unit, configured to determine the crime degree and the closeness between each influence factor in the set of influence factors according to a ratio between the product and the sum.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the establishing module includes:
a fifth calculating unit, configured to calculate, according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors, an information gain of each influence factor in the set of influence factors;
a fourth determining unit, configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root split node of the crime decision tree;
and the establishing unit is used for establishing the crime decision tree according to the root splitting node.
The invention provides a method and a device for establishing a minor crime decision tree, which screen out some influence factors with small influence values on crime degree, so that the establishment process of the decision tree is simple and the time consumption is short.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart illustrating a method for building a minor crime decision tree according to embodiment 1 of the present invention;
fig. 2 is a flowchart illustrating a method for determining an influence value of a crime influencing factor on a crime degree in a method for establishing a minor crime decision tree according to embodiment 1 of the present invention;
fig. 3 is a flowchart showing a crime decision tree building process in the minor crime decision tree building method provided in embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram illustrating an apparatus for creating a minor crime decision tree according to embodiment 2 of the present invention.
FIG. 4 is a drawing showing reference numerals:
410, an acquisition module; 420 a screening module; 430. a first determination module; 440, a second determination module; 450, building a module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Considering that in the prior art, when analyzing factors influencing a minor crime, a decision tree needs to be established, most of which are decision tree models adopting an ID3 algorithm to establish factors influencing the minor crime, but there are various factors influencing the minor crime, some factors have a large influence on the minor, and some factors have a small influence on the minor, but in the prior art, all factors are taken into consideration when establishing the decision tree, so that the establishment process of the decision tree is complex and time-consuming, and the ID3 algorithm has a multi-value bias problem, so that information gain of the influence factors calculated by the ID3 algorithm is inaccurate, and classification accuracy of the established decision tree is reduced. Based on this, the embodiment of the present invention provides a method and an apparatus for establishing a juvenile crime decision tree, which are described below by way of an embodiment.
Example 1
An embodiment of the present invention provides a method for establishing a minor crime decision tree, where as shown in fig. 1, the method includes steps S110 to S150, which are specifically described as follows.
S110, sample data of the minor crimes are obtained, and the sample data comprises minor crime influence factors and crime degrees.
Before the method provided by the embodiment of the invention is adopted to establish the minor crime decision tree, sample data of minor crimes needs to be acquired, namely the crime degrees of a plurality of crimes and minor crimes and corresponding personal influence factors are acquired, for example, the crime degrees, sex, age, family conditions, school conditions and other data of minor crimes of a certain crime are recorded, and the recorded data are stored.
When the method provided by the embodiment of the invention is adopted to establish the decision tree, the sample data of the minor crime is firstly obtained.
Wherein, the crime degree comprises four attribute values of light, general, heavy and serious.
The factors influencing the minor crimes may include sex, age, individual separation condition, parent work place, condition of being in the same place with the parent teacher, solitary and child, family location, family economic condition, individual culture degree, father culture degree, mother culture degree, performance at school, condition of bad places, drinking and smoking condition, fighting condition and prior record condition.
Each of the above-mentioned influencing factors corresponds to a plurality of attribute values, for example, attribute values of age include male and female; attribute values for home economic conditions include affluence, general, and difficulty; attribute values of father culture degree include illiterate, junior middle school and below, high school, university and above; attribute values of mother culture degree include illiterate, junior middle and below, high school, university and above; the attribute values of the proofreading performance comprise excellent, good, medium and poor; attribute values for bad venue conditions include frequent, general, rare, and none; attribute values for alcohol smoking situations include regular, normal, occasional and none; attribute values of the shelving conditions include frequent, general, few, and none; the attribute values of the case of the previous records include one time, two times, and a plurality of times.
However, the factors that may affect the minor crime and the attribute values included in the factors are not limited, and the above description is only an example.
And S120, determining the influence value of each crime influence factor on the crime degree, and screening out an influence factor set with the influence value larger than or equal to a preset value from the sample data.
As shown in fig. 2, the above determining the influence value of each crime influencing factor on the crime degree includes steps S210 to S230, which are specifically as follows:
s210, determining joint probability distribution between each crime influence factor and crime degree;
s220, calculating the covariance between the crime influencing factors and the crime degree according to the joint probability distribution;
s230, calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and S240, determining the influence value of the crime influencing factor on the crime degree according to the covariance, the variance of the crime influencing factor and the variance of the crime degree.
Wherein, the covariance between the crime influencing factors and the crime degree can be calculated by formula (1);
Figure BDA0001190490630000091
wherein, in the above formula (1), XPRepresenting the crime influencing factor, the value of P is 1,2,3 …, xqAttribute value indicating crime influence factor, q is 1,2 …, Y indicates crime degreetAttribute value indicating crime degree, where t is 1,2,3,4, pqtAs a joint probability value between crime influencing factor and crime degree, E (X)P) For the expectation of the influence factor, e (y) is the expectation of the crime degree, where k is the number of attribute values of a certain crime influence factor, and m is the number of attribute values for each crime degree.
For example, if the crime influencing factor is a family economic condition, and the attribute values of the family economic condition include rich, general and poor, the number of the attribute values of the family economic condition is 3, that is, k is 3, and the attribute value of the crime degree includes light, general, heavy and serious, so m is 4, of course, the attribute value of the crime degree and the attribute value of the crime influencing factor are not limited thereto.
The above-mentioned influence value represents a performance index indicating how close the crime influence factor is to the crime degree, and a specific process for determining the influence value of the home economic condition on the crime degree will be described in detail below by taking the home economic condition as an example.
For example, when P is 2, then X2Representing the family economic condition, and the value of q is 3, namely x1Indicates abundance, x2Denotes general, x3Indicating difficulty, ytAn attribute value representing the degree of crime, wherein y1Means lighter, y2General, y3Indicates a heavy weight, y4Indicating a severe condition.
From the sample data, the number of samples with rich family conditions and light crime degree, the number of samples with rich family conditions and common crime degree, the number of samples with rich family conditions and heavy crime degree and the number of samples with rich family conditions and serious crime degree can be respectively determined; the number of samples with common family conditions and low crime degree, the number of samples with common family conditions and high crime degree and the number of samples with common family conditions and high crime degree; the number of samples with difficult family conditions and low crime degree, the number of samples with difficult family conditions and common crime degree, the number of samples with difficult family conditions and high crime degree and the number of samples with difficult family conditions and high crime degree.
According to the determined number of samples, joint probability distribution between the household economic conditions and the crime degree can be determined, for example, the number of samples with abundant household conditions and light crime degree is 3, the number of samples with abundant household conditions and common crime degree is 2, the number of samples with abundant household conditions and heavy crime degree is 4, and the number of samples with abundant household conditions and serious crime degree is 1; the number of samples with common family conditions and light crime degree is 5, the number of samples with common family conditions and common crime degree is 0, the number of samples with common family conditions and heavy crime degree is 2, and the number of samples with common family conditions and severe crime degree is 3; the number of samples with difficult home conditions and a low crime degree is 2, the number of samples with difficult home conditions and a normal crime degree is 5, the number of samples with difficult home conditions and a high crime degree is 1, and the number of samples with difficult home conditions and a high crime degree is 0.
The joint probability distribution between the economic condition of the home and the crime degree determined by the above data is shown in table 1.
When the joint probability distribution is determined, the covariance between the home economic condition and the crime degree can be determined by equation (1).
And determining the variance of the household economic conditions and the variance of the crime degree according to the joint probability distribution in the table (1).
TABLE 1
Figure BDA0001190490630000111
Calculating the influence value of the crime influence factor on the crime degree through a formula (2) according to the covariance, the variance of each crime influence factor and the variance of the crime degree;
Figure BDA0001190490630000112
wherein, in the formula (2), ωPThe above-mentioned Cov (x) is the value of the influence of the crime influencing factor on the crime degreepY) is the covariance between the crime influencing factor and the crime degree
Figure BDA0001190490630000113
As a result of crimeThe variance of the factors is such that,
Figure BDA0001190490630000114
is the variance of crime degree.
In the above, the variance of the home economic condition, the variance of the crime degree, and the covariance between the home economic condition and the crime degree are calculated, and thus, the influence value of the home economic condition on the crime degree can be calculated by the formula (2).
Wherein, as can be seen from the formula (2), ωPIs taken as value of [0,1]And ω isPIs a measure of how closely related the crime-affecting factor is to the crime degree.
The above screening of the influence factor set with the influence value greater than or equal to the preset value from the sample data can be realized by the following processes:
the above-mentioned omegaPComparing with a preset value when the value is omegaPWhen the crime influence factor is smaller than the preset value, the correlation degree between the crime influence factor and the crime degree is smaller, at the moment, the crime influence factor is screened out, and when omega is smaller than the preset valuePAnd when the crime influence factor is larger than or equal to the preset value, the correlation degree between the crime influence factor corresponding to the influence value and the crime degree is larger, and the influence factor is added into the influence factor set.
The preset value is an influence value preset, for example, 0.6, and of course, the preset value may be other values.
The above influencing factors in the crime influencing factor set are all influencing factors with a large degree of correlation with the crime degree, and therefore, in the following process of establishing the crime decision tree, all the adopted influencing factors are influencing factors in the crime influencing factor set.
S130, determining the dependence degree of the crime degree on each influence factor in the influence factor set, and determining the correlation degree between the crime degree and each influence factor in the influence factor set.
Wherein, the determining the dependency of the crime degree on each influence factor in the influence factor set specifically includes: determining a positive domain of the crime degree with respect to each influencing factor in the set of influencing factors; and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
The dependence of the crime degree on each influence factor in the influence factor set can be calculated by a formula (3);
Figure BDA0001190490630000121
wherein, in the above formula (3), sptX(Y) POS represents the degree of crime dependency on each influencing factor in the set of influencing factorsX(Y) X positive region of Y, wherein Y represents crime degree, X represents influence factor set, U is subsample data corresponding to any influence factor in influence factor set, Card (U) represents number of subsample data, Card (POS)X(Y)) represents the number of elements in the positive field.
Wherein, the following will take the influence factor as the family economic condition as an example, and describe the specific calculation process of the right domain and the dependency in detail:
the household economic condition is X2The attribute values of the home economic conditions include abundance, generality, and difficulty, and data about the home economic conditions is selected from sample data, for example, U ═ { record 1 (abundance), record 2 (abundance), record 3 (abundance), record 4 (generality), record 5 (generality), record 6 (difficulty) }.
The U comprises 6 rows of data records, and records with the same attribute value are taken as a group, and the group is marked as an equivalence class;
the known equivalence classes in Y are: m ═ record 2, record 3, record 6}
X2The equivalence classes in (1) are:
group 1: e1When recording 1 (rich), recording 2 (rich), recording 3 (rich) }is performed
Group 2: e2Either the { record 4 (in general),record 5 (general) }
Group 3: e3(record 6 (difficulty) }
U above with respect to X2The method comprises the following steps:
U/X2group 1, group 2, group 3 { { record 1, record 2, record 3}, { record 4, record 5}, { record 6}
Find the above U/X2Where M is an element contained in { record 2, record 3, record 6}, in which case U/X2Record 6 of (1) is eligible, only record 6 of group 3 is fully contained by M, at which point POSX(Y) { record 6}, so the X positive field of Y is POSX(Y) = { record 6 }.
The above Card refers to the number of samples in the set, POSX(Y) only one sample of record 6, the dependency at that time
Figure BDA0001190490630000131
Wherein, the determining the degree of crime and the degree of association between each influencing factor in the influencing factor set specifically includes:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values; determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples; and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
The association degree refers to the association degree between a certain influence factor and the crime degree, and the higher the value of the association degree is, the closer the relation between the influence factor and the crime degree is, the greater the association degree is.
The above-mentioned degree of association can be calculated by the formula (4),
Figure BDA0001190490630000141
wherein, in the formula (4), S (Y, X)P) I is the influence factor X as the degree of correlation between the influence factor XP and the crime degreePCorresponding genusA sex value, j is an attribute value corresponding to the crime degree, fi,jIs the number of the above subsamples, fi,m+1For the number of samples, f, corresponding to each attribute value taken by the influencing factorn+1,jAnd taking the corresponding number of samples when each attribute value is taken for the crime degree.
The specific calculation process of the above-mentioned association degree is still described below by taking the family economic condition as an example:
for example, the number of samples with abundant household conditions and a low crime degree is 3, the number of samples with abundant household conditions and a normal crime degree is 2, the number of samples with abundant household conditions and a high crime degree is 4, and the number of samples with abundant household conditions and a high crime degree is 1; the number of samples with common family conditions and light crime degree is 5, the number of samples with common family conditions and common crime degree is 0, the number of samples with common family conditions and heavy crime degree is 2, and the number of samples with common family conditions and severe crime degree is 3; the number of samples with difficult home conditions and low crime degree is 2, the number of samples with difficult home conditions and general crime degree is 5, the number of samples with difficult home conditions and high crime degree is 1, and the number of samples with difficult home conditions and high crime degree is 0, as shown in table 2.
TABLE 2
Figure BDA0001190490630000151
From the data in table 2, it can be calculated by formula (4) that the correlation between the household economic condition and the crime degree is:
Figure BDA0001190490630000152
the above is only an example of the specific calculation process of the association degree, and the number of samples corresponding to the family economic condition and the size of the specific association degree are not limited.
And S140, determining the closeness between the crime degree and each influence factor in the influence factor set according to the dependency degree and the relevance degree.
Since the two indexes of the dependency and the relevance act on the relevance between the influencing factor and the crime degree at the same time, the two parameters of the dependency and the relevance need to be combined into one parameter, and the specific combination process can refer to the principle of parallel resistance.
Determining the closeness between the crime degree and each influence factor in the crime influence factor set according to the dependency and the relevance, wherein the closeness specifically comprises the following steps:
calculating the product between the dependence degree and the association degree; calculating the sum of the dependency and the correlation; and determining the crime degree and the closeness between each influence factor in the influence factor set according to the ratio between the product sum and the value.
Specifically, the compactness can be calculated by equation (5):
Figure BDA0001190490630000161
wherein, in the above formula (5), hf (X)P) Spt being the degree of crime and the closeness between each influencing factor in the set of influencing factorsXY is the degree of crime dependency on each influencing factor in the set of influencing factors, S (Y, X)P) As influencing factor XPAnd the degree of correlation with the crime degree.
S150, establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set.
As shown in fig. 3, the above-mentioned building of the decision tree according to the compactness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors includes steps S310 to S330, which are specifically as follows.
S310, calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
s320, determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
s330, establishing a crime decision tree according to the root splitting node.
Calculating the information gain of each influence factor in the influence factor set according to the compactness, the information of the sample data and the information of each influence factor in the influence factor set by a formula (6);
Gain(XP,S)=Gain(XP,S)×hf(XP)=[Info(S)-InfoXP(S)]×hf(XP) (6)
wherein, in the above formula (6), Gain (X)PS) information Gain calculated for the basic ID3 algorithm, Gain (X)P,S)NewThe information gain calculated using the new ID3 algorithm obtained after the compactness correction, hf (X) aboveP) The above info (S) is the information entropy of the sample data,
Figure BDA0001190490630000171
and the information entropy corresponding to each influence factor in the influence factor set.
And after the information gain of each influence factor in the influence factor set is calculated through the calculation formula, selecting the maximum information gain from the information gains, determining the maximum information gain as a root split node of the crime decision tree, then determining sub split nodes of the crime decision tree from the rest influence factors in the influence factor set according to the information gains corresponding to the rest influence factors, and establishing the crime decision tree by analogy.
After the crime decision tree is established, factors causing a serious crime degree can be determined according to an output result of the crime decision tree, for example, the output result of the crime decision tree is as follows:
1. IF single-parent separation AND difference condition is equal to no AND family economic condition is equal to difficult AND father culture degree is equal to junior middle school AND following AND bad entrance AND exit place condition is equal to frequent THEN crime degree is equal to general;
2. IF single-parent-exit-exception condition is that the condition of an AND in-out bad place is that the frequent AND fighting condition is that the frequent THEN crime degree is serious;
3. IF single-parent-leave-off condition is that the economic condition of an AND family is that the condition of difficult AND poor entrance AND exit is that the condition of frequent AND drinking AND smoking is that the condition of frequent AND rack-breaking is that the crime degree of frequent THEN is that the crime degree is serious;
4. IF single parent separation is that the economic condition of AND family is difficult AND the culture degree of father is that the cultural degree of illiterate AND in the school is poor AND the recording condition of AND antecedent is that the criminal degree of THEN is heavy;
5. IF single parent separation is not, AND in-school performance is poor, AND frame-making is common, AND antecedent record is twice, THEN crime degree is serious.
According to the output result, the factors causing the serious crime degree can be determined as follows: the system comprises a parent dissimilarity condition, a family economic condition, a parent culture degree, a poor access place condition, a fighting condition, a foreadministrative record condition and the like, wherein the parent dissimilarity condition, the family economic condition and the parent culture degree have great influence on the minor crime, which particularly need to be concerned.
In order to eliminate the influence of pre-criminal discipline, when collecting sample data of a minor crime, crime sample data of a minor without a pre-criminal discipline may be collected.
After determining the important factors causing the minor crime through the above results, the judicial staff and the related parts can adopt some strategies for preventing the minor crime, for example, conversation education for the young people with different relatives, and the like.
According to the method for establishing the juveniles decision tree provided by the embodiment of the invention, some influence factors with small influence values on the crime degree are screened out, so that the establishment process of the decision tree is simple and the time consumption is short, in addition, the parameter of the crime degree and the tightness between the influence factors is introduced when the information gain is calculated, the determined root splitting node is more accurate, and the classification precision of the established decision tree is improved.
Example 2
The embodiment of the present invention provides an apparatus for building a juvenile crime decision tree, which is used to execute the method provided in embodiment 1, wherein, as shown in fig. 4, the apparatus includes an obtaining module 410, a screening module 420, a first determining module 430, a second determining module 440, and a building module 450;
the obtaining module 410 is configured to obtain sample data of a minor crime, where the sample data includes crime influencing factors and crime degrees of the minor;
the screening module 420 is configured to determine an influence value of each crime influence factor on the crime degree, and screen out an influence factor set with an influence value greater than or equal to a preset value from the sample data;
the first determining module 430 is configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
the second determining module 440, configured to determine closeness between the crime degree and each influencing factor in the influencing factor set according to the dependency degree and the association degree;
the establishing module 450 is configured to establish a crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influencing factor in the influencing factor set.
The screening module 420 determines the influence value of each crime influencing factor on the crime degree through a first determining unit, a first calculating unit, a second calculating unit and a second determining unit, and specifically includes:
the first determining unit is configured to determine a joint probability distribution between the attribute value of each crime influencing factor and the attribute value of the crime degree; the first calculation unit is configured to calculate a covariance between the crime influencing factor and the crime degree based on the joint probability distribution; the second calculation unit is configured to calculate a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution; the second determining unit is configured to determine an influence value of the crime influencing factor on the crime degree based on the covariance, the variance of the crime influencing factor, and the variance of the crime degree.
The determining, by the second determining module 440, the closeness between the crime degree and each influence factor in the influence factor set according to the dependency degree and the association degree is implemented by a third calculating unit, a fourth calculating unit, and a third determining unit, and specifically includes:
the third calculating unit is configured to calculate a product between the dependency and the association; the fourth calculating unit is configured to calculate a sum of the dependency and the association; the third determining unit is configured to determine the crime degree and the closeness between each of the influencing factors in the influencing factor set according to a ratio between the product and the sum.
The establishing module 450 establishes the crime decision tree according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the influence factor set, which is implemented by a fifth calculating unit, a fourth determining unit, and an establishing unit, and specifically includes:
the fifth calculating unit is configured to calculate an information gain of each influencing factor in the set of influencing factors according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influencing factor in the set of influencing factors; the fourth determining unit is configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root split node of a crime decision tree; the establishing unit is configured to establish the crime decision tree according to the root splitting node.
The device for establishing the minor crime decision tree provided by the embodiment of the invention screens out some influence factors with small influence values on the crime degree, so that the establishment process of the decision tree is simple and the time consumption is short.
The device for establishing the juvenile crime decision tree provided by the embodiment of the invention can be specific hardware on equipment or software or firmware installed on the equipment and the like. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A method of building a minor crime decision tree, the method comprising:
acquiring sample data of a minor crime, wherein the sample data comprises crime influence factors and crime degree of the minor;
determining an influence value of each crime influence factor on the crime degree, and screening out an influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
determining the degree of dependence of the crime degree on each influence factor in the set of influence factors, and determining the degree of association between the crime degree and each influence factor in the set of influence factors;
determining closeness between the crime degree and each influence factor in the influence factor set according to the dependence degree and the relevance degree;
establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
the determining the influence value of each crime influencing factor on the crime degree comprises the following steps:
determining a joint probability distribution between each of the crime influencing factors and the crime degree;
calculating a covariance between the crime influencing factor and the crime degree according to the joint probability distribution;
calculating the variance of the crime influencing factors and the variance of the crime degree according to the joint probability distribution;
and determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
2. The method of claim 1, wherein determining the degree of crime dependency on each influencing factor in the set of influencing factors comprises:
determining a positive domain for the crime degree with respect to each influencing factor in the set of influencing factors;
and determining the dependency according to the number of samples in the forward domain and the number of samples corresponding to each influence factor in the influence factor set.
3. The method of claim 1, wherein said determining a degree of association between said crime and each of said influencing factors in said set of influencing factors comprises:
determining the number of samples corresponding to each influence factor in the influence factor set when the influence factor takes different attribute values;
determining the number of subsamples of each attribute value corresponding to the crime degree in the number of samples;
and determining the degree of crime and the degree of association between each influence factor in the influence factor set according to the number of the subsamples.
4. The method of claim 1, wherein said determining a closeness between said crime level and each influencer in said set of influencers based on said dependency and said relevance comprises:
calculating a product between the dependency and the relevance;
calculating a sum between the dependency and the association;
and determining the degree of crime and the closeness between each influence factor in the influence factor set according to the ratio between the product and the sum.
5. The method of claim 1, wherein said building a crime decision tree based on said closeness, an information entropy of said sample data, and an information entropy corresponding to each of said influencing factors in said set of influencing factors comprises:
calculating the information gain of each influence factor in the influence factor set according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
determining the maximum information gain, and determining the influence factor corresponding to the maximum information gain as a root splitting node of a crime decision tree;
and establishing the crime decision tree according to the root splitting node.
6. An apparatus for creating a minor crime decision tree, the apparatus comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring sample data of minor crimes, and the sample data comprises crime influence factors and crime degrees of minor crimes;
the screening module is used for determining the influence value of each crime influence factor on the crime degree and screening out the influence factor set of which the influence value is greater than or equal to a preset value from the sample data;
a first determining module, configured to determine a degree of dependence of the crime degree on each influencing factor in the influencing factor set, and determine a degree of association between the crime degree and each influencing factor in the influencing factor set;
a second determining module, configured to determine closeness between the crime degree and each influence factor in the set of influence factors according to the dependency degree and the association degree;
the establishing module is used for establishing a crime decision tree according to the compactness, the information entropy of the sample data and the information entropy corresponding to each influence factor in the influence factor set;
the screening module includes:
a first determination unit for determining a joint probability distribution between an attribute value of each of the crime influencing factors and an attribute value of the crime degree;
a first calculation unit for calculating a covariance between the crime influencing factor and the crime degree from the joint probability distribution;
a second calculation unit for calculating a variance of the crime influencing factor and a variance of the crime degree from the joint probability distribution;
and the second determining unit is used for determining the influence value of the crime influence factor on the crime degree according to the covariance, the variance of the crime influence factor and the variance of the crime degree.
7. The apparatus of claim 6, wherein the second determining module comprises:
a third calculation unit configured to calculate a product between the dependency and the association;
a fourth calculation unit configured to calculate a sum value between the dependency and the association;
a third determining unit, configured to determine the crime degree and the closeness between each influence factor in the set of influence factors according to a ratio between the product and the sum.
8. The apparatus of claim 6, wherein the establishing module comprises:
a fifth calculating unit, configured to calculate, according to the closeness, the information entropy of the sample data, and the information entropy corresponding to each influence factor in the set of influence factors, an information gain of each influence factor in the set of influence factors;
a fourth determining unit, configured to determine a maximum information gain, and determine an influence factor corresponding to the maximum information gain as a root splitting node of the crime decision tree;
and the establishing unit is used for establishing the crime decision tree according to the root splitting node.
CN201611208475.5A 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree Active CN106780258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611208475.5A CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611208475.5A CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Publications (2)

Publication Number Publication Date
CN106780258A CN106780258A (en) 2017-05-31
CN106780258B true CN106780258B (en) 2020-08-14

Family

ID=58920145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611208475.5A Active CN106780258B (en) 2016-12-23 2016-12-23 Method and device for establishing juvenile crime decision tree

Country Status (1)

Country Link
CN (1) CN106780258B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657840A (en) * 2018-11-22 2019-04-19 东软集团股份有限公司 Decision tree generation method, device, computer readable storage medium and electronic equipment
CN110298544A (en) * 2019-05-23 2019-10-01 南京柳橙信息科技有限公司 A kind of minor crime's investigation and assessment overall analysis system
CN114090601B (en) * 2021-11-23 2023-11-03 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN117172381A (en) * 2023-09-05 2023-12-05 薪海科技(上海)有限公司 Risk prediction method based on big data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于均衡系数的决策树优化算法;董跃华等;《计算机应用与软件》;20160715;第33卷(第7期);正文第267-272页 *
基于相关系数的决策树优化算法;董跃华等;《计算机工程与科学》;20150930;第37卷(第9期);正文第1784-1787页 *
属性约简的决策树分类算法对未成年人犯罪行为的分析;王慧等;《中国人民公安大学学报( 自然科学版)》;20111115(第4期);正文第29-32页 *

Also Published As

Publication number Publication date
CN106780258A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
US9875441B2 (en) Question recommending method, apparatus and system
Boutyline et al. Belief network analysis: A relational approach to understanding the structure of attitudes
CN106780258B (en) Method and device for establishing juvenile crime decision tree
Yu et al. Multicollinearity in hierarchical linear models
Leenders Modeling social influence through network autocorrelation: constructing the weight matrix
Franceschet Collaboration in computer science: A network science approach
US8346746B2 (en) Aggregation, organization and provision of professional and social information
US20150142785A1 (en) Generalized graph, rule, and spatial structure based recommendation engine
CN111259173B (en) Search information recommendation method and device
Rehman et al. OLAPing social media: The case of Twitter
CN110532351B (en) Recommendation word display method, device and equipment and computer readable storage medium
US20140074828A1 (en) Systems and methods for cataloging consumer preferences in creative content
CN108322827B (en) Method, system, and computer-readable storage medium for measuring video preferences of a user
KR20130009987A (en) Method and system of displaying friend status and computer storage medium for same
Michalska-Smith et al. Telling ecological networks apart by their structure: A computational challenge
Sampson et al. Surpassing the limit: Keyword clustering to improve Twitter sample coverage
Fiaidhi et al. Thick data: A new qualitative analytics for identifying customer insights
CN111178179A (en) Method and device for identifying urban functional area based on pixel scale
US9971836B2 (en) Group forming method, data collecting method and data collecting apparatus
Weber et al. Homophily in the formation and development of learning networks among university students
Ashton et al. One-through six-component solutions from ratings on familiar English personality-descriptive adjectives
KR101780237B1 (en) Method and device for answering user question based on q&a data provided on online
Yang et al. Finding pure submodels for improved differentiation of bifactor and second-order models
McDaniel et al. Changing patterns of religious affiliation, religiosity, and marital dissolution: A 35-year study of three-generation families
US20210150565A1 (en) Method and apparatus for directing acquisition of information in a social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20220726

Granted publication date: 20200814

PP01 Preservation of patent right