CN111768027A

CN111768027A - Reinforcement learning-based crime risk prediction method, medium, and computing device

Info

Publication number: CN111768027A
Application number: CN202010463027.XA
Authority: CN
Inventors: 李康顺; 王梓铭; 刘嘉豪; 方鸿铭; 雷逸舒
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2020-10-13

Abstract

The invention discloses a method, a medium and a computing device for predicting crime risks based on reinforcement learning, wherein a training sample set is constructed firstly, and clustering is carried out on the training sample set; respectively constructing N BP neural networks for N classes obtained by clustering; inputting the attributes of each training sample into a corresponding BP neural network, and training the BP neural network to obtain a crime risk prediction model; aiming at a test sample needing to predict the risk of the crime again, calculating the distance between the test sample and each cluster center, selecting the cluster center with the minimum distance from the test sample, taking a trained neural network corresponding to the cluster to which the cluster center belongs as a crime again risk prediction model of the test sample, inputting the attribute of the test sample into the crime again risk prediction model of the test sample, and predicting the crime again behavior of the test sample through the model. The invention ensures that the result of the crime prediction is more real, effective and accurate and the calculation speed is faster.

Description

Reinforcement learning-based crime risk prediction method, medium, and computing device

Technical Field

The invention relates to the technical field of crime prediction, in particular to a crime risk prediction method, a crime risk prediction medium and crime risk prediction computing equipment based on reinforcement learning.

Background

Crimes are a social phenomenon in human society. With the continuous progress of human society, especially the rapid development of modern science and technology, crimes vary greatly in number, scale, criminal methods and degree of harm to society, and the threat to human society becomes more serious. Practice proves that crimes are far from sufficient by only striking the palliative measures, so people hope to prevent crimes.

The three special groups of the parole, the temporary outside prison execution and the criminal release which are transformed by the prison are very easy to crime again due to the factors of poor social adaptation capability, unstable psychological state and the like. If such personnel crime again, the crime mode is more irresistible, and the society is greatly threatened. Therefore, how to correctly predict the possibility of the crime again of the personnel and correctly make a risk early warning has important social significance, and is one of the problems to be solved in the current society.

At present, foreign relevant research has been conducted for criminal risk early warning technology of criminals for a century, however, domestic relevant research is relatively few, and in the field of China, questionnaires and scales are mainly used, and in the aspect of content evaluation, only basic information of the criminals is generally considered, and data dimensions and data scale are few. Meanwhile, many relevant researches in China only stay at a theoretical level, and a small number of scholars directly predict data of relevant personnel through a neural network, a random forest, a classification tree and the like. However, the background, living environment, and past experience of different related persons may cause different crime factors and probabilities. Therefore, it is difficult to effectively predict the crime of the relevant personnel and give early warning in time only by adopting the method.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art and provides a reinforced learning-based crime risk prediction method, which can predict the crime behaviors of related personnel and has the advantages of high prediction accuracy and high calculation speed.

A second object of the present invention is to provide a storage medium.

It is a third object of the invention to provide a computing device.

The first purpose of the invention is realized by the following technical scheme: a crime risk prediction method based on reinforcement learning comprises the following steps:

s1, obtaining training samples to form a training sample set; the training samples comprise personnel with crime forepart and crime again and personnel with crime forepart and crime again;

s2, clustering the training samples according to the continuous attribute and the classification attribute of the training samples in the training sample set, and defining the obtained clustering number as N, wherein N is a constant, namely all the training samples in the training sample set are clustered into N types;

s3, respectively constructing N corresponding neural networks aiming at the N classes obtained by clustering;

s4, inputting the continuous attributes and the classified attributes after the unique hot coding of each training sample into a neural network corresponding to the class to which the training sample belongs, and training the neural network to obtain a crime risk prediction model;

s5, aiming at the test sample needing to predict the risk of crime again, calculating the distance between the test sample and each clustering center, selecting the clustering center with the minimum distance from the test sample, and taking the trained neural network corresponding to the cluster to which the clustering center belongs as a crime again risk prediction model of the test sample;

and S6, inputting the continuous attribute and the classified attribute after the unique hot coding of the test sample into a crime risk prediction model of the test sample, and predicting the crime behavior of the test sample through the model.

Preferably, the continuous attributes include age, height, weight and cultural degree; the categorical attributes include gender, crime date, and crime type of pre-crime department.

Preferably, the process of clustering the training samples in S2 specifically includes:

s21, selecting N samples from the training sample set as initial clustering centers according to the required clustering number N; the method specifically comprises the following steps:

firstly, taking any training sample from a training sample set as a first initial clustering center;

then selecting the training sample with the maximum distance sum with the existing clustering centers from the training sample set as a new initial clustering center until N initial clustering centers are selected;

s22, calculating the distance between each training sample and each cluster center in the training sample set, and dividing each training sample to the cluster center closest to the training sample;

s23, calculating a target function according to the distance between each training sample and the clustering center of the cluster to which the training sample belongs, and judging whether the value of the target function is unchanged compared with the last calculated value;

if not, updating the clustering centers for various types according to the training samples in the current various types, and returning to the step S22;

if yes, clustering is finished.

Further, the distance between each training sample in the training sample set and the cluster center is:

wherein:

x_iexpressed as the ith training sample in the training dataset;

K_ja cluster center denoted as class j;

p represents the total number of classes of the training sample continuum attribute,

ω_mrepresenting the weight of the mth continuous type attribute;

x_ima value representing an mth continuous-type attribute of an ith training sample;

representing the average value of the m continuous type attributes of all training samples in the j class;

gamma is the weight of the categorical attribute relative to the continuous attribute;

q represents the total number of categories of the training sample typing attributes;

ω_lrepresenting the weight of the first type attribute;

t_lthe number of the median values in the first type attribute value domain;

representing the w value in the l type attribute value field;

x_ila value representing the ith type attribute of the ith training sample;

the value representing the property of the first type is

The frequency of occurrence of the training samples in class j;

weight ω of mth continuous attribute_mAnd the weight ω of the first type attribute_lRespectively as follows:

wherein:

wherein e is_mEntropy of information as the m-th continuous type attribute, e_xInformation entropy of the xth continuous type attribute, x is 1, 2,3.

E_lEntropy of information for the type I attribute, E_yInformation entropy of the y-th subtype attribute, wherein y is 1, 2,3.

Y_wlThe times of the appearance of the training sample of the w-th value of the l-th type attribute in the training sample set;

i is the total number of training samples in the set of training samples.

Further, the distance between the test sample and each cluster center is:

x_tmrepresents the test sample x_tThe value of the m-th continuous type attribute of (2)；

x_tlRepresents the test sample x_tIs determined by the value of the first type attribute of (1).

Further, the objective function F (X, P) is:

{J_jis the center of the cluster is K_jThe training sample set of class j.

Further, in step S23, according to the training samples in the current classes, the specific steps of updating the cluster centers for the classes are as follows:

step S231, calculating an average value of various continuous type attributes of all training samples for each type:

wherein n is_jIs the total number of training samples, x, in class j_imRepresents the value of the m-th continuous type attribute of the ith training sample in the jth class,

representing the average value of the m continuous type attributes of all training samples in the j class, and p represents the total number of the classes of the continuous type attributes of the training samples;

aiming at each class, obtaining the value of each class attribute of all training samples, and counting the frequency of each value in each class attribute value field:

wherein

Representing the w-th value in the l-th type attribute value field,

the value representing the property of the first type is

Is the frequency of occurrence of w-th value in the l-th type attribute value domain in the j-th class, w is 1, 2,3_l，t_lThe number of the median values in the first type attribute value domain;

step S232, the obtained

m 1, 2,3,.., p and

and l is 1, 2,3, and q is used as the attribute of the new cluster center, so as to obtain the new cluster center.

Preferably, the neural network is a BP neural network;

in step S3, initial parameters are respectively set for the N BP neural networks by using a genetic algorithm, which is specifically as follows:

step S31, randomly generating initial parameters of the BP neural network as an initial group, and setting maximum iteration times, stopping errors, cross probabilities and variation probabilities for the genetic algorithm;

step S32, selecting, crossing and mutating by adopting a championship selection strategy, a uniform crossing strategy and a uniform mutation strategy respectively;

step S33, calculating the fitness of each generation of individuals, stopping the algorithm when the fitness is smaller than a stopping error or the iteration times is larger than the maximum iteration times, and returning the last individual as the initial input parameter of the BP neural network; otherwise, the process returns to step S32.

The second purpose of the invention is realized by the following technical scheme: a storage medium comprising a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to implement the method for predicting crime risk based on reinforcement learning according to the first object of the present invention.

The third purpose of the invention is realized by the following technical scheme: a computing device stores a program that when executed by a processor implements a reinforcement learning-based crime risk prediction method according to a first object of the present invention.

Compared with the prior art, the invention has the following advantages and effects:

(1) the invention relates to a crime risk prediction method based on reinforcement learning, which comprises the steps of firstly constructing a training sample set, and then clustering the training sample set; respectively constructing N corresponding BP neural networks for the N classes obtained by clustering; inputting the attributes of each training sample into a BP neural network corresponding to the class to which the training sample belongs, and training the BP neural network to obtain a crime risk prediction model; aiming at a test sample needing to predict the risk of the crime again, calculating the distance between the test sample and each cluster center, selecting the cluster center with the minimum distance from the test sample, taking a trained neural network corresponding to the cluster to which the cluster center belongs as a crime again risk prediction model of the test sample, inputting the attribute of the test sample into the crime again risk prediction model of the test sample, and predicting the crime again behavior of the test sample through the model. Therefore, the training samples are clustered by the clustering method, the training samples are divided according to the attributes, and then the neural networks are respectively established for all types, so that the data input by each neural network has similar characteristics, the training data is more targeted, the crime forecasting effect is more real, effective and accurate, and the calculating speed is higher.

(2) According to the method for predicting the crime risk based on reinforcement learning, N points with the farthest distance are selected instead of a random selection mode in the selection of the initial clustering center in the clustering process, so that the selected initial clustering center can have a larger average difference degree, the situation that the selected initial clustering center is trapped in local optimum can be avoided to a larger extent, the algorithm is more stable, and the clustering result cannot cause larger fluctuation due to the randomness of the selection of the initial clustering center.

(3) In the reinforced learning-based crime risk prediction method, when the distance between the sample and the clustering center is calculated in the clustering process, and when the distance between the classification attribute and the clustering center is described, the continuity attribute is referred to and the attribute and the clustering center are described through the Euclidean distance, compared with the existing clustering algorithm that the dissimilarity degree between the object and the clustering center cannot be well described through the Hamming distance formula only by adopting a simple 0-1 matching mode, the method adopts a novel mode of describing the distance between the classification attribute and the clustering center, so that the measurement mode of the dissimilarity degree of the classification attribute is more uniform, and is uniform with the measurement mode of the continuity attribute, and is more convincing. Meanwhile, the method describes the information quantity contained in each attribute through the information entropy, and gives corresponding weight to each attribute, so that the importance degree of each attribute and the influence on the final result can be more accurately expressed.

(4) In the reinforced learning-based crime risk prediction method, the initial parameters are set for the BP neural network by adopting the genetic algorithm, the BP neural network can approach to any continuous function theoretically, and the genetic algorithm can better solve the defect that the BP neural network is easy to fall into the local optimal solution, so that the BP neural network can obtain the global optimal solution more easily, and meanwhile, the better initial parameters are also beneficial to accelerating the convergence speed of the BP neural network.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a flow chart of training sample clustering in the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

The embodiment discloses a rethinking risk prediction method based on reinforcement learning, which can be used for predicting criminal behaviors for persons related to president departments so as to monitor and interfere the persons in a targeted manner and reduce the influence of rethinking on the society. The steps of the crime risk prediction method of the embodiment are shown in fig. 1, and include:

s1, obtaining training samples to form a training sample set; the training samples include those with pre-criminal disciplines and crime retrenactivity and those with pre-criminal disciplines and crime retrenactivity.

As in table 1, it is assumed that the training sample set includes the following training samples, and the data of each training sample is as follows:

TABLE 1

Numbering	Sex	Age (age)	Type of crime	Whether or not to crime again
					a1	For male	19	Violence	Is that
a2	For male	27	Theft prevention	Whether or not
					a3	For male	45	Robbery	Is that
a4	Woman	33	Theft prevention	Whether or not
					a5	For male	26	Violence	Whether or not
a6	Woman	16	Theft prevention	Whether or not
					a7	For male	52	Theft prevention	Whether or not

In table 1, the age is the continuous attribute of the training sample, and the gender and the crime type are classified attributes. In this embodiment, the continuous attributes of the training sample may include age, height, weight, and cultural degree; the typing attributes may include gender, crime date, and crime type of pre-crime department.

In this embodiment, the continuous type attribute of the training sample is normalized to obtain the data shown in table 2:

TABLE 2

Numbering	Sex	Age (age)	Type of crime	Whether or not to crime again
					a1	For male	0.08	Violence	Is that
a2	For male	0.31	Theft prevention	Whether or not
					a3	For male	0.81	Robbery	Is that
a4	Woman	0.47	Theft prevention	Whether or not
					a5	For male	0.28	Violence	Whether or not
a6	Woman	0	Theft prevention	Whether or not
					a7	For male	1	Theft prevention	Whether or not

And S2, firstly, performing primary processing on each training sample acquired in the step S1, and removing redundant items and missing items. The redundant items refer to attributes which have no influence on the result of the crime prediction in the training samples. The missing item refers to that a certain attribute value of a training sample is null, and attributes with a large number of missing values are removed.

And then clustering the training samples according to the continuous type attribute and the classification type attribute of each training sample in the training sample set, and defining the obtained clustering number as N, wherein N is a constant, namely all the training samples in the training sample set are clustered into N types. In this embodiment, N may be 2 to 4.

In this embodiment, as shown in fig. 2, the specific steps of clustering each training sample are as follows:

then, selecting the training sample with the maximum distance sum with the existing clustering centers from the training sample set as a new initial clustering center, namely the next initial distance center, until N initial clustering centers are selected;

in this embodiment, if N is 2,2 initial cluster centers are selected in step S21; and after the 1 st clustering center is selected, selecting the training sample with the farthest distance from the 1 st clustering center in the training sample set as the 2 nd clustering center.

If N is 3,2 initial cluster centers are selected in step S21; and after the 2 nd clustering center is selected, selecting the training sample with the largest distance sum of the 1 st clustering center and the 2 nd clustering center in the training sample set as the 3 rd clustering center.

In the present embodiment, for any one cluster center K_jDefining the mth continuous attribute of the class to which the cluster center belongs as the average value of the mth continuous attributes of all training samples in the class

Categorizing attributes of the class to which the cluster center belongs

Is defined as:

wherein any classification type attribute of the cluster center

Wherein, therein

Representing the w-th value in the l-th type attribute value field,

the value representing the property of the first type is

Is the frequency of occurrence of w-th value in the l-th type attribute value domain in the j-th type, w is 1, 2,3_l，t_lThe number of the median values in the first type attribute value domain; in the present embodiment, the data shown in table 1, wherein the classification type attribute has only two categories, i.e. gender and crime type, so l is gender or crime type, wherein when l is gender, the first classification type attribute value range is (male, female), i.e. the value range includes 2 values, i.e. male and female, t is t_lIs 2, i.e. w is 1 or 2,

in the case of male, the male is,

is a woman. When l is a crime type, the first-type attribute value range is (violence, theft, robbery), that is, the value range includes 3 values, t_lIs 2, i.e. w is 1 or 2,

in order to be violent,

in order to avoid theft,

is robbery.

In this embodiment, if a2 is randomly chosen as the first initial cluster center,since clustering has not been started yet, and there are no other training samples in the class to which the cluster center belongs, only the cluster center itself, based on the above, the m-th continuum attribute, i.e., the continuous attribute of the age, of the class to which the cluster center belongs is:

as shown in Table 1, the continuous type attribute includes only 1, so m and p are both 1. Based on the above, the i-th type attribute of the class to which the cluster center belongs is:

i.e. when l is 1,

and

corresponding to 1.0 and 0, respectively.

I.e. when l is 2,

corresponding to 0, 1.0 and 0, respectively.

Wherein the 1 st and 2 nd classification attributes are gender and crime type classification attributes respectively.

In this embodiment, the formula d (x) is calculated by clustering the training samples with the cluster centers_i，K_j) Comprises the following steps:

wherein:

x_iexpressed as the ith training sample in the training dataset;

K_ja cluster center denoted as class j;

ω_mrepresenting the weight of the mth continuous type attribute;

γ is a weight of the categorical attribute with respect to the continuous attribute, and in the present embodiment, γ takes 0.8 to 1.2.

ω_lrepresenting the weight of the first type attribute;

t_lthe number of the median values in the first type attribute value domain;

representing the w value in the l type attribute value field;

x_ila value representing the ith type attribute of the ith training sample;

the value representing the property of the first type is

The frequency of occurrence of the training samples in class j;

wherein:

x_i′mThe value of the m continuous type attribute of the ith training sample is obtained;

i is the total number of training samples in the set of training samples.

Based on the data as shown in Table 1, e_mI.e. e₁Comprises the following steps:

wherein

To obtain the final e₁Is 0.81.

For the categorical attribute gender, i.e., when l is 1, t_lIs the number of 2, and the number of the second,

wherein the content of the first and second substances,

finally obtain E₁Is 0.86.

In the same way, obtain₂When l is 2, the attribute corresponding to the first classification is crime type.

Based on the information entropy obtained above, the weight ω of age, which is the 1 st continuous attribute_mComprises the following steps:

weight ω of 1 st type attribute, i.e., age_lComprises the following steps:

ω_l＝0.31，l＝1；

weight omega of 2 nd type attribute, namely crime type_lComprises the following steps:

ω_l＝0.28，l＝2。

and S22, calculating the distance between each training sample and each clustering center in the training sample set based on the calculation formula of the training samples and the clustering centers, and dividing each training sample into the nearest clustering centers.

Based on the data in tables 1 and 2, in the present embodiment, the distance between each training sample and each cluster center is calculated, for example, for training sample a1, the cluster center K of the 1 st class is calculated₁And class 2 center K₂If K is₁And K₂The initial cluster center obtained in the above step S21 is based on the step S2ω derived from 1_m、ω_lThe above values of (a) can be obtained:

d(a₁，K₁)＝0.41×(0.08-0.31)²+0.31×((1-1)²+0²)+0.28×((1-0)²+1²+0²)＝0.5817；

d(a₁，K₂)＝0.41×(0.08-0.81)²+0.31×((1-1)²+0²)+0.28×((1-0)²+0²+1²)＝0.7785；

thus, the training sample a1 is divided into the clustering centers K₁In the class (c); similarly, calculate a 2-a 7 and K₁And K₂Based on the calculation result, a2 is divided into cluster centers K₁In the class of (2), a3 is divided into clustering centers K₂In the class of (2), a4 is divided into a clustering center K₁In the class of (2), a5 is divided into a clustering center K₁In the class of (2), a6 is divided into a clustering center K₁In the class of (2), a7 is divided into a clustering center K₁In (2) class (iii).

And S23, calculating an objective function according to the distance between each training sample and the cluster center of the cluster to which the training sample belongs, and judging whether the value of the objective function is unchanged compared with the last calculated value.

if yes, clustering is finished.

In this embodiment, if the cluster center in step S22 is the initial cluster center, that is, if the clustering is performed for the first time, there is no last calculated value of the objective function in step S23, and at this time, it is determined that there is a transformation between the currently calculated objective function and the last calculated value.

In this embodiment, the objective function F (X, P) is:

{J_jis the center of the cluster is K_jThe training sample set of class j.

In this embodiment, the specific process for updating the cluster centers for each type is as follows:

step S232, aiming at each type, calculating the average value of various continuous type attributes of all training samples:

wherein

Representing the w-th value in the l-th type attribute value field,

the value representing the property of the first type is

Is the frequency of occurrence of w-th value in the l-th type attribute value domain in the j-th class, w is 1, 2,3_l，t_lThe number of values in the l-th type attribute value domain.

Step S232, the obtained

m 1, 2,3,.., p and

S3, respectively constructing N corresponding BP neural networks aiming at the N classes obtained by clustering; in this embodiment, initial parameters are respectively set for the N BP neural networks by using a genetic algorithm, which is specifically as follows:

in this example, 4 individuals were randomly generated as the initial population of the genetic algorithm, and the individual samples were [1.0,1.1,1.2,1.3,1.3,1.5], [1.6,1.7,1.8,1.9,2.0,2.1], [2.2,2.3,2.4,2.5,2.6,2.7], [2.8,2.9,3.0,3.1,3.2,3.3 ].

The maximum iteration number is set to be 50 for the genetic algorithm, the stop error is 0.1, the cross probability is 0.5, and the variation probability is 0.05.

Step S32, inputting training samples, calculating the fitness of individuals according to the cross entropy cost function of the BP neural network, and respectively adopting a championship selection strategy, a uniform crossing strategy and a uniform mutation strategy to perform selection, crossing and mutation operations, specifically:

firstly, a tournament strategy is adopted for selection operation. Randomly selecting two individuals from the initial sample for comparison, and taking the individual with lower fitness as a descendant individual; this step is repeated until the number of children is consistent with the number of parents.

Then, a uniform crossing strategy is adopted for crossing operation, every two selected filial generation individuals are paired, and each gene (in this example, a floating point number) of the two paired individuals is exchanged according to crossing probability to form two new individuals.

And finally, carrying out mutation operation by adopting a uniform mutation strategy. For each individual progeny generated after crossover, three genes are randomly assigned, random numbers are generated from a designated range (in this case, -5.0-5.0) in a uniform distribution, and the genes are replaced with variation probabilities.

Step S33, calculating the fitness of each generation of individuals, stopping the algorithm when the fitness is smaller than a stopping error or the iteration times is larger than the maximum iteration times, taking the optimal individual as the initial parameter of the neural network, namely returning the last individual as the initial input parameter of the BP neural network, or returning to the step S32;

in this embodiment, the BP neural network has a three-layer structure, the cost function is a cross entropy cost function, and the activation function is a sigmoid function.

And S4, inputting the continuous attribute and the classified attribute after the unique hot coding of each training sample into a BP neural network corresponding to the class to which the training sample belongs, and training the BP neural network to obtain a crime risk prediction model.

In this embodiment, the classification type attribute of each training sample after clustering is subjected to one-hot coding, and the continuous type attribute does not need one-hot coding. Based on the data of the training samples shown in table 2, the results of the data after one-hot encoding are shown in table 3:

TABLE 3

And S5, aiming at the test sample needing to predict the crime risk, calculating the distance between the test sample and each cluster center obtained after the clustering in the step S2 is completed, selecting the cluster center with the minimum distance from the test sample, and taking the trained neural network corresponding to the cluster to which the cluster center belongs as a crime risk prediction model of the test sample.

In this embodiment, referring to step S2, the distance between the test sample and each cluster center is obtained by training a cluster calculation formula between the sample and each cluster center:

x_tmrepresents the test sample x_tThe value of the mth type continuous-type attribute of (1);

It should be noted that: table 1 above shows training sample data in a hypothetical training sample set, and when the method of the present embodiment is actually executed, the total number of training samples in the training sample set may be more than 6 ten thousand, where the number of training samples with crime again accounts for 10% of the total number.

Example 2

The storage medium includes a processor and a memory for storing a program executable by the processor, and is characterized in that when the processor executes the program stored in the memory, the method for predicting crime risk based on reinforcement learning according to embodiment 1 is implemented as follows:

acquiring training samples to form a training sample set; the training samples comprise personnel with crime forepart and crime again and personnel with crime forepart and crime again;

clustering the training samples according to the continuous attribute and the classification attribute of each training sample in the training sample set, and defining the obtained clustering number as N, wherein N is a constant, namely all the training samples in the training sample set are clustered into N classes;

respectively constructing N corresponding neural networks for the N classes obtained by clustering;

inputting the continuous attribute and the classified attribute after the unique hot coding of each training sample into a neural network corresponding to the class to which the training sample belongs, and training the neural network to obtain a crime risk prediction model;

aiming at a test sample needing to predict the risk of the crime again, calculating the distance between the test sample and each clustering center, selecting the clustering center with the minimum distance from the test sample, and taking a trained neural network corresponding to the cluster to which the clustering center belongs as a crime again risk prediction model of the test sample;

inputting the continuous attribute and the classified attribute after the unique hot coding of the test sample into a crime risk prediction model of the test sample, and predicting the crime behavior of the test sample through the model.

In this embodiment, the storage medium may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.

Example 3

The present embodiment discloses a computing device, which stores a program that, when executed by a processor, implements the reinforcement learning-based crime risk prediction method according to embodiment 1, as follows:

In this embodiment, the computing device may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, or a tablet computer.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A crime risk prediction method based on reinforcement learning is characterized by comprising the following steps:

2. The reinforcement learning-based crime risk prediction method of claim 1, wherein the continuum attributes include age, height, weight, and cultural degree; the categorical attributes include gender, crime date, and crime type of pre-crime department.

3. The method for predicting crime risk based on reinforcement learning of claim 1, wherein the process of clustering the training samples in S2 specifically includes:

if yes, clustering is finished.

4. The reinforcement learning-based crime risk prediction method according to claim 3, wherein the distance between each training sample in the training sample set and the cluster center is:

wherein:

x_iexpressed as the ith training sample in the training dataset;

K_ja cluster center denoted as class j;

ω_mrepresenting the weight of the mth continuous type attribute;

ω_lrepresenting the weight of the first type attribute;

t_lthe number of the median values in the first type attribute value domain;

representing the w value in the l type attribute value field;

x_ila value representing the ith type attribute of the ith training sample;

the value representing the property of the first type is

The frequency of occurrence of the training samples in class j;

wherein:

i is the total number of training samples in the set of training samples.

5. The reinforcement learning-based crime risk prediction method according to claim 4, wherein the distance between the test sample and each cluster center is:

6. The reinforcement learning-based crime risk prediction method according to claim 4, wherein the objective function F (X, P) is:

{J_jis the center of the cluster is K_jThe training sample set of class j.

7. The method for predicting crime risk based on reinforcement learning of claim 3, wherein in step S23, the specific steps of updating the cluster center for each class according to the training samples in the current class are as follows:

wherein

Representing the w-th value in the l-th type attribute value field,

the value representing the property of the first type is

step S232, the obtained

And

and obtaining a new clustering center as the attribute of the new clustering center.

8. The reinforcement learning-based crime risk prediction method of claim 1, wherein the neural network is a BP neural network;

9. A storage medium comprising a processor and a memory for storing a processor-executable program, wherein the processor, when executing the program stored in the memory, implements the reinforcement learning-based crime risk prediction method of any one of claims 1-8.

10. A computing device storing a program that, when executed by a processor, implements the reinforcement learning-based crime risk prediction method of any one of claims 1-8.