CN101187927B - Criminal case joint investigation intelligent analysis method - Google Patents

Criminal case joint investigation intelligent analysis method Download PDF

Info

Publication number
CN101187927B
CN101187927B CN2007100508540A CN200710050854A CN101187927B CN 101187927 B CN101187927 B CN 101187927B CN 2007100508540 A CN2007100508540 A CN 2007100508540A CN 200710050854 A CN200710050854 A CN 200710050854A CN 101187927 B CN101187927 B CN 101187927B
Authority
CN
China
Prior art keywords
case
vector
component
similarity
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2007100508540A
Other languages
Chinese (zh)
Other versions
CN101187927A (en
Inventor
刘启和
张建中
陈雷霆
闵帆
何明耘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN2007100508540A priority Critical patent/CN101187927B/en
Publication of CN101187927A publication Critical patent/CN101187927A/en
Application granted granted Critical
Publication of CN101187927B publication Critical patent/CN101187927B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an intelligent analysis approach for accurately and efficiently searching the consolidated text and the image of a criminal case strand. The approach includes the steps that the image of a database and the data information of a text are distilled to form a multidimensional vector feature of each case; an operational formula is defined for continuous data and discrete symbolic data to perform uniform treatment and calculation; the multidimensional vector is endowed with different weights; a rough and reduced technology is adopted, the multidimensional vector feature of each case is performed with the dimensionality reduction; the similarity of the case to be analyzed and each case of the database with a reduced vector is calculated, a serial parallel case which is relevant to the case to be analyzed in the database is found out. The adoption of the invention can combine the experience and the knowledge of an analyst to perform an agile and flexible search and survey interactively, provide the more accurate information of the serial parallel case for the crime-solving personal and improve the crime-solving efficiency.

Description

A kind of intelligent analysis method of combining related cases of criminal case
Technical field
The present invention relates to a kind of intelligent analysis method of combining related cases of criminal case.
Background technology
Criminal case detection personnel can from a large amount of information of criminal-scene collection and with these information stores computer system, as the footprint photo, scene photograph, the feature description of object in situ, information is described or the like in the crime place.These information formats are various, and discrete numbers and symbols is arranged, and text message arranged, image information.At the crime of current strange land, the crime of stream string, the characteristics of committing a crime continuously, criminal case intelligence analysis personnel are according to current case feature, need carry out complicated retrieval and comparison to the case that has taken place in the computer system, finding which case may be to be done by same people or clique, thereby provide a large amount of evidence and clues for cracking of cases work.The disposal route of combining related cases in the criminal case mainly is the analysis and the processing of combining related cases by simple retrieval and artificial comparison at present.Adopt its efficient of this method extremely low, along with analyst's workload increasing and degree of fatigue increase, the accuracy of its manual analysis also reduces greatly, and then has influenced the speed and the efficient of solving a case.Though it is of common occurrence at present respectively image to be carried out the method that collection apparatus and text classify, but because the data characteristics that criminal-scene is gathered is various, both comprised text data, comprise view data again, existing discrete data, comprise continuous data again, in addition, analyst's the mode complexity of combining related cases is various, need a kind ofly can carry out multiple array configuration and finish the analysis of combining related cases information, the computing machine that uses current known image retrieval technologies or text retrieval technology to be difficult to the to be applied in criminal case assistant analysis system that combines related cases, the efficient that can not satisfy combines related cases analyzes and the requirement of accuracy.
Summary of the invention
The objective of the invention is: provide a kind of and can carry out accurately image again, the intelligent analysis method of combining related cases of the criminal case of efficient retrieval text.
Goal of the invention of the present invention realizes by implementing following technical proposals:
A kind of intelligent analysis method of combining related cases of criminal case comprises the steps:
Step 1, respectively the image of each case in the database and the feature of text are extracted;
The image of step 2, each case that will extract from database and a character representation of text are an one-dimensional vector of this case, all features of the image that extracted in each case and text are formed a multi-C vector of each case;
Step 3, give weights to each one-dimensional vector of each case; Similarity in the computational data storehouse between the case obtains similarity matrix; Assign thresholds calculates the field of each case again, obtains the field rough set system 1 of database;
Step 4, the multi-C vector of each case is carried out the dimension yojan;
The practice is the one-component that removes in each case multi-C vector; Give weights to remaining each vector of each case then, calculate the similarity that lacks between this component case, the use threshold calculations identical with step 3 lacks the field of this each case of component, obtains the field rough set system 2 of database; Field rough set system 1 and each case of the database that comparison step 3 obtains removed the field rough set system 2 of the database that one-component obtains, if the two significant difference, this component can not remove, the dimension of multi-C vector can not yojan, if the two difference is little, the component that each case is removed should be by yojan, and then the dimension of the multi-C vector of each case is by yojan; Should repeat this step practice to other each component in each case multi-C vector, component that can yojan removes, and the component that reservation can not yojan has obtained each case multi-C vector of simplifying approximately at last;
Step 5, calculate in case to be analyzed and the database, from database, find out combine related cases part related with it by the similarity between each case behind the yojan vector;
If step 6 can not get satisfied result from step 5, then should in step 3, readjust the weights of each case vector, and the threshold value of adjusting each case field of calculating, repeating step 3 is to the method for step 5, up to the result of the part that obtains to combine related cases; Wherein: a character representation of the image of each case that will extract from database described in the step 2 and text is the one-dimensional vector of this case, " feature " of image described here and text, be meant that the attribute-property value that is expressed as in the following table is right, the pairing property value of each attribute that is each case is its one-dimensional vector, and all attribute-property values are to having formed the multi-C vector of each case:
Attribute Property value
Characteristic point position in the image The pixel value of unique point
Speech in the text data The frequency of speech in text
The text that is used for specific description Discrete data
The numeral that is used for specific description Continuous data
The text that is used for specific description can comprise tool used in crime, and its property value can comprise cutter, rifle; Can be perpetrator's number, its property value is a discrete data; Can be the length of on-the-spot footprint, its property value is a continuous data; It is characterized in that:
1) each case C iAll be represented as a n-dimensional vector: (v I1, v I2..., v In), both comprised the continuous number data in the vector, also comprise the discrete symbols data, establish v, s is the property value from same attribute, is defined as follows computing ' ':
Figure GA20176371200710050854001D00031
2) image and the text feature of the case of extraction described in the step 1, should extract as follows:
The average gradient of each pixel square matrix in step 1-1, the computed image:
N ( x , y ) = ( ∂ I ∂ x ) 2 ∂ I ∂ x ∂ I ∂ y ∂ I ∂ x ∂ I ∂ y ( ∂ I ∂ x ) 2
Wherein I (x, y) be in the image (x, gray-scale value y), when point (x, when y) two eigenwerts of Dui Ying average gradient square matrix are big, this point (x is a unique point y), and the unique point response function is:
R=det(N)-k(trace(N)) 2
Wherein det (N) is the determinant of a matrix value, trace (N) is the mark of matrix N, k is 0.04, by the R value picture element in the image is carried out descending sort, constitute an ordered series of numbers, determine a required unique point number F, preceding F picture element is unique point in the peek row then, and the positional information of unique point is formed a vector of unique point;
Step 1-2, the text feature of case extracted carries out as follows:
Text is carried out participle and part-of-speech tagging, remove function word wherein, remaining speech is designated as v 1, v 2..., v nCalculate each speech v iWord frequency in text is designated as p i, be dimension with the speech, obtain a word frequency vector (p 1, p 2..., p n);
3) weight vector in the step 3 is carried out following normalized:
The weights r of each component of step 3-1, case multi-C vector iConstituted the weights of multi-C vector, therefore, be designated as weight vector the weights of multi-C vector:
R=(r 1,r 2,...,r m),
Step 3-2, weight vector R formula calculated as described below carry out normalized:
W = ( r 1 Σ i = 1 n r i , r 2 Σ i = 1 n r i , . . . , r n Σ i = 1 n r i ) = ( w 1 , w 2 , . . . , w n )
W=(w herein 1, w 2..., w n) be normalized weight vector
4) weights of the vector of utilization described in the step 3 calculate the similarity between case, should adopt following computing method to calculate:
Similarity in step 3-3, the computational data storehouse between two cases;
If C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), C then 1And C 2Between similarity calculate according to following formula:
S ( C 1 , C 2 ) = Σ i = 1 n w i ( v i · s i ) Σ i = 1 n v i · v i Σ i = 1 n s i · s i
Here w 1Be the i component of normalization weight vector, v iV i, s iS iBe C 1, C 2' ' computing of each case vector self i component, (v iS i) be C 1And C 2' ' computing of corresponding i component between two cases;
Similarity in step 3-4, the computational data storehouse between all cases obtains similarity matrix:
If C is arranged in the database 1, C 2..., Cm case, each case are represented as a n-dimensional vector, by step 3-3, calculate the similarity between any two cases, and the similarity matrix that obtains all cases is as follows:
MS = S ( C 1 , C 1 ) , S ( C 1 , C 2 ) , . . . , S ( C 1 , C m ) S ( C 2 , C 1 ) , S ( C 2 , C 2 ) , . . . , S ( C 2 , C m ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S ( C m , C 1 ) , S ( C m , C 2 ) , . . . , S ( C m , C m )
Step 3-5, assign thresholds K are according to K and similarity matrix, to any case C i, calculate case C according to following formula iField N (C i):
N(C i)={C j|S(C i,C j)≤K,j∈{1,2,...,m}},
Step 3-6, to each case in the database, calculate its field, it is as follows to obtain field rough set system 1:
NS={N(C 1),N(C 2),...,N(C m)}
5) it is as follows described in the step 4 multi-C vector to be carried out the step of dimension yojan:
Step 4-1: the case of establishing in the database has n component, allows F={1, and 2 ..., n} establishes C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), from F, remove any one component i, allow F=F-{i}, calculate the similarity of two cases after removing again, its formula is as follows: S ( C 1 , C 2 ) = Σ i ∈ F w i ( v i · s i ) Σ i ∈ F v i · v i Σ i ∈ F s i · s i ,
Step 4-2: the similarity according to step 4-1 calculates, adopt step 3-5, that step 3-6 obtains a field rough set system 2 is as follows:
NS *={N *(C 1),N *(C 2),...,N *(C m)}
Step 4-3: to field rough set NS of system and NS *Compare its difference, definition:
L = 1 2 n × Σ i = 1 n | N ( C i ) ∩ N * ( C i ) | | N ( C i ) | + | N ( C i ) ∩ N * ( C i ) | | N * ( C i ) | ,
L has described field NS of rough set system and NS *Between difference degree, its value is big more, difference is more little, when L less than specified threshold value, { i} promptly can not remove component i, otherwise removes component i to allow F=F ∪;
Step 4-4: repeating step 4-1 can not remove till the component in F to step 4-3 again, obtains the multi-C vector F after the yojan;
6) described case to be analyzed of step 5 and data of database similarity analysis step are as follows:
Step 5-1, to the case Cp of appointment, on the vectorial F after the yojan, calculate the similarity of each case in Cp and the database, it is as follows to obtain similarity vector:
(S(C p,C 1),S(C p,C 2),...,S(C p,C m)),
The field of step 5-2, calculating Cp is as follows:
N(C p)={C j|S(C p,C j)≤K,j∈{1,2,...,m}},
Field N (C wherein p) in case for the case Cp part of combining related cases.
Before described step 1-1, adopt following steps that image is carried out pre-service:
Step 1-0, data acquisition is carried out in specific region in the image such as footprint;
Step 1-01, determine footprint zone in the photo, comprise the forward position point and the heel point of footprint;
Step 1-02, footprint forward position point and heel point are connected to a line segment, the mid point of getting this line segment is an initial point, and this line segment is made as the y axle, and its vertical straight line is the x axle, sets up area coordinate system, and the position of each picture element in this coordinate system in the zoning.
The present invention is by extracting the useful feature vector in image from database and the text, and user's knowledge is mapped as a kind of weight vector, in conjunction with this weight vector, use rough set theory, each component to vector carries out dynamic yojan and selection, carries out similarity then above the vector after yojan and calculates the analysis of combining related cases that realizes case.The present invention unifies to handle and calculate to continuous data and discrete symbols data, define a kind of operational formula, avoid comprising continuous data in the multi-C vector, can't calculate positive region during yojan, maybe need and to calculate positive region, the drawback that causes a large amount of useful informations to lose after the continuous data discretize again; Adopt the present invention's energy binding analysis personnel's experience and knowledge interactively to carry out flexible, flexible retrieval and comparison,, improved the efficient of solving a case for the personnel of solving a case provide the information of combining related cases more accurately.
Description of drawings
Fig. 1 is a process flow diagram of the present invention.
Embodiment
The present invention is described in further detail below in conjunction with accompanying drawing:
Analyze with the data retrieved storehouse in comprise the data message of each criminal case, this information comprises and also comprising the existing case analysis result of combining related cases, to and case case together identify.
Step 1: adopt following steps respectively the view data and the text data of each case in the database to be carried out feature extraction;
Step 1-0: have the footprint photo before extracting feature, to carry out pre-service in the image;
Step 1-01: forward position point and the heel point of determining footprint zone in the photo and footprint;
Step 1-02: footprint forward position point and heel point are connected to a line segment, and the mid point of getting this line segment is an initial point, and this line segment is made as the y axle, and its vertical straight line is the x axle, sets up area coordinate system, and the position of each picture element in this coordinate system in the zoning.
Step 1-1: image-region is carried out feature point extraction according to the following steps, and its extracting method is as follows:
The average gradient of each pixel square matrix is as follows in the computed image:
N ( x , y ) = ( ∂ I ∂ x ) 2 ∂ I ∂ x ∂ I ∂ y ∂ I ∂ x ∂ I ∂ y ( ∂ I ∂ x ) 2 ,
Wherein (x y) is (x, the gray-scale value of y) locating of position in the image to I.If two eigenwerts of the average gradient square matrix that certain point is corresponding are bigger, near the bigger gray level that has this point changes so, and this just illustrates that this point is a unique point, and the unique point response function is:
R=det(N)-k(trace(N)) 2
Det (N) is the determinant of a matrix value, and trace (N) is the mark of matrix N, and k is generally 0.04.
By the R value picture element in the image is carried out descending sort, constitute an ordered series of numbers, determine a required unique point number F, preceding F picture element is unique point in the peek row then, and the positional information of unique point is formed a vector of unique point.
Step 1-2: the data of descriptors such as the vestige that the text data in the database such as way of committing offenses, tool used in crime, crime number, crime personnel shape characteristic, scene are left over, footprint length are carried out feature extraction; Text is carried out participle and part-of-speech tagging, remove function word wherein, remaining speech is designated as w 1, w 2..., w nCalculate each speech w iWord frequency in text, note p i, be dimension with the speech, obtain a vector (p 1, p 2..., p n).
Step 2: the image and the text feature of each case in the database that extracts are expressed as attribute-property value to form, be counted as attribute as the characteristic point position in the image, and the pixel value of unique point is counted as the value on this attribute, speech is counted as attribute in the text data, and the frequency of this speech in text is counted as the value on this attribute, existing discrete data and continuous data also can be organized as the form of attribute-property value in the case, as the text that is used for specific description can comprise tool used in crime, and its property value can comprise cutter, rifle; Can be perpetrator's number, its property value is a discrete data; Can be the length of on-the-spot footprint, its property value is a continuous data.
If m case arranged in the database, then the case in the database is organized into the information table into following form:
Case A 1 A 2 A n
C 1 v 11 v 12 v 1n
C 2 v 21 v 22 v 2n
C ,m v m1 v m2 v mn
Wherein, C 1, C 2..., Cm represents case, A 1, A 2..., A nRepresent n attribute, v I1, v I2..., v InExpression case C iRespectively at attribute A 1, A 2..., A nOn value, like this, the every row in the table is exactly the data vector of a case.Each case C iAll be represented as a n-dimensional vector: (v I1, v I2..., v In), both comprised the continuous number data in the vector, also comprise the discrete symbols data, establish v, s is the property value from same attribute, is defined as follows computing:
Figure GA20176371200710050854001D00071
Property value v as a footprint length is 19.85cm, and the property value s of a footprint length is 19.80cm, is continuous data, and these two property values are defined as vs=19.80*19.85;
Property value v as a tool used in crime is a cutter, and the property value s of a tool used in crime is a rifle, v ≠ s, then vs=0;
As the number of the personnel's that commit a crime property value v is 3, and the number of crime personnel's property value s is 3, v=s, then vs=1
Step 3: each dimensional vector to case is given weights; Similarity between the case of computational data storehouse obtains similarity matrix; Assign thresholds, the field of calculating each case obtains the field rough set system 1 of database;
Step 3-1: the analyst is in conjunction with experimental knowledge and analysis mode each dimensional vector such as the footprint photo to case, and way of committing offenses is given weights, because the n-dimensional vector of each case is all from data such as footprint photo, ways of committing offenses.Use these weights give weights can for each component of n-dimensional vector, the weights of each component of n-dimensional vector are identical with the weights of vector.Its method is: if some component is all from same data in the n-dimensional vector, from the footprint photo, then these components all are endowed the weights of footprint photo as all.Like this, each component in the n-dimensional vector all has weights, remembers that this weight vector is
P=(p 1,p 2,...,p n),
Step 3-2: weight vector shows the attention degree to data, and the analyst obtains the analysis side emphasis by regulating weight vector.For example, if only use the analysis of combining related cases of footprint photo, then the weights of footprint photo can be provided with 1, and the weights of other data are set to 0.With weight vector P normalization, it is as follows to obtain normalized weight vector W:
W = ( p 1 Σ i = 1 n p i , p 2 Σ i = 1 n p i , . . . , p n Σ i = 1 n p i ) = ( w 1 , w 2 , . . . , w n ) ,
Step 3-3: establish C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), then the similarity between C1 and the C2 is calculated according to following formula:
S ( C 1 , C 2 ) = Σ i = 1 n w i ( v i · s i ) Σ i = 1 n v i · v i Σ i = 1 n s i · s i .
Similarity in step 3-4, the computational data storehouse between all cases obtains similarity matrix:
If C is arranged in the database 1, C 2..., Cm case, through step 2, each case is represented as a n-dimensional vector.Therefore, by step 33, calculate the similarity of any two cases, it is as follows to obtain similarity matrix:
MS = S ( C 1 , C 1 ) , S ( C 1 , C 2 ) , . . . , S ( C 1 , C m ) S ( C 2 , C 1 ) , S ( C 2 , C 2 ) , . . . , S ( C 2 , C m ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S ( C m , C 1 ) , S ( C m , C 2 ) , . . . , S ( C m , C m )
Step 3-5: assign thresholds K, according to K and similarity matrix, to any case C i, according to the field N (Ci) of following formula calculating case Ci,
N(C i)={C j|S(C i,C j)≤K,j∈{1,2,...,m}}
Step 3-6: to each case in the database, calculate its field, it is as follows so just to obtain field rough set system 1:
NS={N(C 1),N(C 2),...,N(C m)}。
Step 4: the multi-C vector to each case adopts top-down reduction method, carries out the dimension yojan; Concrete steps are as follows:
Step 4-1: the case of establishing in the database has n component, allows F={1, and 2 ..., n} establishes C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), from F, remove any one component i, allow F=F-{i}, calculate the similarity of two cases after removing again, its formula is as follows:
S ( C 1 , C 2 ) = Σ i ∈ F w i ( v i · s i ) Σ i ∈ F v i · v i Σ i ∈ F s i · s i ,
Step 4-2: the similarity among the 4-1 set by step, it is as follows to obtain a field rough set system 2 once more according to step 3-5, step 3-6:
NS *={N *(C 1),N *(C 2),...,N *(C m)},
Step 4-3: to field rough set NS of system and NS *Compare its difference, definition:
L = 1 2 n × Σ i = 1 n | N ( C i ) ∩ N * ( C i ) | | N ( C i ) | + | N ( C i ) ∩ N * ( C i ) | | N * ( C i ) | ,
L has described field NS of rough set system and NS *Between difference degree, its value is big more, difference is more little.When L less than certain specified threshold value, { i} promptly can not remove component i, otherwise removes component i to allow F=F ∪.
Step 4-4: repeating step 4-1 can not remove till the component in F to step 4-3 again.Obtain the multi-C vector F after the yojan.
Step 5: calculate the case to be analyzed and the similarity of each case of database after the vectorial yojan, find out database its related part of combining related cases that neutralizes.
Step 5-1: to the case Cp of appointment, on the vectorial F after the yojan, calculate the similarity of each case in Cp and the database, it is as follows to obtain similarity vector:
(S(C p,C 1),S(C p,C 2),...,S(C p,C m)),
Step 5-2: the field of calculating CP is as follows:
N(C p)={C j|S(C p,C j)≤K,j∈{1,2,...,m}},
Field N (C wherein p) in case for the case Cp part of combining related cases.
Step 6: if the analyst is dissatisfied to the result in the step 5, then should in step 3, readjust the weights of each case vector, and the threshold value k that adjusts each case field of calculating, repeating step 3 is to the method for step 5, the new part result that combines related cases that gets back is till the analyst is satisfied.
The present invention is by extracting the useful feature vector in image from database and the text, and user's knowledge is mapped as a kind of weight vector, in conjunction with this weight vector, use rough set theory, each component to vector carries out dynamic yojan and selection, carries out similarity then above the vector after yojan and calculates the analysis of combining related cases that realizes case.The present invention unifies to handle and calculate to continuous data and discrete symbols data, define a kind of operational formula, avoid comprising continuous data in the multi-C vector, can't calculate positive region during yojan, maybe need and to calculate positive region, the drawback that causes a large amount of useful informations to lose after the continuous data discretize again; Adopt the present invention's energy binding analysis personnel's experience and knowledge interactively to carry out flexible, flexible retrieval and comparison,, improved the efficient of solving a case for the personnel of solving a case provide the information of combining related cases more accurately.

Claims (3)

1. the intelligent analysis method of combining related cases of a criminal case comprises the steps:
Step 1, respectively the image of each case in the database and the feature of text are extracted;
The image of step 2, each case that will extract from database and a character representation of text are an one-dimensional vector of this case, all features of the image that extracted in each case and text are formed a multi-C vector of each case;
Step 3, give weights to each component in the multi-C vector of each case; Similarity in the computational data storehouse between the case obtains similarity matrix; Assign thresholds calculates the field of each case again, obtains the field rough set system 1 of database;
Step 4, the multi-C vector of each case is carried out the dimension yojan;
The practice is the one-component that removes in each case multi-C vector; Give weights to remaining each component in each case multi-C vector then, calculate the similarity that lacks between this component case, the use threshold calculations identical with step 3 lacks the field of this each case of component, obtains the field rough set system 2 of database; Field rough set system 1 and each case of the database that comparison step 3 obtains removed the field rough set system 2 of the database that one-component obtains, if the two significant difference, this component can not remove, the dimension of multi-C vector can not yojan, if the two difference is little, the component that each case is removed should be by yojan, and then the dimension of the multi-C vector of each case is by yojan; Should repeat this step practice to other each component in each case multi-C vector, component that can yojan removes, and the component that reservation can not yojan has obtained each case multi-C vector of simplifying approximately at last;
Step 5, calculate in case to be analyzed and the database, from database, find out combine related cases related with it by the similarity between each case behind the yojan vector;
If step 6 can not get satisfied result from step 5, then should in step 3, readjust in the multi-C vector of each case each component and give weights, and the threshold value of adjusting each case field of calculating, repeating step 3 is to the method for step 5, up to the result who obtains to combine related cases; Wherein: a character representation of the image of each case that will extract from database described in the step 2 and text is the one-dimensional vector of this case, " feature " of image described here and text, be meant that the attribute-property value that is expressed as in the following table is right, the pairing property value of each attribute that is each case is its one-dimensional vector, and all attribute-property values are to having formed the multi-C vector of each case:
Attribute Property value Characteristic point position in the image The pixel value of unique point Speech in the text data The frequency of speech in text The text that is used for specific description Discrete data The numeral that is used for specific description Continuous data
It is characterized in that:
1) each case C iAll be represented as a n-dimensional vector: (v I1, v I2..., v In), both comprised the continuous number data in the vector, also comprise the discrete symbols data, establish v, s is the property value from same attribute, is defined as follows computing ' ':
Figure FSB00000166038200021
2) image and the text feature of the case of extraction described in the step 1, should extract as follows:
The average gradient of each pixel square matrix in step 1-1, the computed image:
Figure FSB00000166038200022
Wherein
Figure FSB00000166038200023
Represent respectively I (x, y) (x is y) to the derivative of y, and I (x to the derivative of x and I, y) be that ((x is when y) two eigenwerts of Dui Ying average gradient square matrix are big when point for x, gray-scale value y) in the image, this point (x y) is a unique point, and the unique point response function is:
A=det(N)-k(trace(N)) 2
Wherein det (N) is the determinant of a matrix value, trace (N) is the mark of matrix N, k is 0.04, by the A value pixel in the image is carried out descending sort, constitute an ordered series of numbers, determine a required unique point number F, preceding F pixel is unique point in the peek row then, and the positional information of unique point is formed a vector of unique point;
Step 1-2, the text feature of case extracted carries out as follows:
Text is carried out participle and part-of-speech tagging, remove function word wherein, remaining speech is designated as v 1, v 2..., v nCalculate the word frequency of each speech vi in text, be designated as p i, be dimension with the speech, obtain a word frequency vector (p 1, p 2..., p n);
3) give weights to each component in the multi-C vector in the step 3 and carry out following normalized:
The weights r of each component of step 3-1, case multi-C vector iConstituted the weights of multi-C vector, therefore, be designated as weight vector the weights of multi-C vector:
R=(r 1,r 2,...,r m),
Step 3-2, weight vector R formula calculated as described below carry out normalized:
Figure DEST_PATH_FSB00000205252800011
W=(w herein 1, w 2..., w n) be normalized weight vector;
4) utilize each component in the multi-C vector to give weights described in the step 3 and calculate similarity between case, should adopt following computing method to calculate:
Similarity in step 3-3, the computational data storehouse between two cases;
If C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), C then 1And C 2Between similarity calculate according to following formula:
Here w iBe the i component of normalization weight vector, v iV i, s iS iBe C 1, C 2' ' computing of each case vector self i component, (v iS i) be C 1And C 2' ' computing of corresponding i component between two cases;
Similarity in step 3-4, the computational data storehouse between all cases obtains similarity matrix: establishing has C in the database 1, C 2..., Cm case, each case are represented as a n-dimensional vector, by step 3-3, calculate the similarity between any two cases, and the similarity matrix that obtains all cases is as follows:
Figure FSB00000166038200041
Step 3-5, assign thresholds K are according to K and similarity matrix, to any case C i, calculate case C according to following formula iField N (C i)
N(C i)={C j|S(C i,C j)≤K,j∈{1,2,...,m}},
Step 3-6, to each case in the database, calculate its field, it is as follows to obtain field rough set system 1:
NS={N(C 1),N(C 2),...,N(C m)}
5) it is as follows described in the step 4 multi-C vector to be carried out the step of dimension yojan:
Step 4-1: the case of establishing in the database has n component, allows F={1, and 2 ..., n} establishes C 1And C 2Be two cases, the vector of its correspondence is (v 1, v 2..., v n) and (s 1, s 2..., s n), from F, remove any one component i, allow F=F-{i}, calculate the similarity of two cases after removing again, its formula is as follows:
Figure FSB00000166038200042
Step 4-2: the similarity according to step 4-1 calculates, adopt step 3-5, that step 3-6 obtains a field rough set system 2 is as follows:
NS *={N *(C 1),N *(C 2),...,N *(C m)},
Step 4-3: to field rough set NS of system and NS *Compare its difference, definition:
Figure FSB00000166038200043
L has described field NS of rough set system and NS *Between difference degree, its value is big more, difference is more little, when L less than specified threshold value, { i} promptly can not remove component i, otherwise removes component i to allow F=F ∪;
Step 4-4: repeating step 4-1 can not remove till the component in F to step 4-3 again, obtains the multi-C vector F after the yojan;
6) described case to be analyzed of step 5 and data of database similarity analysis step are as follows:
Step 5-1, to the case Cp of appointment, on the vectorial F after the yojan, calculate the similarity of each case in Cp and the database, it is as follows to obtain similarity vector:
(S(C p,C 1),S(C p,C 2),...,S(C p,C m)),
The field of step 5-2, calculating Cp is as follows:
N(C p)={C j|S(C p,C j)≤K,j∈{1,2,...,m}},
Field N (C wherein p) in case for to combine related cases with case Cp.
2. the intelligent analysis method of combining related cases of criminal case according to claim 1 is characterized in that, before described step 1-1, adopts following steps that image is carried out pre-service:
Step 1-0, data acquisition is carried out in the specific region in the image.
3. the intelligent analysis method of combining related cases of criminal case according to claim 2 is characterized in that, described image is a footprint, and its pre-treatment step is as follows:
Step 1-01, determine footprint zone in the photo, comprise the forward position point and the heel point of footprint;
Step 1-02, footprint forward position point and heel point are connected to a line segment, the mid point of getting this line segment is an initial point, and this line segment is made as the y axle, and its vertical straight line is the x axle, sets up area coordinate system, and the position of each pixel in this coordinate system in the zoning.
CN2007100508540A 2007-12-17 2007-12-17 Criminal case joint investigation intelligent analysis method Expired - Fee Related CN101187927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007100508540A CN101187927B (en) 2007-12-17 2007-12-17 Criminal case joint investigation intelligent analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100508540A CN101187927B (en) 2007-12-17 2007-12-17 Criminal case joint investigation intelligent analysis method

Publications (2)

Publication Number Publication Date
CN101187927A CN101187927A (en) 2008-05-28
CN101187927B true CN101187927B (en) 2010-12-15

Family

ID=39480324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100508540A Expired - Fee Related CN101187927B (en) 2007-12-17 2007-12-17 Criminal case joint investigation intelligent analysis method

Country Status (1)

Country Link
CN (1) CN101187927B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346355A (en) * 2013-07-26 2015-02-11 南京中兴力维软件有限公司 Method and system for intelligent retrieval of series public security cases

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663142B (en) * 2012-05-18 2014-02-26 大连海事大学 Knowledge extraction method
CN103513991B (en) * 2013-10-17 2017-04-12 杭州安恒信息技术有限公司 Method for establishing bi-directional mapping among sequences under condition of difference limitation
CN104615600B (en) * 2013-11-04 2019-06-28 深圳力维智联技术有限公司 Similitude case compares implementation method and its device
CN103903210B (en) * 2014-03-31 2018-02-27 安徽新华博信息技术股份有限公司 A kind of analysis method of case feature string simultaneously
CN104601949A (en) * 2014-12-29 2015-05-06 天维尔信息科技股份有限公司 Delayering emergency command dispatching device, delayering emergency command dispatching system and delayering emergency command dispatching method
CN104636503A (en) * 2015-03-10 2015-05-20 浪潮集团有限公司 Method for querying data
CN105260449B (en) * 2015-10-10 2018-10-02 张福辉 A kind of case key-strings string and detection method
CN106127241A (en) * 2016-06-17 2016-11-16 中国电子科技集团公司第二十八研究所 One is combined related cases sorting technique and categorizing system of combining related cases
CN106294319A (en) * 2016-08-04 2017-01-04 武汉数为科技有限公司 One is combined related cases recognition methods
CN106327473A (en) * 2016-08-10 2017-01-11 北京小米移动软件有限公司 Method and device for acquiring foreground images
CN106355537A (en) * 2016-08-23 2017-01-25 冯村 Smart analysis method and system for interrelated cases
CN106952075A (en) * 2017-02-23 2017-07-14 北京奇虎科技有限公司 Case information report, dissemination method and equipment
CN106951906B (en) * 2017-03-22 2020-03-17 重庆市公安局刑事警察总队 Comprehensive analysis method for multi-dimensional classification and identification of sole patterns
CN107122438A (en) * 2017-04-21 2017-09-01 安徽富驰信息技术有限公司 A kind of judicial case search method and system
CN109426905B (en) * 2017-08-29 2022-03-18 北京国双科技有限公司 Criminal document criminal deviation judging method and device
CN110019697A (en) * 2017-08-29 2019-07-16 北京国双科技有限公司 A kind of method for pushing and device of criminal document
CN110020134B (en) * 2017-11-09 2021-08-13 北京国双科技有限公司 Knowledge service information pushing method and system, storage medium and processor
CN107894981A (en) * 2017-12-13 2018-04-10 武汉烽火普天信息技术有限公司 A kind of automatic abstracting method of case semantic feature
CN108228757A (en) * 2017-12-21 2018-06-29 北京市商汤科技开发有限公司 Image search method and device, electronic equipment, storage medium, program
CN108595547A (en) * 2018-04-09 2018-09-28 南京网感至察信息科技有限公司 A kind of similar case search method based on semantics extraction
CN109033351A (en) * 2018-07-25 2018-12-18 北京神州泰岳软件股份有限公司 The merging method and device of merit data
CN109118411B (en) * 2018-08-15 2022-02-22 浙江省绍兴市人民检察院 Criminal execution inspection system and method based on intelligent auxiliary platform and mobile terminal
CN111382769B (en) * 2018-12-29 2023-09-22 阿里巴巴集团控股有限公司 Information processing method, device and system
CN109947824A (en) * 2019-03-21 2019-06-28 赖雪英 Ask lonely data application processing system and its method
CN110019374B (en) * 2019-03-26 2021-03-12 杭州数梦工场科技有限公司 Feature-based data item processing method and device, storage medium and computer equipment
CN110619064A (en) * 2019-08-29 2019-12-27 苏州千视通视觉科技股份有限公司 Case studying and judging method and device based on deep learning
CN110837604B (en) * 2019-10-16 2020-12-25 贝壳找房(北京)科技有限公司 Data analysis method and device based on housing monitoring platform
CN112256809A (en) * 2020-11-13 2021-01-22 珠海大横琴科技发展有限公司 Data processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346355A (en) * 2013-07-26 2015-02-11 南京中兴力维软件有限公司 Method and system for intelligent retrieval of series public security cases

Also Published As

Publication number Publication date
CN101187927A (en) 2008-05-28

Similar Documents

Publication Publication Date Title
CN101187927B (en) Criminal case joint investigation intelligent analysis method
Qader et al. An overview of bag of words; importance, implementation, applications, and challenges
CN101877007B (en) Remote sensing image retrieval method with integration of spatial direction relation semanteme
CN106951498A (en) Text clustering method
Popat et al. Hierarchical document clustering based on cosine similarity measure
CN102968626B (en) A kind of method of facial image coupling
CN102663447B (en) Cross-media searching method based on discrimination correlation analysis
CN102364498A (en) Multi-label-based image recognition method
CN107291895B (en) Quick hierarchical document query method
CN109034035A (en) Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features
CN105975932A (en) Gait recognition and classification method based on time sequence shapelet
CN107122411A (en) A kind of collaborative filtering recommending method based on discrete multi views Hash
CN106126585A (en) Unmanned plane image search method based on quality grading with the combination of perception Hash feature
CN102902826A (en) Quick image retrieval method based on reference image indexes
CN106776950B (en) On-site shoe-print trace pattern image retrieval method based on expert experience guidance
JP4937395B2 (en) Feature vector generation apparatus, feature vector generation method and program
CN104794496A (en) Remote sensing character optimization algorithm for improving mRMR (min-redundancy max-relevance) algorithm
CN106844785A (en) A kind of CBIR method based on conspicuousness segmentation
CN113688635A (en) Semantic similarity based class case recommendation method
CN105678244A (en) Approximate video retrieval method based on improvement of editing distance
CN109934852B (en) Video description method based on object attribute relation graph
CN108153818B (en) Big data based clustering method
Shanmugavadivu et al. FOSIR: fuzzy-object-shape for image retrieval applications
CN111078859B (en) Author recommendation method based on reference times
CN110737796B (en) Image retrieval method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101215

Termination date: 20171217