CN110443320A - The determination method and device of event similarity - Google Patents
The determination method and device of event similarity Download PDFInfo
- Publication number
- CN110443320A CN110443320A CN201910745487.9A CN201910745487A CN110443320A CN 110443320 A CN110443320 A CN 110443320A CN 201910745487 A CN201910745487 A CN 201910745487A CN 110443320 A CN110443320 A CN 110443320A
- Authority
- CN
- China
- Prior art keywords
- event
- attribute
- parameter
- category attributes
- connection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Abstract
The present invention provides a kind of determination method and devices of event similarity.This method comprises: obtaining the first parameter according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event, wherein the correlation of the first parameter expression Category Attributes;The second parameter is obtained according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;The incidence coefficient between first event and second event is determined according to the first parameter and the second parameter;In the case where incidence coefficient is greater than predetermined threshold, it determines first event and second event is similar case, it solves in the related technology, during being determined to the correlation between two events, the weight of the relative coefficient between two events can not simply, be effectively obtained, and leads to the problem for judging that the accuracy of two event correlations is low.
Description
Technical field
The present invention relates to event handling fields, in particular to a kind of determination method and device of event similarity.
Background technique
In police field, there is countless behavior events daily, how to go the relevance between depth analysis event,
Help to be promoted public security officer to the identification effect of string and case, it helps make to current event with reference to historical context event
Accurate judgement.The incidence coefficient between two samples is calculated, can determine the correlation of two events.Traditional method is to calculate
The distance between two samples, distance is closer, then incidence coefficient is higher.Calculate distance method have Euclidean distance, COS distance,
Jie Kade similarity factor, mahalanobis distance etc..
But the calculation of these distances is all built upon on the basis of all properties (or variable) equal weight, that is, is thought
The importance of all properties is identical.And in practical business, the importance of each attribute is different, different even the same attribute
Value (or variate-value), importance is also different.Therefore before calculating distance, it is necessary to carry out weight tune to each attribute value
It is whole.In traditional business, often artificially the significance level of attribute is judged, and give weight, but this method subjectivity is strong, different experts
Judgement it is different, and when number of attributes is more, need to consume the plenty of time.
In the prior art, regarding to the issue above without effective technical solution.
Summary of the invention
The embodiment of the invention provides a kind of determination method and devices of event similarity, at least to solve the relevant technologies
In, during determining to the correlation between two events, it can not simply, effectively obtain the correlation system between two events
Several weights, and lead to the problem for judging that the accuracy of two event correlations is low.
According to one embodiment of present invention, a kind of determination method of event similarity is provided, comprising: obtain table respectively
Show N number of attribute of first event and indicate M attribute of second event, wherein the M, N number of attribute include: discrete category
Property and connection attribute, and M is equal with N, and is positive integer;According in N number of attribute of the first event Category Attributes and
Category Attributes in M attribute of the second event determine the first parameter, wherein first parameter indicates first thing
The correlation of Category Attributes between part and the second event;According to the connection attribute in N number of attribute of the first event
The second parameter is determined with the connection attribute in M attribute of the second event, wherein second parameter indicates described first
The correlation of connection attribute between event and the second event;Institute is determined according to first parameter and second parameter
State the incidence coefficient between first event and the second event;In the case where the incidence coefficient is greater than predetermined threshold, really
The fixed first event and the second event are similar case.
Further, according to M attribute of Category Attributes and the second event in N number of attribute of the first event
In Category Attributes determine that first parameter includes: to obtain all discrete categories between the first event and the second event
Property corresponding third parameter and the first event and the second event between corresponding 4th ginseng of all Category Attributes
Number, wherein the third parameter is used to indicate the association of all Category Attributes between the first event and the second event
Relationship, the 4th parameter be used for indicate the first event and between two event all Category Attributes weight system
Number;First parameter is determined according to the product of the third parameter and the 4th parameter.
Further, the corresponding third of all Category Attributes between the first event and the second event is obtained
Parameter includes: the institute in the identical situation of Category Attributes value of the Category Attributes value and the second event of the first event
Stating third parameter value is 1;It is not identical in the Category Attributes value of the first event and the Category Attributes value of the second event
In the case where, the third parameter value is 0.
Further, all Category Attributes the corresponding described 4th between the first event and the second event are obtained
Parameter includes: the situation identical with the attribute value of the Category Attributes of the second event of the Category Attributes in the first event
Under, obtain probability of the identical Category Attributes in sample event base;Negative logarithm operation is carried out to the probability, value is made
For the 4th parameter;The attribute value of the Category Attributes of Category Attributes and the second event in the first event not phase
In the case that together or the Category Attributes only occur in the first event, four parameter value is zero.
Further, according to M attribute of connection attribute and the second event in N number of attribute of the first event
In connection attribute determine that second parameter includes: to obtain all continuous categories between the first event and the second event
Property corresponding 5th parameter and the first event and the second event between corresponding 6th ginseng of all connection attributes
Number, wherein the 5th parameter is used to indicate the association of all connection attributes between the first event and the second event
Relationship, the 6th parameter be used for indicate the first event and between two event all connection attributes weight system
Number;Second parameter is determined according to the product of the 5th parameter and the 6th parameter.
Further, all connection attributes the corresponding described 5th between the first event and the second event are obtained
Parameter includes: that the 5th parameter of each connection attribute is determined by following formula:
Wherein, xmax, xmin are respectively the maximum value and minimum of the attribute value of the connection attribute in sample event base
Value, x1 indicate that the attribute value of the connection attribute c in the first event, x2 indicate the connection attribute c's in the second event
Attribute value.
Further, all connection attributes the corresponding described 6th between the first event and the second event are obtained
Parameter includes: that the 6th parameter of each connection attribute is determined by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of the connection attribute c in sample event base, F
(x1) the accumulative density fonction for the attribute value x1 of the connection attribute c in the sample event base.
According to another embodiment of the invention, a kind of event similarity calculation device is provided, comprising: acquiring unit,
For obtaining the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein the M, N number of attribute
Include: Category Attributes and connection attribute, and M is equal with N, and is positive integer;First determination unit, for according to described the
Category Attributes in N number of attribute of one event and the Category Attributes in M attribute of the second event determine the first parameter,
In, first parameter indicates the correlation of the Category Attributes between the first event and the second event;Second determines
Unit, for continuous in M attribute of connection attribute and the second event in N number of attribute according to the first event
Attribute determines the second parameter, wherein second parameter indicates the continuous category between the first event and the second event
The correlation of property;Third determination unit, for according to first parameter and second parameter determine the first event and
Incidence coefficient between the second event;4th determination unit, for the case where the incidence coefficient is greater than predetermined threshold
Under, it determines the first event and the second event is similar case.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method.
Through the invention, the N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively,
In, M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;According to the N of first event
Category Attributes in M attribute of Category Attributes and second event in a attribute determine the first parameter, wherein the first parameter list
Show the correlation of the Category Attributes between first event and second event;According to the connection attribute in N number of attribute of first event
The second parameter is determined with the connection attribute in M attribute of second event, wherein the second parameter indicates first event and the second thing
The correlation of connection attribute between part;The pass between first event and second event is determined according to the first parameter and the second parameter
Contact number;In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case, Jin Erke
To solve during determining the correlation between two events, can not simply, effectively obtain two events in the related technology
Between relative coefficient weight, and lead to the problem for judging that the accuracy of two event correlations is low.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware configuration frame of the mobile terminal of the determination method of event similarity according to an embodiment of the present invention
Figure;
Fig. 2 is the flow chart of the determination method of event similarity according to an embodiment of the present invention;
Fig. 3 is the system module signal of incidence coefficient between the adaptive polo placement event of preferred embodiments according to the present invention
Figure;
Fig. 4 is the structural block diagram of the determining device of event similarity according to an embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
The determination embodiment of the method for event similarity provided by the embodiment of the present application one can be in mobile terminal, computer
It is executed in terminal or similar arithmetic unit.For running on mobile terminals, Fig. 1 is a kind of thing of the embodiment of the present invention
The hardware block diagram of the mobile terminal of the determination method of part similarity.As shown in Figure 1, mobile terminal 10 may include one or
(processor 102 can include but is not limited to Micro-processor MCV or may be programmed patrol multiple (one is only shown in Fig. 1) processors 102
The processing unit of volume device FPGA etc.) and memory 104 for storing data, optionally, above-mentioned mobile terminal can also wrap
Include the transmission device 106 and input-output equipment 108 for communication function.It will appreciated by the skilled person that Fig. 1
Shown in structure be only illustrate, the structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include
The more perhaps less component or with the configuration different from shown in Fig. 1 than shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair
The corresponding computer program of determination method of event similarity in bright embodiment, processor 102 are stored in storage by operation
Computer program in device 104 realizes above-mentioned method thereby executing various function application and data processing.Memory
104 may include high speed random access memory, may also include nonvolatile memory, and such as one or more magnetic storage device dodges
It deposits or other non-volatile solid state memories.In some instances, memory 104 can further comprise relative to processor
102 remotely located memories, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network
Including but not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation
Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to
It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as
RF) module is used to wirelessly be communicated with internet.
A kind of determination of event similarity for running on above-mentioned mobile terminal or the network architecture is provided in the present embodiment
Method, Fig. 2 are the flow charts of the determination of event similarity according to an embodiment of the present invention, as shown in Fig. 2, the event similarity
Determine that method flow includes the following steps:
Step S202 obtains the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein
M, N number of attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer.
Wherein, the attribute of an event may include that Category Attributes are N number of, and continuity attribute M.For example, police field
Event, an event may include: crime time, case personage, case type.Case type belongs to the Category Attributes of event,
The crime time belongs to the continuity attribute of event.
Step S204, according to discrete in M attribute of Category Attributes and second event in N number of attribute of first event
Attribute determines the first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event.
It should be noted that according in M attribute of Category Attributes and second event in N number of attribute of first event
Category Attributes determine that the first parameter may include: to obtain the corresponding third of all Category Attributes between first event and second event
Corresponding 4th parameter of all Category Attributes between parameter and first event and second event, wherein third parameter is used for table
Show the incidence relation of all Category Attributes between first event and second event, the 4th parameter is for indicating first event and with two
The weight coefficient of all Category Attributes between event;The first parameter is determined according to the product of third parameter and the 4th parameter.
Wherein, obtaining the corresponding third parameter of all Category Attributes between first event and second event may include: In
In the identical situation of Category Attributes value of the Category Attributes value and second event of first event, third parameter value is 1;First
In the different situation of Category Attributes value of the Category Attributes value and second event of event, third parameter value is 0.
For example, all there are 3 Category Attributes parameters when two events, wherein there are two Category Attributes are identical, then value
It is 1;Remaining Category Attributes are different, value 0, and then the corresponding third parameter of Category Attributes of event is a row vector,
That is (1,1,0).
It can be with it should be noted that obtaining corresponding 4th parameter of all Category Attributes between first event and second event
Include: the Category Attributes of the Category Attributes and second event in first event the identical situation of attribute value under, obtain it is identical
Probability of the Category Attributes in sample event base;Negative logarithm operation is carried out to probability, value is as the 4th parameter;In the first thing
The attribute value of the Category Attributes of Category Attributes and second event in part is not identical or Category Attributes only go out in first event
In the case where existing, four parameter values are zero.
For example, when all there are 3 Category Attributes parameters in two events, wherein there are two Category Attributes are identical, in sample
Parameter probability valuing in event base is 0.6,0.3;Remaining Category Attributes are different, value 0, and then company's discreteness of two events
Corresponding 4th parameter (weight parameter) is a row vector, i.e., (0.6,0.3,0).
Step S206, according to continuous in M attribute of connection attribute and second event in N number of attribute of first event
Attribute determines the second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event.
It should be noted that according in M attribute of connection attribute and second event in N number of attribute of first event
Connection attribute determines that the second parameter may include: to obtain all connection attributes the corresponding 5th between first event and second event
Corresponding 6th parameter of all connection attributes between parameter and first event and second event, wherein the 5th parameter is used for table
Show the incidence relation of all connection attributes between first event and second event, the 6th parameter is for indicating first event and with two
The weight coefficient of all connection attributes between event;The second parameter is determined according to the product of the 5th parameter and the 6th parameter.
Wherein, it may include: logical for obtaining corresponding 5th parameter of all connection attributes between first event and second event
Cross the 5th parameter that following formula determines each connection attribute:
Wherein, xmax, xmin are respectively the maxima and minima of the attribute value of the connection attribute in sample event base, x1
Indicate that the attribute value of the connection attribute c in first event, x2 indicate the attribute value of the connection attribute c in second event.
It can it should also be noted that, obtaining corresponding 6th parameter of all connection attributes between first event and second event
To include: the 6th parameter for determining each connection attribute by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of connection attribute c in sample event base, F (x1)
For accumulative density fonction of the attribute value x1 in sample event base of connection attribute c.
Step S208 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
It should be noted that the first determining parameter is added with the second parameter can determine first event and second event
Between incidence coefficient.
Step S210 determines first event and second event is similar in the case where incidence coefficient is greater than predetermined threshold
Event.
Wherein, after determining first event and second event for similar case, the above method can also include: calculating thing
The similarity of all events and first event in part library;It obtains in all events and is greater than predetermined threshold with first event similarity
Event;Show the event for being greater than predetermined threshold with first event similarity in all events.And then it can be convenient user and look into
Ask event relevant to first event.
Through the above steps, the N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively,
Wherein, M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;According to first event
N number of attribute in Category Attributes and second event M attribute in Category Attributes determine the first parameter, wherein first ginseng
Number indicates the correlation of the Category Attributes between first event and second event;According to continuous in N number of attribute of first event
Connection attribute in M attribute of attribute and second event determines the second parameter, wherein the second parameter indicates first event and the
The correlation of connection attribute between two events;It is determined between first event and second event according to the first parameter and the second parameter
Incidence coefficient;In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case.Solution
It has determined in the prior art, has generallyd use the parameter that manual operation determines the correlativity of two events, and then determine the phase of two events
Guan Xing wastes a large amount of manpower;Meanwhile because the accuracy that human factor will lead to the correlation of determining event not high is asked
Topic.
Optionally, the executing subject of above-mentioned steps can for server, terminal etc., but not limited to this.
Optionally, the execution sequence of step S204 and step S206 can be interchanged, it can step S204 is first carried out,
Then S206 is executed again.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
Based on the above embodiment, the structuring event that a kind of adaptive weighting is additionally provided in this preferred embodiments is similar
Spend calculation method.
Event data is to record the data of subject behavior, for police field, records everyone daily trip, lives
The data such as place, online.In general, event includes event body, time of origin, event title three necessary attribute elements, with
And other depend on the optional attribute of event title.Such as train event, optional attribute includes departure place, reaches in ground
Point, train number etc., and lodging event, optional attribute include hotel name, departure time etc..The above-mentioned attribute for meaning different event
It can be variant between structure.This preferred embodiment proposes that the incidence coefficient of adaptive weighting calculates to the identical event of attribute structure
Method.It is specifically described as follows:
1, the calculation method (being equivalent to the incidence coefficient between determining first event and second event) of incidence coefficient
Assuming that event has n attribute, wherein Category Attributes and connection attribute quantity are respectively nC、nD.The then pass of two events
Connection number is r, and r=rC+rD。
For Category Attributes, rD=wD·xD.The degree of association vector x of attribute is constructed firstD(being equivalent to third parameter), it is right
In any one Category Attributes of two events, if attribute value is identical, the degree of association of the attribute is 1, if it is different, then association
Degree is 0, and the dimension of vector and the number of field are identical.Secondly, calculating weight vectors wD(being equivalent to the 4th parameter).For two
Any one attribute of event, if attribute value is identical, weight is that (it is general that p is that the attribute value occurs in all samples to-lnp
Rate, if probability is 0.01, weight is-ln0.01=4.6), if different or missing values, weight 0 occur.I.e. for every
A Category Attributes d, shown in weight equation following (1):
Wherein, p is the frequency that the attribute value occurs in attribute d.
For connection attribute, rC=wC·xC.Firstly, the degree of association vector x of building attributeD(being equivalent to the 5th parameter), it is right
In any one connection attribute of two events, calculation of relationship degree formula (2) is as follows:
Wherein, xmax、xminMaxima and minima respectively in sample event base, it is therefore an objective to remove exceptional value influence.
Secondly, calculating weight vectors wC.For any one attribute of two events, calculation formula (3) is as follows:
wC=-ln (F (x2)-F (x1)) (3)
Wherein, F (x) is the accumulative density fonction of attribute c, i.e. weight is equal between x1~x2 (including x1, x2)
Sample adds up the negative logarithm of density.
2, influence of the Attribute Correlation to incidence coefficient
In discrete field, following case is considered:
One: two attribute a of case, b is mutually indepedent, wherein a attribute value be 0,1 (respectively accounting for 50%), b attribute value be 0,
1 (respectively accounting for 50%).And c attribute=2*a+b, value are 0~3 (respectively accounting for 25%).
Think that the sum of information content that c attribute ought to be provided with a, b is identical at this time.According to above-mentioned discrete field incidence coefficient meter
Calculate, when the c attribute of two samples of A and B is identical, incidence coefficient be-ln0.25, with a, b attribute incidence coefficient all the same (-
Ln0.5-ln0.5) identical.The result is consistent with actual conditions.
Case two: as two attribute a, when b is perfectly correlated, wherein a attribute value is 0,1 (respectively accounting for 50%), b attribute=a.
And c attribute=2*a+b, value 0,3 (respectively accounting for 50%).
Think that c attribute ought to be identical as the information content that a or b are provided at this time.It is calculated according to above-mentioned discrete field incidence coefficient,
When the c attribute of two samples of A and B is identical, incidence coefficient be-ln0.5, with a, b attribute incidence coefficient all the same (-
Ln0.5-ln0.5) identical.The result and actual conditions are inconsistent.
To sum up, in Category Attributes field, the calculating of incidence coefficient should fully consider the correlation between attribute, the situation
It is also set up in connection attribute.Therefore, before calculating incidence coefficient, it is necessary to remove the correlation between attribute.
Removing correlation, there are two types of alternative approach: 1, the correlation matrix between calculated field filters related coefficient 0.6
Above attribute (only retaining one), this kind of method is relatively easy, sacrifices part accuracy rate to estimate incidence coefficient.2, by wi
Probability p in=- lnp is adjusted to the conditional probability based on the known 1st to ith attribute, such as in case two, the weight of a=0
For-ln0.5, and b and a is perfectly correlated, p (b=0 | a=0)=1, therefore weight is-ln1=0.This kind of method can accurately calculate
Incidence coefficient, but under number of attributes is more or the minimum situation of attribute value probability of occurrence, conditional probability is unstable or can not calculate.
3, the correlation calculations of attribute
Attribute Correlation calculating is divided into three kinds of situations: the 1) correlation of two continuity attributes;2) two discreteness attributes
Correlation;3) correlation of continuity and discreteness attribute.
The correlation of two continuity attributes can be calculated using pearson related coefficient, if the correlation of two attributes
Greater than 0.6, then only retain one.
Since the value quantity of Category Attributes changes sieve greatly, the value of name be can achieve hundreds of thousands of in such as 1,000,000 samples
Kind, and gender only there are two types of.Therefore, Category Attributes are directly converted to dummy variable, it is unreliable calculates complicated and result.For solution
The certainly problem replaces related coefficient using information gain-ratio, as shown in formula (4).Equally, if two Attribute Correlations are greater than
0.6, then only retain one.
Wherein, the comentropy of E (A), E (B) expression attribute A, B, and E (A | B) indicate the comentropy that the attribute A of B classification is added.
The correlation of continuity and discreteness is calculated, converts Category Attributes for continuity first, then use Category Attributes
Calculation method, calculate related coefficient, method with calculating discreteness it is identical.
As shown in figure 3, between adaptive polo placement event incidence coefficient system module schematic diagram, wherein system function packet
Include three big modules: quality management module, preprocessing module and computing module.It is specific as follows in detail.
Quality management module detects the quality of data for being mainly responsible for, and function includes filling up to missing values
Or delete, exceptional value is modified or is deleted.
Preprocessing module, for completing from the detection for increasing ID, (purpose is removal correlation to the Data Dimensionality Reduction of continuous type attribute
Interference) and feature normalization (unified dimension).
Computing module, the vector for being responsible for completing two event correlation degree calculate, and Category Attributes relative coefficient calculates
And coefficient dimension-reduction treatment, the whole calculation of relationship degree of the weight calculation of each attribute and two events (are equivalent to embodiment
Method in 1).
It should be noted that computing module can be used for calculating following content:
Step 1: degree of association vector x calculates.For any one Category Attributes of two events, if attribute value is identical,
The degree of association of the attribute is 1, if it is different, then the degree of association is 0.For connection attribute, then according to formula in background technique (1) into
Row calculates, and the dimension of vector and the number of field are identical.
Step 2: the correlation calculations between Category Attributes.Category Attributes to related coefficient more than or equal to 0.6 do dimensionality reduction
Processing, multiple dimensions only retain one.
Step 3: the calculating of weight vectors w.Category Attributes are calculated according to formula (2) is described, and connection attribute according to
Formula (3) the method is calculated.
Step 4: degree of association r is calculated.R=rC+rD=wC·xC+wD·xD.I.e. the degree of association is the Category Attributes degree of association and company
The sum of continuous Attribute Association degree.
Embodiment 3
A kind of determining device of event similarity is additionally provided in the present embodiment, and the device is for realizing above-described embodiment
And preferred embodiment, the descriptions that have already been made will not be repeated.As used below, term " module " may be implemented to make a reservation for
The combination of the software and/or hardware of function.It is hard although device described in following embodiment is preferably realized with software
The realization of the combination of part or software and hardware is also that may and be contemplated.
Fig. 4 is the structural block diagram of the determining device of event similarity according to an embodiment of the present invention, as shown in figure 4, the thing
The determining device of part similarity includes: acquiring unit 41, the first determination unit 43, the second determination unit 45, third determination unit
47 and the 4th determination unit 49.
Acquiring unit 41, for obtaining M category for indicating N number of attribute of first event and indicating second event respectively
Property, wherein the M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer.
First determination unit 43, in N number of attribute according to the first event Category Attributes and second thing
Category Attributes in M attribute of part determine the first parameter, wherein first parameter indicates the first event and described the
The correlation of Category Attributes between two events.
Second determination unit 45, for the connection attribute and second thing in N number of attribute according to the first event
Connection attribute in M attribute of part determines the second parameter, wherein second parameter indicates the first event and described the
The correlation of connection attribute between two events
Third determination unit 47, for determining the first event and institute according to first parameter and second parameter
State the incidence coefficient between second event.
4th determination unit 49, for determining first thing in the case where the incidence coefficient is greater than predetermined threshold
Part and the second event are similar case.
By above-mentioned apparatus, acquiring unit 41 obtains the N number of attribute for indicating first event respectively and indicates second event
M attribute, wherein M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;The
One determination unit 43 is according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event
Determine the first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;Second really
Order member 45 is determined according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event
Second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;Third determines single
Member 47 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter;4th determination unit 49
In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
Embodiment 4
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively
Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event
Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event
Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, storage medium is also configured to store the computer program for executing following steps:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively
Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event
Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event
Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read-
Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard
The various media that can store computer program such as disk, magnetic or disk.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device
It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively
Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event
Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event
Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment
Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of determination method of event similarity characterized by comprising
The N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively, wherein the M, N number of attribute
Include: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
According to the Category Attributes in M attribute of Category Attributes and the second event in N number of attribute of the first event
Determine the first parameter, wherein first parameter indicates the Category Attributes between the first event and the second event
Correlation;
According to the connection attribute in M attribute of connection attribute and the second event in N number of attribute of the first event
Determine the second parameter, wherein second parameter indicates the connection attribute between the first event and the second event
Correlation;
The association system between the first event and the second event is determined according to first parameter and second parameter
Number;
In the case where the incidence coefficient is greater than predetermined threshold, determines the first event and the second event is similar thing
Part.
2. the method according to claim 1, wherein according to the discrete category in N number of attribute of the first event
Category Attributes in M attribute of property and the second event determine that first parameter includes:
Obtain the corresponding third parameter of all Category Attributes between the first event and the second event and described first
Corresponding 4th parameter of all Category Attributes between event and the second event, wherein the third parameter is for indicating institute
State the incidence relation of all Category Attributes between first event and the second event, the 4th parameter is for indicating described the
One event and between two event all Category Attributes weight coefficient;
First parameter is determined according to the product of the third parameter and the 4th parameter.
3. according to the method described in claim 2, it is characterized in that, obtaining institute between the first event and the second event
There is the corresponding third parameter of Category Attributes to include:
In the identical situation of Category Attributes value of the Category Attributes value and the second event of the first event, the third
Parameter value is 1;
In the different situation of Category Attributes value of the Category Attributes value and the second event of the first event, described
Three parameter values are 0.
4. according to the method described in claim 2, it is characterized in that, obtaining institute between the first event and the second event
There is corresponding 4th parameter of Category Attributes to include:
In the identical situation of attribute value of the Category Attributes of Category Attributes and the second event in the first event, obtain
Take probability of the identical Category Attributes in sample event base;
Negative logarithm operation is carried out to the probability, value is as the 4th parameter;
The attribute value of the Category Attributes of Category Attributes and the second event in the first event is not identical or described
In the case that Category Attributes only occur in the first event, four parameter value is zero.
5. the method according to claim 1, wherein according to the continuous category in N number of attribute of the first event
Connection attribute in M attribute of property and the second event determines that second parameter includes:
Obtain corresponding 5th parameter of all connection attributes between the first event and the second event and described first
Corresponding 6th parameter of all connection attributes between event and the second event, wherein the 5th parameter is for indicating institute
State the incidence relation of all connection attributes between first event and the second event, the 6th parameter is for indicating described the
One event and between two event all connection attributes weight coefficient;
Second parameter is determined according to the product of the 5th parameter and the 6th parameter.
6. according to the method described in claim 5, it is characterized in that, obtaining institute between the first event and the second event
There is corresponding 5th parameter of connection attribute to include:
The 5th parameter of each connection attribute is determined by following formula:
Wherein, xmax、xminThe maxima and minima of the attribute value of the connection attribute respectively in sample event base, x1 table
Show that the attribute value of the connection attribute c in the first event, x2 indicate the attribute value of the connection attribute c in the second event.
7. according to the method described in claim 5, it is characterized in that, obtaining institute between the first event and the second event
There is corresponding 6th parameter of connection attribute to include:
The 6th parameter of each connection attribute is determined by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of the connection attribute c in sample event base, F (x1)
For accumulative density fonction of the attribute value x1 in the sample event base of the connection attribute c.
8. a kind of event similarity calculation device characterized by comprising
Acquiring unit, for obtaining the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein
The M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;
First determination unit, M for Category Attributes and the second event in N number of attribute according to the first event
Category Attributes in attribute determine the first parameter, wherein first parameter indicates the first event and the second event
Between Category Attributes correlation;
Second determination unit, M for connection attribute and the second event in N number of attribute according to the first event
Connection attribute in attribute determines the second parameter, wherein second parameter indicates the first event and the second event
Between connection attribute correlation;
Third determination unit, for determining the first event and described second according to first parameter and second parameter
Incidence coefficient between event;
4th determination unit, for determining the first event and institute in the case where the incidence coefficient is greater than predetermined threshold
Stating second event is similar case.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in any one of claim 1 to 7 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 7
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745487.9A CN110443320A (en) | 2019-08-13 | 2019-08-13 | The determination method and device of event similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910745487.9A CN110443320A (en) | 2019-08-13 | 2019-08-13 | The determination method and device of event similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110443320A true CN110443320A (en) | 2019-11-12 |
Family
ID=68435108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910745487.9A Pending CN110443320A (en) | 2019-08-13 | 2019-08-13 | The determination method and device of event similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110443320A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125476A (en) * | 2019-12-23 | 2020-05-08 | 北京每日优鲜电子商务有限公司 | Event data processing method and device |
CN112749239A (en) * | 2021-01-20 | 2021-05-04 | 青岛海信网络科技股份有限公司 | Event map construction method and device and computing equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205108A1 (en) * | 2009-02-11 | 2010-08-12 | Mun Johnathan C | Credit and market risk evaluation method |
CN103198493A (en) * | 2013-04-09 | 2013-07-10 | 天津大学 | Target tracking method based on multi-feature self-adaption fusion and on-line study |
CN106599227A (en) * | 2016-12-19 | 2017-04-26 | 北京天广汇通科技有限公司 | Method and apparatus for obtaining similarity between objects based on attribute values |
CN107169628A (en) * | 2017-04-14 | 2017-09-15 | 华中科技大学 | A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction |
CN107464132A (en) * | 2017-07-04 | 2017-12-12 | 北京三快在线科技有限公司 | A kind of similar users method for digging and device, electronic equipment |
CN108304853A (en) * | 2017-10-10 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing |
KR101983704B1 (en) * | 2017-12-27 | 2019-05-29 | 현대카드 주식회사 | Method for recommending information on websites using personalization algorithm and server using the same |
CN110096708A (en) * | 2019-04-30 | 2019-08-06 | 科大讯飞股份有限公司 | A kind of determining method and device of calibration collection |
-
2019
- 2019-08-13 CN CN201910745487.9A patent/CN110443320A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205108A1 (en) * | 2009-02-11 | 2010-08-12 | Mun Johnathan C | Credit and market risk evaluation method |
CN103198493A (en) * | 2013-04-09 | 2013-07-10 | 天津大学 | Target tracking method based on multi-feature self-adaption fusion and on-line study |
CN106599227A (en) * | 2016-12-19 | 2017-04-26 | 北京天广汇通科技有限公司 | Method and apparatus for obtaining similarity between objects based on attribute values |
CN107169628A (en) * | 2017-04-14 | 2017-09-15 | 华中科技大学 | A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction |
CN107464132A (en) * | 2017-07-04 | 2017-12-12 | 北京三快在线科技有限公司 | A kind of similar users method for digging and device, electronic equipment |
CN108304853A (en) * | 2017-10-10 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing |
KR101983704B1 (en) * | 2017-12-27 | 2019-05-29 | 현대카드 주식회사 | Method for recommending information on websites using personalization algorithm and server using the same |
CN110096708A (en) * | 2019-04-30 | 2019-08-06 | 科大讯飞股份有限公司 | A kind of determining method and device of calibration collection |
Non-Patent Citations (3)
Title |
---|
ASHRAF S ET AL: "New Quadrature-Based Approximations for the Characteristic Function and the Distribution Function of Sums of Lognormal Random Variables", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 * |
主编柳杨著: "《数字图像物体识别理论详解与实战》", 31 January 2018 * |
谷淑娟等: "一种改进的 CBR 案例检索相似性度量模型", 《中国管理信息化》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125476A (en) * | 2019-12-23 | 2020-05-08 | 北京每日优鲜电子商务有限公司 | Event data processing method and device |
CN112749239A (en) * | 2021-01-20 | 2021-05-04 | 青岛海信网络科技股份有限公司 | Event map construction method and device and computing equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112417439B (en) | Account detection method, device, server and storage medium | |
CN108615119B (en) | Abnormal user identification method and equipment | |
US20170286845A1 (en) | Automatic extraction of user mobility behaviors and interaction preferences using spatio-temporal data | |
CN106716382A (en) | Methods and systems for aggregated multi-application behavioral analysis of mobile device behaviors | |
CN112700252B (en) | Information security detection method and device, electronic equipment and storage medium | |
CN110362689A (en) | A kind of methods of risk assessment, device, storage medium and server | |
CN110443320A (en) | The determination method and device of event similarity | |
CN115660262B (en) | Engineering intelligent quality inspection method, system and medium based on database application | |
CN110457175A (en) | Business data processing method, device, electronic equipment and medium | |
CN112839014A (en) | Method, system, device and medium for establishing model for identifying abnormal visitor | |
CN110196920A (en) | The treating method and apparatus and storage medium and electronic device of text data | |
CN109740621B (en) | Video classification method, device and equipment | |
CN111581508B (en) | Service monitoring method, device, equipment and storage medium | |
CN111741004B (en) | Network security situation awareness method and related device | |
CN110246026B (en) | Data transfer output combination setting method and device and terminal equipment | |
CN114039765A (en) | Safety management and control method and device for power distribution Internet of things and electronic equipment | |
CN108449306A (en) | One kind degree of peeling off detection method | |
CN111738834B (en) | Data processing method and device | |
CN110765303A (en) | Method and system for updating database | |
CN110427558A (en) | The method for pushing and device of Energy Resources Service's director's part | |
CN112446777A (en) | Credit evaluation method, device, equipment and storage medium | |
CN113704566B (en) | Identification number body identification method, storage medium and electronic equipment | |
CN111797069B (en) | Hadoop platform-based mass storage and pickup record processing method and device | |
CN116304901B (en) | Webpage server fingerprint identification method, device, equipment and storage medium | |
CN116861101B (en) | Data processing method and device for social matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191112 |
|
RJ01 | Rejection of invention patent application after publication |