CN110443320A - The determination method and device of event similarity - Google Patents

The determination method and device of event similarity Download PDF

Info

Publication number
CN110443320A
CN110443320A CN201910745487.9A CN201910745487A CN110443320A CN 110443320 A CN110443320 A CN 110443320A CN 201910745487 A CN201910745487 A CN 201910745487A CN 110443320 A CN110443320 A CN 110443320A
Authority
CN
China
Prior art keywords
event
attribute
parameter
category attributes
connection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910745487.9A
Other languages
Chinese (zh)
Inventor
喻守益
蔡文滨
崔峭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910745487.9A priority Critical patent/CN110443320A/en
Publication of CN110443320A publication Critical patent/CN110443320A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The present invention provides a kind of determination method and devices of event similarity.This method comprises: obtaining the first parameter according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event, wherein the correlation of the first parameter expression Category Attributes;The second parameter is obtained according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;The incidence coefficient between first event and second event is determined according to the first parameter and the second parameter;In the case where incidence coefficient is greater than predetermined threshold, it determines first event and second event is similar case, it solves in the related technology, during being determined to the correlation between two events, the weight of the relative coefficient between two events can not simply, be effectively obtained, and leads to the problem for judging that the accuracy of two event correlations is low.

Description

The determination method and device of event similarity
Technical field
The present invention relates to event handling fields, in particular to a kind of determination method and device of event similarity.
Background technique
In police field, there is countless behavior events daily, how to go the relevance between depth analysis event, Help to be promoted public security officer to the identification effect of string and case, it helps make to current event with reference to historical context event Accurate judgement.The incidence coefficient between two samples is calculated, can determine the correlation of two events.Traditional method is to calculate The distance between two samples, distance is closer, then incidence coefficient is higher.Calculate distance method have Euclidean distance, COS distance, Jie Kade similarity factor, mahalanobis distance etc..
But the calculation of these distances is all built upon on the basis of all properties (or variable) equal weight, that is, is thought The importance of all properties is identical.And in practical business, the importance of each attribute is different, different even the same attribute Value (or variate-value), importance is also different.Therefore before calculating distance, it is necessary to carry out weight tune to each attribute value It is whole.In traditional business, often artificially the significance level of attribute is judged, and give weight, but this method subjectivity is strong, different experts Judgement it is different, and when number of attributes is more, need to consume the plenty of time.
In the prior art, regarding to the issue above without effective technical solution.
Summary of the invention
The embodiment of the invention provides a kind of determination method and devices of event similarity, at least to solve the relevant technologies In, during determining to the correlation between two events, it can not simply, effectively obtain the correlation system between two events Several weights, and lead to the problem for judging that the accuracy of two event correlations is low.
According to one embodiment of present invention, a kind of determination method of event similarity is provided, comprising: obtain table respectively Show N number of attribute of first event and indicate M attribute of second event, wherein the M, N number of attribute include: discrete category Property and connection attribute, and M is equal with N, and is positive integer;According in N number of attribute of the first event Category Attributes and Category Attributes in M attribute of the second event determine the first parameter, wherein first parameter indicates first thing The correlation of Category Attributes between part and the second event;According to the connection attribute in N number of attribute of the first event The second parameter is determined with the connection attribute in M attribute of the second event, wherein second parameter indicates described first The correlation of connection attribute between event and the second event;Institute is determined according to first parameter and second parameter State the incidence coefficient between first event and the second event;In the case where the incidence coefficient is greater than predetermined threshold, really The fixed first event and the second event are similar case.
Further, according to M attribute of Category Attributes and the second event in N number of attribute of the first event In Category Attributes determine that first parameter includes: to obtain all discrete categories between the first event and the second event Property corresponding third parameter and the first event and the second event between corresponding 4th ginseng of all Category Attributes Number, wherein the third parameter is used to indicate the association of all Category Attributes between the first event and the second event Relationship, the 4th parameter be used for indicate the first event and between two event all Category Attributes weight system Number;First parameter is determined according to the product of the third parameter and the 4th parameter.
Further, the corresponding third of all Category Attributes between the first event and the second event is obtained Parameter includes: the institute in the identical situation of Category Attributes value of the Category Attributes value and the second event of the first event Stating third parameter value is 1;It is not identical in the Category Attributes value of the first event and the Category Attributes value of the second event In the case where, the third parameter value is 0.
Further, all Category Attributes the corresponding described 4th between the first event and the second event are obtained Parameter includes: the situation identical with the attribute value of the Category Attributes of the second event of the Category Attributes in the first event Under, obtain probability of the identical Category Attributes in sample event base;Negative logarithm operation is carried out to the probability, value is made For the 4th parameter;The attribute value of the Category Attributes of Category Attributes and the second event in the first event not phase In the case that together or the Category Attributes only occur in the first event, four parameter value is zero.
Further, according to M attribute of connection attribute and the second event in N number of attribute of the first event In connection attribute determine that second parameter includes: to obtain all continuous categories between the first event and the second event Property corresponding 5th parameter and the first event and the second event between corresponding 6th ginseng of all connection attributes Number, wherein the 5th parameter is used to indicate the association of all connection attributes between the first event and the second event Relationship, the 6th parameter be used for indicate the first event and between two event all connection attributes weight system Number;Second parameter is determined according to the product of the 5th parameter and the 6th parameter.
Further, all connection attributes the corresponding described 5th between the first event and the second event are obtained Parameter includes: that the 5th parameter of each connection attribute is determined by following formula:
Wherein, xmax, xmin are respectively the maximum value and minimum of the attribute value of the connection attribute in sample event base Value, x1 indicate that the attribute value of the connection attribute c in the first event, x2 indicate the connection attribute c's in the second event Attribute value.
Further, all connection attributes the corresponding described 6th between the first event and the second event are obtained Parameter includes: that the 6th parameter of each connection attribute is determined by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of the connection attribute c in sample event base, F (x1) the accumulative density fonction for the attribute value x1 of the connection attribute c in the sample event base.
According to another embodiment of the invention, a kind of event similarity calculation device is provided, comprising: acquiring unit, For obtaining the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein the M, N number of attribute Include: Category Attributes and connection attribute, and M is equal with N, and is positive integer;First determination unit, for according to described the Category Attributes in N number of attribute of one event and the Category Attributes in M attribute of the second event determine the first parameter, In, first parameter indicates the correlation of the Category Attributes between the first event and the second event;Second determines Unit, for continuous in M attribute of connection attribute and the second event in N number of attribute according to the first event Attribute determines the second parameter, wherein second parameter indicates the continuous category between the first event and the second event The correlation of property;Third determination unit, for according to first parameter and second parameter determine the first event and Incidence coefficient between the second event;4th determination unit, for the case where the incidence coefficient is greater than predetermined threshold Under, it determines the first event and the second event is similar case.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
Through the invention, the N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively, In, M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;According to the N of first event Category Attributes in M attribute of Category Attributes and second event in a attribute determine the first parameter, wherein the first parameter list Show the correlation of the Category Attributes between first event and second event;According to the connection attribute in N number of attribute of first event The second parameter is determined with the connection attribute in M attribute of second event, wherein the second parameter indicates first event and the second thing The correlation of connection attribute between part;The pass between first event and second event is determined according to the first parameter and the second parameter Contact number;In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case, Jin Erke To solve during determining the correlation between two events, can not simply, effectively obtain two events in the related technology Between relative coefficient weight, and lead to the problem for judging that the accuracy of two event correlations is low.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of hardware configuration frame of the mobile terminal of the determination method of event similarity according to an embodiment of the present invention Figure;
Fig. 2 is the flow chart of the determination method of event similarity according to an embodiment of the present invention;
Fig. 3 is the system module signal of incidence coefficient between the adaptive polo placement event of preferred embodiments according to the present invention Figure;
Fig. 4 is the structural block diagram of the determining device of event similarity according to an embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
The determination embodiment of the method for event similarity provided by the embodiment of the present application one can be in mobile terminal, computer It is executed in terminal or similar arithmetic unit.For running on mobile terminals, Fig. 1 is a kind of thing of the embodiment of the present invention The hardware block diagram of the mobile terminal of the determination method of part similarity.As shown in Figure 1, mobile terminal 10 may include one or (processor 102 can include but is not limited to Micro-processor MCV or may be programmed patrol multiple (one is only shown in Fig. 1) processors 102 The processing unit of volume device FPGA etc.) and memory 104 for storing data, optionally, above-mentioned mobile terminal can also wrap Include the transmission device 106 and input-output equipment 108 for communication function.It will appreciated by the skilled person that Fig. 1 Shown in structure be only illustrate, the structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include The more perhaps less component or with the configuration different from shown in Fig. 1 than shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of determination method of event similarity in bright embodiment, processor 102 are stored in storage by operation Computer program in device 104 realizes above-mentioned method thereby executing various function application and data processing.Memory 104 may include high speed random access memory, may also include nonvolatile memory, and such as one or more magnetic storage device dodges It deposits or other non-volatile solid state memories.In some instances, memory 104 can further comprise relative to processor 102 remotely located memories, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network Including but not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as RF) module is used to wirelessly be communicated with internet.
A kind of determination of event similarity for running on above-mentioned mobile terminal or the network architecture is provided in the present embodiment Method, Fig. 2 are the flow charts of the determination of event similarity according to an embodiment of the present invention, as shown in Fig. 2, the event similarity Determine that method flow includes the following steps:
Step S202 obtains the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein M, N number of attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer.
Wherein, the attribute of an event may include that Category Attributes are N number of, and continuity attribute M.For example, police field Event, an event may include: crime time, case personage, case type.Case type belongs to the Category Attributes of event, The crime time belongs to the continuity attribute of event.
Step S204, according to discrete in M attribute of Category Attributes and second event in N number of attribute of first event Attribute determines the first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event.
It should be noted that according in M attribute of Category Attributes and second event in N number of attribute of first event Category Attributes determine that the first parameter may include: to obtain the corresponding third of all Category Attributes between first event and second event Corresponding 4th parameter of all Category Attributes between parameter and first event and second event, wherein third parameter is used for table Show the incidence relation of all Category Attributes between first event and second event, the 4th parameter is for indicating first event and with two The weight coefficient of all Category Attributes between event;The first parameter is determined according to the product of third parameter and the 4th parameter.
Wherein, obtaining the corresponding third parameter of all Category Attributes between first event and second event may include: In In the identical situation of Category Attributes value of the Category Attributes value and second event of first event, third parameter value is 1;First In the different situation of Category Attributes value of the Category Attributes value and second event of event, third parameter value is 0.
For example, all there are 3 Category Attributes parameters when two events, wherein there are two Category Attributes are identical, then value It is 1;Remaining Category Attributes are different, value 0, and then the corresponding third parameter of Category Attributes of event is a row vector, That is (1,1,0).
It can be with it should be noted that obtaining corresponding 4th parameter of all Category Attributes between first event and second event Include: the Category Attributes of the Category Attributes and second event in first event the identical situation of attribute value under, obtain it is identical Probability of the Category Attributes in sample event base;Negative logarithm operation is carried out to probability, value is as the 4th parameter;In the first thing The attribute value of the Category Attributes of Category Attributes and second event in part is not identical or Category Attributes only go out in first event In the case where existing, four parameter values are zero.
For example, when all there are 3 Category Attributes parameters in two events, wherein there are two Category Attributes are identical, in sample Parameter probability valuing in event base is 0.6,0.3;Remaining Category Attributes are different, value 0, and then company's discreteness of two events Corresponding 4th parameter (weight parameter) is a row vector, i.e., (0.6,0.3,0).
Step S206, according to continuous in M attribute of connection attribute and second event in N number of attribute of first event Attribute determines the second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event.
It should be noted that according in M attribute of connection attribute and second event in N number of attribute of first event Connection attribute determines that the second parameter may include: to obtain all connection attributes the corresponding 5th between first event and second event Corresponding 6th parameter of all connection attributes between parameter and first event and second event, wherein the 5th parameter is used for table Show the incidence relation of all connection attributes between first event and second event, the 6th parameter is for indicating first event and with two The weight coefficient of all connection attributes between event;The second parameter is determined according to the product of the 5th parameter and the 6th parameter.
Wherein, it may include: logical for obtaining corresponding 5th parameter of all connection attributes between first event and second event Cross the 5th parameter that following formula determines each connection attribute:
Wherein, xmax, xmin are respectively the maxima and minima of the attribute value of the connection attribute in sample event base, x1 Indicate that the attribute value of the connection attribute c in first event, x2 indicate the attribute value of the connection attribute c in second event.
It can it should also be noted that, obtaining corresponding 6th parameter of all connection attributes between first event and second event To include: the 6th parameter for determining each connection attribute by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of connection attribute c in sample event base, F (x1) For accumulative density fonction of the attribute value x1 in sample event base of connection attribute c.
Step S208 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
It should be noted that the first determining parameter is added with the second parameter can determine first event and second event Between incidence coefficient.
Step S210 determines first event and second event is similar in the case where incidence coefficient is greater than predetermined threshold Event.
Wherein, after determining first event and second event for similar case, the above method can also include: calculating thing The similarity of all events and first event in part library;It obtains in all events and is greater than predetermined threshold with first event similarity Event;Show the event for being greater than predetermined threshold with first event similarity in all events.And then it can be convenient user and look into Ask event relevant to first event.
Through the above steps, the N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively, Wherein, M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;According to first event N number of attribute in Category Attributes and second event M attribute in Category Attributes determine the first parameter, wherein first ginseng Number indicates the correlation of the Category Attributes between first event and second event;According to continuous in N number of attribute of first event Connection attribute in M attribute of attribute and second event determines the second parameter, wherein the second parameter indicates first event and the The correlation of connection attribute between two events;It is determined between first event and second event according to the first parameter and the second parameter Incidence coefficient;In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case.Solution It has determined in the prior art, has generallyd use the parameter that manual operation determines the correlativity of two events, and then determine the phase of two events Guan Xing wastes a large amount of manpower;Meanwhile because the accuracy that human factor will lead to the correlation of determining event not high is asked Topic.
Optionally, the executing subject of above-mentioned steps can for server, terminal etc., but not limited to this.
Optionally, the execution sequence of step S204 and step S206 can be interchanged, it can step S204 is first carried out, Then S206 is executed again.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
Based on the above embodiment, the structuring event that a kind of adaptive weighting is additionally provided in this preferred embodiments is similar Spend calculation method.
Event data is to record the data of subject behavior, for police field, records everyone daily trip, lives The data such as place, online.In general, event includes event body, time of origin, event title three necessary attribute elements, with And other depend on the optional attribute of event title.Such as train event, optional attribute includes departure place, reaches in ground Point, train number etc., and lodging event, optional attribute include hotel name, departure time etc..The above-mentioned attribute for meaning different event It can be variant between structure.This preferred embodiment proposes that the incidence coefficient of adaptive weighting calculates to the identical event of attribute structure Method.It is specifically described as follows:
1, the calculation method (being equivalent to the incidence coefficient between determining first event and second event) of incidence coefficient
Assuming that event has n attribute, wherein Category Attributes and connection attribute quantity are respectively nC、nD.The then pass of two events Connection number is r, and r=rC+rD
For Category Attributes, rD=wD·xD.The degree of association vector x of attribute is constructed firstD(being equivalent to third parameter), it is right In any one Category Attributes of two events, if attribute value is identical, the degree of association of the attribute is 1, if it is different, then association Degree is 0, and the dimension of vector and the number of field are identical.Secondly, calculating weight vectors wD(being equivalent to the 4th parameter).For two Any one attribute of event, if attribute value is identical, weight is that (it is general that p is that the attribute value occurs in all samples to-lnp Rate, if probability is 0.01, weight is-ln0.01=4.6), if different or missing values, weight 0 occur.I.e. for every A Category Attributes d, shown in weight equation following (1):
Wherein, p is the frequency that the attribute value occurs in attribute d.
For connection attribute, rC=wC·xC.Firstly, the degree of association vector x of building attributeD(being equivalent to the 5th parameter), it is right In any one connection attribute of two events, calculation of relationship degree formula (2) is as follows:
Wherein, xmax、xminMaxima and minima respectively in sample event base, it is therefore an objective to remove exceptional value influence.
Secondly, calculating weight vectors wC.For any one attribute of two events, calculation formula (3) is as follows:
wC=-ln (F (x2)-F (x1)) (3)
Wherein, F (x) is the accumulative density fonction of attribute c, i.e. weight is equal between x1~x2 (including x1, x2) Sample adds up the negative logarithm of density.
2, influence of the Attribute Correlation to incidence coefficient
In discrete field, following case is considered:
One: two attribute a of case, b is mutually indepedent, wherein a attribute value be 0,1 (respectively accounting for 50%), b attribute value be 0, 1 (respectively accounting for 50%).And c attribute=2*a+b, value are 0~3 (respectively accounting for 25%).
Think that the sum of information content that c attribute ought to be provided with a, b is identical at this time.According to above-mentioned discrete field incidence coefficient meter Calculate, when the c attribute of two samples of A and B is identical, incidence coefficient be-ln0.25, with a, b attribute incidence coefficient all the same (- Ln0.5-ln0.5) identical.The result is consistent with actual conditions.
Case two: as two attribute a, when b is perfectly correlated, wherein a attribute value is 0,1 (respectively accounting for 50%), b attribute=a. And c attribute=2*a+b, value 0,3 (respectively accounting for 50%).
Think that c attribute ought to be identical as the information content that a or b are provided at this time.It is calculated according to above-mentioned discrete field incidence coefficient, When the c attribute of two samples of A and B is identical, incidence coefficient be-ln0.5, with a, b attribute incidence coefficient all the same (- Ln0.5-ln0.5) identical.The result and actual conditions are inconsistent.
To sum up, in Category Attributes field, the calculating of incidence coefficient should fully consider the correlation between attribute, the situation It is also set up in connection attribute.Therefore, before calculating incidence coefficient, it is necessary to remove the correlation between attribute.
Removing correlation, there are two types of alternative approach: 1, the correlation matrix between calculated field filters related coefficient 0.6 Above attribute (only retaining one), this kind of method is relatively easy, sacrifices part accuracy rate to estimate incidence coefficient.2, by wi Probability p in=- lnp is adjusted to the conditional probability based on the known 1st to ith attribute, such as in case two, the weight of a=0 For-ln0.5, and b and a is perfectly correlated, p (b=0 | a=0)=1, therefore weight is-ln1=0.This kind of method can accurately calculate Incidence coefficient, but under number of attributes is more or the minimum situation of attribute value probability of occurrence, conditional probability is unstable or can not calculate.
3, the correlation calculations of attribute
Attribute Correlation calculating is divided into three kinds of situations: the 1) correlation of two continuity attributes;2) two discreteness attributes Correlation;3) correlation of continuity and discreteness attribute.
The correlation of two continuity attributes can be calculated using pearson related coefficient, if the correlation of two attributes Greater than 0.6, then only retain one.
Since the value quantity of Category Attributes changes sieve greatly, the value of name be can achieve hundreds of thousands of in such as 1,000,000 samples Kind, and gender only there are two types of.Therefore, Category Attributes are directly converted to dummy variable, it is unreliable calculates complicated and result.For solution The certainly problem replaces related coefficient using information gain-ratio, as shown in formula (4).Equally, if two Attribute Correlations are greater than 0.6, then only retain one.
Wherein, the comentropy of E (A), E (B) expression attribute A, B, and E (A | B) indicate the comentropy that the attribute A of B classification is added.
The correlation of continuity and discreteness is calculated, converts Category Attributes for continuity first, then use Category Attributes Calculation method, calculate related coefficient, method with calculating discreteness it is identical.
As shown in figure 3, between adaptive polo placement event incidence coefficient system module schematic diagram, wherein system function packet Include three big modules: quality management module, preprocessing module and computing module.It is specific as follows in detail.
Quality management module detects the quality of data for being mainly responsible for, and function includes filling up to missing values Or delete, exceptional value is modified or is deleted.
Preprocessing module, for completing from the detection for increasing ID, (purpose is removal correlation to the Data Dimensionality Reduction of continuous type attribute Interference) and feature normalization (unified dimension).
Computing module, the vector for being responsible for completing two event correlation degree calculate, and Category Attributes relative coefficient calculates And coefficient dimension-reduction treatment, the whole calculation of relationship degree of the weight calculation of each attribute and two events (are equivalent to embodiment Method in 1).
It should be noted that computing module can be used for calculating following content:
Step 1: degree of association vector x calculates.For any one Category Attributes of two events, if attribute value is identical, The degree of association of the attribute is 1, if it is different, then the degree of association is 0.For connection attribute, then according to formula in background technique (1) into Row calculates, and the dimension of vector and the number of field are identical.
Step 2: the correlation calculations between Category Attributes.Category Attributes to related coefficient more than or equal to 0.6 do dimensionality reduction Processing, multiple dimensions only retain one.
Step 3: the calculating of weight vectors w.Category Attributes are calculated according to formula (2) is described, and connection attribute according to Formula (3) the method is calculated.
Step 4: degree of association r is calculated.R=rC+rD=wC·xC+wD·xD.I.e. the degree of association is the Category Attributes degree of association and company The sum of continuous Attribute Association degree.
Embodiment 3
A kind of determining device of event similarity is additionally provided in the present embodiment, and the device is for realizing above-described embodiment And preferred embodiment, the descriptions that have already been made will not be repeated.As used below, term " module " may be implemented to make a reservation for The combination of the software and/or hardware of function.It is hard although device described in following embodiment is preferably realized with software The realization of the combination of part or software and hardware is also that may and be contemplated.
Fig. 4 is the structural block diagram of the determining device of event similarity according to an embodiment of the present invention, as shown in figure 4, the thing The determining device of part similarity includes: acquiring unit 41, the first determination unit 43, the second determination unit 45, third determination unit 47 and the 4th determination unit 49.
Acquiring unit 41, for obtaining M category for indicating N number of attribute of first event and indicating second event respectively Property, wherein the M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer.
First determination unit 43, in N number of attribute according to the first event Category Attributes and second thing Category Attributes in M attribute of part determine the first parameter, wherein first parameter indicates the first event and described the The correlation of Category Attributes between two events.
Second determination unit 45, for the connection attribute and second thing in N number of attribute according to the first event Connection attribute in M attribute of part determines the second parameter, wherein second parameter indicates the first event and described the The correlation of connection attribute between two events
Third determination unit 47, for determining the first event and institute according to first parameter and second parameter State the incidence coefficient between second event.
4th determination unit 49, for determining first thing in the case where the incidence coefficient is greater than predetermined threshold Part and the second event are similar case.
By above-mentioned apparatus, acquiring unit 41 obtains the N number of attribute for indicating first event respectively and indicates second event M attribute, wherein M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;The One determination unit 43 is according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event Determine the first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;Second really Order member 45 is determined according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event Second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;Third determines single Member 47 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter;4th determination unit 49 In the case where incidence coefficient is greater than predetermined threshold, determines first event and second event is similar case.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Embodiment 4
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, storage medium is also configured to store the computer program for executing following steps:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains the N number of attribute for indicating first event and M attribute for indicating second event, wherein M, N number of respectively Attribute includes: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
S2, it is true according to the Category Attributes in M attribute of Category Attributes and second event in N number of attribute of first event Fixed first parameter, wherein the first parameter indicates the correlation of the Category Attributes between first event and second event;
S3, it is true according to the connection attribute in M attribute of connection attribute and second event in N number of attribute of first event Fixed second parameter, wherein the second parameter indicates the correlation of the connection attribute between first event and second event;
S4 determines the incidence coefficient between first event and second event according to the first parameter and the second parameter.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of determination method of event similarity characterized by comprising
The N number of attribute for indicating first event and M attribute for indicating second event are obtained respectively, wherein the M, N number of attribute Include: Category Attributes and connection attribute, and M is equal with N, and is positive integer;
According to the Category Attributes in M attribute of Category Attributes and the second event in N number of attribute of the first event Determine the first parameter, wherein first parameter indicates the Category Attributes between the first event and the second event Correlation;
According to the connection attribute in M attribute of connection attribute and the second event in N number of attribute of the first event Determine the second parameter, wherein second parameter indicates the connection attribute between the first event and the second event Correlation;
The association system between the first event and the second event is determined according to first parameter and second parameter Number;
In the case where the incidence coefficient is greater than predetermined threshold, determines the first event and the second event is similar thing Part.
2. the method according to claim 1, wherein according to the discrete category in N number of attribute of the first event Category Attributes in M attribute of property and the second event determine that first parameter includes:
Obtain the corresponding third parameter of all Category Attributes between the first event and the second event and described first Corresponding 4th parameter of all Category Attributes between event and the second event, wherein the third parameter is for indicating institute State the incidence relation of all Category Attributes between first event and the second event, the 4th parameter is for indicating described the One event and between two event all Category Attributes weight coefficient;
First parameter is determined according to the product of the third parameter and the 4th parameter.
3. according to the method described in claim 2, it is characterized in that, obtaining institute between the first event and the second event There is the corresponding third parameter of Category Attributes to include:
In the identical situation of Category Attributes value of the Category Attributes value and the second event of the first event, the third Parameter value is 1;
In the different situation of Category Attributes value of the Category Attributes value and the second event of the first event, described Three parameter values are 0.
4. according to the method described in claim 2, it is characterized in that, obtaining institute between the first event and the second event There is corresponding 4th parameter of Category Attributes to include:
In the identical situation of attribute value of the Category Attributes of Category Attributes and the second event in the first event, obtain Take probability of the identical Category Attributes in sample event base;
Negative logarithm operation is carried out to the probability, value is as the 4th parameter;
The attribute value of the Category Attributes of Category Attributes and the second event in the first event is not identical or described In the case that Category Attributes only occur in the first event, four parameter value is zero.
5. the method according to claim 1, wherein according to the continuous category in N number of attribute of the first event Connection attribute in M attribute of property and the second event determines that second parameter includes:
Obtain corresponding 5th parameter of all connection attributes between the first event and the second event and described first Corresponding 6th parameter of all connection attributes between event and the second event, wherein the 5th parameter is for indicating institute State the incidence relation of all connection attributes between first event and the second event, the 6th parameter is for indicating described the One event and between two event all connection attributes weight coefficient;
Second parameter is determined according to the product of the 5th parameter and the 6th parameter.
6. according to the method described in claim 5, it is characterized in that, obtaining institute between the first event and the second event There is corresponding 5th parameter of connection attribute to include:
The 5th parameter of each connection attribute is determined by following formula:
Wherein, xmax、xminThe maxima and minima of the attribute value of the connection attribute respectively in sample event base, x1 table Show that the attribute value of the connection attribute c in the first event, x2 indicate the attribute value of the connection attribute c in the second event.
7. according to the method described in claim 5, it is characterized in that, obtaining institute between the first event and the second event There is corresponding 6th parameter of connection attribute to include:
The 6th parameter of each connection attribute is determined by following formula:
wC=-ln (F (x2)-F (x1))
Wherein, F (x2) is accumulative density fonction of the attribute value x2 of the connection attribute c in sample event base, F (x1) For accumulative density fonction of the attribute value x1 in the sample event base of the connection attribute c.
8. a kind of event similarity calculation device characterized by comprising
Acquiring unit, for obtaining the N number of attribute for indicating first event and M attribute for indicating second event respectively, wherein The M, N number of attribute include: Category Attributes and connection attribute, and M is equal with N, and are positive integer;
First determination unit, M for Category Attributes and the second event in N number of attribute according to the first event Category Attributes in attribute determine the first parameter, wherein first parameter indicates the first event and the second event Between Category Attributes correlation;
Second determination unit, M for connection attribute and the second event in N number of attribute according to the first event Connection attribute in attribute determines the second parameter, wherein second parameter indicates the first event and the second event Between connection attribute correlation;
Third determination unit, for determining the first event and described second according to first parameter and second parameter Incidence coefficient between event;
4th determination unit, for determining the first event and institute in the case where the incidence coefficient is greater than predetermined threshold Stating second event is similar case.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of claim 1 to 7 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 7 Method.
CN201910745487.9A 2019-08-13 2019-08-13 The determination method and device of event similarity Pending CN110443320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910745487.9A CN110443320A (en) 2019-08-13 2019-08-13 The determination method and device of event similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910745487.9A CN110443320A (en) 2019-08-13 2019-08-13 The determination method and device of event similarity

Publications (1)

Publication Number Publication Date
CN110443320A true CN110443320A (en) 2019-11-12

Family

ID=68435108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910745487.9A Pending CN110443320A (en) 2019-08-13 2019-08-13 The determination method and device of event similarity

Country Status (1)

Country Link
CN (1) CN110443320A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125476A (en) * 2019-12-23 2020-05-08 北京每日优鲜电子商务有限公司 Event data processing method and device
CN112749239A (en) * 2021-01-20 2021-05-04 青岛海信网络科技股份有限公司 Event map construction method and device and computing equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205108A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Credit and market risk evaluation method
CN103198493A (en) * 2013-04-09 2013-07-10 天津大学 Target tracking method based on multi-feature self-adaption fusion and on-line study
CN106599227A (en) * 2016-12-19 2017-04-26 北京天广汇通科技有限公司 Method and apparatus for obtaining similarity between objects based on attribute values
CN107169628A (en) * 2017-04-14 2017-09-15 华中科技大学 A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108304853A (en) * 2017-10-10 2018-07-20 腾讯科技(深圳)有限公司 Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing
KR101983704B1 (en) * 2017-12-27 2019-05-29 현대카드 주식회사 Method for recommending information on websites using personalization algorithm and server using the same
CN110096708A (en) * 2019-04-30 2019-08-06 科大讯飞股份有限公司 A kind of determining method and device of calibration collection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205108A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Credit and market risk evaluation method
CN103198493A (en) * 2013-04-09 2013-07-10 天津大学 Target tracking method based on multi-feature self-adaption fusion and on-line study
CN106599227A (en) * 2016-12-19 2017-04-26 北京天广汇通科技有限公司 Method and apparatus for obtaining similarity between objects based on attribute values
CN107169628A (en) * 2017-04-14 2017-09-15 华中科技大学 A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108304853A (en) * 2017-10-10 2018-07-20 腾讯科技(深圳)有限公司 Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing
KR101983704B1 (en) * 2017-12-27 2019-05-29 현대카드 주식회사 Method for recommending information on websites using personalization algorithm and server using the same
CN110096708A (en) * 2019-04-30 2019-08-06 科大讯飞股份有限公司 A kind of determining method and device of calibration collection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ASHRAF S ET AL: "New Quadrature-Based Approximations for the Characteristic Function and the Distribution Function of Sums of Lognormal Random Variables", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 *
主编柳杨著: "《数字图像物体识别理论详解与实战》", 31 January 2018 *
谷淑娟等: "一种改进的 CBR 案例检索相似性度量模型", 《中国管理信息化》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125476A (en) * 2019-12-23 2020-05-08 北京每日优鲜电子商务有限公司 Event data processing method and device
CN112749239A (en) * 2021-01-20 2021-05-04 青岛海信网络科技股份有限公司 Event map construction method and device and computing equipment

Similar Documents

Publication Publication Date Title
CN112417439B (en) Account detection method, device, server and storage medium
CN108615119B (en) Abnormal user identification method and equipment
US20170286845A1 (en) Automatic extraction of user mobility behaviors and interaction preferences using spatio-temporal data
CN106716382A (en) Methods and systems for aggregated multi-application behavioral analysis of mobile device behaviors
CN112700252B (en) Information security detection method and device, electronic equipment and storage medium
CN110362689A (en) A kind of methods of risk assessment, device, storage medium and server
CN110443320A (en) The determination method and device of event similarity
CN115660262B (en) Engineering intelligent quality inspection method, system and medium based on database application
CN110457175A (en) Business data processing method, device, electronic equipment and medium
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
CN110196920A (en) The treating method and apparatus and storage medium and electronic device of text data
CN109740621B (en) Video classification method, device and equipment
CN111581508B (en) Service monitoring method, device, equipment and storage medium
CN111741004B (en) Network security situation awareness method and related device
CN110246026B (en) Data transfer output combination setting method and device and terminal equipment
CN114039765A (en) Safety management and control method and device for power distribution Internet of things and electronic equipment
CN108449306A (en) One kind degree of peeling off detection method
CN111738834B (en) Data processing method and device
CN110765303A (en) Method and system for updating database
CN110427558A (en) The method for pushing and device of Energy Resources Service's director's part
CN112446777A (en) Credit evaluation method, device, equipment and storage medium
CN113704566B (en) Identification number body identification method, storage medium and electronic equipment
CN111797069B (en) Hadoop platform-based mass storage and pickup record processing method and device
CN116304901B (en) Webpage server fingerprint identification method, device, equipment and storage medium
CN116861101B (en) Data processing method and device for social matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191112

RJ01 Rejection of invention patent application after publication